Help for package SurveyStat

Type:

Package

Title:

Survey Data Cleaning, Weighting and Analysis

Version:

1.0.3

Description:

Provides utilities for cleaning survey data, computing weights, and performing descriptive statistical analysis. Methods follow Lohr (2019, ISBN:978-0367272454) "Sampling: Design and Analysis" and Lumley (2010) <doi:10.1002/9780470580066>.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.0.0)

Imports:

dplyr, ggplot2, rlang, stats

Suggests:

knitr, rmarkdown, markdown, testthat

VignetteBuilder:

knitr

RoxygenNote:

7.3.3

NeedsCompilation:

Packaged:

2026-01-19 05:21:43 UTC; HP Computers

Author:

Muhammad Ali [aut, cre]

Maintainer:

Muhammad Ali <aliawan1170@gmail.com>

Repository:

CRAN

Date/Publication:

2026-01-22 21:20:02 UTC

Apply survey weights to data

Description

This function applies survey weights by creating a weighted version of the dataset. The weights are normalized to sum to the sample size for computational stability.

Usage

apply_weights(data, weight_col)

Arguments

data

A data.frame containing survey data

weight_col

Character string specifying column name containing weights

Value

A data.frame with normalized weights

Examples

data <- data.frame(age = c(25, 30, 35), weight = c(1.2, 0.8, 1.0))
weighted_data <- apply_weights(data, "weight")

Clean missing values in specified column

Description

This function handles missing values using specified imputation method. Supports mean, median, and mode imputation for numeric variables.

Usage

clean_missing(data, col, method = c("mean", "median", "mode"))

Arguments

data

A data.frame containing survey data

col

Character string specifying column name to clean

method

Character string specifying imputation method ("mean", "median", or "mode")

Value

A data.frame with missing values imputed

Examples

data <- data.frame(age = c(25, NA, 30, NA, 35))
clean_data <- clean_missing(data, "age", method = "mean")

Generate cross-tabulation table with chi-square test

Description

This function creates a cross-tabulation between two categorical variables and performs a chi-square test of independence. Can incorporate survey weights.

Usage

cross_tabulation(data, col1, col2, weight_col = NULL)

Arguments

data

A data.frame containing survey data

col1

Character string specifying first categorical variable

col2

Character string specifying second categorical variable

weight_col

Character string specifying column name containing weights (optional)

Value

A list containing cross-tabulation and chi-square test results

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), 
                   education = c("HS", "College", "HS", "College"))
cross_tab <- cross_tabulation(data, "gender", "education")

Generate comprehensive survey description

Description

This function provides a comprehensive description of survey data including sample size, variable types, missing value patterns, and basic statistics. Can incorporate survey weights if provided.

Usage

describe_survey(data, weight_col = NULL)

Arguments

data

A data.frame containing survey data

weight_col

Character string specifying column name containing weights (optional)

Value

A list containing descriptive statistics

Examples

data <- data.frame(
  age = c(25, 30, 35),
  gender = c("M", "F", "M"),
  weight = c(1.2, 0.8, 1.0)
)
desc <- describe_survey(data)
desc_weighted <- describe_survey(data, "weight")

Example Survey Dataset

Description

A small example dataset used to demonstrate SurveyStat functions.

Usage

example_survey

Format

A data frame with 10 rows and 5 variables:

Age: Numeric age of respondent
Gender: Gender of respondent (Male/Female)
Education: Education level (High School/Bachelor/Graduate)
Income: Numeric income value
Weight: Survey weight

Source

Simulated data for demonstration purposes

Generate frequency table for categorical variable

Description

This function creates a frequency table for a categorical variable, optionally incorporating survey weights.

Usage

frequency_table(data, col, weight_col = NULL)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for categorical variable

weight_col

Character string specifying column name containing weights (optional)

Value

A data.frame with frequency statistics

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), weight = c(1, 1.2, 0.8, 1.1))
freq_table <- frequency_table(data, "gender")
weighted_freq <- frequency_table(data, "gender", "weight")

Declare global variables to suppress NOTES in CRAN checks

Description

This file declares variables that are used in non-standard evaluation contexts (dplyr pipelines, ggplot aesthetics) to avoid "no visible binding" notes during CRAN checks.

Create publication-quality box plot

Description

This function creates a clean, publication-quality box plot for numeric variables, optionally grouped by a categorical variable.

Usage

plot_boxplot(data, col, group_col = NULL, add_points = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for numeric variable

group_col

Character string specifying column name for grouping variable (optional)

add_points

Logical whether to add individual data points (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(age = c(25, 30, 35, 40, 45), gender = c("M", "F", "M", "F", "M"))
box_plot <- plot_boxplot(data, "age")
grouped_box <- plot_boxplot(data, "age", "gender")

Create publication-quality histogram

Description

This function creates a clean, publication-quality histogram for numeric variables using ggplot2 with minimal theme and appropriate statistical overlays.

Usage

plot_histogram(data, col, bins = 30, add_density = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for numeric variable

bins

Number of bins for histogram (default: 30)

add_density

Logical whether to add density curve (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(age = rnorm(100, 35, 10))
hist_plot <- plot_histogram(data, "age")
print(hist_plot)

Create weighted bar plot for categorical variables

Description

This function creates a bar plot for categorical variables, optionally using survey weights to show weighted frequencies.

Usage

plot_weighted_bar(data, col, weight_col = NULL, show_percentages = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for categorical variable

weight_col

Character string specifying column name containing weights (optional)

show_percentages

Logical whether to show percentage labels (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), weight = c(1, 1.2, 0.8, 1.1))
bar_plot <- plot_weighted_bar(data, "gender")
weighted_bar <- plot_weighted_bar(data, "gender", "weight")

Rake survey weights to match population targets

Description

This function implements simple raking (iterative proportional fitting) to adjust survey weights to match known population marginal totals. Assumes two-dimensional raking for simplicity.

Usage

rake_weights(data, population_targets, weight_col = "weight")

Arguments

data

A data.frame containing survey data

population_targets

Named list with population totals for each variable

weight_col

Character string specifying initial weight column name

Value

A data.frame with raked weights

Examples

# Assuming we have gender and education population totals
targets <- list(
  gender = c(Male = 1000000, Female = 1050000),
  education = c(HighSchool = 800000, Bachelor = 900000, Graduate = 350000)
)
data <- data.frame(
  gender = c("Male", "Female", "Male", "Female", "Male"), 
  education = c("HighSchool", "Bachelor", "Bachelor", "HighSchool", "Graduate"),
  weight = c(1, 1, 1, 1, 1)
)
raked_data <- rake_weights(data, targets, "weight")

Remove duplicate rows from survey data

Description

This function identifies and removes duplicate rows based on all columns. Preserves the first occurrence of each duplicate.

Usage

remove_duplicates(data)

Arguments

data

A data.frame containing survey data

Value

A data.frame with duplicates removed

Examples

data <- data.frame(id = c(1, 2, 2, 3), age = c(25, 30, 30, 35))
clean_data <- remove_duplicates(data)

Standardize categorical values

Description

This function standardizes categorical variables by mapping values to standardized categories. Useful for consolidating different representations of the same category.

Usage

standardize_categories(data, col, mapping)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name to standardize

mapping

Named list or vector mapping old values to new values

Value

A data.frame with standardized categories

Examples

data <- data.frame(gender = c("M", "Male", "F", "Female", "m"))
mapping <- list("M" = "Male", "Male" = "Male", "F" = "Female", "Female" = "Female", "m" = "Male")
clean_data <- standardize_categories(data, "gender", mapping)

Calculate weighted mean

Description

This function calculates the weighted mean of a numeric variable. Uses standard weighted mean formula: sum(x * w) / sum(w)

Usage

weighted_mean(data, target_col, weight_col)

Arguments

data

A data.frame containing survey data

target_col

Character string specifying column name for target variable

weight_col

Character string specifying column name containing weights

Value

Numeric weighted mean

Examples

data <- data.frame(income = c(50000, 75000, 100000), weight = c(1.2, 0.8, 1.0))
weighted_income <- weighted_mean(data, "income", "weight")

Calculate weighted total

Description

This function calculates the weighted total of a numeric variable. Useful for estimating population totals from survey data.

Usage

weighted_total(data, target_col, weight_col)

Arguments

data

A data.frame containing survey data

target_col

Character string specifying column name for target variable

weight_col

Character string specifying column name containing weights

Value

Numeric weighted total

Examples

data <- data.frame(income = c(50000, 75000, 100000), weight = c(1000, 800, 1200))
total_income <- weighted_total(data, "income", "weight")