Skip to contents

This function provides multiple methods for imputing missing values in a dataset, including mean, median, random forest, KNN, and multiple imputation.

Usage

leo_impute_na(data, method = "mean", ...)

Arguments

data

A data frame containing the data with missing values

method

Imputation method: "mean", "median", "rf" (random forest), "knn" (K-nearest neighbors), or "mice" (multiple imputation)

...

Additional arguments passed to the specific imputation function

Value

A data frame with missing values imputed

Examples

# Create sample data with missing values
set.seed(123)
sample_data <- data.frame(
  age = c(25, 30, NA, 40, 45),
  score = c(80, NA, 90, 85, NA),
  group = factor(c("A", "B", "A", NA, "B"))
)

# Mean imputation
leo_impute_na(sample_data, method = "mean")
#>  [10:00:42] Missing values per column (count/percentage):
#>  [10:00:42]   age: 1/5 (20%)
#>  [10:00:42]   score: 2/5 (40%)
#>  [10:00:42]   group: 1/5 (20%)
#> ! [10:00:42] Remaining missing values after imputation:
#> ! [10:00:42]   group: 1
#>   age score group
#> 1  25    80     A
#> 2  30    85     B
#> 3  35    90     A
#> 4  40    85  <NA>
#> 5  45    85     B

# Random forest imputation
leo_impute_na(sample_data, method = "rf")
#>  [10:00:42] Missing values per column (count/percentage):
#>  [10:00:42]   age: 1/5 (20%)
#>  [10:00:42]   score: 2/5 (40%)
#>  [10:00:42]   group: 1/5 (20%)
#> Missing value imputation by random forests
#> 
#> Variables to impute:		age, group, score
#> Variables used to impute:	age, score, group
#> 
#> iter 1 
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
#>  [10:00:42] Imputation completed using rf method
#>   age score group
#> 1  25 80.00     A
#> 2  30 84.91     B
#> 3  40 90.00     A
#> 4  40 85.00     A
#> 5  45 84.91     B