
Basic imputation function for missing values
leo_impute_na.RdThis function provides multiple methods for imputing missing values in a dataset, including mean, median, random forest, KNN, and multiple imputation.
Examples
# Create sample data with missing values
set.seed(123)
sample_data <- data.frame(
age = c(25, 30, NA, 40, 45),
score = c(80, NA, 90, 85, NA),
group = factor(c("A", "B", "A", NA, "B"))
)
# Mean imputation
leo_impute_na(sample_data, method = "mean")
#> ℹ [10:00:42] Missing values per column (count/percentage):
#> ℹ [10:00:42] age: 1/5 (20%)
#> ℹ [10:00:42] score: 2/5 (40%)
#> ℹ [10:00:42] group: 1/5 (20%)
#> ! [10:00:42] Remaining missing values after imputation:
#> ! [10:00:42] group: 1
#> age score group
#> 1 25 80 A
#> 2 30 85 B
#> 3 35 90 A
#> 4 40 85 <NA>
#> 5 45 85 B
# Random forest imputation
leo_impute_na(sample_data, method = "rf")
#> ℹ [10:00:42] Missing values per column (count/percentage):
#> ℹ [10:00:42] age: 1/5 (20%)
#> ℹ [10:00:42] score: 2/5 (40%)
#> ℹ [10:00:42] group: 1/5 (20%)
#> Missing value imputation by random forests
#>
#> Variables to impute: age, group, score
#> Variables used to impute: age, score, group
#>
#> iter 1
#>
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
#> ℹ [10:00:42] Imputation completed using rf method
#> age score group
#> 1 25 80.00 A
#> 2 30 84.91 B
#> 3 40 90.00 A
#> 4 40 85.00 A
#> 5 45 84.91 B