variable_select selects observation variables based on the specified variable selection method.

variable_select(
  population,
  variables,
  sample = NULL,
  operation = "variance_threshold",
  ...
)

Arguments

population

tbl with grouping (metadata) and observation variables.

variables

character vector specifying observation variables.

sample

tbl containing sample that is used by some variable selection methods. sample has same structure as population.

operation

optional character string specifying method for variable selection. This must be one of the strings "variance_threshold", "correlation_threshold", "drop_na_columns".

...

arguments passed to selection operation.

Value

variable-selected data of the same class as population.

Examples

# In this example, we use `correlation_threshold` as the operation for # variable selection. suppressMessages(suppressWarnings(library(magrittr))) population <- tibble::tibble( x = rnorm(100), y = rnorm(100) / 1000 ) population %<>% dplyr::mutate(z = x + rnorm(100) / 10) sample <- population %>% dplyr::slice(1:30) variables <- c("x", "y", "z") operation <- "correlation_threshold" cor(sample)
#> x y z #> x 1.00000000 -0.08022343 0.99463331 #> y -0.08022343 1.00000000 -0.06732153 #> z 0.99463331 -0.06732153 1.00000000
# `x` and `z` are highly correlated; one of them will be removed head(population)
#> # A tibble: 6 x 3 #> x y z #> <dbl> <dbl> <dbl> #> 1 0.380 0.00105 0.328 #> 2 -0.502 -0.00105 -0.551 #> 3 -0.333 -0.00126 -0.328 #> 4 -1.02 0.00324 -0.889 #> 5 -1.07 -0.000417 -0.842 #> 6 0.304 0.000298 0.458
futile.logger::flog.threshold(futile.logger::ERROR)
#> NULL
variable_select(population, variables, sample, operation) %>% head()
#> # A tibble: 6 x 2 #> y z #> <dbl> <dbl> #> 1 0.00105 0.328 #> 2 -0.00105 -0.551 #> 3 -0.00126 -0.328 #> 4 0.00324 -0.889 #> 5 -0.000417 -0.842 #> 6 0.000298 0.458