Select observation variables. — variable

variable_select selects observation variables based on the specified variable selection method.

variable_select(
  population,
  variables,
  sample = NULL,
  operation = "variance_threshold",
  ...
)

Arguments

population	tbl with grouping (metadata) and observation variables.
variables	character vector specifying observation variables.
sample	tbl containing sample that is used by some variable selection methods. `sample` has same structure as `population`.
operation	optional character string specifying method for variable selection. This must be one of the strings `"variance_threshold"`, `"correlation_threshold"`, `"drop_na_columns"`.
...	arguments passed to selection operation.

Value

variable-selected data of the same class as population.

Examples


# In this example, we use `correlation_threshold` as the operation for
# variable selection.

suppressMessages(suppressWarnings(library(magrittr)))
population <- tibble::tibble(
  x = rnorm(100),
  y = rnorm(100) / 1000
)

population %<>% dplyr::mutate(z = x + rnorm(100) / 10)

sample <- population %>% dplyr::slice(1:30)

variables <- c("x", "y", "z")

operation <- "correlation_threshold"

cor(sample)
#>             x           y           z
#> x  1.00000000 -0.08022343  0.99463331
#> y -0.08022343  1.00000000 -0.06732153
#> z  0.99463331 -0.06732153  1.00000000

# `x` and `z` are highly correlated; one of them will be removed

head(population)
#> # A tibble: 6 x 3
#>        x         y      z
#>    <dbl>     <dbl>  <dbl>
#> 1  0.380  0.00105   0.328
#> 2 -0.502 -0.00105  -0.551
#> 3 -0.333 -0.00126  -0.328
#> 4 -1.02   0.00324  -0.889
#> 5 -1.07  -0.000417 -0.842
#> 6  0.304  0.000298  0.458

futile.logger::flog.threshold(futile.logger::ERROR)
#> NULL

variable_select(population, variables, sample, operation) %>% head()
#> # A tibble: 6 x 2
#>           y      z
#>       <dbl>  <dbl>
#> 1  0.00105   0.328
#> 2 -0.00105  -0.551
#> 3 -0.00126  -0.328
#> 4  0.00324  -0.889
#> 5 -0.000417 -0.842
#> 6  0.000298  0.458