correlation_threshold
returns list of variables such that no two
variables have a correlation greater than a specified threshold.
correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")
variables | character vector specifying observation variables. |
---|---|
sample | tbl containing sample used to estimate parameters. |
cutoff | threshold between [0,1] that defines the minimum correlation of a selected feature. |
method | optional character string specifying method for calculating
correlation. This must be one of the strings |
character vector specifying observation variables to be excluded.
correlation_threshold
is a wrapper for caret::findCorrelation
.
suppressMessages(suppressWarnings(library(magrittr))) sample <- tibble::tibble( x = rnorm(30), y = rnorm(30) / 1000 ) sample %<>% dplyr::mutate(z = x + rnorm(30) / 10) variables <- c("x", "y", "z") head(sample)#> # A tibble: 6 x 3 #> x y z #> <dbl> <dbl> <dbl> #> 1 -1.40 0.000935 -1.29 #> 2 0.255 0.000176 0.189 #> 3 -2.44 0.000244 -2.33 #> 4 -0.00557 0.00162 -0.0302 #> 5 0.622 0.000112 0.504 #> 6 1.15 -0.000134 1.05#> x y z #> x 1.00000000 0.05080466 0.99554738 #> y 0.05080466 1.00000000 0.06550465 #> z 0.99554738 0.06550465 1.00000000# `x` and `z` are highly correlated; one of them will be removed correlation_threshold(variables, sample)#> [1] "z"