mark_outlier_rows
drops outlier rows.
mark_outlier_rows( population, variables, sample, method = "svd+iqr", outlier_col = "is_outlier", ... )
population | tbl with grouping (metadata) and observation variables. |
---|---|
variables | character vector specifying observation variables. |
sample | tbl containing sample that is used by outlier removal methods
to estimate parameters. |
method | optional character string specifying method for outlier
removal. There is currently only one option ( |
outlier_col | optional character string specifying the name for the
column that will indicate outliers (in the output).
Default |
... | arguments passed to outlier removal method. |
population
with an extra column is_outlier
.
suppressMessages(suppressWarnings(library(magrittr))) population <- tibble::tibble( Metadata_group = sample(c("a", "b"), 100, replace = TRUE), Metadata_type = sample(c("control", "trt"), 100, replace = TRUE), AreaShape_Area = c(rnorm(98), 20, 30), AreaShape_Eccentricity = rnorm(100) ) variables <- c("AreaShape_Area", "AreaShape_Eccentricity") sample <- population %>% dplyr::filter(Metadata_type == "control") population_marked <- cytominer::mark_outlier_rows( population, variables, sample, method = "svd+iqr" ) population_marked %>% dplyr::group_by(is_outlier) %>% dplyr::sample_n(3)#> # A tibble: 6 x 5 #> # Groups: is_outlier [2] #> Metadata_group Metadata_type AreaShape_Area AreaShape_Eccentricity is_outlier #> <chr> <chr> <dbl> <dbl> <lgl> #> 1 a trt 0.390 -0.00636 FALSE #> 2 b control 0.0767 -0.0596 FALSE #> 3 a trt 1.52 0.695 FALSE #> 4 a control -2.33 -1.91 TRUE #> 5 b control -1.56 1.58 TRUE #> 6 a trt 2.30 1.31 TRUE