extract_subpopulations
identifies clusters in the reference and
population sets and reports the frequency of points in each cluster for the
two sets.
extract_subpopulations(population, reference, variables, k)
population | tbl with grouping (metadata) and observation variables. |
---|---|
reference | tbl with grouping (metadata) and observation variables.
Columns of |
variables | character vector specifying observation variables. |
k | scalar specifying number of clusters. |
list containing clusters centers (subpop_centers
), two
normalized histograms specifying frequency of each clusters in population
and reference (subpop_profiles
), and cluster prediction and distance
to the predicted cluster for all input data (population_clusters
and
reference_clusters
).
data <- tibble::tibble( Metadata_group = c( "control", "control", "control", "control", "experiment", "experiment", "experiment", "experiment" ), AreaShape_Area = c(10, 12, NA, 16, 8, 8, 7, 7), AreaShape_Length = c(2, 3, NA, NA, 4, 5, 1, 5) ) variables <- c("AreaShape_Area", "AreaShape_Length") population <- dplyr::filter(data, Metadata_group == "experiment") reference <- dplyr::filter(data, Metadata_group == "control") extract_subpopulations( population = population, reference = reference, variables = variables, k = 3 )#> $subpop_centers #> AreaShape_Area AreaShape_Length #> 1 7.000000 1.000000 #> 2 7.666667 4.666667 #> 3 11.000000 2.500000 #> #> $subpop_profiles #> # A tibble: 3 x 3 #> cluster_id population reference #> <int> <dbl> <dbl> #> 1 1 0.25 0 #> 2 2 0.75 0 #> 3 3 0 1 #> #> $population_clusters #> # A tibble: 4 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 experiment 2 0.745 #> 2 experiment 2 0.471 #> 3 experiment 1 0 #> 4 experiment 2 0.745 #> #> $reference_clusters #> # A tibble: 2 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 control 3 1.12 #> 2 control 3 1.12 #>