extract_subpopulations identifies clusters in the reference and population sets and reports the frequency of points in each cluster for the two sets.

extract_subpopulations(population, reference, variables, k)



tbl with grouping (metadata) and observation variables.


tbl with grouping (metadata) and observation variables. Columns of population and reference should be identical.


character vector specifying observation variables.


scalar specifying number of clusters.


list containing clusters centers (subpop_centers), two normalized histograms specifying frequency of each clusters in population and reference (subpop_profiles), and cluster prediction and distance to the predicted cluster for all input data (population_clusters and reference_clusters).


data <- tibble::tibble( Metadata_group = c( "control", "control", "control", "control", "experiment", "experiment", "experiment", "experiment" ), AreaShape_Area = c(10, 12, NA, 16, 8, 8, 7, 7), AreaShape_Length = c(2, 3, NA, NA, 4, 5, 1, 5) ) variables <- c("AreaShape_Area", "AreaShape_Length") population <- dplyr::filter(data, Metadata_group == "experiment") reference <- dplyr::filter(data, Metadata_group == "control") extract_subpopulations( population = population, reference = reference, variables = variables, k = 3 )
#> $subpop_centers #> AreaShape_Area AreaShape_Length #> 1 7.000000 1.000000 #> 2 7.666667 4.666667 #> 3 11.000000 2.500000 #> #> $subpop_profiles #> # A tibble: 3 x 3 #> cluster_id population reference #> <int> <dbl> <dbl> #> 1 1 0.25 0 #> 2 2 0.75 0 #> 3 3 0 1 #> #> $population_clusters #> # A tibble: 4 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 experiment 2 0.745 #> 2 experiment 2 0.471 #> 3 experiment 1 0 #> 4 experiment 2 0.745 #> #> $reference_clusters #> # A tibble: 2 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 control 3 1.12 #> 2 control 3 1.12 #>