extract_subpopulations identifies clusters in the reference and population sets and reports the frequency of points in each cluster for the two sets.

extract_subpopulations(population, reference, variables, k)

Arguments

population

tbl with grouping (metadata) and observation variables.

reference

tbl with grouping (metadata) and observation variables. Columns of population and reference should be identical.

variables

character vector specifying observation variables.

k

scalar specifying number of clusters.

Value

list containing clusters centers (subpop_centers), two normalized histograms specifying frequency of each clusters in population and reference (subpop_profiles), and cluster prediction and distance to the predicted cluster for all input data (population_clusters and reference_clusters).

Examples

data <- tibble::tibble( Metadata_group = c( "control", "control", "control", "control", "experiment", "experiment", "experiment", "experiment" ), AreaShape_Area = c(10, 12, NA, 16, 8, 8, 7, 7), AreaShape_Length = c(2, 3, NA, NA, 4, 5, 1, 5) ) variables <- c("AreaShape_Area", "AreaShape_Length") population <- dplyr::filter(data, Metadata_group == "experiment") reference <- dplyr::filter(data, Metadata_group == "control") extract_subpopulations( population = population, reference = reference, variables = variables, k = 3 )
#> $subpop_centers #> AreaShape_Area AreaShape_Length #> 1 7.000000 1.000000 #> 2 7.666667 4.666667 #> 3 11.000000 2.500000 #> #> $subpop_profiles #> # A tibble: 3 x 3 #> cluster_id population reference #> <int> <dbl> <dbl> #> 1 1 0.25 0 #> 2 2 0.75 0 #> 3 3 0 1 #> #> $population_clusters #> # A tibble: 4 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 experiment 2 0.745 #> 2 experiment 2 0.471 #> 3 experiment 1 0 #> 4 experiment 2 0.745 #> #> $reference_clusters #> # A tibble: 2 x 3 #> Metadata_group cluster_id dist_to_cluster #> <chr> <int> <dbl> #> 1 control 3 1.12 #> 2 control 3 1.12 #>