Extract subpopulations. — extract_subpopulations • cytominer

extract_subpopulations identifies clusters in the reference and population sets and reports the frequency of points in each cluster for the two sets.

extract_subpopulations(population, reference, variables, k)

Arguments

population	tbl with grouping (metadata) and observation variables.
reference	tbl with grouping (metadata) and observation variables. Columns of `population` and `reference` should be identical.
variables	character vector specifying observation variables.
k	scalar specifying number of clusters.

Value

list containing clusters centers (subpop_centers), two normalized histograms specifying frequency of each clusters in population and reference (subpop_profiles), and cluster prediction and distance to the predicted cluster for all input data (population_clusters and reference_clusters).

Examples

data <- tibble::tibble(
  Metadata_group = c(
    "control", "control", "control", "control",
    "experiment", "experiment", "experiment", "experiment"
  ),
  AreaShape_Area = c(10, 12, NA, 16, 8, 8, 7, 7),
  AreaShape_Length = c(2, 3, NA, NA, 4, 5, 1, 5)
)
variables <- c("AreaShape_Area", "AreaShape_Length")
population <- dplyr::filter(data, Metadata_group == "experiment")
reference <- dplyr::filter(data, Metadata_group == "control")
extract_subpopulations(
  population = population,
  reference = reference,
  variables = variables,
  k = 3
)
#> $subpop_centers
#>   AreaShape_Area AreaShape_Length
#> 1       7.000000         1.000000
#> 2       7.666667         4.666667
#> 3      11.000000         2.500000
#> 
#> $subpop_profiles
#> # A tibble: 3 x 3
#>   cluster_id population reference
#>        <int>      <dbl>     <dbl>
#> 1          1       0.25         0
#> 2          2       0.75         0
#> 3          3       0            1
#> 
#> $population_clusters
#> # A tibble: 4 x 3
#>   Metadata_group cluster_id dist_to_cluster
#>   <chr>               <int>           <dbl>
#> 1 experiment              2           0.745
#> 2 experiment              2           0.471
#> 3 experiment              1           0    
#> 4 experiment              2           0.745
#> 
#> $reference_clusters
#> # A tibble: 2 x 3
#>   Metadata_group cluster_id dist_to_cluster
#>   <chr>               <int>           <dbl>
#> 1 control                 3            1.12
#> 2 control                 3            1.12
#>