R/sim_collate.R
sim_collate.Rdsim_collate collates several subsets of a melted similarity matrix,
required for computing metrics.
sim_collate(
sim_df,
all_same_cols_rep,
annotation_cols,
any_different_cols_rep = NULL,
all_different_cols_rep = NULL,
all_same_cols_ref = NULL,
all_same_cols_rep_ref = NULL,
all_same_cols_non_rep = NULL,
any_different_cols_non_rep = NULL,
all_different_cols_non_rep = NULL,
any_different_cols_group = NULL,
all_same_cols_group = NULL,
reference = NULL,
drop_reference = FALSE,
drop_group = NULL
)metric_sim object.
optional character vector specifying columns.
character vector specifying which columns from
metadata to annotate the left index of the filtered sim_df
with.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character vector specifying columns.
optional character string specifying reference.
optional boolean specifying whether to filter (drop)
pairs using reference on the left index.
optional tbl; rows that match on drop_group on the
left or right index are dropped.
metric_sim object comprising a filtered sim_df with
sets of pairs, preserving the same metric_sim attributes as
sim_df.
Fetch similarities between
(a) all rows (except, optionally those containing reference), and
(b) all rows containing reference
Do so only for those (a, b) pairs that
have same values in all columns of all_same_cols_ref
Fetch similarities between
(a) all rows except reference rows, and
(b) all rows except reference rows (i.e. to each other)
Do so for only those (a, b) pairs that
have same values in all columns of all_same_cols_rep
have different values in all columns of all_different_cols_rep
(if specified)
have different values in at least one column of
any_different_cols_rep (if specified)
Keep, both, (a, b) and (b, a)
Fetch similarities between
(a) all rows containing reference, and
(b) all rows containing reference (i.e. to each other)
Do so for only those (a, b) pairs that
have same values in all columns of all_same_cols_rep_ref.
Keep, both, (a, b) and (b, a)
Fetch similarities between
(a) all rows (except, optionally, reference rows), and
(b) all rows except reference rows
Do so for only those (a, b) pairs that
have same values in all columns of all_same_cols_non_rep
have different values in all columns all_different_cols_non_rep
have different values in at least one column of
any_different_cols_non_rep
Keep, both, (a, b) and (b, a)
Fetch similarities between
(a) all rows (except, optionally, reference rows), and
(b) all rows (except, optionally, reference rows)
Do so for only those (a, b) pairs that
have same values in all columns of all_same_cols_group
have different values in at least one column of
any_different_cols_group
Keep, both, (a, b) and (b, a)
sim_df <- matric::sim_calculate(matric::cellhealth)
drop_group <-
data.frame(Metadata_gene_name = "EMPTY")
reference <-
data.frame(Metadata_gene_name = c("Chr2"))
all_same_cols_ref <-
c(
"Metadata_cell_line",
"Metadata_Plate"
)
all_same_cols_rep <-
c(
"Metadata_cell_line",
"Metadata_gene_name",
"Metadata_pert_name"
)
all_same_cols_rep_ref <-
c(
"Metadata_cell_line",
"Metadata_gene_name",
"Metadata_pert_name",
"Metadata_Plate"
)
any_different_cols_non_rep <-
c(
"Metadata_cell_line",
"Metadata_gene_name",
"Metadata_pert_name"
)
all_same_cols_non_rep <-
c(
"Metadata_cell_line",
"Metadata_Plate"
)
all_different_cols_non_rep <-
c("Metadata_gene_name")
all_same_cols_group <-
c(
"Metadata_cell_line",
"Metadata_gene_name"
)
any_different_cols_group <-
c(
"Metadata_cell_line",
"Metadata_gene_name",
"Metadata_pert_name"
)
annotation_cols <-
c(
"Metadata_cell_line",
"Metadata_gene_name",
"Metadata_pert_name"
)
collated_sim <-
matric::sim_collate(
sim_df,
reference = reference,
all_same_cols_rep = all_same_cols_rep,
all_same_cols_rep_ref = all_same_cols_rep_ref,
all_same_cols_ref = all_same_cols_ref,
any_different_cols_non_rep = any_different_cols_non_rep,
all_same_cols_non_rep = all_same_cols_non_rep,
all_different_cols_non_rep = all_different_cols_non_rep,
any_different_cols_group = any_different_cols_group,
all_same_cols_group = all_same_cols_group,
annotation_cols = annotation_cols,
drop_group = drop_group
)
head(collated_sim)
#> # A tibble: 6 × 7
#> id1 id2 sim Metadata_cell_line Metadata_gene_name Metadata_pert_name
#> <int> <int> <dbl> <chr> <chr> <chr>
#> 1 2 1 -0.959 A549 AKT1 AKT1-1
#> 2 23 1 -0.983 A549 AKT1 AKT1-1
#> 3 24 1 -0.990 A549 AKT1 AKT1-1
#> 4 45 1 -0.932 A549 AKT1 AKT1-1
#> 5 46 1 0.982 A549 AKT1 AKT1-1
#> 6 1 2 -0.959 A549 AKT1 AKT1-1
#> # ℹ 1 more variable: type <chr>
collated_sim %>%
dplyr::group_by(type) %>%
dplyr::tally()
#> # A tibble: 4 × 2
#> type n
#> <chr> <int>
#> 1 non_rep 1152
#> 2 ref 1944
#> 3 rep 468
#> 4 rep_group 3672