sim_filter_some_different_drop_some filters a melted similarity matrix to keep pairs with the same values in specific columns, and other constraints.

sim_filter_some_different_drop_some(
  sim_df,
  row_metadata,
  any_different_cols,
  all_same_cols = NULL,
  all_different_cols = NULL,
  filter_drop_left = NULL,
  filter_drop_right = NULL,
  annotation_cols = NULL,
  sim_cols = c("id1", "id2", "sim")
)

Arguments

sim_df

data.frame with melted similarity matrix.

row_metadata

data.frame with row metadata.

any_different_cols

character vector specifying columns.

all_same_cols

optional character vector specifying columns.

all_different_cols

optional character vector specifying columns.

filter_drop_left

data.frame of metadata specifying which rows to drop on the left index.

filter_drop_right

data.frame of metadata specifying which rows to drop on the right index.

annotation_cols

optional character vector specifying which columns from metadata to annotate the left index of the filtered sim_df with.

sim_cols

optional character string specifying minimal set of columns for a similarity matrix

Value

Filtered sim_df as a data.frame, keeping only pairs that have

  • same values in all columns of all_same_cols,

  • different values in all columns all_different_cols, and

  • different values in at least one column of any_different_cols,

with further filtering using filter_drop_left and filter_drop_right. Rows are annotated based on the first index, if specified.

Examples

suppressMessages(suppressWarnings(library(magrittr)))
population <- tibble::tibble(
  Metadata_group = sample(c("a", "b"), 4, replace = TRUE),
  Metadata_type1 = sample(c("x", "y"), 4, replace = TRUE),
  Metadata_type2 = sample(c("p", "q"), 4, replace = TRUE),
  x = rnorm(4),
  y = x + rnorm(4) / 100,
  z = y + rnorm(4) / 1000
)
annotation_cols <- c("Metadata_group", "Metadata_type")
sim_df <- matric::sim_calculate(population, method = "pearson")
row_metadata <- attr(sim_df, "row_metadata")
sim_df <- matric::sim_annotate(sim_df, row_metadata, annotation_cols)
all_same_cols <- c("Metadata_group")
all_different_cols <- c("Metadata_type1")
any_different_cols <- c("Metadata_type2")
filter_drop_left <-
  tibble::tibble(Metadata_group = "a", Metadata_type1 = "x")
filter_drop_right <-
  tibble::tibble(Metadata_group = "a", Metadata_type1 = "x")
drop_reference <- FALSE
matric::sim_filter_some_different_drop_some(
  sim_df,
  row_metadata,
  any_different_cols,
  all_same_cols,
  all_different_cols,
  filter_drop_left,
  filter_drop_right,
  annotation_cols
)
#> [1] id1            id2            sim            Metadata_group
#> <0 rows> (or 0-length row.names)