Compute similarity between pairs of rows of a matrix

cosine_sparse computes cosine similarity between pairs of rows of a matrix. pearson_sparse computes pearson similarity between pairs of rows of a matrix.

cosine_sparse(X, id1, id2, ...)

pearson_sparse(X, id1, id2, ...)

Arguments

X: matrix
id1: vector of integers specifying the list of rows of X (first set)
id2: vector of integers specifying the list of rows of X, (second set), same length as id1.
...: arguments passed downstream for parallel processing.

Value

data.frame with the same number of rows as the length of id1

(and id2) containing the similarity between the pairs of rows of X. sim[i] == similarity(X[id1[i], ], X[id2[i], ]).

Examples


set.seed(42)
X <- matrix(rnorm(5 * 3), 5, 3)

id1 <- c(1, 3)
id2 <- c(5, 4)

s1 <- matric::cosine_sparse(X, id1, id2) %>% dplyr::arrange(id1, id2)

Xn <- X / sqrt(rowSums(X * X))

n_rows <- nrow(Xn)

s2 <-
  expand.grid(
    id1 = seq(n_rows),
    id2 = seq(n_rows),
    KEEP.OUT.ATTRS = FALSE
  ) %>%
  dplyr::mutate(sim = as.vector(tcrossprod(Xn))) %>%
  dplyr::inner_join(s1 %>% dplyr::select(id1, id2)) %>%
  dplyr::arrange(id1, id2)
#> Joining with `by = join_by(id1, id2)`

s1
#>   id1 id2       sim
#> 1   1   5 0.4743700
#> 2   3   4 0.1387656

all.equal(s1, s2)
#> [1] TRUE

Xm <- X - rowMeans(X)
s3 <- matric::cosine_sparse(Xm, id1, id2) %>% dplyr::arrange(id1, id2)
s4 <- matric::pearson_sparse(X, id1, id2) %>% dplyr::arrange(id1, id2)

all.equal(s3, s4)
#> [1] TRUE