The similarity matrix represents a graph with vertices and edges.

Each vertex belongs to 3 nested sets

  • Level 0 “singleton” set - i.e. just itself
  • Level 1 “replicate” set - e.g. replicates of the same perturbation
  • Level 2 “group replicate” set e.g. replicates of perturbations with same MOA

We calculate metrics hierarchically:

  • Level 1-0: similarity of elements of a Level 0 (singleton) set to elements of its Level 1 (replicates) set, except elements of its Level 0 set. In simpler terms, this is a replicate similarity of a vertex, i.e. the similarity of vertex to its replicates (except itself)*. This is a Level 0 (singleton) set metric.
  • Level 2-1: similarity of each element of a Level 1 set (replicates) to elements of its Level 2 set (group replicates), except elements of its Level 1 set (replicates). In simpler terms, this is a group replicate similarity of a replicate set, i.e. similarity of elements of a replicate set to its group replicates (except to other elements of its replicate set). This is a Level 1 (replicate) set metric.

We can aggregate each of these metrics to produce more metrics:

  • Level 1: average Level 1-0 similarity across all Level 0 (singleton) sets that are nested in the Level 1 set. In simpler terms, this is the average replicate similarity of a set of replicate vertices. This is a Level 1 (replicate) set metric.
  • Level 2: average Level 2-1 similarity across all Level 1 (replicate) sets that are nested in the Level 2 set. In simpler terms, this is the average group replicate similarity of a set of replicate sets. This is a a Level 2 (group replicate) set metric.

Consider a compound perturbation experiment done in replicates in a multi-well plate. Each compound belongs to one (or more) MOAs.

  • Each replicate well has a Level 1-0 metric, which is the similarity of that well to its replicates.
  • Each compound has a Level 2-1 metric, which is the average similarity of each of its replicate wells to replicate wells of other compounds with the same MOA.


  • Each compound has a Level 1 metric, which is the average Level 1-0 metric across all its replicate wells.
  • Each MOA has a Level 2 metric, which is the average Level 2-1 metric across all its compounds.

The metrics implemented in matric are defined below.

Level 1-0

Raw metrics

Metric Description
sim_mean_i mean similarity of a vertex to its replicate vertices

Related: sim_median_i which uses median instead of mean.

Scaled metrics

Metric Description
sim_scaled_mean_non_rep_i scale sim_mean_i using sim_mean_stat_non_rep_i and sim_sd_stat_non_rep_i


  • sim_mean_stat_non_rep_i and sim_sd_stat_non_rep_i are the mean and s.d. of similarity of a vertex to its non-replicate vertices.


  • sim_scaled_median_non_rep_i which scales sim_median_i instead of sim_mean_i.
  • sim_scaled_mean_ref_i which scales sim_mean_i w.r.t. reference vertices (i.e. uses sim_mean_stat_ref_i and sim_sd_stat_ref_i – the mean and s.d. of similarity of a vertex to the references vertices – to scale).
  • sim_scaled_median_ref_i which is the same as sim_scaled_mean_ref_i except that is scales sim_median_i instead of sim_mean_i.

Rank-based and retrieval-based metrics

Consider a list of vertices comprising

  • the replicates of the vertex
  • the non-replicates of the vertex
Metric Description
sim_ranked_relrank_mean_non_rep_i the mean percentile of the vertex’s replicates in this list
sim_retrieval_average_precision_non_rep_i the average precision reported on the list, with the replicates being the positive class
sim_retrieval_r_precision_non_rep_i similarly, the R-precision reported on the list


  • sim_ranked_relrank_median_non_rep_i reports the median percentile instead of the mean percentile.
  • sim_ranked_relrank_mean_ref_i, sim_ranked_relrank_median_ref_i, sim_retrieval_average_precision_ref_i, and sim_retrieval_r_precision_non_rep_i use a list of vertices comprising the reference vertices instead of the non-replicate vertices.

Level 1 aggregations Level 1-0 metrics

  • sim_mean_i_mean_i is the mean sim_mean_i across all replicate vertices in a replicate set.
  • sim_mean_i_median_i, sim_median_i_mean_i, and sim_median_i_median_i are the corresponding Level 1 aggregated metrics for other combinations of Level 1-0 raw metrics and summary statistics.
  • sim_scaled_mean_non_rep_i_mean_i, sim_scaled_median_non_rep_i_median_i, sim_scaled_mean_ref_i_mean_i, sim_scaled_median_ref_i_median_i are the corresponding Level 1 aggregated metrics for the scaled Level 1-0 metrics.
  • sim_ranked_relrank_mean_ref_i_mean_i, sim_ranked_relrank_mean_ref_i_median_i, sim_ranked_relrank_median_ref_i_mean_i, sim_ranked_relrank_median_ref_i_median_i are the corresponding Level 1 aggregated metrics for the rank-based Level 1-0 metrics.
  • sim_retrieval_average_precision_ref_i_mean_i, sim_retrieval_average_precision_ref_i_median_i, sim_retrieval_r_precision_ref_i_mean_i, sim_retrieval_r_precision_ref_i_median_i are the corresponding Level 1 aggregated metrics for the retrieval-based Level 1-0 metrics.

Note: These are Level 1 summaries of scaling parameters; they are not used for scaling, themselves:

  • sim_mean_stat_non_rep_i_mean_i, sim_sd_stat_non_rep_i_mean_i, sim_mean_stat_non_rep_i_median_i, sim_sd_stat_non_rep_i_median_i
  • sim_mean_stat_ref_i_mean_i, sim_sd_stat_ref_i_mean_i, sim_mean_stat_ref_i_median_i, sim_sd_stat_ref_i_median_i

Level 2-1

Raw metrics

Metric Description
sim_mean_g mean similarity of vertices in a replicate set to its group replicate vertices

Related: sim_median_g which uses median instead of mean.

Scaled metrics

Metric Description
sim_scaled_mean_non_rep_g scale sim_mean_g using sim_mean_stat_non_rep_g and sim_sd_stat_non_rep_g


  • sim_mean_stat_non_rep_g and sim_sd_stat_non_rep_g are the mean and s.d. of similarity of vertices in a replicate set to their non-replicate (and non-group replicate) vertices.


  • sim_scaled_median_non_rep_g which scales sim_median_g instead of sim_mean_g.
  • sim_scaled_mean_ref_g which scales sim_mean_g w.r.t. reference vertices (i.e. uses sim_mean_stat_ref_g and sim_sd_stat_ref_g – the mean and s.d. of similarity of vertices in a replicate set to the references vertices – to scale).
  • sim_scaled_median_ref_i which is the same as sim_scaled_mean_ref_i except that is scales sim_median_i instead of sim_mean_i.

Rank-based and retrieval-based metrics

Consider a list of vertices comprising

  • the vertices in a replicate set
  • the corresponding non-replicate (and non-group replicate) vertices

We define metrics similar to the corresponding Level 1-0 metrics:

  • sim_ranked_relrank_mean_non_rep_g
  • sim_ranked_relrank_median_non_rep_g
  • sim_retrieval_average_precision_non_rep_g
  • sim_retrieval_r_precision_non_rep_g
  • sim_ranked_relrank_median_ref_g
  • sim_ranked_relrank_median_ref_g
  • sim_retrieval_average_precision_ref_g
  • sim_retrieval_r_precision_ref_g

Level 2 aggregations of Level 2-1 metrics

These are not implemented.


This a related discussion on metrics, from here.

We have a weighted graph where the vertices are perturbations with multiple labels (e.g. pathways in the case of genetic perturbations), and edges are the similarity between the vertices (e.g. the cosine similarity between image-based profiles of two CRISPR knockouts).

There are three levels of ranked lists of edges, each of which can produce global metrics (based on classification metrics like average precision or other so-called class probability metrics). These global metrics can be used to compare representations.

In all 3 cases, we pose it as a binary classification problem on the edges:

  • Class 1 edges: vertices have a shared label (e.g. at least one MOA in common)
  • Class 0 edges: vertices do not have a shared label

The three levels of ranked lists of edges, along with the metrics they induce, are below

(Not all the metrics are useful, and some may be very similar to others. I have highlighted the ones I think are useful.)

  1. Global: Single list, comprising all edges
  1. We can directly compute a single global metric from this list
  1. Label-specific: One list per label, comprising all edges that have at least one vertex with the label
  1. We can compute a label-specific metric, from each list, with an additional constraint on Class 1 edges: both vertices should share the label being evaluated.
  2. We can then (weighted) average the label-specific metrics to get a single global metric.
  3. We can also directly compute a global metric directly across all the label-specific lists.
  1. Sample-specific: One list per sample, comprising all edges that have at least one vertex as that sample
  1. We can compute a sample-specific metric, from each list.
  2. We can then average the sample-specific metrics to get a label-specific metric, but filtered like in 1a although it may not be quite as straightforward; 2.d might be better.
  3. We can further (weighted) average the label-specific metrics to get a single global metric.
  4. We can also directly compute a label-specific metric directly across the sample-specific lists, but filtered like in 1a.
  5. We can also directly average the sample-specific metrics to get a single global metric.
  6. We can also directly compute a single global metric directly across all the sample-specific lists.
  7. We can also (weighted) average the label-specific metric in 2d to get a single global metric.


  • This discussion on metrics does not address the notion of “group replicates”.
  • Level 1 metrics are macro-averaged metrics because we are taking averages of Level 1-0 metrics. Macro-averaged metrics are not currently implemented
  • The difference between 1.a and 2.d is in how we construct the label-specific list: 1.a combines the sample-specific lists and then ranks, whereas 2.d first ranks the sample-specific lists and then combines the ranked lists.
  • Similarly, the difference between 0.a and 2.g is in how we construct the global list: 0.a combines the sample-specific lists and then ranks, whereas 2.g first ranks the sample-specific lists and then combines the ranked lists.
  • 1.b and 2.g are similar; the both aggregate their corresponding label-specific metrics (1.a and 2.d respectively) to get a global metric.
  • sim_retrieval_average_precision_non_rep_i is an example of 2.a
  • sim_retrieval_average_precision_non_rep_i_mean_i is an example of 2.b

Categorization based on (I did not double-check; there could be errors)

Index Averaging Metric type
0.a micro global
1.a micro label-specific
1.b macro global
1.c micro global
2.b macro label-specific
2.c macro of macro-label-specific global
2.d micro label-specific
2.e macro global
2.f micro global
2.g macro of micro-label-specific global