The similarity matrix represents a graph with vertices and edges.
Each vertex belongs to 3 nested sets
We calculate metrics hierarchically:
We can aggregate each of these metrics to produce more metrics:
Consider a compound perturbation experiment done in replicates in a multi-well plate. Each compound belongs to one (or more) MOAs.
Further,
The metrics implemented in matric
are defined below.
Metric | Description |
---|---|
sim_mean_i |
mean similarity of a vertex to its replicate vertices |
Related: sim_median_i
which uses median instead of
mean.
Metric | Description |
---|---|
sim_scaled_mean_non_rep_i |
scale sim_mean_i using
sim_mean_stat_non_rep_i and
sim_sd_stat_non_rep_i
|
where
sim_mean_stat_non_rep_i
and
sim_sd_stat_non_rep_i
are the mean and s.d. of similarity
of a vertex to its non-replicate vertices.Related:
sim_scaled_median_non_rep_i
which scales
sim_median_i
instead of sim_mean_i
.sim_scaled_mean_ref_i
which scales
sim_mean_i
w.r.t. reference vertices (i.e. uses
sim_mean_stat_ref_i
and sim_sd_stat_ref_i
– the mean and s.d. of similarity of a vertex to the references vertices
– to scale).sim_scaled_median_ref_i
which is the same as
sim_scaled_mean_ref_i
except that is scales
sim_median_i
instead of sim_mean_i
.Consider a list of vertices comprising
Metric | Description |
---|---|
sim_ranked_relrank_mean_non_rep_i |
the mean percentile of the vertex’s replicates in this list |
sim_retrieval_average_precision_non_rep_i |
the average precision reported on the list, with the replicates being the positive class |
sim_retrieval_r_precision_non_rep_i |
similarly, the R-precision reported on the list |
Related:
sim_ranked_relrank_median_non_rep_i
reports the median
percentile instead of the mean percentile.sim_ranked_relrank_mean_ref_i
,
sim_ranked_relrank_median_ref_i
,
sim_retrieval_average_precision_ref_i
, and
sim_retrieval_r_precision_non_rep_i
use a list of vertices
comprising the reference vertices instead of the non-replicate
vertices.sim_mean_i_mean_i
is the mean sim_mean_i
across all replicate vertices in a replicate set.sim_mean_i_median_i
, sim_median_i_mean_i
,
and sim_median_i_median_i
are the corresponding Level 1
aggregated metrics for other combinations of Level 1-0 raw metrics and
summary statistics.sim_scaled_mean_non_rep_i_mean_i
,
sim_scaled_median_non_rep_i_median_i
,
sim_scaled_mean_ref_i_mean_i
,
sim_scaled_median_ref_i_median_i
are the corresponding
Level 1 aggregated metrics for the scaled Level 1-0 metrics.sim_ranked_relrank_mean_ref_i_mean_i
,
sim_ranked_relrank_mean_ref_i_median_i
,
sim_ranked_relrank_median_ref_i_mean_i
,
sim_ranked_relrank_median_ref_i_median_i
are the
corresponding Level 1 aggregated metrics for the rank-based Level 1-0
metrics.sim_retrieval_average_precision_ref_i_mean_i
,
sim_retrieval_average_precision_ref_i_median_i
,
sim_retrieval_r_precision_ref_i_mean_i
,
sim_retrieval_r_precision_ref_i_median_i
are the
corresponding Level 1 aggregated metrics for the retrieval-based Level
1-0 metrics.Note: These are Level 1 summaries of scaling parameters; they are not used for scaling, themselves:
sim_mean_stat_non_rep_i_mean_i
,
sim_sd_stat_non_rep_i_mean_i
,
sim_mean_stat_non_rep_i_median_i
,
sim_sd_stat_non_rep_i_median_i
sim_mean_stat_ref_i_mean_i
,
sim_sd_stat_ref_i_mean_i
,
sim_mean_stat_ref_i_median_i
,
sim_sd_stat_ref_i_median_i
Metric | Description |
---|---|
sim_mean_g |
mean similarity of vertices in a replicate set to its group replicate vertices |
Related: sim_median_g
which uses median instead of
mean.
Metric | Description |
---|---|
sim_scaled_mean_non_rep_g |
scale sim_mean_g using
sim_mean_stat_non_rep_g and
sim_sd_stat_non_rep_g
|
where
sim_mean_stat_non_rep_g
and
sim_sd_stat_non_rep_g
are the mean and s.d. of similarity
of vertices in a replicate set to their non-replicate (and non-group
replicate) vertices.Related:
sim_scaled_median_non_rep_g
which scales
sim_median_g
instead of sim_mean_g
.sim_scaled_mean_ref_g
which scales
sim_mean_g
w.r.t. reference vertices (i.e. uses
sim_mean_stat_ref_g
and sim_sd_stat_ref_g
–
the mean and s.d. of similarity of vertices in a replicate set to the
references vertices – to scale).sim_scaled_median_ref_i
which is the same as
sim_scaled_mean_ref_i
except that is scales
sim_median_i
instead of sim_mean_i
.Consider a list of vertices comprising
We define metrics similar to the corresponding Level 1-0 metrics:
sim_ranked_relrank_mean_non_rep_g
sim_ranked_relrank_median_non_rep_g
sim_retrieval_average_precision_non_rep_g
sim_retrieval_r_precision_non_rep_g
sim_ranked_relrank_median_ref_g
sim_ranked_relrank_median_ref_g
sim_retrieval_average_precision_ref_g
sim_retrieval_r_precision_ref_g
This a related discussion on metrics, from here.
We have a weighted graph where the vertices are perturbations with multiple labels (e.g. pathways in the case of genetic perturbations), and edges are the similarity between the vertices (e.g. the cosine similarity between image-based profiles of two CRISPR knockouts).
There are three levels of ranked lists of edges, each of which can produce global metrics (based on classification metrics like average precision or other so-called class probability metrics). These global metrics can be used to compare representations.
In all 3 cases, we pose it as a binary classification problem on the edges:
The three levels of ranked lists of edges, along with the metrics they induce, are below
(Not all the metrics are useful, and some may be very similar to others. I have highlighted the ones I think are useful.)
Notes:
sim_retrieval_average_precision_non_rep_i
is an example
of 2.asim_retrieval_average_precision_non_rep_i_mean_i
is an
example of 2.bCategorization based on https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification (I did not double-check; there could be errors)
Index | Averaging | Metric type |
---|---|---|
0.a | micro | global |
1.a | micro | label-specific |
1.b | macro | global |
1.c | micro | global |
2.b | macro | label-specific |
2.c | macro of macro-label-specific | global |
2.d | micro | label-specific |
2.e | macro | global |
2.f | micro | global |
2.g | macro of micro-label-specific | global |