Skip to content

copairs.map

copairs.map

Module to compute mAP-based metrics.

apply_fdr_correction(map_scores, method='fdr_bh')

Apply standard FDR correction across all tests.

Parameters:

  • map_scores (DataFrame) –

    DataFrame containing mAP scores with a 'p_value' column.

  • method (str, default: 'fdr_bh' ) –

    Multiple testing correction method (default: 'fdr_bh'). See statsmodels.stats.multitest.multipletests for options.

Returns:

  • DataFrame

    Input DataFrame with 'corrected_p_value' column added.

Source code in src/copairs/map/hierarchical_fdr.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def apply_fdr_correction(
    map_scores: pd.DataFrame,
    method: str = "fdr_bh",
) -> pd.DataFrame:
    """Apply standard FDR correction across all tests.

    Parameters
    ----------
    map_scores : pd.DataFrame
        DataFrame containing mAP scores with a 'p_value' column.
    method : str, optional
        Multiple testing correction method (default: 'fdr_bh').
        See statsmodels.stats.multitest.multipletests for options.

    Returns
    -------
    pd.DataFrame
        Input DataFrame with 'corrected_p_value' column added.

    """
    map_scores = map_scores.copy()
    _, pvals_corrected, _, _ = multipletests(map_scores["p_value"], method=method)
    map_scores["corrected_p_value"] = pvals_corrected
    return map_scores

apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)

Apply hierarchical FDR correction for grouped hypotheses.

Implements a two-stage testing procedure appropriate for dose-response data where only high doses are expected to be active:

  • Stage 1: Use minimum p-value within each group defined by hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant.
  • Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.

Parameters:

  • map_scores (DataFrame) –

    DataFrame containing mAP scores with a 'p_value' column.

  • hierarchical_by (list) –

    Metadata column(s) defining the group structure (e.g., ['compound']).

  • sameby (list) –

    Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).

Returns:

  • DataFrame

    Input DataFrame with additional columns: - corrected_p_value: BH-corrected p-value (1.0 for groups that didn't pass Stage 1). - stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). - stage1_corrected_p_value: BH-corrected Stage 1 p-value. - stage1_significant: Whether the group passed Stage 1.

Raises:

  • ValueError

    If hierarchical_by is not a proper subset of sameby.

Notes

This method uses minimum p-value (rather than Simes) for Stage 1 aggregation. Min-p is appropriate for dose-response data where only high doses are expected to be active. Simes would penalize compounds for having inactive low doses, which is the expected biological behavior.

Source code in src/copairs/map/hierarchical_fdr.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def apply_hierarchical_fdr_correction(
    map_scores: pd.DataFrame,
    hierarchical_by: List[str],
    sameby: List[str],
) -> pd.DataFrame:
    """Apply hierarchical FDR correction for grouped hypotheses.

    Implements a two-stage testing procedure appropriate for dose-response data
    where only high doses are expected to be active:

    - Stage 1: Use minimum p-value within each group defined by `hierarchical_by`,
      then apply BH correction at the group level. A group passes if any member
      is significant.
    - Stage 2: For groups that pass Stage 1, apply BH correction to the
      individual tests within each group.

    Parameters
    ----------
    map_scores : pd.DataFrame
        DataFrame containing mAP scores with a 'p_value' column.
    hierarchical_by : list
        Metadata column(s) defining the group structure (e.g., ['compound']).
    sameby : list
        Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).

    Returns
    -------
    pd.DataFrame
        Input DataFrame with additional columns:
        - `corrected_p_value`: BH-corrected p-value (1.0 for groups that didn't pass Stage 1).
        - `stage1_p_value`: Group-level p-value from Stage 1 (minimum p-value).
        - `stage1_corrected_p_value`: BH-corrected Stage 1 p-value.
        - `stage1_significant`: Whether the group passed Stage 1.

    Raises
    ------
    ValueError
        If `hierarchical_by` is not a proper subset of `sameby`.

    Notes
    -----
    This method uses minimum p-value (rather than Simes) for Stage 1 aggregation.
    Min-p is appropriate for dose-response data where only high doses are expected
    to be active. Simes would penalize compounds for having inactive low doses,
    which is the expected biological behavior.

    """
    # Validate that hierarchical_by is a subset of sameby
    if not set(hierarchical_by).issubset(set(sameby)):
        raise ValueError(
            f"hierarchical_by columns {hierarchical_by} must be a subset of "
            f"sameby columns {sameby}"
        )

    if set(hierarchical_by) == set(sameby):
        raise ValueError(
            f"hierarchical_by columns {hierarchical_by} must be a proper subset of "
            f"sameby columns {sameby}. If they are equal, use standard correction "
            f"by not specifying hierarchical_by."
        )

    logger.info("Applying hierarchical FDR correction...")
    map_scores = map_scores.copy()

    # Stage 1: Aggregate p-values to group level using minimum p-value
    # Min-p is appropriate for dose-response where only high doses are expected to be active
    stage1_pvals = map_scores.groupby(hierarchical_by, observed=True).agg(
        {"p_value": "min"}
    )
    stage1_pvals.columns = ["stage1_p_value"]

    # Apply BH correction at the group level
    reject_stage1, stage1_corrected, _, _ = multipletests(
        stage1_pvals["stage1_p_value"], method="fdr_bh"
    )
    stage1_pvals["stage1_corrected_p_value"] = stage1_corrected
    stage1_pvals["stage1_significant"] = reject_stage1

    # Merge Stage 1 results back to map_scores
    map_scores = map_scores.merge(
        stage1_pvals.reset_index(), on=hierarchical_by, how="left"
    )

    # Stage 2: For groups that passed Stage 1, apply BH within each group
    # For groups that didn't pass, set corrected_p_value to 1.0
    map_scores["corrected_p_value"] = 1.0

    for group_key, group_df in map_scores.groupby(hierarchical_by, observed=True):
        if not group_df["stage1_significant"].iloc[0]:
            # Group didn't pass Stage 1, skip
            continue

        group_indices = group_df.index
        group_pvals = group_df["p_value"].values

        if len(group_pvals) == 1:
            # Single test in group, no additional correction needed
            map_scores.loc[group_indices, "corrected_p_value"] = group_pvals[0]
        else:
            # Apply BH correction within the group
            _, group_corrected, _, _ = multipletests(group_pvals, method="fdr_bh")
            map_scores.loc[group_indices, "corrected_p_value"] = group_corrected

    return map_scores

get_map_pvalue(ap_scores, sameby, null_size, seed, progress_bar=True, max_workers=None, cache_dir=None)

Compute mAP scores and p-values from AP scores.

This function groups AP scores by the specified columns, computes the mean Average Precision (mAP) for each group, and calculates p-values by comparing against null distributions.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • seed (int) –

    Random seed for reproducibility.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - Columns from sameby (group identifiers). - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - indices: List of indices in the original ap_scores for this group.

Source code in src/copairs/map/map.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def get_map_pvalue(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    seed: int,
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Compute mAP scores and p-values from AP scores.

    This function groups AP scores by the specified columns, computes the mean
    Average Precision (mAP) for each group, and calculates p-values by comparing
    against null distributions.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    seed : int
        Random seed for reproducibility.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - Columns from `sameby` (group identifiers).
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `indices`: List of indices in the original ap_scores for this group.

    """
    # Filter out invalid or incomplete AP scores
    ap_scores = ap_scores.query("~average_precision.isna() and n_pos_pairs > 0")
    ap_scores = ap_scores.reset_index(drop=True).copy()

    logger.info("Computing null_dist...")
    # Extract configurations for null distribution generation
    null_confs = ap_scores[["n_pos_pairs", "n_total_pairs"]].values
    null_confs, rev_ix = np.unique(null_confs, axis=0, return_inverse=True)

    # Generate null distributions for each unique configuration
    null_dists = compute.get_null_dists(
        null_confs, null_size, seed=seed, cache_dir=cache_dir, progress_bar=progress_bar
    )
    ap_scores["null_ix"] = rev_ix

    # Function to calculate the p-value for a mAP score based on the null distribution
    def get_p_value(params):
        map_score, indices = params
        null_dist = null_dists[rev_ix[indices]].mean(axis=0)
        num = (null_dist > map_score).sum()
        p_value = (num + 1) / (null_size + 1)  # Add 1 for stability
        return p_value

    logger.info("Computing p-values...")

    # Group by the specified metadata column(s) and calculate mean AP
    map_scores = ap_scores.groupby(sameby, observed=True, as_index=False).agg(
        {
            "average_precision": ["mean", lambda x: list(x.index)],
            "normalized_average_precision": "mean",
        }
    )
    map_scores.columns = sameby + [
        "mean_average_precision",
        "indices",
        "mean_normalized_average_precision",
    ]

    # Compute p-values for each group using the null distributions
    params = map_scores[["mean_average_precision", "indices"]]

    if progress_bar:
        from tqdm.contrib.concurrent import thread_map

        p_values = thread_map(
            get_p_value, params.values, leave=False, max_workers=max_workers
        )
    else:
        p_values = silent_thread_map(
            get_p_value, params.values, max_workers=max_workers
        )
    map_scores["p_value"] = p_values

    return map_scores

mean_average_precision(ap_scores, sameby, null_size, threshold, seed, progress_bar=True, max_workers=None, cache_dir=None)

Calculate the Mean Average Precision (mAP) score and associated p-values.

This function computes the Mean Average Precision (mAP) score by grouping profiles based on the specified criteria (sameby). It calculates the significance of mAP scores by comparing them to a null distribution and performs multiple testing corrections using Benjamini-Hochberg FDR.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • threshold (float) –

    p-value threshold for identifying significant MaP scores.

  • seed (int) –

    Random seed for reproducibility.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - corrected_p_value: Adjusted p-value after multiple testing correction. - below_p: Boolean indicating if the p-value is below the threshold. - below_corrected_p: Boolean indicating if the corrected p-value is below the threshold.

See Also

mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.

Source code in src/copairs/map/map.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def mean_average_precision(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    threshold: float,
    seed: int,
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Calculate the Mean Average Precision (mAP) score and associated p-values.

    This function computes the Mean Average Precision (mAP) score by grouping profiles
    based on the specified criteria (`sameby`). It calculates the significance of mAP
    scores by comparing them to a null distribution and performs multiple testing
    corrections using Benjamini-Hochberg FDR.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    threshold : float
        p-value threshold for identifying significant MaP scores.
    seed : int
        Random seed for reproducibility.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `corrected_p_value`: Adjusted p-value after multiple testing correction.
        - `below_p`: Boolean indicating if the p-value is below the threshold.
        - `below_corrected_p`: Boolean indicating if the corrected p-value is below the threshold.

    See Also
    --------
    mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.

    """
    # Step 1: Compute mAP scores and p-values
    map_scores = get_map_pvalue(
        ap_scores=ap_scores,
        sameby=sameby,
        null_size=null_size,
        seed=seed,
        progress_bar=progress_bar,
        max_workers=max_workers,
        cache_dir=cache_dir,
    )

    # Step 2: Apply multiple testing correction
    map_scores = apply_fdr_correction(map_scores)

    # Step 3: Mark scores below the p-value threshold
    map_scores["below_p"] = map_scores["p_value"] < threshold
    map_scores["below_corrected_p"] = map_scores["corrected_p_value"] < threshold

    return map_scores

mean_average_precision_hierarchical(ap_scores, sameby, null_size, threshold, seed, hierarchical_by, progress_bar=True, max_workers=None, cache_dir=None)

Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.

This function computes the Mean Average Precision (mAP) score by grouping profiles based on the specified criteria (sameby). It applies hierarchical FDR correction appropriate for grouped hypothesis testing, such as dose-response data.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • threshold (float) –

    p-value threshold for identifying significant MaP scores.

  • seed (int) –

    Random seed for reproducibility.

  • hierarchical_by (list) –

    Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:

    • Stage 1: Use minimum p-value within each group defined by hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant.
    • Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.

    This is designed for dose-response data where only high doses are expected to be active. The hierarchical_by columns must be a proper subset of sameby. For example, with sameby=['compound', 'dose'] and hierarchical_by=['compound'], mAP is calculated per compound×dose, but FDR correction accounts for the grouped structure.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - corrected_p_value: Adjusted p-value after multiple testing correction. - below_p: Boolean indicating if the p-value is below the threshold. - below_corrected_p: Boolean indicating if the corrected p-value is below the threshold. - stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). - stage1_corrected_p_value: BH-corrected Stage 1 p-value. - stage1_significant: Whether the group passed Stage 1.

See Also

mean_average_precision : For standard BH FDR correction.

Source code in src/copairs/map/map.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def mean_average_precision_hierarchical(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    threshold: float,
    seed: int,
    hierarchical_by: List[str],
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.

    This function computes the Mean Average Precision (mAP) score by grouping profiles
    based on the specified criteria (`sameby`). It applies hierarchical FDR correction
    appropriate for grouped hypothesis testing, such as dose-response data.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    threshold : float
        p-value threshold for identifying significant MaP scores.
    seed : int
        Random seed for reproducibility.
    hierarchical_by : list
        Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:

        - Stage 1: Use minimum p-value within each group defined by `hierarchical_by`,
          then apply BH correction at the group level. A group passes if any member
          is significant.
        - Stage 2: For groups that pass Stage 1, apply BH correction to the
          individual tests within each group.

        This is designed for dose-response data where only high doses are expected
        to be active. The `hierarchical_by` columns must be a proper subset of `sameby`.
        For example, with `sameby=['compound', 'dose']` and `hierarchical_by=['compound']`,
        mAP is calculated per compound×dose, but FDR correction accounts for the
        grouped structure.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `corrected_p_value`: Adjusted p-value after multiple testing correction.
        - `below_p`: Boolean indicating if the p-value is below the threshold.
        - `below_corrected_p`: Boolean indicating if the corrected p-value is below the threshold.
        - `stage1_p_value`: Group-level p-value from Stage 1 (minimum p-value).
        - `stage1_corrected_p_value`: BH-corrected Stage 1 p-value.
        - `stage1_significant`: Whether the group passed Stage 1.

    See Also
    --------
    mean_average_precision : For standard BH FDR correction.

    """
    # Step 1: Compute mAP scores and p-values
    map_scores = get_map_pvalue(
        ap_scores=ap_scores,
        sameby=sameby,
        null_size=null_size,
        seed=seed,
        progress_bar=progress_bar,
        max_workers=max_workers,
        cache_dir=cache_dir,
    )

    # Step 2: Apply hierarchical multiple testing correction
    # Includes stage1_* columns for transparency. Could drop these in future
    # for cleaner output (only corrected_p_value is needed downstream).
    map_scores = apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)

    # Step 3: Mark scores below the p-value threshold
    map_scores["below_p"] = map_scores["p_value"] < threshold
    map_scores["below_corrected_p"] = map_scores["corrected_p_value"] < threshold

    return map_scores

average_precision

Functions to compute average precision.

average_precision(meta, feats, pos_sameby, pos_diffby, neg_sameby, neg_diffby, batch_size=20000, distance='cosine', progress_bar=True)

Calculate average precision (AP) scores for pairs of profiles based on their similarity.

This function identifies positive and negative pairs of profiles using metadata rules, computes their similarity scores, and calculates average precision scores for each profile. The results include the number of positive and total pairs for each profile.

Parameters:

  • meta (DataFrame) –

    Metadata of the profiles, including columns used for defining pairs. This DataFrame should include the columns specified in pos_sameby, pos_diffby, neg_sameby, and neg_diffby.

  • feats (ndarray) –

    Feature matrix representing the profiles, where rows correspond to profiles and columns to features.

  • pos_sameby (list) –

    Metadata columns used to define positive pairs. Two profiles are considered a positive pair if they belong to the same group that is not a control group. For example, replicate profiles of the same compound are positive pairs and should share the same value in a column identifying compounds.

  • pos_diffby (list) –

    Metadata columns used to differentiate positive pairs. Positive pairs do not need to differ in any metadata columns, so this is typically left empty. However, if necessary (e.g., to account for batch effects), you can specify columns such as batch identifiers.

  • neg_sameby (list) –

    Metadata columns used to define negative pairs. Typically left empty, as profiles forming a negative pair (e.g., a compound and a DMSO/control) do not need to share any metadata values. This ensures comparisons are made without enforcing unnecessary constraints.

  • neg_diffby (list) –

    Metadata columns used to differentiate negative pairs. Two profiles are considered a negative pair if one belongs to a compound group and the other to a DMSO/ control group. They must differ in specified metadata columns, such as those identifying the compound and the treatment index, to ensure comparisons are only made between compounds and DMSO controls (not between different compounds).

  • batch_size (int, default: 20000 ) –

    The batch size for similarity computations to optimize memory usage. Default is 20000.

  • distance (str, default: 'cosine' ) –

    The distance function used for computing similarities. Default is "cosine".

Returns:

  • DataFrame

    A DataFrame containing the following columns: - 'average_precision': The calculated average precision score for each profile. - 'normalized_average_precision': The normalized AP score (scale-independent). - 'n_pos_pairs': The number of positive pairs for each profile. - 'n_total_pairs': The total number of pairs for each profile. - Additional metadata columns from the input.

Raises:

Notes
  • Positive Pair Rules:
    • Positive pairs are defined by pos_sameby (profiles share these metadata values) and optionally differentiated by pos_diffby (profiles must differ in these metadata values if specified).
  • Negative Pair Rules:
    • Negative pairs are defined by neg_diffby (profiles differ in these metadata values) and optionally constrained by neg_sameby (profiles share these metadata values if specified).
Source code in src/copairs/map/average_precision.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def average_precision(
    meta: pd.DataFrame,
    feats: pd.DataFrame,
    pos_sameby: List[str],
    pos_diffby: List[str],
    neg_sameby: List[str],
    neg_diffby: List[str],
    batch_size: int = 20000,
    distance: str = "cosine",
    progress_bar: bool = True,
) -> pd.DataFrame:
    """Calculate average precision (AP) scores for pairs of profiles based on their similarity.

    This function identifies positive and negative pairs of profiles using  metadata
    rules, computes their similarity scores, and calculates average precision
    scores for each profile. The results include the number of positive and total pairs
    for each profile.

    Parameters
    ----------
    meta : pd.DataFrame
        Metadata of the profiles, including columns used for defining pairs.
        This DataFrame should include the columns specified in `pos_sameby`,
        `pos_diffby`, `neg_sameby`, and `neg_diffby`.

    feats : np.ndarray
        Feature matrix representing the profiles, where rows correspond to profiles
        and columns to features.

    pos_sameby : list
        Metadata columns used to define positive pairs. Two profiles are considered a
        positive pair if they belong to the same group that is not a control group.
        For example, replicate profiles of the same compound are positive pairs and
        should share the same value in a column identifying compounds.

    pos_diffby : list
        Metadata columns used to differentiate positive pairs. Positive pairs do not need
        to differ in any metadata columns, so this is typically left empty. However,
        if necessary (e.g., to account for batch effects), you can specify columns
        such as batch identifiers.

    neg_sameby : list
        Metadata columns used to define negative pairs. Typically left empty, as profiles
        forming a negative pair (e.g., a compound and a DMSO/control) do not need to
        share any metadata values. This ensures comparisons are made without enforcing
        unnecessary constraints.

    neg_diffby : list
        Metadata columns used to differentiate negative pairs. Two profiles are considered
        a negative pair if one belongs to a compound group and the other to a DMSO/
        control group. They must differ in specified metadata columns, such as those
        identifying the compound and the treatment index, to ensure comparisons are
        only made between compounds and DMSO controls (not between different compounds).

    batch_size : int
        The batch size for similarity computations to optimize memory usage.
        Default is 20000.

    distance : str
        The distance function used for computing similarities. Default is "cosine".

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the following columns:
        - 'average_precision': The calculated average precision score for each profile.
        - 'normalized_average_precision': The normalized AP score (scale-independent).
        - 'n_pos_pairs': The number of positive pairs for each profile.
        - 'n_total_pairs': The total number of pairs for each profile.
        - Additional metadata columns from the input.

    Raises
    ------
    UnpairedException
        If no positive or negative pairs are found in the dataset.

    Notes
    -----
    - Positive Pair Rules:
        * Positive pairs are defined by `pos_sameby` (profiles share these metadata values)
          and optionally differentiated by `pos_diffby` (profiles must differ in these metadata values if specified).
    - Negative Pair Rules:
        * Negative pairs are defined by `neg_diffby` (profiles differ in these metadata values)
          and optionally constrained by `neg_sameby` (profiles share these metadata values if specified).
    """
    # Combine all metadata columns needed for pair definitions
    columns = flatten_str_list(pos_sameby, pos_diffby, neg_sameby, neg_diffby)

    # Validate and filter metadata to ensure the required columns are present and usable
    meta, columns = evaluate_and_filter(meta, columns)
    validate_pipeline_input(meta, feats, columns)

    # Get the distance function for similarity calculations (e.g., cosine)
    similarity_fn = compute.get_similarity_fn(distance, progress_bar=progress_bar)

    # Reset metadata index for consistent indexing
    meta = meta.reset_index(drop=True).copy()

    logger.info("Indexing metadata...")

    # Identify positive pairs based on `pos_sameby` and `pos_diffby`
    logger.info("Finding positive pairs...")
    pos_pairs = find_pairs(meta, sameby=pos_sameby, diffby=pos_diffby)
    if len(pos_pairs) == 0:
        raise UnpairedException("Unable to find positive pairs.")

    # Identify negative pairs based on `neg_sameby` and `neg_diffby`
    logger.info("Finding negative pairs...")
    neg_pairs = find_pairs(meta, sameby=neg_sameby, diffby=neg_diffby)
    if len(neg_pairs) == 0:
        raise UnpairedException("Unable to find negative pairs.")

    # Compute similarities for positive pairs
    logger.info("Computing positive similarities...")
    pos_sims = similarity_fn(feats, pos_pairs, batch_size)

    # Compute similarities for negative pairs
    logger.info("Computing negative similarities...")
    neg_sims = similarity_fn(feats, neg_pairs, batch_size)

    # Build rank lists for calculating average precision
    logger.info("Building rank lists...")
    paired_ix, rel_k_list, counts = build_rank_lists(
        pos_pairs, neg_pairs, pos_sims, neg_sims
    )

    # Compute average precision scores and associated configurations
    logger.info("Computing average precision...")
    ap_scores, null_confs = compute.ap_contiguous(rel_k_list, counts)

    # Add AP scores and pair counts to the metadata DataFrame
    logger.info("Creating result DataFrame...")
    meta["n_pos_pairs"] = 0
    meta["n_total_pairs"] = 0
    meta.loc[paired_ix, "average_precision"] = ap_scores
    meta.loc[paired_ix, "n_pos_pairs"] = null_confs[:, 0]
    meta.loc[paired_ix, "n_total_pairs"] = null_confs[:, 1]

    # Compute normalized AP scores
    logger.info("Computing normalized average precision...")
    meta["normalized_average_precision"] = np.nan
    # Compute normalized scores for profiles with pairs
    M = null_confs[:, 0]  # n_pos_pairs
    L = null_confs[:, 1]  # n_total_pairs
    N = L - M  # n_neg_pairs
    normalized_scores = normalize_ap(ap_scores, M, N)
    meta.loc[paired_ix, "normalized_average_precision"] = normalized_scores

    logger.info("Finished.")
    return meta

build_rank_lists(pos_pairs, neg_pairs, pos_sims, neg_sims)

Build rank lists for calculating average precision.

This function processes positive and negative pairs along with their similarity scores to construct rank lists and determine unique profile indices with their associated counts.

Parameters:

  • pos_pairs (ndarray) –

    Array of positive pair indices, where each pair is represented as a pair of integers.

  • neg_pairs (ndarray) –

    Array of negative pair indices, where each pair is represented as a pair of integers.

  • pos_sims (ndarray) –

    Array of similarity scores for positive pairs.

  • neg_sims (ndarray) –

    Array of similarity scores for negative pairs.

Returns:

  • paired_ix ( ndarray ) –

    Unique indices of profiles that appear in the rank lists.

  • rel_k_list ( ndarray ) –

    Array of relevance labels (1 for positive pairs, 0 for negative pairs) sorted by decreasing similarity within each profile.

  • counts ( ndarray ) –

    Array of counts indicating how many times each profile index appears in the rank lists.

Source code in src/copairs/map/average_precision.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def build_rank_lists(
    pos_pairs: np.ndarray,
    neg_pairs: np.ndarray,
    pos_sims: np.ndarray,
    neg_sims: np.ndarray,
):
    """Build rank lists for calculating average precision.

    This function processes positive and negative pairs along with their similarity scores
    to construct rank lists and determine unique profile indices with their associated counts.

    Parameters
    ----------
    pos_pairs : np.ndarray
        Array of positive pair indices, where each pair is represented as a pair of integers.

    neg_pairs : np.ndarray
        Array of negative pair indices, where each pair is represented as a pair of integers.

    pos_sims : np.ndarray
        Array of similarity scores for positive pairs.

    neg_sims : np.ndarray
        Array of similarity scores for negative pairs.

    Returns
    -------
    paired_ix : np.ndarray
        Unique indices of profiles that appear in the rank lists.

    rel_k_list : np.ndarray
        Array of relevance labels (1 for positive pairs, 0 for negative pairs) sorted by
        decreasing similarity within each profile.

    counts : np.ndarray
        Array of counts indicating how many times each profile index appears in the rank lists.
    """
    # Combine relevance labels: 1 for positive pairs, 0 for negative pairs
    labels = np.concatenate(
        [
            np.ones(pos_pairs.size, dtype=np.uint32),
            np.zeros(neg_pairs.size, dtype=np.uint32),
        ]
    )

    # Flatten positive and negative pair indices for ranking
    ix = np.concatenate([pos_pairs.ravel(), neg_pairs.ravel()])

    # Expand similarity scores to match the flattened pair indices
    sim_all = np.concatenate([np.repeat(pos_sims, 2), np.repeat(neg_sims, 2)])

    # Sort by index (lexicographical order) and then by similarity (descending)
    # `1 - sim_all` ensures higher similarity values appear first, prioritizing
    # pairs with stronger similarity scores for ranking.
    # `ix` acts as a secondary criterion, ensuring consistent ordering of pairs
    # with equal similarity scores by their indices (lexicographical order).
    ix_sort = np.lexsort([1 - sim_all, ix])

    # Create the rank list of relevance labels sorted by similarity and index
    rel_k_list = labels[ix_sort]

    # Find unique profile indices and count their occurrences in the pairs
    paired_ix, counts = np.unique(ix, return_counts=True)

    return paired_ix, rel_k_list, counts.astype(np.uint32)

p_values(dframe, null_size, seed, progress_bar=True)

Compute p-values for average precision scores based on a null distribution.

This function calculates the p-values for each profile in the input DataFrame, comparing their average precision scores (average_precision) against a null distribution generated for their specific configurations (number of positive and total pairs). Profiles with no positive pairs are excluded from the p-value calculation.

Parameters:

  • dframe (DataFrame) –

    A DataFrame containing the following columns: - average_precision: The AP scores for each profile. - n_pos_pairs: Number of positive pairs for each profile. - n_total_pairs: Total number of pairs (positive + negative) for each profile.

  • null_size (int) –

    The number of samples to generate in the null distribution for significance testing.

  • seed (int) –

    Random seed for reproducibility of the null distribution.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

Returns:

  • ndarray

    An array of p-values for each profile in the DataFrame. Profiles with no positive pairs will have NaN as their p-value.

Source code in src/copairs/map/average_precision.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
def p_values(
    dframe: pd.DataFrame, null_size: int, seed: int, progress_bar: bool = True
) -> np.ndarray:
    """Compute p-values for average precision scores based on a null distribution.

    This function calculates the p-values for each profile in the input DataFrame,
    comparing their average precision scores (`average_precision`) against a null
    distribution generated for their specific configurations (number of positive
    and total pairs). Profiles with no positive pairs are excluded from the p-value calculation.

    Parameters
    ----------
    dframe : pd.DataFrame
        A DataFrame containing the following columns:
        - `average_precision`: The AP scores for each profile.
        - `n_pos_pairs`: Number of positive pairs for each profile.
        - `n_total_pairs`: Total number of pairs (positive + negative) for each profile.
    null_size : int
        The number of samples to generate in the null distribution for significance testing.
    seed : int
        Random seed for reproducibility of the null distribution.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.

    Returns
    -------
    np.ndarray
        An array of p-values for each profile in the DataFrame. Profiles with no positive
        pairs will have NaN as their p-value.
    """
    # Create a mask to filter profiles with at least one positive pair
    mask = dframe["n_pos_pairs"] > 0

    # Initialize the p-values array with NaN for all profiles
    pvals = np.full(len(dframe), np.nan, dtype=np.float32)

    # Extract the average precision scores and null configurations for valid profiles
    scores = dframe.loc[mask, "average_precision"].values
    null_confs = dframe.loc[mask, ["n_pos_pairs", "n_total_pairs"]].values

    # Compute p-values for profiles with valid configurations using the null distribution
    pvals[mask] = compute.p_values(scores, null_confs, null_size, seed, progress_bar)

    # Return the array of p-values, including NaN for invalid profiles
    return pvals

filter

Functions to support query-like syntax when finding the matches.

apply_filters(df, query_list)

Combine and apply query filters to a DataFrame.

This function takes a list of query expressions and applies them to a DataFrame to filter its rows. If no query expressions are provided, the original DataFrame is returned unchanged.

Parameters:

  • df (DataFrame) –

    The DataFrame to which the filters will be applied.

  • query_list (List[str]) –

    A list of query expressions (e.g., "column_name > 5"). These expressions should follow the syntax supported by pd.DataFrame.query.

Returns:

  • DataFrame

    The DataFrame filtered based on the provided query expressions.

Raises:

  • ValueError:
    • If the combined query results in an empty DataFrame.
    • If the combined query expression is invalid.
Source code in src/copairs/map/filter.py
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def apply_filters(df: pd.DataFrame, query_list: List[str]) -> pd.DataFrame:
    """Combine and apply query filters to a DataFrame.

    This function takes a list of query expressions and applies them to a DataFrame
    to filter its rows. If no query expressions are provided, the original DataFrame
    is returned unchanged.

    Parameters
    ----------
    df : pd.DataFrame
        The DataFrame to which the filters will be applied.
    query_list : List[str]
        A list of query expressions (e.g., "column_name > 5"). These expressions
        should follow the syntax supported by `pd.DataFrame.query`.

    Returns
    -------
    pd.DataFrame
        The DataFrame filtered based on the provided query expressions.

    Raises
    ------
    ValueError:
        - If the combined query results in an empty DataFrame.
        - If the combined query expression is invalid.
    """
    # If no queries are provided, return the original DataFrame unchanged
    if not query_list:
        return df

    # Combine the query expressions into a single string using logical AND (&)
    combined_query = " & ".join(f"({query})" for query in query_list)

    try:
        # Apply the combined query to filter the DataFrame
        df_filtered = df.query(combined_query)

        # Raise an error if the filtered DataFrame is empty
        if df_filtered.empty:
            raise ValueError(f"No data matched the query: {combined_query}")
    except Exception as e:
        # Handle any issues with the query expression and provide feedback
        raise ValueError(
            f"Invalid combined query expression: {combined_query}. Error: {e}"
        )

    # Return the filtered DataFrame
    return df_filtered

evaluate_and_filter(df, columns)

Evaluate query filters and filter the metadata DataFrame based on specified columns.

This function processes column specifications, extracts any filter conditions, applies these conditions to the metadata DataFrame, and returns the filtered metadata along with the updated list of columns.

Parameters:

  • df (DataFrame) –

    The metadata DataFrame containing information about profiles to be filtered.

  • columns (List[str]) –

    A list of metadata column names.

Returns:

  • Tuple[DataFrame, List[str]]
    • The filtered metadata DataFrame.
    • The updated list of columns after processing any filter specifications.
Source code in src/copairs/map/filter.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def evaluate_and_filter(
    df: pd.DataFrame, columns: List[str]
) -> Tuple[pd.DataFrame, List[str]]:
    """Evaluate query filters and filter the metadata DataFrame based on specified columns.

    This function processes column specifications, extracts any filter conditions,
    applies these conditions to the metadata DataFrame, and returns the filtered metadata
    along with the updated list of columns.

    Parameters
    ----------
    df : pd.DataFrame
        The metadata DataFrame containing information about profiles to be filtered.
    columns : List[str]
        A list of metadata column names.

    Returns
    -------
    Tuple[pd.DataFrame, List[str]]
        - The filtered metadata DataFrame.
        - The updated list of columns after processing any filter specifications.
    """
    # Extract query filters from the column specifications
    query_list, columns = extract_filters(columns, df.columns)

    # Apply the extracted filters to the metadata DataFrame
    df = apply_filters(df, query_list)

    # Return the filtered metadata DataFrame and the updated list of columns
    return df, columns

extract_filters(columns, df_columns)

Extract and validate query filters from selected metadata columns.

Parameters:

  • columns (List[str]) –

    A list of selected metadata column names or query expressions. Query expressions should follow a valid syntax (e.g., "metadata_column > 5" or "metadata_column == 'value'").

  • df_columns (List[str]) –

    All available metadata column names to validate against.

Returns:

  • Tuple[List[str], List[str]]
    • queries_to_eval: A list of valid query expressions to evaluate.
    • parsed_cols: A list of valid metadata column names extracted from the input columns.

Raises:

  • ValueError:
    • If a metadata column or query expression is invalid (e.g., references a non-existent column).
    • If duplicate queries are found for the same metadata column.
Source code in src/copairs/map/filter.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def extract_filters(
    columns: List[str], df_columns: List[str]
) -> Tuple[List[str], List[str]]:
    """Extract and validate query filters from selected metadata columns.

    Parameters
    ----------
    columns : List[str]
        A list of selected metadata column names or query expressions. Query expressions
        should follow a valid syntax (e.g., "metadata_column > 5" or "metadata_column == 'value'").
    df_columns : List[str]
        All available metadata column names to validate against.

    Returns
    -------
    Tuple[List[str], List[str]]
        - `queries_to_eval`: A list of valid query expressions to evaluate.
        - `parsed_cols`: A list of valid metadata column names extracted from the input `columns`.

    Raises
    ------
    ValueError:
        - If a metadata column or query expression is invalid (e.g., references a non-existent column).
        - If duplicate queries are found for the same metadata column.
    """
    # Initialize lists to store parsed metadata column names and query expressions
    parsed_cols = []
    queries_to_eval = []

    # Iterate through each entry in the selected metadata columns
    for col in columns:
        if col in df_columns:
            # If the entry is a valid metadata column name, add it to parsed_cols
            parsed_cols.append(col)
            continue

        # Use regex to extract metadata column names from query expressions
        column_names = re.findall(r"(\w+)\s*[=<>!]+", col)

        # Validate the extracted metadata column names against all available metadata columns
        valid_column_names = [col for col in column_names if col in df_columns]
        if not valid_column_names:
            raise ValueError(f"Invalid query or metadata column name: {col}")

        # Add valid query expressions and associated metadata columns
        queries_to_eval.append(col)
        parsed_cols.extend(valid_column_names)

        # Check for duplicate metadata columns in the parsed list
        if len(parsed_cols) != len(set(parsed_cols)):
            raise ValueError(f"Duplicate queries for column: {col}")

    # Return the queries to evaluate and the parsed metadata column names
    return queries_to_eval, parsed_cols

flatten_str_list(*args)

Create a single list with all the params given.

Source code in src/copairs/map/filter.py
45
46
47
48
49
50
51
52
53
54
55
56
def flatten_str_list(*args):
    """Create a single list with all the params given."""
    columns = set()
    for col in args:
        if isinstance(col, str):
            columns.add(col)
        elif isinstance(col, dict):
            columns.update(itertools.chain.from_iterable(col.values()))
        else:
            columns.update(col)
    columns = list(columns)
    return columns

validate_pipeline_input(meta, feats, columns)

Validate the metadata and features for consistency and completeness.

Parameters:

  • meta (DataFrame) –

    The metadata DataFrame describing the profiles.

  • feats (ndarray) –

    The feature matrix where rows correspond to profiles in the metadata.

  • columns (List[str]) –

    List of column names in the metadata to validate for null values.

Raises:

  • ValueError:
    • If any of the specified metadata columns contain null values.
    • If the number of rows in the metadata and features are not equal.
    • If the feature matrix contains null values.
Source code in src/copairs/map/filter.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def validate_pipeline_input(
    meta: pd.DataFrame, feats: np.ndarray, columns: List[str]
) -> None:
    """Validate the metadata and features for consistency and completeness.

    Parameters
    ----------
    meta : pd.DataFrame
        The metadata DataFrame describing the profiles.
    feats : np.ndarray
        The feature matrix where rows correspond to profiles in the metadata.
    columns : List[str]
        List of column names in the metadata to validate for null values.

    Raises
    ------
    ValueError:
        - If any of the specified metadata columns contain null values.
        - If the number of rows in the metadata and features are not equal.
        - If the feature matrix contains null values.
    """
    # Check for null values in the specified metadata columns
    if meta[columns].isna().any(axis=None):
        raise ValueError("metadata columns should not have null values.")

    # Check if the number of rows in metadata matches the feature matrix
    if len(meta) != len(feats):
        raise ValueError("Metadata and features must have the same number of rows.")

    # Check for null values in the feature matrix
    if np.isnan(feats).any():
        raise ValueError("features should not have null values.")

hierarchical_fdr

Hierarchical FDR correction for grouped hypothesis testing.

apply_fdr_correction(map_scores, method='fdr_bh')

Apply standard FDR correction across all tests.

Parameters:

  • map_scores (DataFrame) –

    DataFrame containing mAP scores with a 'p_value' column.

  • method (str, default: 'fdr_bh' ) –

    Multiple testing correction method (default: 'fdr_bh'). See statsmodels.stats.multitest.multipletests for options.

Returns:

  • DataFrame

    Input DataFrame with 'corrected_p_value' column added.

Source code in src/copairs/map/hierarchical_fdr.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def apply_fdr_correction(
    map_scores: pd.DataFrame,
    method: str = "fdr_bh",
) -> pd.DataFrame:
    """Apply standard FDR correction across all tests.

    Parameters
    ----------
    map_scores : pd.DataFrame
        DataFrame containing mAP scores with a 'p_value' column.
    method : str, optional
        Multiple testing correction method (default: 'fdr_bh').
        See statsmodels.stats.multitest.multipletests for options.

    Returns
    -------
    pd.DataFrame
        Input DataFrame with 'corrected_p_value' column added.

    """
    map_scores = map_scores.copy()
    _, pvals_corrected, _, _ = multipletests(map_scores["p_value"], method=method)
    map_scores["corrected_p_value"] = pvals_corrected
    return map_scores

apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)

Apply hierarchical FDR correction for grouped hypotheses.

Implements a two-stage testing procedure appropriate for dose-response data where only high doses are expected to be active:

  • Stage 1: Use minimum p-value within each group defined by hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant.
  • Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.

Parameters:

  • map_scores (DataFrame) –

    DataFrame containing mAP scores with a 'p_value' column.

  • hierarchical_by (list) –

    Metadata column(s) defining the group structure (e.g., ['compound']).

  • sameby (list) –

    Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).

Returns:

  • DataFrame

    Input DataFrame with additional columns: - corrected_p_value: BH-corrected p-value (1.0 for groups that didn't pass Stage 1). - stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). - stage1_corrected_p_value: BH-corrected Stage 1 p-value. - stage1_significant: Whether the group passed Stage 1.

Raises:

  • ValueError

    If hierarchical_by is not a proper subset of sameby.

Notes

This method uses minimum p-value (rather than Simes) for Stage 1 aggregation. Min-p is appropriate for dose-response data where only high doses are expected to be active. Simes would penalize compounds for having inactive low doses, which is the expected biological behavior.

Source code in src/copairs/map/hierarchical_fdr.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def apply_hierarchical_fdr_correction(
    map_scores: pd.DataFrame,
    hierarchical_by: List[str],
    sameby: List[str],
) -> pd.DataFrame:
    """Apply hierarchical FDR correction for grouped hypotheses.

    Implements a two-stage testing procedure appropriate for dose-response data
    where only high doses are expected to be active:

    - Stage 1: Use minimum p-value within each group defined by `hierarchical_by`,
      then apply BH correction at the group level. A group passes if any member
      is significant.
    - Stage 2: For groups that pass Stage 1, apply BH correction to the
      individual tests within each group.

    Parameters
    ----------
    map_scores : pd.DataFrame
        DataFrame containing mAP scores with a 'p_value' column.
    hierarchical_by : list
        Metadata column(s) defining the group structure (e.g., ['compound']).
    sameby : list
        Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).

    Returns
    -------
    pd.DataFrame
        Input DataFrame with additional columns:
        - `corrected_p_value`: BH-corrected p-value (1.0 for groups that didn't pass Stage 1).
        - `stage1_p_value`: Group-level p-value from Stage 1 (minimum p-value).
        - `stage1_corrected_p_value`: BH-corrected Stage 1 p-value.
        - `stage1_significant`: Whether the group passed Stage 1.

    Raises
    ------
    ValueError
        If `hierarchical_by` is not a proper subset of `sameby`.

    Notes
    -----
    This method uses minimum p-value (rather than Simes) for Stage 1 aggregation.
    Min-p is appropriate for dose-response data where only high doses are expected
    to be active. Simes would penalize compounds for having inactive low doses,
    which is the expected biological behavior.

    """
    # Validate that hierarchical_by is a subset of sameby
    if not set(hierarchical_by).issubset(set(sameby)):
        raise ValueError(
            f"hierarchical_by columns {hierarchical_by} must be a subset of "
            f"sameby columns {sameby}"
        )

    if set(hierarchical_by) == set(sameby):
        raise ValueError(
            f"hierarchical_by columns {hierarchical_by} must be a proper subset of "
            f"sameby columns {sameby}. If they are equal, use standard correction "
            f"by not specifying hierarchical_by."
        )

    logger.info("Applying hierarchical FDR correction...")
    map_scores = map_scores.copy()

    # Stage 1: Aggregate p-values to group level using minimum p-value
    # Min-p is appropriate for dose-response where only high doses are expected to be active
    stage1_pvals = map_scores.groupby(hierarchical_by, observed=True).agg(
        {"p_value": "min"}
    )
    stage1_pvals.columns = ["stage1_p_value"]

    # Apply BH correction at the group level
    reject_stage1, stage1_corrected, _, _ = multipletests(
        stage1_pvals["stage1_p_value"], method="fdr_bh"
    )
    stage1_pvals["stage1_corrected_p_value"] = stage1_corrected
    stage1_pvals["stage1_significant"] = reject_stage1

    # Merge Stage 1 results back to map_scores
    map_scores = map_scores.merge(
        stage1_pvals.reset_index(), on=hierarchical_by, how="left"
    )

    # Stage 2: For groups that passed Stage 1, apply BH within each group
    # For groups that didn't pass, set corrected_p_value to 1.0
    map_scores["corrected_p_value"] = 1.0

    for group_key, group_df in map_scores.groupby(hierarchical_by, observed=True):
        if not group_df["stage1_significant"].iloc[0]:
            # Group didn't pass Stage 1, skip
            continue

        group_indices = group_df.index
        group_pvals = group_df["p_value"].values

        if len(group_pvals) == 1:
            # Single test in group, no additional correction needed
            map_scores.loc[group_indices, "corrected_p_value"] = group_pvals[0]
        else:
            # Apply BH correction within the group
            _, group_corrected, _, _ = multipletests(group_pvals, method="fdr_bh")
            map_scores.loc[group_indices, "corrected_p_value"] = group_corrected

    return map_scores

map

Functions to compute mean average precision.

get_map_pvalue(ap_scores, sameby, null_size, seed, progress_bar=True, max_workers=None, cache_dir=None)

Compute mAP scores and p-values from AP scores.

This function groups AP scores by the specified columns, computes the mean Average Precision (mAP) for each group, and calculates p-values by comparing against null distributions.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • seed (int) –

    Random seed for reproducibility.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - Columns from sameby (group identifiers). - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - indices: List of indices in the original ap_scores for this group.

Source code in src/copairs/map/map.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def get_map_pvalue(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    seed: int,
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Compute mAP scores and p-values from AP scores.

    This function groups AP scores by the specified columns, computes the mean
    Average Precision (mAP) for each group, and calculates p-values by comparing
    against null distributions.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    seed : int
        Random seed for reproducibility.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - Columns from `sameby` (group identifiers).
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `indices`: List of indices in the original ap_scores for this group.

    """
    # Filter out invalid or incomplete AP scores
    ap_scores = ap_scores.query("~average_precision.isna() and n_pos_pairs > 0")
    ap_scores = ap_scores.reset_index(drop=True).copy()

    logger.info("Computing null_dist...")
    # Extract configurations for null distribution generation
    null_confs = ap_scores[["n_pos_pairs", "n_total_pairs"]].values
    null_confs, rev_ix = np.unique(null_confs, axis=0, return_inverse=True)

    # Generate null distributions for each unique configuration
    null_dists = compute.get_null_dists(
        null_confs, null_size, seed=seed, cache_dir=cache_dir, progress_bar=progress_bar
    )
    ap_scores["null_ix"] = rev_ix

    # Function to calculate the p-value for a mAP score based on the null distribution
    def get_p_value(params):
        map_score, indices = params
        null_dist = null_dists[rev_ix[indices]].mean(axis=0)
        num = (null_dist > map_score).sum()
        p_value = (num + 1) / (null_size + 1)  # Add 1 for stability
        return p_value

    logger.info("Computing p-values...")

    # Group by the specified metadata column(s) and calculate mean AP
    map_scores = ap_scores.groupby(sameby, observed=True, as_index=False).agg(
        {
            "average_precision": ["mean", lambda x: list(x.index)],
            "normalized_average_precision": "mean",
        }
    )
    map_scores.columns = sameby + [
        "mean_average_precision",
        "indices",
        "mean_normalized_average_precision",
    ]

    # Compute p-values for each group using the null distributions
    params = map_scores[["mean_average_precision", "indices"]]

    if progress_bar:
        from tqdm.contrib.concurrent import thread_map

        p_values = thread_map(
            get_p_value, params.values, leave=False, max_workers=max_workers
        )
    else:
        p_values = silent_thread_map(
            get_p_value, params.values, max_workers=max_workers
        )
    map_scores["p_value"] = p_values

    return map_scores

mean_average_precision(ap_scores, sameby, null_size, threshold, seed, progress_bar=True, max_workers=None, cache_dir=None)

Calculate the Mean Average Precision (mAP) score and associated p-values.

This function computes the Mean Average Precision (mAP) score by grouping profiles based on the specified criteria (sameby). It calculates the significance of mAP scores by comparing them to a null distribution and performs multiple testing corrections using Benjamini-Hochberg FDR.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • threshold (float) –

    p-value threshold for identifying significant MaP scores.

  • seed (int) –

    Random seed for reproducibility.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - corrected_p_value: Adjusted p-value after multiple testing correction. - below_p: Boolean indicating if the p-value is below the threshold. - below_corrected_p: Boolean indicating if the corrected p-value is below the threshold.

See Also

mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.

Source code in src/copairs/map/map.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def mean_average_precision(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    threshold: float,
    seed: int,
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Calculate the Mean Average Precision (mAP) score and associated p-values.

    This function computes the Mean Average Precision (mAP) score by grouping profiles
    based on the specified criteria (`sameby`). It calculates the significance of mAP
    scores by comparing them to a null distribution and performs multiple testing
    corrections using Benjamini-Hochberg FDR.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    threshold : float
        p-value threshold for identifying significant MaP scores.
    seed : int
        Random seed for reproducibility.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `corrected_p_value`: Adjusted p-value after multiple testing correction.
        - `below_p`: Boolean indicating if the p-value is below the threshold.
        - `below_corrected_p`: Boolean indicating if the corrected p-value is below the threshold.

    See Also
    --------
    mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.

    """
    # Step 1: Compute mAP scores and p-values
    map_scores = get_map_pvalue(
        ap_scores=ap_scores,
        sameby=sameby,
        null_size=null_size,
        seed=seed,
        progress_bar=progress_bar,
        max_workers=max_workers,
        cache_dir=cache_dir,
    )

    # Step 2: Apply multiple testing correction
    map_scores = apply_fdr_correction(map_scores)

    # Step 3: Mark scores below the p-value threshold
    map_scores["below_p"] = map_scores["p_value"] < threshold
    map_scores["below_corrected_p"] = map_scores["corrected_p_value"] < threshold

    return map_scores

mean_average_precision_hierarchical(ap_scores, sameby, null_size, threshold, seed, hierarchical_by, progress_bar=True, max_workers=None, cache_dir=None)

Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.

This function computes the Mean Average Precision (mAP) score by grouping profiles based on the specified criteria (sameby). It applies hierarchical FDR correction appropriate for grouped hypothesis testing, such as dose-response data.

Parameters:

  • ap_scores (DataFrame) –

    DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs n_pos_pairs and total pairs n_total_pairs).

  • sameby (list or str) –

    Metadata column(s) used to group profiles for mAP calculation.

  • null_size (int) –

    Number of samples in the null distribution for significance testing.

  • threshold (float) –

    p-value threshold for identifying significant MaP scores.

  • seed (int) –

    Random seed for reproducibility.

  • hierarchical_by (list) –

    Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:

    • Stage 1: Use minimum p-value within each group defined by hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant.
    • Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.

    This is designed for dose-response data where only high doses are expected to be active. The hierarchical_by columns must be a proper subset of sameby. For example, with sameby=['compound', 'dose'] and hierarchical_by=['compound'], mAP is calculated per compound×dose, but FDR correction accounts for the grouped structure.

  • progress_bar (bool, default: True ) –

    Whether or not to show tqdm's progress bar.

  • max_workers (int, default: None ) –

    Number of workers used. Default defined by tqdm's thread_map.

  • cache_dir (str or Path, default: None ) –

    Location to save the cache.

Returns:

  • DataFrame

    DataFrame with the following columns: - mean_average_precision: Mean AP score for each group. - mean_normalized_average_precision: Mean normalized AP score (scale-independent). - p_value: p-value comparing mAP to the null distribution. - corrected_p_value: Adjusted p-value after multiple testing correction. - below_p: Boolean indicating if the p-value is below the threshold. - below_corrected_p: Boolean indicating if the corrected p-value is below the threshold. - stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). - stage1_corrected_p_value: BH-corrected Stage 1 p-value. - stage1_significant: Whether the group passed Stage 1.

See Also

mean_average_precision : For standard BH FDR correction.

Source code in src/copairs/map/map.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def mean_average_precision_hierarchical(
    ap_scores: pd.DataFrame,
    sameby: List[str],
    null_size: int,
    threshold: float,
    seed: int,
    hierarchical_by: List[str],
    progress_bar: bool = True,
    max_workers: Optional[int] = None,
    cache_dir: Optional[Union[str, Path]] = None,
) -> pd.DataFrame:
    """Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.

    This function computes the Mean Average Precision (mAP) score by grouping profiles
    based on the specified criteria (`sameby`). It applies hierarchical FDR correction
    appropriate for grouped hypothesis testing, such as dose-response data.

    Parameters
    ----------
    ap_scores : pd.DataFrame
        DataFrame containing individual Average Precision (AP) scores and pair statistics
        (e.g., number of positive pairs `n_pos_pairs` and total pairs `n_total_pairs`).
    sameby : list or str
        Metadata column(s) used to group profiles for mAP calculation.
    null_size : int
        Number of samples in the null distribution for significance testing.
    threshold : float
        p-value threshold for identifying significant MaP scores.
    seed : int
        Random seed for reproducibility.
    hierarchical_by : list
        Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:

        - Stage 1: Use minimum p-value within each group defined by `hierarchical_by`,
          then apply BH correction at the group level. A group passes if any member
          is significant.
        - Stage 2: For groups that pass Stage 1, apply BH correction to the
          individual tests within each group.

        This is designed for dose-response data where only high doses are expected
        to be active. The `hierarchical_by` columns must be a proper subset of `sameby`.
        For example, with `sameby=['compound', 'dose']` and `hierarchical_by=['compound']`,
        mAP is calculated per compound×dose, but FDR correction accounts for the
        grouped structure.
    progress_bar : bool
        Whether or not to show tqdm's progress bar.
    max_workers : int
        Number of workers used. Default defined by tqdm's `thread_map`.
    cache_dir : str or Path
        Location to save the cache.

    Returns
    -------
    pd.DataFrame
        DataFrame with the following columns:
        - `mean_average_precision`: Mean AP score for each group.
        - `mean_normalized_average_precision`: Mean normalized AP score (scale-independent).
        - `p_value`: p-value comparing mAP to the null distribution.
        - `corrected_p_value`: Adjusted p-value after multiple testing correction.
        - `below_p`: Boolean indicating if the p-value is below the threshold.
        - `below_corrected_p`: Boolean indicating if the corrected p-value is below the threshold.
        - `stage1_p_value`: Group-level p-value from Stage 1 (minimum p-value).
        - `stage1_corrected_p_value`: BH-corrected Stage 1 p-value.
        - `stage1_significant`: Whether the group passed Stage 1.

    See Also
    --------
    mean_average_precision : For standard BH FDR correction.

    """
    # Step 1: Compute mAP scores and p-values
    map_scores = get_map_pvalue(
        ap_scores=ap_scores,
        sameby=sameby,
        null_size=null_size,
        seed=seed,
        progress_bar=progress_bar,
        max_workers=max_workers,
        cache_dir=cache_dir,
    )

    # Step 2: Apply hierarchical multiple testing correction
    # Includes stage1_* columns for transparency. Could drop these in future
    # for cleaner output (only corrected_p_value is needed downstream).
    map_scores = apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)

    # Step 3: Mark scores below the p-value threshold
    map_scores["below_p"] = map_scores["p_value"] < threshold
    map_scores["below_corrected_p"] = map_scores["corrected_p_value"] < threshold

    return map_scores

silent_thread_map(fn, *iterables, **kwargs)

Map iterables and kwargs to a function.

Parameters:

  • fn (callable) –

    Function to map over iterables.

  • *iterables (tuple, default: () ) –

    Iterables to map over.

  • **kwargs (dict, default: {} ) –

    Additional keyword arguments. Accepts: - max_workers : int, optional Maximum number of workers [default: min(32, cpu_count() + 4)]. - chunksize : int, optional Size of chunks for each worker [default: 1].

Source code in src/copairs/map/map.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
def silent_thread_map(fn, *iterables, **kwargs):
    """Map iterables and kwargs to a function.

    Parameters
    ----------
    fn : callable
        Function to map over iterables.
    *iterables : tuple
        Iterables to map over.
    **kwargs : dict
        Additional keyword arguments. Accepts:
        - max_workers : int, optional
            Maximum number of workers [default: min(32, cpu_count() + 4)].
        - chunksize : int, optional
            Size of chunks for each worker [default: 1].
    """
    # Based on tqdm's original implementation for consistency
    # (github.com/tqdm/tqdm/blob/0ed5d7f18fa3153834cbac0aa57e8092b217cc16/tqdm/contrib/concurrent.py#L29).

    kwargs = kwargs.copy()
    max_workers = kwargs.pop("max_workers", min(32, cpu_count() + 4))
    chunksize = kwargs.pop("chunksize", 1)
    with ThreadPoolExecutor(max_workers=max_workers) as ex:
        return list(ex.map(fn, *iterables, chunksize=chunksize, **kwargs))

multilabel

Functions to compute mAP with multilabel support.

average_precision(meta, feats, pos_sameby, pos_diffby, neg_sameby, neg_diffby, multilabel_col, batch_size=20000, distance='cosine', progress_bar=True)

Compute average precision with multilabel support.

Returns normalized_average_precision in addition to average_precision.

See Also

copairs.map.average_precision : Average precision without multilabel support.

Source code in src/copairs/map/multilabel.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def average_precision(
    meta: pd.DataFrame,
    feats: pd.DataFrame,
    pos_sameby: List[str],
    pos_diffby: List[str],
    neg_sameby: List[str],
    neg_diffby: List[str],
    multilabel_col,
    batch_size=20000,
    distance="cosine",
    progress_bar: bool = True,
) -> pd.DataFrame:
    """
    Compute average precision with multilabel support.

    Returns normalized_average_precision in addition to average_precision.

    See Also
    --------
    copairs.map.average_precision : Average precision without multilabel support.
    """
    columns = flatten_str_list(pos_sameby, pos_diffby, neg_sameby, neg_diffby)
    meta, columns = evaluate_and_filter(meta, columns)
    validate_pipeline_input(meta, feats, columns)
    distance_fn = compute.get_similarity_fn(distance, progress_bar=progress_bar)
    # Critical!, otherwise the indexing wont work
    meta = meta.reset_index(drop=True).copy()

    logger.info("Indexing metadata...")

    logger.info("Finding positive pairs...")
    pos_pairs, keys, pos_counts = find_pairs_multilabel(
        meta, sameby=pos_sameby, diffby=pos_diffby, multilabel_col=multilabel_col
    )
    if len(pos_pairs) == 0:
        raise UnpairedException("Unable to find positive pairs.")

    logger.info("Finding negative pairs...")
    neg_pairs = find_pairs_multilabel(
        meta, sameby=neg_sameby, diffby=neg_diffby, multilabel_col=multilabel_col
    )
    if len(neg_pairs) == 0:
        raise UnpairedException("Unable to find any negative pairs.")

    logger.info("Dropping dups in negative pairs...")
    neg_pairs = np.unique(neg_pairs, axis=0)

    logger.info("Computing positive similarities...")
    pos_sims = distance_fn(feats, pos_pairs, batch_size)

    logger.info("Computing negative similarities...")
    neg_sims = distance_fn(feats, neg_pairs, batch_size)

    logger.info("Computing AP per label...")
    negs_for = _create_neg_query_solver(neg_pairs, neg_sims)
    ap_scores_list, null_confs_list, ix_list = _build_rank_lists_multi(
        pos_pairs, pos_sims, pos_counts, negs_for
    )

    logger.info("Creating result DataFrame...")
    results = []
    "Here the positive pairs are per-item inside multilabel_col"
    # TODO Check if multi-label key is necessary
    for i, key in enumerate(keys):
        # Compute normalized AP for this label group
        M = null_confs_list[i][:, 0]  # n_pos_pairs
        L = null_confs_list[i][:, 1]  # n_total_pairs
        N = L - M  # n_neg_pairs
        normalized_scores = normalize_ap(ap_scores_list[i], M, N)

        result = pd.DataFrame(
            {
                "average_precision": ap_scores_list[i],
                "normalized_average_precision": normalized_scores,
                "n_pos_pairs": null_confs_list[i][:, 0],
                "n_total_pairs": null_confs_list[i][:, 1],
                "ix": ix_list[i],
                multilabel_col: key,
            }
        )
        results.append(result)
    results = pd.concat(results).reset_index(drop=True)
    meta = meta.drop(multilabel_col, axis=1)
    results = meta.merge(results, right_on="ix", left_index=True).drop("ix", axis=1)
    results["n_pos_pairs"] = results["n_pos_pairs"].fillna(0).astype(np.uint32)
    results["n_total_pairs"] = results["n_total_pairs"].fillna(0).astype(np.uint32)
    logger.info("Finished.")
    return results

normalization

Functions for normalizing Average Precision scores.

compute_normalized_ap_scores(ap_scores, null_confs)

Compute both raw and normalized Average Precision scores.

Parameters:

  • ap_scores (ndarray) –

    Array of raw Average Precision scores.

  • null_confs (ndarray) –

    Array of configurations where each row is [n_pos_pairs, n_total_pairs].

Returns:

  • ap_scores ( ndarray ) –

    The original raw AP scores.

  • normalized_ap_scores ( ndarray ) –

    The normalized AP scores.

Source code in src/copairs/map/normalization.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def compute_normalized_ap_scores(
    ap_scores: np.ndarray, null_confs: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:
    """Compute both raw and normalized Average Precision scores.

    Parameters
    ----------
    ap_scores : np.ndarray
        Array of raw Average Precision scores.
    null_confs : np.ndarray
        Array of configurations where each row is [n_pos_pairs, n_total_pairs].

    Returns
    -------
    ap_scores : np.ndarray
        The original raw AP scores.
    normalized_ap_scores : np.ndarray
        The normalized AP scores.
    """
    # Extract M (positive pairs) and compute N (negative pairs)
    M = null_confs[:, 0].astype(int)
    L = null_confs[:, 1].astype(int)
    N = L - M

    # Compute normalized scores
    normalized_ap_scores = normalize_ap(ap_scores, M, N)

    return ap_scores, normalized_ap_scores

expected_ap(M, N)

Compute the expected Average Precision under random ranking.

This implements the exact finite-sample formula for expected AP when items are randomly ranked.

Parameters:

  • M (int) –

    Number of positive items (relevant documents).

  • N (int) –

    Number of negative items (irrelevant documents).

Returns:

  • float

    The expected Average Precision under random ranking.

Notes

Formula: E[AP] = (1/L) × [(M-1)/(L-1) × (L - H_L) + H_L] where L = M + N and H_L is the L-th harmonic number.

Source code in src/copairs/map/normalization.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def expected_ap(M: int, N: int) -> float:
    """Compute the expected Average Precision under random ranking.

    This implements the exact finite-sample formula for expected AP when
    items are randomly ranked.

    Parameters
    ----------
    M : int
        Number of positive items (relevant documents).
    N : int
        Number of negative items (irrelevant documents).

    Returns
    -------
    float
        The expected Average Precision under random ranking.

    Notes
    -----
    Formula: E[AP] = (1/L) × [(M-1)/(L-1) × (L - H_L) + H_L]
    where L = M + N and H_L is the L-th harmonic number.
    """
    L = M + N

    # Handle edge cases
    if L < 1 or M < 0 or N < 0:
        raise ValueError(f"Invalid inputs: M={M}, N={N}")
    if L == 1:
        return 1.0 if M == 1 else 0.0
    if M == 0:
        return 0.0
    if M == L:  # All items are positive
        return 1.0

    # Compute the L-th harmonic number
    H_L = harmonic_number(L)

    # Apply the exact formula
    mu0 = (1.0 / L) * (((M - 1.0) / (L - 1.0)) * (L - H_L) + H_L)

    return mu0

harmonic_number(n)

Compute the n-th harmonic number H_n = Σ(1/k) for k=1 to n.

Parameters:

  • n (int) –

    The index of the harmonic number to compute.

Returns:

  • float

    The n-th harmonic number.

Source code in src/copairs/map/normalization.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def harmonic_number(n: int) -> float:
    """Compute the n-th harmonic number H_n = Σ(1/k) for k=1 to n.

    Parameters
    ----------
    n : int
        The index of the harmonic number to compute.

    Returns
    -------
    float
        The n-th harmonic number.
    """
    if n <= 0:
        return 0.0
    return sum(1.0 / k for k in range(1, n + 1))

normalize_ap(ap, M, N, eps=1e-10)

Normalize Average Precision scores to be scale-independent.

Computes the normalized AP as (AP - μ₀) / (1 - μ₀) where μ₀ is the expected AP under random ranking.

Parameters:

  • ap (float or ndarray) –

    The Average Precision score(s) to normalize.

  • M (int or ndarray) –

    Number of positive items for each AP score.

  • N (int or ndarray) –

    Number of negative items for each AP score.

  • eps (float, default: 1e-10 ) –

    Small epsilon to avoid division by zero when μ₀ ≈ 1.

Returns:

  • float or ndarray

    The normalized Average Precision score(s).

Notes
  • Normalized AP = 0 when performance equals random chance
  • Normalized AP = 1 when performance is perfect
  • Negative values indicate worse-than-random performance
Source code in src/copairs/map/normalization.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def normalize_ap(
    ap: Union[float, np.ndarray],
    M: Union[int, np.ndarray],
    N: Union[int, np.ndarray],
    eps: float = 1e-10,
) -> Union[float, np.ndarray]:
    """Normalize Average Precision scores to be scale-independent.

    Computes the normalized AP as (AP - μ₀) / (1 - μ₀) where μ₀ is the
    expected AP under random ranking.

    Parameters
    ----------
    ap : float or np.ndarray
        The Average Precision score(s) to normalize.
    M : int or np.ndarray
        Number of positive items for each AP score.
    N : int or np.ndarray
        Number of negative items for each AP score.
    eps : float
        Small epsilon to avoid division by zero when μ₀ ≈ 1.

    Returns
    -------
    float or np.ndarray
        The normalized Average Precision score(s).

    Notes
    -----
    - Normalized AP = 0 when performance equals random chance
    - Normalized AP = 1 when performance is perfect
    - Negative values indicate worse-than-random performance
    """
    # Handle scalar or array inputs
    is_scalar = np.isscalar(ap)

    ap = np.atleast_1d(ap)
    M = np.atleast_1d(M)
    N = np.atleast_1d(N)

    # Validate that all arrays have compatible lengths
    lengths = [
        len(ap),
        len(M) if len(M) > 1 else len(ap),
        len(N) if len(N) > 1 else len(ap),
    ]
    if len(set(lengths)) > 1:
        raise ValueError(
            f"Array lengths must match: ap={len(ap)}, M={len(M)}, N={len(N)}"
        )

    # Compute expected AP for each configuration
    mu0 = np.zeros_like(ap, dtype=float)
    for i in range(len(ap)):
        M_i = M[i] if len(M) > 1 else M[0]
        N_i = N[i] if len(N) > 1 else N[0]
        mu0[i] = expected_ap(int(M_i), int(N_i))

    # Normalize: (AP - μ₀) / (1 - μ₀)
    # Use eps to avoid division by zero when μ₀ ≈ 1
    denominator = np.maximum(1 - mu0, eps)
    normalized = (ap - mu0) / denominator

    # Clip to [-1, 1] range to handle numerical edge cases
    normalized = np.clip(normalized, -1.0, 1.0)

    return float(normalized[0]) if is_scalar else normalized