copairs.map¶
copairs.map
¶
Module to compute mAP-based metrics.
apply_fdr_correction(map_scores, method='fdr_bh')
¶
Apply standard FDR correction across all tests.
Parameters:
-
map_scores(DataFrame) –DataFrame containing mAP scores with a 'p_value' column.
-
method(str, default:'fdr_bh') –Multiple testing correction method (default: 'fdr_bh'). See statsmodels.stats.multitest.multipletests for options.
Returns:
-
DataFrame–Input DataFrame with 'corrected_p_value' column added.
Source code in src/copairs/map/hierarchical_fdr.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)
¶
Apply hierarchical FDR correction for grouped hypotheses.
Implements a two-stage testing procedure appropriate for dose-response data where only high doses are expected to be active:
- Stage 1: Use minimum p-value within each group defined by
hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant. - Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.
Parameters:
-
map_scores(DataFrame) –DataFrame containing mAP scores with a 'p_value' column.
-
hierarchical_by(list) –Metadata column(s) defining the group structure (e.g., ['compound']).
-
sameby(list) –Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).
Returns:
-
DataFrame–Input DataFrame with additional columns: -
corrected_p_value: BH-corrected p-value (1.0 for groups that didn't pass Stage 1). -stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). -stage1_corrected_p_value: BH-corrected Stage 1 p-value. -stage1_significant: Whether the group passed Stage 1.
Raises:
-
ValueError–If
hierarchical_byis not a proper subset ofsameby.
Notes
This method uses minimum p-value (rather than Simes) for Stage 1 aggregation. Min-p is appropriate for dose-response data where only high doses are expected to be active. Simes would penalize compounds for having inactive low doses, which is the expected biological behavior.
Source code in src/copairs/map/hierarchical_fdr.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
get_map_pvalue(ap_scores, sameby, null_size, seed, progress_bar=True, max_workers=None, cache_dir=None)
¶
Compute mAP scores and p-values from AP scores.
This function groups AP scores by the specified columns, computes the mean Average Precision (mAP) for each group, and calculates p-values by comparing against null distributions.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
seed(int) –Random seed for reproducibility.
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: - Columns from
sameby(group identifiers). -mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -indices: List of indices in the original ap_scores for this group.
Source code in src/copairs/map/map.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
mean_average_precision(ap_scores, sameby, null_size, threshold, seed, progress_bar=True, max_workers=None, cache_dir=None)
¶
Calculate the Mean Average Precision (mAP) score and associated p-values.
This function computes the Mean Average Precision (mAP) score by grouping profiles
based on the specified criteria (sameby). It calculates the significance of mAP
scores by comparing them to a null distribution and performs multiple testing
corrections using Benjamini-Hochberg FDR.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
threshold(float) –p-value threshold for identifying significant MaP scores.
-
seed(int) –Random seed for reproducibility.
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: -
mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -corrected_p_value: Adjusted p-value after multiple testing correction. -below_p: Boolean indicating if the p-value is below the threshold. -below_corrected_p: Boolean indicating if the corrected p-value is below the threshold.
See Also
mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.
Source code in src/copairs/map/map.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
mean_average_precision_hierarchical(ap_scores, sameby, null_size, threshold, seed, hierarchical_by, progress_bar=True, max_workers=None, cache_dir=None)
¶
Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.
This function computes the Mean Average Precision (mAP) score by grouping profiles
based on the specified criteria (sameby). It applies hierarchical FDR correction
appropriate for grouped hypothesis testing, such as dose-response data.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
threshold(float) –p-value threshold for identifying significant MaP scores.
-
seed(int) –Random seed for reproducibility.
-
hierarchical_by(list) –Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:
- Stage 1: Use minimum p-value within each group defined by
hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant. - Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.
This is designed for dose-response data where only high doses are expected to be active. The
hierarchical_bycolumns must be a proper subset ofsameby. For example, withsameby=['compound', 'dose']andhierarchical_by=['compound'], mAP is calculated per compound×dose, but FDR correction accounts for the grouped structure. - Stage 1: Use minimum p-value within each group defined by
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: -
mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -corrected_p_value: Adjusted p-value after multiple testing correction. -below_p: Boolean indicating if the p-value is below the threshold. -below_corrected_p: Boolean indicating if the corrected p-value is below the threshold. -stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). -stage1_corrected_p_value: BH-corrected Stage 1 p-value. -stage1_significant: Whether the group passed Stage 1.
See Also
mean_average_precision : For standard BH FDR correction.
Source code in src/copairs/map/map.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
average_precision
¶
Functions to compute average precision.
average_precision(meta, feats, pos_sameby, pos_diffby, neg_sameby, neg_diffby, batch_size=20000, distance='cosine', progress_bar=True)
¶
Calculate average precision (AP) scores for pairs of profiles based on their similarity.
This function identifies positive and negative pairs of profiles using metadata rules, computes their similarity scores, and calculates average precision scores for each profile. The results include the number of positive and total pairs for each profile.
Parameters:
-
meta(DataFrame) –Metadata of the profiles, including columns used for defining pairs. This DataFrame should include the columns specified in
pos_sameby,pos_diffby,neg_sameby, andneg_diffby. -
feats(ndarray) –Feature matrix representing the profiles, where rows correspond to profiles and columns to features.
-
pos_sameby(list) –Metadata columns used to define positive pairs. Two profiles are considered a positive pair if they belong to the same group that is not a control group. For example, replicate profiles of the same compound are positive pairs and should share the same value in a column identifying compounds.
-
pos_diffby(list) –Metadata columns used to differentiate positive pairs. Positive pairs do not need to differ in any metadata columns, so this is typically left empty. However, if necessary (e.g., to account for batch effects), you can specify columns such as batch identifiers.
-
neg_sameby(list) –Metadata columns used to define negative pairs. Typically left empty, as profiles forming a negative pair (e.g., a compound and a DMSO/control) do not need to share any metadata values. This ensures comparisons are made without enforcing unnecessary constraints.
-
neg_diffby(list) –Metadata columns used to differentiate negative pairs. Two profiles are considered a negative pair if one belongs to a compound group and the other to a DMSO/ control group. They must differ in specified metadata columns, such as those identifying the compound and the treatment index, to ensure comparisons are only made between compounds and DMSO controls (not between different compounds).
-
batch_size(int, default:20000) –The batch size for similarity computations to optimize memory usage. Default is 20000.
-
distance(str, default:'cosine') –The distance function used for computing similarities. Default is "cosine".
Returns:
-
DataFrame–A DataFrame containing the following columns: - 'average_precision': The calculated average precision score for each profile. - 'normalized_average_precision': The normalized AP score (scale-independent). - 'n_pos_pairs': The number of positive pairs for each profile. - 'n_total_pairs': The total number of pairs for each profile. - Additional metadata columns from the input.
Raises:
-
UnpairedException–If no positive or negative pairs are found in the dataset.
Notes
- Positive Pair Rules:
- Positive pairs are defined by
pos_sameby(profiles share these metadata values) and optionally differentiated bypos_diffby(profiles must differ in these metadata values if specified).
- Positive pairs are defined by
- Negative Pair Rules:
- Negative pairs are defined by
neg_diffby(profiles differ in these metadata values) and optionally constrained byneg_sameby(profiles share these metadata values if specified).
- Negative pairs are defined by
Source code in src/copairs/map/average_precision.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | |
build_rank_lists(pos_pairs, neg_pairs, pos_sims, neg_sims)
¶
Build rank lists for calculating average precision.
This function processes positive and negative pairs along with their similarity scores to construct rank lists and determine unique profile indices with their associated counts.
Parameters:
-
pos_pairs(ndarray) –Array of positive pair indices, where each pair is represented as a pair of integers.
-
neg_pairs(ndarray) –Array of negative pair indices, where each pair is represented as a pair of integers.
-
pos_sims(ndarray) –Array of similarity scores for positive pairs.
-
neg_sims(ndarray) –Array of similarity scores for negative pairs.
Returns:
-
paired_ix(ndarray) –Unique indices of profiles that appear in the rank lists.
-
rel_k_list(ndarray) –Array of relevance labels (1 for positive pairs, 0 for negative pairs) sorted by decreasing similarity within each profile.
-
counts(ndarray) –Array of counts indicating how many times each profile index appears in the rank lists.
Source code in src/copairs/map/average_precision.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
p_values(dframe, null_size, seed, progress_bar=True)
¶
Compute p-values for average precision scores based on a null distribution.
This function calculates the p-values for each profile in the input DataFrame,
comparing their average precision scores (average_precision) against a null
distribution generated for their specific configurations (number of positive
and total pairs). Profiles with no positive pairs are excluded from the p-value calculation.
Parameters:
-
dframe(DataFrame) –A DataFrame containing the following columns: -
average_precision: The AP scores for each profile. -n_pos_pairs: Number of positive pairs for each profile. -n_total_pairs: Total number of pairs (positive + negative) for each profile. -
null_size(int) –The number of samples to generate in the null distribution for significance testing.
-
seed(int) –Random seed for reproducibility of the null distribution.
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
Returns:
-
ndarray–An array of p-values for each profile in the DataFrame. Profiles with no positive pairs will have NaN as their p-value.
Source code in src/copairs/map/average_precision.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 | |
filter
¶
Functions to support query-like syntax when finding the matches.
apply_filters(df, query_list)
¶
Combine and apply query filters to a DataFrame.
This function takes a list of query expressions and applies them to a DataFrame to filter its rows. If no query expressions are provided, the original DataFrame is returned unchanged.
Parameters:
-
df(DataFrame) –The DataFrame to which the filters will be applied.
-
query_list(List[str]) –A list of query expressions (e.g., "column_name > 5"). These expressions should follow the syntax supported by
pd.DataFrame.query.
Returns:
-
DataFrame–The DataFrame filtered based on the provided query expressions.
Raises:
-
ValueError:–- If the combined query results in an empty DataFrame.
- If the combined query expression is invalid.
Source code in src/copairs/map/filter.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
evaluate_and_filter(df, columns)
¶
Evaluate query filters and filter the metadata DataFrame based on specified columns.
This function processes column specifications, extracts any filter conditions, applies these conditions to the metadata DataFrame, and returns the filtered metadata along with the updated list of columns.
Parameters:
-
df(DataFrame) –The metadata DataFrame containing information about profiles to be filtered.
-
columns(List[str]) –A list of metadata column names.
Returns:
-
Tuple[DataFrame, List[str]]–- The filtered metadata DataFrame.
- The updated list of columns after processing any filter specifications.
Source code in src/copairs/map/filter.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
extract_filters(columns, df_columns)
¶
Extract and validate query filters from selected metadata columns.
Parameters:
-
columns(List[str]) –A list of selected metadata column names or query expressions. Query expressions should follow a valid syntax (e.g., "metadata_column > 5" or "metadata_column == 'value'").
-
df_columns(List[str]) –All available metadata column names to validate against.
Returns:
-
Tuple[List[str], List[str]]–queries_to_eval: A list of valid query expressions to evaluate.parsed_cols: A list of valid metadata column names extracted from the inputcolumns.
Raises:
-
ValueError:–- If a metadata column or query expression is invalid (e.g., references a non-existent column).
- If duplicate queries are found for the same metadata column.
Source code in src/copairs/map/filter.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
flatten_str_list(*args)
¶
Create a single list with all the params given.
Source code in src/copairs/map/filter.py
45 46 47 48 49 50 51 52 53 54 55 56 | |
validate_pipeline_input(meta, feats, columns)
¶
Validate the metadata and features for consistency and completeness.
Parameters:
-
meta(DataFrame) –The metadata DataFrame describing the profiles.
-
feats(ndarray) –The feature matrix where rows correspond to profiles in the metadata.
-
columns(List[str]) –List of column names in the metadata to validate for null values.
Raises:
-
ValueError:–- If any of the specified metadata columns contain null values.
- If the number of rows in the metadata and features are not equal.
- If the feature matrix contains null values.
Source code in src/copairs/map/filter.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
hierarchical_fdr
¶
Hierarchical FDR correction for grouped hypothesis testing.
apply_fdr_correction(map_scores, method='fdr_bh')
¶
Apply standard FDR correction across all tests.
Parameters:
-
map_scores(DataFrame) –DataFrame containing mAP scores with a 'p_value' column.
-
method(str, default:'fdr_bh') –Multiple testing correction method (default: 'fdr_bh'). See statsmodels.stats.multitest.multipletests for options.
Returns:
-
DataFrame–Input DataFrame with 'corrected_p_value' column added.
Source code in src/copairs/map/hierarchical_fdr.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
apply_hierarchical_fdr_correction(map_scores, hierarchical_by, sameby)
¶
Apply hierarchical FDR correction for grouped hypotheses.
Implements a two-stage testing procedure appropriate for dose-response data where only high doses are expected to be active:
- Stage 1: Use minimum p-value within each group defined by
hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant. - Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.
Parameters:
-
map_scores(DataFrame) –DataFrame containing mAP scores with a 'p_value' column.
-
hierarchical_by(list) –Metadata column(s) defining the group structure (e.g., ['compound']).
-
sameby(list) –Metadata column(s) used for mAP calculation (e.g., ['compound', 'dose']).
Returns:
-
DataFrame–Input DataFrame with additional columns: -
corrected_p_value: BH-corrected p-value (1.0 for groups that didn't pass Stage 1). -stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). -stage1_corrected_p_value: BH-corrected Stage 1 p-value. -stage1_significant: Whether the group passed Stage 1.
Raises:
-
ValueError–If
hierarchical_byis not a proper subset ofsameby.
Notes
This method uses minimum p-value (rather than Simes) for Stage 1 aggregation. Min-p is appropriate for dose-response data where only high doses are expected to be active. Simes would penalize compounds for having inactive low doses, which is the expected biological behavior.
Source code in src/copairs/map/hierarchical_fdr.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
map
¶
Functions to compute mean average precision.
get_map_pvalue(ap_scores, sameby, null_size, seed, progress_bar=True, max_workers=None, cache_dir=None)
¶
Compute mAP scores and p-values from AP scores.
This function groups AP scores by the specified columns, computes the mean Average Precision (mAP) for each group, and calculates p-values by comparing against null distributions.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
seed(int) –Random seed for reproducibility.
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: - Columns from
sameby(group identifiers). -mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -indices: List of indices in the original ap_scores for this group.
Source code in src/copairs/map/map.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
mean_average_precision(ap_scores, sameby, null_size, threshold, seed, progress_bar=True, max_workers=None, cache_dir=None)
¶
Calculate the Mean Average Precision (mAP) score and associated p-values.
This function computes the Mean Average Precision (mAP) score by grouping profiles
based on the specified criteria (sameby). It calculates the significance of mAP
scores by comparing them to a null distribution and performs multiple testing
corrections using Benjamini-Hochberg FDR.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
threshold(float) –p-value threshold for identifying significant MaP scores.
-
seed(int) –Random seed for reproducibility.
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: -
mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -corrected_p_value: Adjusted p-value after multiple testing correction. -below_p: Boolean indicating if the p-value is below the threshold. -below_corrected_p: Boolean indicating if the corrected p-value is below the threshold.
See Also
mean_average_precision_hierarchical : For hierarchical FDR correction with grouped data.
Source code in src/copairs/map/map.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
mean_average_precision_hierarchical(ap_scores, sameby, null_size, threshold, seed, hierarchical_by, progress_bar=True, max_workers=None, cache_dir=None)
¶
Calculate the Mean Average Precision (mAP) score with hierarchical FDR correction.
This function computes the Mean Average Precision (mAP) score by grouping profiles
based on the specified criteria (sameby). It applies hierarchical FDR correction
appropriate for grouped hypothesis testing, such as dose-response data.
Parameters:
-
ap_scores(DataFrame) –DataFrame containing individual Average Precision (AP) scores and pair statistics (e.g., number of positive pairs
n_pos_pairsand total pairsn_total_pairs). -
sameby(list or str) –Metadata column(s) used to group profiles for mAP calculation.
-
null_size(int) –Number of samples in the null distribution for significance testing.
-
threshold(float) –p-value threshold for identifying significant MaP scores.
-
seed(int) –Random seed for reproducibility.
-
hierarchical_by(list) –Metadata column(s) for hierarchical FDR correction. Enables two-stage testing:
- Stage 1: Use minimum p-value within each group defined by
hierarchical_by, then apply BH correction at the group level. A group passes if any member is significant. - Stage 2: For groups that pass Stage 1, apply BH correction to the individual tests within each group.
This is designed for dose-response data where only high doses are expected to be active. The
hierarchical_bycolumns must be a proper subset ofsameby. For example, withsameby=['compound', 'dose']andhierarchical_by=['compound'], mAP is calculated per compound×dose, but FDR correction accounts for the grouped structure. - Stage 1: Use minimum p-value within each group defined by
-
progress_bar(bool, default:True) –Whether or not to show tqdm's progress bar.
-
max_workers(int, default:None) –Number of workers used. Default defined by tqdm's
thread_map. -
cache_dir(str or Path, default:None) –Location to save the cache.
Returns:
-
DataFrame–DataFrame with the following columns: -
mean_average_precision: Mean AP score for each group. -mean_normalized_average_precision: Mean normalized AP score (scale-independent). -p_value: p-value comparing mAP to the null distribution. -corrected_p_value: Adjusted p-value after multiple testing correction. -below_p: Boolean indicating if the p-value is below the threshold. -below_corrected_p: Boolean indicating if the corrected p-value is below the threshold. -stage1_p_value: Group-level p-value from Stage 1 (minimum p-value). -stage1_corrected_p_value: BH-corrected Stage 1 p-value. -stage1_significant: Whether the group passed Stage 1.
See Also
mean_average_precision : For standard BH FDR correction.
Source code in src/copairs/map/map.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
silent_thread_map(fn, *iterables, **kwargs)
¶
Map iterables and kwargs to a function.
Parameters:
-
fn(callable) –Function to map over iterables.
-
*iterables(tuple, default:()) –Iterables to map over.
-
**kwargs(dict, default:{}) –Additional keyword arguments. Accepts: - max_workers : int, optional Maximum number of workers [default: min(32, cpu_count() + 4)]. - chunksize : int, optional Size of chunks for each worker [default: 1].
Source code in src/copairs/map/map.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
multilabel
¶
Functions to compute mAP with multilabel support.
average_precision(meta, feats, pos_sameby, pos_diffby, neg_sameby, neg_diffby, multilabel_col, batch_size=20000, distance='cosine', progress_bar=True)
¶
Compute average precision with multilabel support.
Returns normalized_average_precision in addition to average_precision.
See Also
copairs.map.average_precision : Average precision without multilabel support.
Source code in src/copairs/map/multilabel.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
normalization
¶
Functions for normalizing Average Precision scores.
compute_normalized_ap_scores(ap_scores, null_confs)
¶
Compute both raw and normalized Average Precision scores.
Parameters:
-
ap_scores(ndarray) –Array of raw Average Precision scores.
-
null_confs(ndarray) –Array of configurations where each row is [n_pos_pairs, n_total_pairs].
Returns:
-
ap_scores(ndarray) –The original raw AP scores.
-
normalized_ap_scores(ndarray) –The normalized AP scores.
Source code in src/copairs/map/normalization.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
expected_ap(M, N)
¶
Compute the expected Average Precision under random ranking.
This implements the exact finite-sample formula for expected AP when items are randomly ranked.
Parameters:
-
M(int) –Number of positive items (relevant documents).
-
N(int) –Number of negative items (irrelevant documents).
Returns:
-
float–The expected Average Precision under random ranking.
Notes
Formula: E[AP] = (1/L) × [(M-1)/(L-1) × (L - H_L) + H_L] where L = M + N and H_L is the L-th harmonic number.
Source code in src/copairs/map/normalization.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
harmonic_number(n)
¶
Compute the n-th harmonic number H_n = Σ(1/k) for k=1 to n.
Parameters:
-
n(int) –The index of the harmonic number to compute.
Returns:
-
float–The n-th harmonic number.
Source code in src/copairs/map/normalization.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
normalize_ap(ap, M, N, eps=1e-10)
¶
Normalize Average Precision scores to be scale-independent.
Computes the normalized AP as (AP - μ₀) / (1 - μ₀) where μ₀ is the expected AP under random ranking.
Parameters:
-
ap(float or ndarray) –The Average Precision score(s) to normalize.
-
M(int or ndarray) –Number of positive items for each AP score.
-
N(int or ndarray) –Number of negative items for each AP score.
-
eps(float, default:1e-10) –Small epsilon to avoid division by zero when μ₀ ≈ 1.
Returns:
-
float or ndarray–The normalized Average Precision score(s).
Notes
- Normalized AP = 0 when performance equals random chance
- Normalized AP = 1 when performance is perfect
- Negative values indicate worse-than-random performance
Source code in src/copairs/map/normalization.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |