You're reading the documentation for a development version. For the latest released version, please have a look at v1.0.1.
Python API¶
cosmicqc.analyze¶
Module for detecting various quality control aspects from source data.
- src.cosmicqc.analyze._convert_feature_threshold_input_to_named_threshold_dicts(feature_thresholds_file: str, feature_thresholds: Dict[str, float] | Dict[str, Dict[str, float]] | str | None = None) List[Tuple[str, Dict[str, float]]][source]¶
Convert feature threshold input into named threshold dictionaries for processing.
- Parameters:
feature_thresholds_file – str Path to the YAML file containing threshold definitions.
feature_thresholds – LabelOutliersFeatureThresholdInput If you do not provide feature_thresholds, the function will default to using the default YAML file. - None: Use all thresholds from the file (only applicable to label_outliers). - str: Named threshold set from the file. - Dict[str, float]: Single unnamed threshold set. - Dict[str, Dict[str, float]]: Multiple named threshold sets.
- Returns:
A list of (name, thresholds_dict) tuples. If a single unnamed threshold dictionary is provided as input, it will be returned with the name “custom”.
- Return type:
List[Tuple[str, Dict[str, float]]]
- Raises:
ValueError – If the input format is invalid.
- src.cosmicqc.analyze._create_condition_map(df: CytoDataFrame, outlier_df: CytoDataFrame, thresholds: Dict[str, float], name_prefix: str) Tuple[List[Series], Dict[str, str]][source]¶
Create boolean outlier conditions and z-score column names for one threshold set.
- Parameters:
df – CytoDataFrame Source data used to calculate z-scores.
outlier_df – CytoDataFrame Working dataframe where z-score columns are stored.
thresholds – Dict[str, float] Dictionary of feature thresholds.
name_prefix – str Prefix to use when naming z-score columns.
- Raises:
ValueError – If a feature in thresholds does not exist in the DataFrame.
- Returns:
- Tuple[List[pd.Series], Dict[str, str]]
A list of boolean condition series and a mapping of features to their z-score column names.
- src.cosmicqc.analyze._warn_if_inline_thresholds_ignore_file(feature_thresholds_file: str | None, feature_thresholds: Dict[str, float] | Dict[str, Dict[str, float]] | str | None) None[source]¶
Warn when inline thresholds override a non-default thresholds file.
- src.cosmicqc.analyze.find_outliers(df: CytoDataFrame | DataFrame | str, metadata_columns: List[str], feature_thresholds: Dict[str, float] | str, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, export_path: str | None = None) DataFrame[source]¶
This function uses identify_outliers to return a dataframe with only the outliers and provided metadata columns.
NOTE: This function can only be used with a single set of feature thresholds for finding what outliers look like in the data. For multiple sets of feature thresholds, use label_outliers instead and filter the results for the condition(s) of interest.
- Parameters:
df – Union[CytoDataFrame, pd.DataFrame, str] DataFrame or file string-based filepath of a Parquet, CSV, or TSV file with CytoTable output or similar data.
metadata_columns – List[str] List of metadata columns that should be outputted with the outlier data.
feature_thresholds – Dict[str, float] One of two options: A dictionary with the feature name(s) as the key(s) and their assigned threshold for identifying outliers. Positive int for the threshold will detect outliers “above” than the mean, negative int will detect outliers “below” the mean. Or a string which is a named key reference found within the feature_thresholds_file yaml file.
feature_thresholds_file – Optional[str] = DEFAULT_QC_THRESHOLD_FILE, An optional feature thresholds file where thresholds may be defined within a file.
export_path – Optional[str] = None An optional path to export the data using CytoDataFrame export capabilities. If None no export is performed. Note: compatible exports are CSV’s, TSV’s, and parquet.
- Returns:
Outlier data frame for the given conditions.
- Return type:
pd.DataFrame
- src.cosmicqc.analyze.identify_outliers(df: CytoDataFrame | DataFrame | str, feature_thresholds: Dict[str, float] | Dict[str, Dict[str, float]] | str, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, condition_name: str | None = None, include_threshold_scores: bool = False, export_path: str | None = None) Series | CytoDataFrame[source]¶
This function uses z-scoring to format the data for detecting outlier nuclei or cells using specific CellProfiler features.
- Parameters:
df – Union[CytoDataFrame, pd.DataFrame, str] Input dataframe or file path.
feature_thresholds – IdentifyOutliersFeatureThresholdInput Either: 1. {feature: threshold} 2. {condition_name: {feature: threshold}} 3. string key from a YAML file
include_threshold_scores – bool Whether to include z-score columns in the output.
condition_name – Optional[str] Optional explicit name to use for CQC columns when feature_thresholds is a single dict (only features and thresholds). Default name is “custom” if not provided.
export_path – Optional[str] If provided, export the result.
- Returns:
- Union[pd.Series, CytoDataFrame]
Return shape depends on whether one or multiple conditions are provided:
- Single condition:
Returns a boolean pd.Series if include_threshold_scores is False. Returns a CytoDataFrame with z-score columns and the outlier column if include_threshold_scores is True.
- Multiple conditions via Dict[str, Dict[str, float]]:
Returns a CytoDataFrame. If include_threshold_scores is False, it contains one outlier column per condition. If include_threshold_scores is True, it contains the per-condition z-score columns plus one outlier column per condition.
- src.cosmicqc.analyze.label_outliers(df: CytoDataFrame | DataFrame | str, feature_thresholds: Dict[str, float] | Dict[str, Dict[str, float]] | str | None = None, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, include_threshold_scores: bool = False, export_path: str | None = None, export_as_annotations: bool = False, annotation_metadata_columns: List[str] | None = None) CytoDataFrame[source]¶
This function labels outliers in the input dataframe based on specified feature thresholds and exports the whole dataframe or an annotations file with just metadata and outlier labels.
- Parameters:
df – Union[CytoDataFrame, pd.DataFrame, str] DataFrame or file path (Parquet, CSV, or TSV).
feature_thresholds –
LabelOutliersFeatureThresholdInput Defines one or more QC conditions.
- Single condition:
{“feature”: threshold}
- Multiple conditions:
- {
“undersegmented_cells”: {“feature1”: -1, “feature2”: -1}, “oversegmented_nuclei”: {“feature3”: 2},
}
- String:
Named condition from the feature_thresholds_file.
- None:
Run all conditions defined in the thresholds file.
feature_thresholds_file – Optional[str] = DEFAULT_QC_THRESHOLD_FILE YAML file containing named threshold conditions.
include_threshold_scores – bool = False If True, include per-feature z-score columns in the output.
export_path – Optional[str] = None Path to export results.
export_as_annotations – bool = False If True, export only metadata + QC columns (annotations file). If False, export the full dataset.
annotation_metadata_columns – Optional[List[str]] = None Metadata columns to include when export_as_annotations=True. If annotation export is requested, these columns are required and will be written alongside the generated Metadata_cqc_* columns.
- Returns:
Either the full dataframe or only metadata with added QC columns:
Metadata_cqc_<condition>_is_outlier
(optional) Metadata_cqc_<condition>_<feature>_zscore if
include_threshold_scores=True
When
export_as_annotations=True, the exported file contains only annotation_metadata_columns plus QC-related columns.- Return type:
CytoDataFrame
- src.cosmicqc.analyze.read_thresholds_set_from_file(feature_thresholds_file: str, feature_thresholds: str | None = None) Dict[str, int] | Dict[str, Dict[str, int]][source]¶
Reads a set of feature thresholds from a specified file.
This function takes the path to a feature thresholds file and a specific feature threshold string, reads the file, and returns the thresholds set from the file.
- Parameters:
feature_thresholds_file (str) – The path to the file containing feature thresholds.
feature_thresholds (Optional str, default None) – A string specifying the feature thresholds. If we have None, return all thresholds.
- Returns:
A dictionary containing the processed feature thresholds.
- Return type:
dict
- Raises:
LookupError – If the file does not contain the specified feature_thresholds key.
cosmicqc.cli¶
Setup coSMicQC CLI through python-fire
- src.cosmicqc.cli.HasCustomRepr(component: object) bool[source]¶
Reproduces above HasCustomStr function to determine if component has a custom __repr__ method.
…
- Parameters:
component – The object to check for a custom __repr__ method.
- Returns:
Whether component has a custom __repr__ method.