{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `ContaminationDetector` in action\n", "\n", "In this example, we apply `ContaminationDetector` from `coSMicQC` on an example dataset from the NF1 project.\n", "\n", "The NF1 project example includes wells from a cell line that was contaminated with mycoplasma.\n", "In the wet lab, these cells were detected as negative for mycoplasma.\n", "We do not want to process contaminated cells, so we can use this methodology to confirm the contamination and the extent of it on the plate.\n", "\n", "The result of this method is either a pass or fail.\n", "If the data is clean, then the method stops at step 1 and says the data is ready for further downstream analysis.\n", "If the data has contamination, this method will continue processing after step 1 to determine if the problem is for the whole plate or part of the plate.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from cosmicqc import ContaminationDetector\n", "\n", "# set a path for the NF1 parquet-based dataset\n", "data_path = (\n", " \"../../../tests/data/cytotable/NF1_cellpainting_data/Plate_3_filtered.parquet\"\n", ")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1355, 2321)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Metadata_ImageNumberImage_Metadata_PlateMetadata_number_of_singlecellsImage_Metadata_SiteImage_Metadata_WellMetadata_Cells_Number_Object_NumberMetadata_Cytoplasm_Parent_CellsMetadata_Cytoplasm_Parent_NucleiMetadata_Nuclei_Number_Object_NumberImage_FileName_CY5...Nuclei_Texture_Variance_DAPI_3_02_256Nuclei_Texture_Variance_DAPI_3_03_256Nuclei_Texture_Variance_GFP_3_00_256Nuclei_Texture_Variance_GFP_3_01_256Nuclei_Texture_Variance_GFP_3_02_256Nuclei_Texture_Variance_GFP_3_03_256Nuclei_Texture_Variance_RFP_3_00_256Nuclei_Texture_Variance_RFP_3_01_256Nuclei_Texture_Variance_RFP_3_02_256Nuclei_Texture_Variance_RFP_3_03_256
030Plate_327915B111122B11_01_3_15_CY5_001_illumcorrect.tiff...619.327600594.798669271.137249268.157417311.088206282.370923198.402061202.133683203.094321193.875072
131Plate_327916B111122B11_01_3_16_CY5_001_illumcorrect.tiff...323.170295321.31071134.84114535.13911438.07520638.080602131.691809126.174866136.433036132.735107
234Plate_327919B111122B11_01_3_19_CY5_001_illumcorrect.tiff...321.457911314.851226286.810209261.637391257.878700259.463388157.252242156.042241154.576787154.894240
335Plate_32791B111122B11_01_3_1_CY5_001_illumcorrect.tiff...1487.3540341468.971582516.742751489.945367519.912829510.173091369.462002366.631748383.771987364.529179
444Plate_32796B111122B11_01_3_6_CY5_001_illumcorrect.tiff...508.054695501.77049751.69532754.24862357.98486952.494053262.420251255.894670259.081931266.519397
\n", "

5 rows × 2321 columns

\n", "
" ], "text/plain": [ " Metadata_ImageNumber Image_Metadata_Plate Metadata_number_of_singlecells \\\n", "0 30 Plate_3 279 \n", "1 31 Plate_3 279 \n", "2 34 Plate_3 279 \n", "3 35 Plate_3 279 \n", "4 44 Plate_3 279 \n", "\n", " Image_Metadata_Site Image_Metadata_Well \\\n", "0 15 B11 \n", "1 16 B11 \n", "2 19 B11 \n", "3 1 B11 \n", "4 6 B11 \n", "\n", " Metadata_Cells_Number_Object_Number Metadata_Cytoplasm_Parent_Cells \\\n", "0 1 1 \n", "1 1 1 \n", "2 1 1 \n", "3 1 1 \n", "4 1 1 \n", "\n", " Metadata_Cytoplasm_Parent_Nuclei Metadata_Nuclei_Number_Object_Number \\\n", "0 2 2 \n", "1 2 2 \n", "2 2 2 \n", "3 2 2 \n", "4 2 2 \n", "\n", " Image_FileName_CY5 ... \\\n", "0 B11_01_3_15_CY5_001_illumcorrect.tiff ... \n", "1 B11_01_3_16_CY5_001_illumcorrect.tiff ... \n", "2 B11_01_3_19_CY5_001_illumcorrect.tiff ... \n", "3 B11_01_3_1_CY5_001_illumcorrect.tiff ... \n", "4 B11_01_3_6_CY5_001_illumcorrect.tiff ... \n", "\n", " Nuclei_Texture_Variance_DAPI_3_02_256 Nuclei_Texture_Variance_DAPI_3_03_256 \\\n", "0 619.327600 594.798669 \n", "1 323.170295 321.310711 \n", "2 321.457911 314.851226 \n", "3 1487.354034 1468.971582 \n", "4 508.054695 501.770497 \n", "\n", " Nuclei_Texture_Variance_GFP_3_00_256 Nuclei_Texture_Variance_GFP_3_01_256 \\\n", "0 271.137249 268.157417 \n", "1 34.841145 35.139114 \n", "2 286.810209 261.637391 \n", "3 516.742751 489.945367 \n", "4 51.695327 54.248623 \n", "\n", " Nuclei_Texture_Variance_GFP_3_02_256 Nuclei_Texture_Variance_GFP_3_03_256 \\\n", "0 311.088206 282.370923 \n", "1 38.075206 38.080602 \n", "2 257.878700 259.463388 \n", "3 519.912829 510.173091 \n", "4 57.984869 52.494053 \n", "\n", " Nuclei_Texture_Variance_RFP_3_00_256 Nuclei_Texture_Variance_RFP_3_01_256 \\\n", "0 198.402061 202.133683 \n", "1 131.691809 126.174866 \n", "2 157.252242 156.042241 \n", "3 369.462002 366.631748 \n", "4 262.420251 255.894670 \n", "\n", " Nuclei_Texture_Variance_RFP_3_02_256 Nuclei_Texture_Variance_RFP_3_03_256 \n", "0 203.094321 193.875072 \n", "1 136.433036 132.735107 \n", "2 154.576787 154.894240 \n", "3 383.771987 364.529179 \n", "4 259.081931 266.519397 \n", "\n", "[5 rows x 2321 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load in the dataset\n", "filtered_nf1_df = pd.read_parquet(data_path)\n", "\n", "# Look over the data to check it is correct\n", "print(filtered_nf1_df.shape)\n", "filtered_nf1_df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running step 1...\n", "Summary:\n", "\n", "Check Result\n", "----------------------- --------\n", "Texture skewed? ✅\n", "Nucleus shape variable? ❌\n", "Interpretation:\n", "Contamination detected! 🚨\n", "Anomalous texture around nuclei detected but nuclei segmentation not clearly impacted.\n", "Proceeding to step 2...\n", "Running step 2...\n", "Summary:\n", "\n", "Check Result\n", "------------------------------------------------------- --------\n", "Texture skewed? ✅\n", "Nucleus shape variable? ❌\n", "Whole plate contaminated due to abnormal texture? ❌\n", "Whole plate contaminated due to abnormal nucleus shape? ❌\n", "Interpretation:\n", "Partial plate contamination in texture only. Proceed to step 3.\n", "Running step 3...\n", "Finding outlier cells with anomalous texture around the nucleus...\n", "Number of outliers: 198 (14.61%)\n", "Outliers Range:\n", "Cytoplasm_Texture_InfoMeas1_DAPI_3_02_256 Min: -0.3808007336913458\n", "Cytoplasm_Texture_InfoMeas1_DAPI_3_02_256 Max: -0.23152918999076016\n", "Number of outliers: 120 (8.86%)\n", "Outliers Range:\n", "Cytoplasm_Granularity_2_DAPI Min: 1.8500394938002351\n", "Cytoplasm_Granularity_2_DAPI Max: 27.639655282904474\n", "Total number of outliers detected: 242\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Number of wells in the top 25%: 3\n", "Wells in the top 25% of highest outlier proportions:\n", "B5, C5, D5\n" ] } ], "source": [ "# Instantiate the ContaminationDetector class and run the contamination detection process\n", "detector = ContaminationDetector(\n", " dataframe=filtered_nf1_df, nucleus_channel_naming=\"DAPI\"\n", ").run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we can see that the detector has found anomalous texture surrounding the nucleus from this plate. This is a problem and could likely reflect contamination.\n", "\n", "In step 2, based on the mean of the texture, it was found that this problem only impacts part of the plate.\n", "\n", "In step 3, we found 3 wells that have high proportion of outlier single-cells with abnormal texture. This was concluded to be one cell line that had mycoplasma contamination on the plate, while the rest of the cell lines were fine." ] } ], "metadata": { "kernelspec": { "display_name": "cosmicqc-vNopUmqk-py3.11", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 2 }