Skip to main content

Experimental Tools

Next-generation tools under active development in the WayScience organization, designed to become the foundation of Cytomining v2.

What does Cytomining v2 solve?

The current Cytomining stack was designed around 2D single-cell data from CellProfiler. As the field moves toward 3D organoid imaging, larger-scale archives, and deep learning feature extraction, several gaps have emerged: no standardized image catalog, images and features stored separately, no 3D support, and hit calling that collapses single-cell heterogeneity. The tools below are purpose-built to close each of these gaps — together forming a fully traceable, format-agnostic, 3D-capable profiling pipeline.

Tools

buscar icon buscar

Hit calling — identifies biologically active perturbations from single-cell morphological profiles using distribution-level scoring.

iceberg-bioimage icon iceberg-bioimage

Data cataloging — scans bioimaging stores and publishes image metadata to Cytomining-compatible Parquet warehouses via Apache Iceberg.

OME-arrow icon OME-arrow

Image storage — stores microscopy images alongside metadata and derived data in a unified, queryable Apache Arrow format.

zedprofiler

3D feature extraction — extracts morphological features from volumetric microscopy images for CPU-efficient high-content profiling.

Cytomining v2 pipelines

Standard (2D)

🔬 Raw Images
OME-arrow store
📊 Feature Extraction
extract
CytoTable harmonize
pycytominer process
buscar hit call

3D Organoid

🔬 Raw Images
OME-arrow store
zedprofiler 3D 3D extract
CytoTable harmonize
pycytominer process
buscar hit call

Yellow = new 3D-capable step. Purple = new data infrastructure. Blue = existing Cytomining tools.

What each tool solves

🗄️

iceberg-bioimage

Problem: Raw bioimaging archives have no standard catalog — finding, versioning, and joining images to downstream data requires bespoke scripts per lab. Solution: Scans any image store into a versioned Apache Iceberg catalog that directly exports Cytomining-compatible Parquet warehouses.

🏹

OME-arrow

Problem: Images and feature tables live in separate systems — linking a numeric outlier back to its source cell requires error-prone manual joins across formats. Solution: Embeds images as first-class columns in Arrow tables so features, metadata, and pixel data travel together and can be queried or exported as tensors.

📦

zedprofiler

Problem: Classical profiling tools only extract 2D features — organoid, cleared-tissue, and confocal z-stack experiments are left without a first-class CPU-efficient feature extractor. Solution: Extracts morphological features directly from 3D volumetric images with anisotropic spacing correction, no GPU required.

🔍

buscar

Problem: Population-level hit calling averages away biologically meaningful cell-to-cell variation — heterogeneous responses and rare subpopulations are invisible to copairs-style metrics. Solution: Scores perturbation efficacy and specificity directly on single-cell distributions using Earth Mover's Distance, preserving heterogeneity throughout hit calling.