Tutorial: Merging multiple plates with Tablenumber¶
Goal: combine multiple CellProfiler SQLite exports (plates) into a single Parquet output while preserving plate identity via TableNumber.
What you will accomplish¶
Point Cytotable at a folder of multiple plate exports.
Add
TableNumberso downstream analyses can distinguish rows from different plates.Verify merged outputs.
Setup (copy-paste)¶
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install cytotable
Inputs and outputs¶
Input: A folder of CellProfiler SQLite files (example structure):
data/plates/PlateA.sqlitedata/plates/PlateB.sqliteOutput: Parquet file under
./outputs/multi_plate.parquet, with aMetadata_TableNumbercolumn indicating plate.
Step 1: define your paths¶
export SOURCE_PATH="./data/plates"
export DEST_PATH="./outputs/multi_plate.parquet"
export CACHE_DIR="./sqlite_cache"
mkdir -p "$DEST_PATH" "$CACHE_DIR"
Step 2: run the conversion with tablenumber¶
import os
import cytotable
source_path = os.environ["SOURCE_PATH"]
dest_path = os.environ["DEST_PATH"]
cache_dir = os.environ["CACHE_DIR"]
result = cytotable.convert(
source_path=source_path,
source_datatype="sqlite",
dest_path=dest_path,
dest_datatype="parquet",
preset="cellprofiler_sqlite",
local_cache_dir=cache_dir,
add_tablenumber=True, # key for multi-plate merges
chunk_size=30000,
)
print(result)
Why this matters:
add_tablenumber=TrueaddsMetadata_TableNumberso you can filter/group by plate later.Pointing
source_pathto a folder makes Cytotable glob multiple plates.local_cache_dirkeeps each plate cached locally for reliable DuckDB access.
Step 3: validate plate separation¶
You should see one Parquet file (multi_plate.parquet) in DEST_PATH.
Opening a file with Pandas or PyArrow should show Metadata_TableNumber present and non-zero rows.
If you processed multiple plates, expect multiple distinct values in that column.
Scenario callouts (“if your data looks like this…”)¶
Local SQLite files: set
source_pathto the folder of local.sqlitefiles; removeno_sign_request.Only certain compartments: pass
targets=["cells", "nuclei"]to limit tables.Memory constrained: lower
chunk_size(e.g., 10000) and ensureCACHE_DIRis on a disk with enough space for all plates + parquet output.