Tutorial

This page covers brief tutorials and notes on how to use CytoTable.

CellProfiler CSV Output to Parquet

CellProfiler pipelines or projects may produce various CSV-based compartment output (for example, “Cells.csv”, “Cytoplasm.csv”, etc.). CytoTable converts this data to Parquet from local or object-storage based locations.

Files with similar names nested within sub-folders will be concatenated by default (appended to the end of each data file) together and used to create a single Parquet file per compartment. For example: if we have folder/subfolder_a/cells.csv and folder/subfolder_b/cells.csv, using convert(source_path="folder", ...) will result in folder.cells.parquet (unless concat=False).

Note: The dest_path parameter (convert(dest_path="")) will be used for intermediary data work and must be a new file or directory path. This path will result directory output on join=False and a single file output on join=True.

For example, see below:

from cytotable import convert

# using a local path with cellprofiler csv presets
convert(
    source_path="./tests/data/cellprofiler/ExampleHuman",
    source_datatype="csv",
    dest_path="ExampleHuman.parquet",
    dest_datatype="parquet",
    preset="cellprofiler_csv",
)

# using an s3-compatible path with no signature for client
# and cellprofiler csv presets
convert(
    source_path="s3://s3path",
    source_datatype="csv",
    dest_path="s3_local_result",
    dest_datatype="parquet",
    concat=True,
    preset="cellprofiler_csv",
    no_sign_request=True,
)