CytoTable mise en place¶
This notebook includes a quick demonstration of CytoTable to help you understand the basics of using this project.
The name of the notebook comes from the french mise en place:
“Mise en place (French pronunciation: [mi zɑ̃ ˈplas]) is a French culinary phrase which means “putting in place” or “gather”. It refers to the setup required before cooking, and is often used in professional kitchens to refer to organizing and arranging the ingredients …”
import pathlib
from collections import Counter
import pyarrow.parquet as pq
import cytotable
# setup variables for use throughout the notebook
source_path = "../../../tests/data/cellprofiler/examplehuman"
dest_path = "./example.parquet"
# remove the dest_path if it's present
if pathlib.Path(dest_path).is_file():
pathlib.Path(dest_path).unlink()
# show the files we will use as source data with CytoTable
list(pathlib.Path(source_path).glob("*.csv"))
[PosixPath('../../../tests/data/cellprofiler/examplehuman/Experiment.csv'),
PosixPath('../../../tests/data/cellprofiler/examplehuman/PH3.csv'),
PosixPath('../../../tests/data/cellprofiler/examplehuman/Cytoplasm.csv'),
PosixPath('../../../tests/data/cellprofiler/examplehuman/Image.csv'),
PosixPath('../../../tests/data/cellprofiler/examplehuman/Nuclei.csv'),
PosixPath('../../../tests/data/cellprofiler/examplehuman/Cells.csv')]
%%time
# run cytotable convert
result = cytotable.convert(
source_path=source_path,
dest_path=dest_path,
# specify a destination data format type
dest_datatype="parquet",
# specify a preset which enables quick use of common input file formats
preset="cellprofiler_csv",
)
result.name
CPU times: user 327 ms, sys: 201 ms, total: 528 ms
Wall time: 22.4 s
'example.parquet'
# show the table head using pandas
pq.read_table(source=result).to_pandas().head()
Metadata_ImageNumber | Metadata_Cells_Parent_Nuclei | Metadata_Cytoplasm_Parent_Cells | Metadata_Cytoplasm_Parent_Nuclei | Metadata_ObjectNumber | Image_FileName_DNA | Image_FileName_OrigOverlay | Image_FileName_PH3 | Image_FileName_cellbody | Cytoplasm_AreaShape_Area | ... | Nuclei_Location_Center_X | Nuclei_Location_Center_Y | Nuclei_Location_Center_Z | Nuclei_Location_MaxIntensity_X_DNA | Nuclei_Location_MaxIntensity_X_PH3 | Nuclei_Location_MaxIntensity_Y_DNA | Nuclei_Location_MaxIntensity_Y_PH3 | Nuclei_Location_MaxIntensity_Z_DNA | Nuclei_Location_MaxIntensity_Z_PH3 | Nuclei_Number_Object_Number | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | AS_09125_050116030001_D03f00d0.tif | AS_09125_050116030001_D03f00d0_Overlay.png | AS_09125_050116030001_D03f00d1.tif | AS_09125_050116030001_D03f00d2.tif | 288 | ... | 477.099237 | 7.580153 | 0 | 477.0 | 478.0 | 8.0 | 13.0 | 0.0 | 0.0 | 1 |
1 | 1 | 2 | 2 | 2 | 2 | AS_09125_050116030001_D03f00d0.tif | AS_09125_050116030001_D03f00d0_Overlay.png | AS_09125_050116030001_D03f00d1.tif | AS_09125_050116030001_D03f00d2.tif | 256 | ... | 495.750000 | 11.098684 | 0 | 495.0 | 502.0 | 9.0 | 14.0 | 0.0 | 0.0 | 2 |
2 | 1 | 3 | 3 | 3 | 3 | AS_09125_050116030001_D03f00d0.tif | AS_09125_050116030001_D03f00d0_Overlay.png | AS_09125_050116030001_D03f00d1.tif | AS_09125_050116030001_D03f00d2.tif | 52 | ... | 438.959184 | 11.374150 | 0 | 440.0 | 439.0 | 11.0 | 16.0 | 0.0 | 0.0 | 3 |
3 | 1 | 4 | 4 | 4 | 4 | AS_09125_050116030001_D03f00d0.tif | AS_09125_050116030001_D03f00d0_Overlay.png | AS_09125_050116030001_D03f00d1.tif | AS_09125_050116030001_D03f00d2.tif | 466 | ... | 80.459184 | 11.163265 | 0 | 80.0 | 81.0 | 13.0 | 10.0 | 0.0 | 0.0 | 4 |
4 | 1 | 5 | 5 | 5 | 5 | AS_09125_050116030001_D03f00d0.tif | AS_09125_050116030001_D03f00d0_Overlay.png | AS_09125_050116030001_D03f00d1.tif | AS_09125_050116030001_D03f00d2.tif | 296 | ... | 58.423077 | 15.509615 | 0 | 62.0 | 52.0 | 14.0 | 15.0 | 0.0 | 0.0 | 5 |
5 rows × 312 columns
# show metadata for the result file
pq.read_metadata(result)
<pyarrow._parquet.FileMetaData object at 0x17e23fab0>
created_by: parquet-cpp-arrow version 20.0.0
num_columns: 312
num_rows: 289
num_row_groups: 1
format_version: 2.6
serialized_size: 87762
# show schema metadata which includes CytoTable information
# note: this information will travel with the file.
pq.read_schema(result).metadata
{b'data-producer': b'https://github.com/cytomining/CytoTable',
b'data-producer-version': b'0.0.15.post15.dev0+c2a924c'}
# show schema column name summaries
print("Column name prefix counts:")
dict(Counter(w.split("_", 1)[0] for w in pq.read_schema(result).names))
Column name prefix counts:
{'Metadata': 5, 'Image': 4, 'Cytoplasm': 99, 'Cells': 101, 'Nuclei': 103}
# show full schema details
pq.read_schema(result)
Metadata_ImageNumber: int64
Metadata_Cells_Parent_Nuclei: int64
Metadata_Cytoplasm_Parent_Cells: int64
Metadata_Cytoplasm_Parent_Nuclei: int64
Metadata_ObjectNumber: int64
Image_FileName_DNA: string
Image_FileName_OrigOverlay: string
Image_FileName_PH3: string
Image_FileName_cellbody: string
Cytoplasm_AreaShape_Area: int64
Cytoplasm_AreaShape_BoundingBoxArea: int64
Cytoplasm_AreaShape_BoundingBoxMaximum_X: int64
Cytoplasm_AreaShape_BoundingBoxMaximum_Y: int64
Cytoplasm_AreaShape_BoundingBoxMinimum_X: int64
Cytoplasm_AreaShape_BoundingBoxMinimum_Y: int64
Cytoplasm_AreaShape_Center_X: double
Cytoplasm_AreaShape_Center_Y: double
Cytoplasm_AreaShape_Compactness: double
Cytoplasm_AreaShape_Eccentricity: double
Cytoplasm_AreaShape_EquivalentDiameter: double
Cytoplasm_AreaShape_EulerNumber: int64
Cytoplasm_AreaShape_Extent: double
Cytoplasm_AreaShape_FormFactor: double
Cytoplasm_AreaShape_MajorAxisLength: double
Cytoplasm_AreaShape_MaxFeretDiameter: double
Cytoplasm_AreaShape_MaximumRadius: double
Cytoplasm_AreaShape_MeanRadius: double
Cytoplasm_AreaShape_MedianRadius: double
Cytoplasm_AreaShape_MinFeretDiameter: double
Cytoplasm_AreaShape_MinorAxisLength: double
Cytoplasm_AreaShape_Orientation: double
Cytoplasm_AreaShape_Perimeter: double
Cytoplasm_AreaShape_Solidity: double
Cytoplasm_AreaShape_Zernike_0_0: double
Cytoplasm_AreaShape_Zernike_1_1: double
Cytoplasm_AreaShape_Zernike_2_0: double
Cytoplasm_AreaShape_Zernike_2_2: double
Cytoplasm_AreaShape_Zernike_3_1: double
Cytoplasm_AreaShape_Zernike_3_3: double
Cytoplasm_AreaShape_Zernike_4_0: double
Cytoplasm_AreaShape_Zernike_4_2: double
Cytoplasm_AreaShape_Zernike_4_4: double
Cytoplasm_AreaShape_Zernike_5_1: double
Cytoplasm_AreaShape_Zernike_5_3: double
Cytoplasm_AreaShape_Zernike_5_5: double
Cytoplasm_AreaShape_Zernike_6_0: double
Cytoplasm_AreaShape_Zernike_6_2: double
Cytoplasm_AreaShape_Zernike_6_4: double
Cytoplasm_AreaShape_Zernike_6_6: double
Cytoplasm_AreaShape_Zernike_7_1: double
Cytoplasm_AreaShape_Zernike_7_3: double
Cytoplasm_AreaShape_Zernike_7_5: double
Cytoplasm_AreaShape_Zernike_7_7: double
Cytoplasm_AreaShape_Zernike_8_0: double
Cytoplasm_AreaShape_Zernike_8_2: double
Cytoplasm_AreaShape_Zernike_8_4: double
Cytoplasm_AreaShape_Zernike_8_6: double
Cytoplasm_AreaShape_Zernike_8_8: double
Cytoplasm_AreaShape_Zernike_9_1: double
Cytoplasm_AreaShape_Zernike_9_3: double
Cytoplasm_AreaShape_Zernike_9_5: double
Cytoplasm_AreaShape_Zernike_9_7: double
Cytoplasm_AreaShape_Zernike_9_9: double
Cytoplasm_Intensity_IntegratedIntensityEdge_DNA: double
Cytoplasm_Intensity_IntegratedIntensityEdge_PH3: double
Cytoplasm_Intensity_IntegratedIntensity_DNA: double
Cytoplasm_Intensity_IntegratedIntensity_PH3: double
Cytoplasm_Intensity_LowerQuartileIntensity_DNA: double
Cytoplasm_Intensity_LowerQuartileIntensity_PH3: double
Cytoplasm_Intensity_MADIntensity_DNA: double
Cytoplasm_Intensity_MADIntensity_PH3: double
Cytoplasm_Intensity_MassDisplacement_DNA: double
Cytoplasm_Intensity_MassDisplacement_PH3: double
Cytoplasm_Intensity_MaxIntensityEdge_DNA: double
Cytoplasm_Intensity_MaxIntensityEdge_PH3: double
Cytoplasm_Intensity_MaxIntensity_DNA: double
Cytoplasm_Intensity_MaxIntensity_PH3: double
Cytoplasm_Intensity_MeanIntensityEdge_DNA: double
Cytoplasm_Intensity_MeanIntensityEdge_PH3: double
Cytoplasm_Intensity_MeanIntensity_DNA: double
Cytoplasm_Intensity_MeanIntensity_PH3: double
Cytoplasm_Intensity_MedianIntensity_DNA: double
Cytoplasm_Intensity_MedianIntensity_PH3: double
Cytoplasm_Intensity_MinIntensityEdge_DNA: double
Cytoplasm_Intensity_MinIntensityEdge_PH3: double
Cytoplasm_Intensity_MinIntensity_DNA: double
Cytoplasm_Intensity_MinIntensity_PH3: double
Cytoplasm_Intensity_StdIntensityEdge_DNA: double
Cytoplasm_Intensity_StdIntensityEdge_PH3: double
Cytoplasm_Intensity_StdIntensity_DNA: double
Cytoplasm_Intensity_StdIntensity_PH3: double
Cytoplasm_Intensity_UpperQuartileIntensity_DNA: double
Cytoplasm_Intensity_UpperQuartileIntensity_PH3: double
Cytoplasm_Location_CenterMassIntensity_X_DNA: double
Cytoplasm_Location_CenterMassIntensity_X_PH3: double
Cytoplasm_Location_CenterMassIntensity_Y_DNA: double
Cytoplasm_Location_CenterMassIntensity_Y_PH3: double
Cytoplasm_Location_CenterMassIntensity_Z_DNA: double
Cytoplasm_Location_CenterMassIntensity_Z_PH3: double
Cytoplasm_Location_Center_X: double
Cytoplasm_Location_Center_Y: double
Cytoplasm_Location_MaxIntensity_X_DNA: double
Cytoplasm_Location_MaxIntensity_X_PH3: double
Cytoplasm_Location_MaxIntensity_Y_DNA: double
Cytoplasm_Location_MaxIntensity_Y_PH3: double
Cytoplasm_Location_MaxIntensity_Z_DNA: double
Cytoplasm_Location_MaxIntensity_Z_PH3: double
Cytoplasm_Number_Object_Number: int64
Cells_AreaShape_Area: int64
Cells_AreaShape_BoundingBoxArea: int64
Cells_AreaShape_BoundingBoxMaximum_X: int64
Cells_AreaShape_BoundingBoxMaximum_Y: int64
Cells_AreaShape_BoundingBoxMinimum_X: int64
Cells_AreaShape_BoundingBoxMinimum_Y: int64
Cells_AreaShape_Center_X: double
Cells_AreaShape_Center_Y: double
Cells_AreaShape_Compactness: double
Cells_AreaShape_Eccentricity: double
Cells_AreaShape_EquivalentDiameter: double
Cells_AreaShape_EulerNumber: int64
Cells_AreaShape_Extent: double
Cells_AreaShape_FormFactor: double
Cells_AreaShape_MajorAxisLength: double
Cells_AreaShape_MaxFeretDiameter: double
Cells_AreaShape_MaximumRadius: double
Cells_AreaShape_MeanRadius: double
Cells_AreaShape_MedianRadius: double
Cells_AreaShape_MinFeretDiameter: double
Cells_AreaShape_MinorAxisLength: double
Cells_AreaShape_Orientation: double
Cells_AreaShape_Perimeter: double
Cells_AreaShape_Solidity: double
Cells_AreaShape_Zernike_0_0: double
Cells_AreaShape_Zernike_1_1: double
Cells_AreaShape_Zernike_2_0: double
Cells_AreaShape_Zernike_2_2: double
Cells_AreaShape_Zernike_3_1: double
Cells_AreaShape_Zernike_3_3: double
Cells_AreaShape_Zernike_4_0: double
Cells_AreaShape_Zernike_4_2: double
Cells_AreaShape_Zernike_4_4: double
Cells_AreaShape_Zernike_5_1: double
Cells_AreaShape_Zernike_5_3: double
Cells_AreaShape_Zernike_5_5: double
Cells_AreaShape_Zernike_6_0: double
Cells_AreaShape_Zernike_6_2: double
Cells_AreaShape_Zernike_6_4: double
Cells_AreaShape_Zernike_6_6: double
Cells_AreaShape_Zernike_7_1: double
Cells_AreaShape_Zernike_7_3: double
Cells_AreaShape_Zernike_7_5: double
Cells_AreaShape_Zernike_7_7: double
Cells_AreaShape_Zernike_8_0: double
Cells_AreaShape_Zernike_8_2: double
Cells_AreaShape_Zernike_8_4: double
Cells_AreaShape_Zernike_8_6: double
Cells_AreaShape_Zernike_8_8: double
Cells_AreaShape_Zernike_9_1: double
Cells_AreaShape_Zernike_9_3: double
Cells_AreaShape_Zernike_9_5: double
Cells_AreaShape_Zernike_9_7: double
Cells_AreaShape_Zernike_9_9: double
Cells_Children_Cytoplasm_Count: int64
Cells_Intensity_IntegratedIntensityEdge_DNA: double
Cells_Intensity_IntegratedIntensityEdge_PH3: double
Cells_Intensity_IntegratedIntensity_DNA: double
Cells_Intensity_IntegratedIntensity_PH3: double
Cells_Intensity_LowerQuartileIntensity_DNA: double
Cells_Intensity_LowerQuartileIntensity_PH3: double
Cells_Intensity_MADIntensity_DNA: double
Cells_Intensity_MADIntensity_PH3: double
Cells_Intensity_MassDisplacement_DNA: double
Cells_Intensity_MassDisplacement_PH3: double
Cells_Intensity_MaxIntensityEdge_DNA: double
Cells_Intensity_MaxIntensityEdge_PH3: double
Cells_Intensity_MaxIntensity_DNA: double
Cells_Intensity_MaxIntensity_PH3: double
Cells_Intensity_MeanIntensityEdge_DNA: double
Cells_Intensity_MeanIntensityEdge_PH3: double
Cells_Intensity_MeanIntensity_DNA: double
Cells_Intensity_MeanIntensity_PH3: double
Cells_Intensity_MedianIntensity_DNA: double
Cells_Intensity_MedianIntensity_PH3: double
Cells_Intensity_MinIntensityEdge_DNA: double
Cells_Intensity_MinIntensityEdge_PH3: double
Cells_Intensity_MinIntensity_DNA: double
Cells_Intensity_MinIntensity_PH3: double
Cells_Intensity_StdIntensityEdge_DNA: double
Cells_Intensity_StdIntensityEdge_PH3: double
Cells_Intensity_StdIntensity_DNA: double
Cells_Intensity_StdIntensity_PH3: double
Cells_Intensity_UpperQuartileIntensity_DNA: double
Cells_Intensity_UpperQuartileIntensity_PH3: double
Cells_Location_CenterMassIntensity_X_DNA: double
Cells_Location_CenterMassIntensity_X_PH3: double
Cells_Location_CenterMassIntensity_Y_DNA: double
Cells_Location_CenterMassIntensity_Y_PH3: double
Cells_Location_CenterMassIntensity_Z_DNA: double
Cells_Location_CenterMassIntensity_Z_PH3: double
Cells_Location_Center_X: double
Cells_Location_Center_Y: double
Cells_Location_Center_Z: int64
Cells_Location_MaxIntensity_X_DNA: double
Cells_Location_MaxIntensity_X_PH3: double
Cells_Location_MaxIntensity_Y_DNA: double
Cells_Location_MaxIntensity_Y_PH3: double
Cells_Location_MaxIntensity_Z_DNA: double
Cells_Location_MaxIntensity_Z_PH3: double
Cells_Number_Object_Number: int64
Nuclei_AreaShape_Area: int64
Nuclei_AreaShape_BoundingBoxArea: int64
Nuclei_AreaShape_BoundingBoxMaximum_X: int64
Nuclei_AreaShape_BoundingBoxMaximum_Y: int64
Nuclei_AreaShape_BoundingBoxMinimum_X: int64
Nuclei_AreaShape_BoundingBoxMinimum_Y: int64
Nuclei_AreaShape_Center_X: double
Nuclei_AreaShape_Center_Y: double
Nuclei_AreaShape_Compactness: double
Nuclei_AreaShape_Eccentricity: double
Nuclei_AreaShape_EquivalentDiameter: double
Nuclei_AreaShape_EulerNumber: int64
Nuclei_AreaShape_Extent: double
Nuclei_AreaShape_FormFactor: double
Nuclei_AreaShape_MajorAxisLength: double
Nuclei_AreaShape_MaxFeretDiameter: double
Nuclei_AreaShape_MaximumRadius: double
Nuclei_AreaShape_MeanRadius: double
Nuclei_AreaShape_MedianRadius: double
Nuclei_AreaShape_MinFeretDiameter: double
Nuclei_AreaShape_MinorAxisLength: double
Nuclei_AreaShape_Orientation: double
Nuclei_AreaShape_Perimeter: double
Nuclei_AreaShape_Solidity: double
Nuclei_AreaShape_Zernike_0_0: double
Nuclei_AreaShape_Zernike_1_1: double
Nuclei_AreaShape_Zernike_2_0: double
Nuclei_AreaShape_Zernike_2_2: double
Nuclei_AreaShape_Zernike_3_1: double
Nuclei_AreaShape_Zernike_3_3: double
Nuclei_AreaShape_Zernike_4_0: double
Nuclei_AreaShape_Zernike_4_2: double
Nuclei_AreaShape_Zernike_4_4: double
Nuclei_AreaShape_Zernike_5_1: double
Nuclei_AreaShape_Zernike_5_3: double
Nuclei_AreaShape_Zernike_5_5: double
Nuclei_AreaShape_Zernike_6_0: double
Nuclei_AreaShape_Zernike_6_2: double
Nuclei_AreaShape_Zernike_6_4: double
Nuclei_AreaShape_Zernike_6_6: double
Nuclei_AreaShape_Zernike_7_1: double
Nuclei_AreaShape_Zernike_7_3: double
Nuclei_AreaShape_Zernike_7_5: double
Nuclei_AreaShape_Zernike_7_7: double
Nuclei_AreaShape_Zernike_8_0: double
Nuclei_AreaShape_Zernike_8_2: double
Nuclei_AreaShape_Zernike_8_4: double
Nuclei_AreaShape_Zernike_8_6: double
Nuclei_AreaShape_Zernike_8_8: double
Nuclei_AreaShape_Zernike_9_1: double
Nuclei_AreaShape_Zernike_9_3: double
Nuclei_AreaShape_Zernike_9_5: double
Nuclei_AreaShape_Zernike_9_7: double
Nuclei_AreaShape_Zernike_9_9: double
Nuclei_Children_Cells_Count: int64
Nuclei_Children_Cytoplasm_Count: int64
Nuclei_Children_PH3_Count: int64
Nuclei_Intensity_IntegratedIntensityEdge_DNA: double
Nuclei_Intensity_IntegratedIntensityEdge_PH3: double
Nuclei_Intensity_IntegratedIntensity_DNA: double
Nuclei_Intensity_IntegratedIntensity_PH3: double
Nuclei_Intensity_LowerQuartileIntensity_DNA: double
Nuclei_Intensity_LowerQuartileIntensity_PH3: double
Nuclei_Intensity_MADIntensity_DNA: double
Nuclei_Intensity_MADIntensity_PH3: double
Nuclei_Intensity_MassDisplacement_DNA: double
Nuclei_Intensity_MassDisplacement_PH3: double
Nuclei_Intensity_MaxIntensityEdge_DNA: double
Nuclei_Intensity_MaxIntensityEdge_PH3: double
Nuclei_Intensity_MaxIntensity_DNA: double
Nuclei_Intensity_MaxIntensity_PH3: double
Nuclei_Intensity_MeanIntensityEdge_DNA: double
Nuclei_Intensity_MeanIntensityEdge_PH3: double
Nuclei_Intensity_MeanIntensity_DNA: double
Nuclei_Intensity_MeanIntensity_PH3: double
Nuclei_Intensity_MedianIntensity_DNA: double
Nuclei_Intensity_MedianIntensity_PH3: double
Nuclei_Intensity_MinIntensityEdge_DNA: double
Nuclei_Intensity_MinIntensityEdge_PH3: double
Nuclei_Intensity_MinIntensity_DNA: double
Nuclei_Intensity_MinIntensity_PH3: double
Nuclei_Intensity_StdIntensityEdge_DNA: double
Nuclei_Intensity_StdIntensityEdge_PH3: double
Nuclei_Intensity_StdIntensity_DNA: double
Nuclei_Intensity_StdIntensity_PH3: double
Nuclei_Intensity_UpperQuartileIntensity_DNA: double
Nuclei_Intensity_UpperQuartileIntensity_PH3: double
Nuclei_Location_CenterMassIntensity_X_DNA: double
Nuclei_Location_CenterMassIntensity_X_PH3: double
Nuclei_Location_CenterMassIntensity_Y_DNA: double
Nuclei_Location_CenterMassIntensity_Y_PH3: double
Nuclei_Location_CenterMassIntensity_Z_DNA: double
Nuclei_Location_CenterMassIntensity_Z_PH3: double
Nuclei_Location_Center_X: double
Nuclei_Location_Center_Y: double
Nuclei_Location_Center_Z: int64
Nuclei_Location_MaxIntensity_X_DNA: double
Nuclei_Location_MaxIntensity_X_PH3: double
Nuclei_Location_MaxIntensity_Y_DNA: double
Nuclei_Location_MaxIntensity_Y_PH3: double
Nuclei_Location_MaxIntensity_Z_DNA: double
Nuclei_Location_MaxIntensity_Z_PH3: double
Nuclei_Number_Object_Number: int64
-- schema metadata --
data-producer: 'https://github.com/cytomining/CytoTable'
data-producer-version: '0.0.15.post15.dev0+c2a924c'