# Tutorial: NPZ embeddings + metadata to Parquet

A start-to-finish walkthrough for turning NPZ files (for example, DeepProfiler outputs) plus metadata into Parquet.
This uses a small example bundled in the repo.

## What you will accomplish

- Read NPZ feature files and matching metadata from disk.
- Combine them into Parquet with a preset that aligns common keys.
- Validate the output shape and schema.

```{admonition} If your data looks like this, change...
- NPZ in a different folder: point `source_path` there; keep `preset="deepprofiler"`.
- Memory constrained: add `chunk_size=10000` to the convert call.
- `.npy` files or plain CSV feature tables: this tutorial/preset does not cover them; use the CellProfiler CSV/SQLite flows instead.
```

## Setup (copy-paste)

```bash
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install cytotable
```

## Inputs and outputs

- **Input:** Example NPZ + metadata in this repo: `tests/data/deepprofiler/pycytominer_example`
- **Output:** A Parquet file under `./outputs/deepprofiler_example.parquet`

## Step 1: define your paths

```bash
export SOURCE_PATH="tests/data/deepprofiler/pycytominer_example"
export DEST_PATH="./outputs/deepprofiler_example.parquet"
mkdir -p "$DEST_PATH"
```

## Step 2: run the conversion

```python
import os
import cytotable

source_path = os.environ["SOURCE_PATH"]
dest_path = os.environ["DEST_PATH"]

result = cytotable.convert(
    source_path=source_path,
    source_datatype="npz",
    dest_path=dest_path,
    dest_datatype="parquet",
    preset="deepprofiler",
    concat=True,
    join=False,
)

print(result)
```

Notes (why these flags matter):

- `preset="deepprofiler"` aligns NPZ feature arrays with metadata columns.
- `concat=True` merges multiple NPZ shards.
- `join=False` writes per-table Parquet files (the preset produces `all_files.npz` as the logical table).

## Step 3: validate the output

You should see `deepprofiler_example.parquet` in `DEST_PATH`.
Opening it with Pandas or PyArrow should show non-zero rows and both feature (`efficientnet_*`) and metadata columns.

## What success looks like

- A Parquet file `deepprofiler_example.parquet` exists in `DEST_PATH`.
- DuckDB/Pandas can read the file; row count is non-zero.
- Feature columns (for example, `efficientnet_*`) and metadata columns (plate/well/site) both appear.