Package Overview¶
legend-dataflow-scripts provides the Python scripts and library utilities that power the LEGEND-200 data production pipeline. The package is designed to calibrate and optimise hundreds of HPGe detector channels in parallel and then merge the results before building the final analysis-ready data tiers.
Note
The package is intended to be run by the legend-dataflow. Direct invocation of the CLI entry points is possible for development and testing.
Architecture¶
Data are processed in sequential tiers, each represented by an LH5 (HDF5-based) file. The pipeline is:
Raw detector data (raw tier)
│
▼
┌───────────────────────────┐
│ DSP parameter optimisation│
│ (par/geds/dsp/) │
│ - PZ correction │
│ - Noise optimisation │
│ - Energy optimisation │
│ - Event selection │
│ - DPLMS filter │
│ - SVM classifier │
└──────────┬────────────────┘
│ par_dsp.yaml
▼
┌───────────────────────────┐
│ build-tier-dsp │ (tier/dsp.py)
│ dspeed processing │
└──────────┬────────────────┘
│ dsp LH5 file
▼
┌───────────────────────────┐
│ HIT parameter optimisation│
│ (par/geds/hit/) │
│ - Quality cuts │
│ - Energy calibration │
│ - A/E calibration │
│ - LQ calibration │
└──────────┬────────────────┘
│ par_hit.yaml
▼
┌───────────────────────────┐
│ build-tier-hit │ (tier/hit.py)
│ pygama hit builder │
└──────────┬────────────────┘
│ hit LH5 file
▼
Physics analysis
Module Reference¶
The package is organised into four sub-packages:
legenddataflowscripts.tier¶
Data-tier building scripts. Each function is a self-contained CLI entry point that reads one tier of LH5 data and writes the next.
legenddataflowscripts.par¶
Calibration and parameter-optimisation scripts, organised by detector type
(currently only geds for germanium detectors) and tier.
DSP parameter optimisation (par.geds.dsp):
HIT parameter optimisation (par.geds.hit):
legenddataflowscripts.utils¶
Shared utility functions used across the calibration and tier-building scripts.
legenddataflowscripts.workflow¶
Workflow infrastructure: execution environment management, file-database construction, configuration variable substitution, and catalog pre-compilation.
Key Dependencies¶
Package |
Role |
|---|---|
Digital signal processing engine used by the DSP tier builder. |
|
Calibration algorithms (energy cal, A/E, LQ, noise optimisation, etc.) and the HIT tier builder. |
|
LEGEND Data Object format and LH5 file I/O. |
|
Configuration file handling (JSON/YAML, TextDB, validity catalogs). |
|
LEGEND metadata access. |
|
scikit-learn |
SVM classifier training and inference. |
Snakemake |
Workflow scheduler that orchestrates all processing rules. |