Package Overview

legend-dataflow-scripts provides the Python scripts and library utilities that power the LEGEND-200 data production pipeline. The package is designed to calibrate and optimise hundreds of HPGe detector channels in parallel and then merge the results before building the final analysis-ready data tiers.

Note

The package is intended to be run by the legend-dataflow. Direct invocation of the CLI entry points is possible for development and testing.

Architecture

Data are processed in sequential tiers, each represented by an LH5 (HDF5-based) file. The pipeline is:

Raw detector data (raw tier)
        │
        ▼
┌───────────────────────────┐
│  DSP parameter optimisation│
│  (par/geds/dsp/)           │
│  - PZ correction           │
│  - Noise optimisation      │
│  - Energy optimisation     │
│  - Event selection         │
│  - DPLMS filter            │
│  - SVM classifier          │
└──────────┬────────────────┘
           │  par_dsp.yaml
           ▼
┌───────────────────────────┐
│    build-tier-dsp          │  (tier/dsp.py)
│    dspeed processing       │
└──────────┬────────────────┘
           │  dsp LH5 file
           ▼
┌───────────────────────────┐
│  HIT parameter optimisation│
│  (par/geds/hit/)           │
│  - Quality cuts            │
│  - Energy calibration      │
│  - A/E calibration         │
│  - LQ calibration          │
└──────────┬────────────────┘
           │  par_hit.yaml
           ▼
┌───────────────────────────┐
│    build-tier-hit          │  (tier/hit.py)
│    pygama hit builder      │
└──────────┬────────────────┘
           │  hit LH5 file
           ▼
      Physics analysis

Module Reference

The package is organised into four sub-packages:

legenddataflowscripts.tier

Data-tier building scripts. Each function is a self-contained CLI entry point that reads one tier of LH5 data and writes the next.

legenddataflowscripts.par

Calibration and parameter-optimisation scripts, organised by detector type (currently only geds for germanium detectors) and tier.

DSP parameter optimisation (par.geds.dsp):

HIT parameter optimisation (par.geds.hit):

legenddataflowscripts.utils

Shared utility functions used across the calibration and tier-building scripts.

legenddataflowscripts.workflow

Workflow infrastructure: execution environment management, file-database construction, configuration variable substitution, and catalog pre-compilation.

Key Dependencies

Package

Role

dspeed

Digital signal processing engine used by the DSP tier builder.

pygama

Calibration algorithms (energy cal, A/E, LQ, noise optimisation, etc.) and the HIT tier builder.

lgdo

LEGEND Data Object types.

lh5

LH5 file I/O.

dbetto

Configuration file handling (JSON/YAML, TextDB, validity catalogs).

pylegendmeta

LEGEND metadata access.

scikit-learn

SVM classifier training and inference.

Snakemake

Workflow scheduler that orchestrates all processing rules.