SQI Pipeline and Calibration

This page describes in full detail:

How raw physiological signals are segmented and passed through the SQI extraction pipeline.
How each SQI is computed, routed, and packaged into a result DataFrame.
How the rule-based classifier works and makes accept/reject decisions.
How the calibration system derives thresholds empirically from synthetic signals.

All source lives in vital_sqi/pipeline/pipeline_functions.py, vital_sqi/rule/, and vital_sqi/calibration/.

—

Part 1 — SQI Extraction Pipeline 

Overview 

The extraction pipeline transforms a list of signal segments into a structured DataFrame of quality metrics. The call chain is:

extract_sqi(segments, milestones, sqi_dict_filename)
    └── for each segment:
            extract_segment_sqi(segment, sqi_list, sqi_names, sqi_arg_list, wave_type)
                ├── [once] get_nn(signal)          # nn_intervals cache
                └── for each SQI:
                        get_sqi(sqi_func, sqi_name, segment, ...)
                            ├── [if per_beat] per_beat_sqi(...)
                            └── [else]         sqi_func(signal, **kwargs)
                                    └── get_sqi_dict(result, sqi_name)

Step 1 — `extract_sqi` (entry point)

df = extract_sqi(segments, milestones, "vital_sqi/resource/sqi_dict.json", wave_type="PPG")

Inputs

segments

A list of DataFrames produced by split_segment(). Each DataFrame has two columns: time (timestamps) and signal (raw waveform values).

milestones

A two-column DataFrame with start_idx and end_idx (sample positions in the original recording) for each segment.

sqi_dict_filename

Path to a JSON configuration file. Each top-level key is a user-chosen label; each value is a dict with:

"sqi" — the function name registered in sqi_mapping (e.g. "kurtosis_sqi").
"args" — keyword arguments forwarded to the SQI function.

Example:

{
  "perfusion":  {"sqi": "perfusion_sqi",  "args": {}},
  "kurtosis":   {"sqi": "kurtosis_sqi",   "args": {"axis": 0}},
  "poincare":   {"sqi": "poincare_sqi",   "args": {}}
}

What it does

Loads sqi_dict_filename and resolves each "sqi" key to a callable from sqi_mapping.
Iterates over every segment (with a tqdm progress bar).
Calls extract_segment_sqi() for each segment, collecting a pd.Series of SQI values.
Assembles all series into a pd.DataFrame (one row per segment).
Appends start_idx / end_idx columns from milestones.

Output

A DataFrame with one row per segment. Columns are the SQI labels from the config file. Multi-value SQIs (e.g. poincare_sqi which returns sd1, sd2, area, ratio) expand into sub-columns. Per-beat SQIs that return lists expand into _mean_sqi, _median_sqi, and _std_sqi columns.

Step 2 — `extract_segment_sqi` (per-segment)

Called once per segment. Responsible for:

Extracting the raw signal array from the segment DataFrame once (avoiding repeated iloc calls inside the loop).
Computing nn_intervals once and caching the result for reuse by all HRV SQIs (see nn_intervals Caching).
Iterating over every (function, name) pair, calling get_sqi(), and collecting results.

Special handling:

perfusion_sqi requires two signals (raw + filtered). When it appears in the list, extract_segment_sqi() passes y=signal_values directly and skips the standard dispatch path.
Any SQI that raises an exception is caught; a warning is emitted and that SQI column receives NaN for the segment.

nn_intervals Caching

HRV SQIs (sdnn_sqi, rmssd_sqi, poincare_sqi, etc.) take nn_intervals as their first positional argument rather than the raw signal. NN intervals are derived from peak detection, which involves a Kalman filter and is expensive (~14 ms per segment at 100 Hz).

Without caching, get_nn would be called once per HRV SQI (12–18 calls per segment). The optimisation: extract_segment_sqi inspects the first argument name of each SQI function using inspect.getfullargspec (cached in _argspec_cache at module level so introspection runs only once per function lifetime) and routes HRV SQIs through a single shared _nn_cache:

_nn_cache = None
for sqi_func, sqi_name in zip(sqi_list, sqi_names):
    first_arg = _argspec_cache[sqi_func][0]
    if first_arg == "nn_intervals":
        if _nn_cache is None:
            _nn_cache = get_nn(signal_values)   # computed once
        args["_nn_intervals"] = _nn_cache       # injected into get_sqi

This reduces peak-detection calls from N (one per HRV SQI) to 1 per segment, giving a ~3.6x speedup.

Step 3 — `get_sqi` (per-SQI dispatch)

Handles two execution modes:

Whole-segment mode (per_beat=False, default): The SQI function is called on the full signal array (or on nn_intervals for HRV SQIs). The result — scalar, dict, or list — is packaged by get_sqi_dict().
Per-beat mode (per_beat=True): extract_segment_sqi() runs peak detection once per segment on the first per-beat SQI encountered and caches the resulting peak_list / trough_list. All subsequent per-beat SQIs in the same segment reuse the cached peaks without re-running the detector. get_sqi() accepts the cached peaks via the _peak_list / _trough_list private keyword arguments; per_beat_sqi() then slices the signal into individual beats and calls the SQI function on each. If use_mean_beat=True (default), all beats are resampled to mean_resample_size samples, averaged, and the SQI is computed once on the mean beat (one value per segment).

wave_type is forwarded to every SQI function whose signature includes a wave_type parameter. sampling_rate / sample_rate arguments in sqi_arg_list are automatically updated to the actual signal sampling rate before calling (via _patch_fs in the calibration runner).

Step 4 — `get_sqi_dict` (result packaging)

Converts whatever the SQI function returns into a {column: value} dict that maps cleanly to DataFrame columns:

`get_sqi_dict` rules
Raw return type	Column(s) produced
`dict`	Returned unchanged (e.g. `poincare_sqi` → `{"sd1": …, "sd2": …, "area": …, "ratio": …}`).
scalar (`float` / `int`)	`{sqi_name: scalar}`.
1-element `list` / `ndarray`	`{sqi_name: value}`.
multi-element `list` / `ndarray`	`{sqi_name_mean_sqi, sqi_name_median_sqi, sqi_name_std_sqi}`.

—

Part 2 — SQI Catalogue 

The following table lists every SQI registered in sqi_mapping. SQIs whose first argument is nn_intervals are automatically routed through the cached NN-interval path. All others receive the raw signal.

Registered SQIs
Function name	Input type	What it measures
`perfusion_sqi`	raw + filtered	`(max - min of filtered) / abs(mean of raw) * 100`
`kurtosis_sqi`	signal	Tail heaviness of the amplitude distribution
`skewness_sqi`	signal	Asymmetry of the amplitude distribution
`entropy_sqi`	signal	Shannon entropy of the normalised amplitude histogram
`signal_to_noise_sqi`	signal	`abs(mean) / std` — higher is cleaner
`zero_crossings_rate_sqi`	signal	Rate of sign changes — high value indicates noise
`mean_crossing_rate_sqi`	signal	Rate at which signal crosses its own mean
`clipping_sqi`	signal	Fraction of samples at the amplitude rail (saturation)
`baseline_wander_sqi`	signal	Low-frequency (<0.5 Hz) energy fraction — drift indicator
`spectral_snr_sqi`	signal	In-band vs out-of-band power ratio (dB)
`ectopic_sqi`	signal	Ratio of ectopic/outlier RR intervals
`correlogram_sqi`	signal	Mean of top ACF peaks — periodicity score
`msq_sqi`	signal	Peak-detector agreement ratio between two algorithms
`amplitude_consistency_sqi`	signal	Beat-to-beat peak amplitude coefficient of variation
`band_energy_sqi`	signal	Peak STFT time-marginal in a frequency band
`lfe_sqi`	signal	Low-frequency (0–0.5 Hz) STFT energy
`qrse_sqi`	signal	QRS-band (5–25 Hz) STFT energy
`hfe_sqi`	signal	High-frequency (100+ Hz) STFT energy
`vhfp_sqi`	signal	Very-high-frequency normalised power (>150 Hz)
`qrsa_sqi`	signal	Median QRS amplitude via WaveformMorphology
`dtw_sqi`	signal	DTW distance to a reference template (4 types)
`sdnn_sqi`	nn_intervals	Sample std of NN intervals
`sdsd_sqi`	nn_intervals	Sample std of successive NN differences
`rmssd_sqi`	nn_intervals	Root mean square of successive differences
`cvsd_sqi`	nn_intervals	RMSSD / mean NN (normalised short-term variability)
`cvnn_sqi`	nn_intervals	SDNN / mean NN (normalised long-term variability)
`mean_nn_sqi`	nn_intervals	Mean of NN intervals
`median_nn_sqi`	nn_intervals	Median of NN intervals
`pnn_sqi`	nn_intervals	% of successive differences strictly > threshold (pNN50)
`hr_mean_sqi`	nn_intervals	Mean heart rate in BPM
`hr_median_sqi`	nn_intervals	Median heart rate in BPM
`hr_min_sqi`	nn_intervals	Minimum instantaneous heart rate
`hr_max_sqi`	nn_intervals	Maximum instantaneous heart rate
`hr_std_sqi`	nn_intervals	Sample std of instantaneous heart rate
`hr_range_sqi`	nn_intervals	Fraction of HR values outside [range_min, range_max]
`peak_frequency_sqi`	nn_intervals	Spectral peak frequency in the LF band (0.04–0.15 Hz)
`absolute_power_sqi`	nn_intervals	Absolute power in the LF band
`log_power_sqi`	nn_intervals	Log power in the LF band
`relative_power_sqi`	nn_intervals	LF band power / total power
`normalized_power_sqi`	nn_intervals	LF band power / (total power − VLF power), per HRV Task Force 1996
`lf_hf_ratio_sqi`	nn_intervals	LF/HF power ratio (sympatho-vagal balance)
`poincare_sqi`	nn_intervals	SD1, SD2, ellipse area, SD1/SD2 ratio
`rr_irregularity_sqi`	nn_intervals	MAD of successive RR diffs / median RR
`sample_entropy_sqi`	nn_intervals	Sample entropy — complexity / regularity of RR series
`dfa_sqi`	nn_intervals	DFA alpha1 — short-range fractal scaling exponent
`hurst_sqi`	nn_intervals	Hurst exponent — long-range correlation (R/S analysis)

—

Part 3 — Rule-Based Classification 

Overview 

After SQI extraction, every segment receives an "accept" or "reject" decision by running a RuleSet over its SQI row. The workflow:

classify_segments(sqis, rule_dict_filename, ruleset_order)
    ├── load rule_dict.json
    ├── [auto_mode] adjust thresholds to observed p5/p95
    ├── build Rule objects → RuleSet
    └── for each segment row:
            RuleSet.execute(row_df) → "accept" | "reject"

Rule Format 

Rules are stored in rule_dict.json. Each entry has the structure:

{
  "kurtosis_sqi": {
    "name": "kurtosis_sqi",
    "def": [
      {"op": ">",  "value": "0.5",  "label": "accept"},
      {"op": "<=", "value": "0.5",  "label": "reject"},
      {"op": ">=", "value": "8.0",  "label": "reject"},
      {"op": "<",  "value": "8.0",  "label": "accept"}
    ],
    "desc": "Calibrated from synthetic PPG signals. Accept: p5=0.5 to p95=8.0.",
    "ref":  "vital_sqi.calibration"
  }
}

The four-element "def" array encodes a half-open accept interval (lower, upper):

Condition 1&2: value > lower → accept; value <= lower → reject
Condition 3&4: value >= upper → reject; value < upper → accept

The Rule class parses this into a sorted boundaries array and a labels array, then uses bisect for O(log n) lookup.

The `Rule` class 

Rule represents a single threshold rule for one SQI.

load_def(path) — loads rule from JSON file by name.
update_def(op_list, value_list, label_list) — programmatically sets new thresholds.
apply_rule(x) — applies bisect_left on the sorted boundaries array to find the interval containing x, returns its label.
save_def(path, overwrite=False) — writes or merges the rule to JSON.

The `RuleSet` class 

RuleSet manages an ordered dict of Rule objects and executes them sequentially.

`execute(value_df)`

for order, rule in sorted(self.rules.items()):
    value = value_df.iloc[0][rule.name]
    decision = rule.apply_rule(value)
    if decision == "reject":
        return "reject"
return "accept"

This is a linear early-exit scan, not recursion. Rules are checked in ascending integer order. The moment any rule returns "reject", the loop short-circuits and returns immediately without checking the remaining rules. Only if every rule returns "accept" does the function return "accept".

This design is:

Optimal — O(R) where R is the number of rules; cannot be done faster than linear since every rule must be checked in the worst case.
Ordered — rule priority is controlled by the integer key; lower key = checked first.
Short-circuiting — reject-heavy rules placed early (low key) minimize average rule evaluations per segment.

Threshold-selection strategies (`auto_mode`)

classify_segments supports three strategies for deciding the (lower, upper) accept band on each rule:

auto_mode=False (or "manual"): Use the bounds stored in rule_dict.json verbatim. Pick this when you want to apply externally calibrated thresholds without adapting them to the current recording.
auto_mode=True (or "quantile") — default: Replace each rule’s bounds with the empirical lower_bound / upper_bound quantiles (p5 / p95 by default) of the SQI values observed across all segments. Self-adapting, but the joint accept rate falls off geometrically as you stack more rules: 5 rules at p5/p95 on uniformly clean data only accept ~60 % of segments because each rule independently trims its own tails.
auto_mode="tune": Per-rule quantile is computed automatically so the joint accept rate (under the independence approximation) hits target_accept_rate (default 0.85). Solving the standard equation target = (1 - 2q)^n for the symmetric trim q and n rules: with 5 rules and target 0.85 each rule keeps ~96.8 % of its distribution, i.e. bands at p1.6 / p98.4. Much more forgiving than plain "quantile" mode when several rules are active.

Degenerate rules — SQIs whose distribution is constant across the recording (e.g. zero_crossings_rate_sqi == 0 for mean-centred clean PPG) — are dropped from the rule set with a warning instead of producing a “reject everything” 0-width band.

See vital_sqi.rule.auto_threshold for the underlying helpers, which the Inspect view in the web app reuses for its live preview.

The independence-approximation math

Auto-tune mode picks a per-rule quantile q such that the joint accept rate hits the user’s target. Under the assumption that rules are independent — a reasonable simplification for SQIs that measure different facets of the signal — the joint accept rate is the product of the per-rule rates:

\[P(\text{accept}) = \prod_{i=1}^{n} P(\text{accept}_i) = (1 - 2q)^{n}\]

Solving for the symmetric trim q given a target p:

\[q = \frac{1 - p^{1/n}}{2}\]

That’s exactly what vital_sqi.rule.auto_threshold.per_rule_quantile() computes. For typical configurations:

`n_rules`	`p`	per-rule keep rate	per-rule trim `q`
1	0.85	0.850	7.5 % (p7.5 / p92.5)
3	0.85	0.947	2.6 % (p2.6 / p97.4)
5	0.85	0.968	1.6 % (p1.6 / p98.4)
10	0.85	0.984	0.8 % (p0.8 / p99.2)
5	0.95	0.990	0.5 % (p0.5 / p99.5)

In practice rules are not perfectly independent — kurtosis and dtw both react to morphology disturbances, for instance — so the empirical joint accept rate is typically ± 3 percentage points of the target. Good enough for an interactive UI control; if you need an exact rate, use the iterative widening strategy in vital_sqi.rule.auto_threshold (not yet exposed, see the auto_threshold module for the building blocks).

Worked example — auto-tune on a clean PPG recording

The reference recording is a 2-hour OUCRU SmartCare PPG file (100 Hz, 833 800 samples → 278 segments at 30 s windows). Same SQI catalogue, same five-rule default subset (kurtosis_sqi, perfusion_sqi, correlogram_sqi, msq_sqi, dtw_sqi); only the auto_mode configuration changes:

`auto_mode`	accept rate (278 segments)
`"quantile"`, p5 / p95	60.1 % — over-restrictive
`"quantile"`, p1 / p99	83.8 %
`"tune"`, target 0.80	76.6 %
`"tune"`, target 0.85	80.9 % — matches target
`"tune"`, target 0.90	83.8 %

The p5 / p95 row shows the failure mode: with 5 independent rules each trimming 10 % of the distribution, the joint accept ceiling is 0.9^5 ≈ 59 % — which is exactly what the table shows on a clean recording. Tune mode at 85 % is the recommended default for interactive use; the GUI’s Threshold mode → Auto-tune slider exposes this directly.

Implementation outline

classify_segments dispatches on auto_mode in three branches:

if auto_mode == "manual":
    # Use the bounds in rule_dict.json verbatim.
    rule = generate_rule(name, rule_dict[name]["def"])

elif auto_mode == "quantile":
    # Per-column p5/p95-style band; drop degenerate columns.
    band = auto_threshold.quantile_band(
        name, sqi_df[name].values,
        lower_pct=lower_bound, upper_pct=upper_bound,
    )
    if band is None:
        continue  # degenerate or all-NaN; drop with warning

elif auto_mode == "tune":
    # Pre-compute every band so per-rule quantile is uniform.
    all_bands = auto_threshold.tuned_bands(
        {name: sqi_df[name].values for name in candidate_columns},
        target_accept_rate=target_accept_rate,
    )

After the loop the surviving bands populate a single RuleSet; consecutive integer priorities are re-assigned so the rule set’s “no gaps starting at 1” invariant holds even after drops. Classification is then the existing linear early-exit scan from RuleSet.execute.

Degenerate-band guard

Both quantile and tune modes share a single test:

band_width = upper - lower
if band_width < 1e-6:
    # Column is essentially constant; skip with warning.
    continue

The threshold (DEGENERATE_BAND_HALF_WIDTH = 1e-6) is the same one used by vital_sqi.calibration.threshold_estimator for its constant-column epsilon guard. This catches:

zero_crossings_rate_sqi on already-mean-centred clean signals (always 0).
ectopic_sqi on recordings with no detected ectopics (rule_index=0 → always returns the no-ectopic outlier ratio of ~1.0).
hfe_sqi whose default band sits above Nyquist for sampling_rate=100 (always 0).

Each dropped rule logs a warning naming the column and the observed band; the user sees them as Auto-skipped: … in the Inspect view’s rule panel.

Strictest-rule detection

The Inspect view’s Drop strictest rule button calls vital_sqi.rule.auto_threshold.strictest_columns() over the per-rule rejection tally:

counts = {"kurtosis_sqi": 23, "perfusion_sqi": 28, "msq_sqi": 28,
          "correlogram_sqi": 21, "dtw_sqi": 11}
median = 23
mad    = 5
threshold = median + 3*MAD = 38
# No outliers — all counts within bounds.

This uses the modified Z-score (median + k × MAD) rather than mean + k·σ, because the very outlier you’re trying to detect would inflate the mean and standard deviation and hide itself. The default k = 3 flags clear upward outliers without false-positive nags on roughly-even distributions.

`ruleset_order` and `get_decision_segments`

ruleset_order is a dict mapping integer priority keys to rule names:

ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "sdnn_sqi"}

Only the SQIs listed here participate in classification. All other SQI columns in the DataFrame are ignored.

After classification, get_decision_segments(segments, decision, reject_decision) separates segments into accepted and rejected lists by combining the rule-set decision with any pre-existing rejection flags.

—

Part 4 — Peak Detection Algorithms 

Overview 

Peak detection is used by:

All per_beat=True SQIs (DTW, skewness, kurtosis on individual beats).
HRV SQIs — get_nn() calls the PPG/ECG detector to derive RR intervals.
ectopic_sqi, correlogram_sqi, msq_sqi, amplitude_consistency_sqi.

The PeakDetector class exposes two independent method families: ppg_detector (9 algorithms) and ecg_detector (4 algorithms).

PPG Detectors 

Select via the peak_detector argument in sqi_dict.json or via the detector_type constant imported from vital_sqi.common.rpeak_detection.

PPG detection algorithms
Constant	Value	Description
`DEFAULT`	6	vitalDSP `WaveformMorphology.systolic_peaks` — height + distance threshold on the band-passed signal. Recommended default.
`ADAPTIVE_THRESHOLD`	1	Threshold adapts to the local signal amplitude (running max). Suitable for recordings with slow baseline drift.
`COUNT_ORIG_METHOD`	2	Local maxima above a `0.75`-quantile × 0.2 threshold. Fast but sensitive to non-zero DC baselines.
`CLUSTERER_METHOD`	3	KMeans (k=2) separates peaks from non-peaks. Non-deterministic; set `random_state` for reproducibility.
`SLOPE_SUM_METHOD`	4	Zong 2003 slope-sum energy with onset back-search. Robust on noisy PPG; requires `fs`-scaled windows.
`MOVING_AVERAGE_METHOD`	5	Elgendi two-moving-average (MA_peak > MA_beat). Returns raw-signal peaks within each detected “block of interest”.
`BILLAUER_METHOD`	7	Billauer delta-based peak/trough tracker. Records `mxpos` (true peak sample) not the detection crossing.
`AMPD_METHOD`	8	Automatic Multiscale Peak Detection (Scholkmann 2012). Parameter-free; works well on quasi-periodic PPG at 50–250 Hz.
`LOCAL_MAX_IBI`	9	Local-maxima with inter-beat-interval gating. Filters candidate peaks by physiological IBI range (300–2000 ms).

ECG Detectors 

Used by ecg_detector(s, detector_type=ECG_DEFAULT). All algorithms detect R-peaks; Q/S/P/T morphology extraction always uses vitalDSP WaveformMorphology, anchored to the chosen R-peak indices.

ECG detection algorithms
Constant	Value	Description
`ECG_DEFAULT`	10	vitalDSP `WaveformMorphology` derivative-square-MWI with height threshold. Provides R + Q/S/P/T in a single call. Recommended default.
`PAN_TOMPKINS`	11	Classic Pan-Tompkins 1985. Bandpass 5–15 Hz → derivative → squaring → 150 ms MWI → adaptive dual threshold with 2 s learning period. Back-projects within ±60 ms for true R-peak.
`HAMILTON`	12	Hamilton-Tompkins simplified 2002. Bandpass 8–16 Hz → first difference → 80 ms moving average → mean+0.5σ threshold. Single-pass, suitable for real-time use.
`ENGZEE`	13	Engzee-Zeelenberg 1979. High-pass filtered derivative with a dynamic threshold that adapts after each detected beat.

Usage example:

from vital_sqi.common.rpeak_detection import (
    PeakDetector, ECG_DEFAULT, PAN_TOMPKINS, HAMILTON, ENGZEE,
    DEFAULT, AMPD_METHOD
)

ppg_detector = PeakDetector(wave_type="PPG", fs=100)
peaks, troughs = ppg_detector.ppg_detector(ppg_signal)          # DEFAULT
peaks, troughs = ppg_detector.ppg_detector(ppg_signal, AMPD_METHOD)

ecg_detector = PeakDetector(wave_type="ECG", fs=256)
r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal)           # ECG_DEFAULT
r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal, PAN_TOMPKINS)

—

Part 5 — Calibration System 

Purpose 

The calibration system derives lower and upper threshold bounds for every SQI empirically, using large pools of synthetic clean and noisy signals. Results are written to:

vital_sqi/resource/rule_dict.json — threshold rules for the classifier.
vital_sqi/resource/sqi_dict.json — SQI function registry (only successfully calibrated SQIs are included).

Architecture 

run_calibration.calibrate()
    ├── 1. generate_clean_ppg / generate_clean_ecg   (signal_generator.py)
    ├── 2. inject_noise × NOISE_PROFILES              (noise_injector.py)
    │       ├── accept pool: clean + very-mild-noise segments
    │       └── reject pool: moderate-to-severe noise segments
    ├── 3. compute_sqi_distributions(accept + reject) (sqi_runner.py)
    ├── 4. estimate_thresholds(accept_df, reject_df)  (threshold_estimator.py)
    └── 5. export_rule_dict + export_sqi_dict         (exporter.py)

Step 1 — Signal Generation 

generate_clean_ppg() and generate_clean_ecg() produce n_segments clean synthetic waveforms using vitalDSP’s physiological signal generators.

PPG: sampling rate 100 Hz, default segment duration 30 s.
ECG: sampling rate 256 Hz, default segment duration 30 s.
noise_floor=0.0 — no noise added at generation time.
Each segment is a (signal_array, fs) tuple.

Step 2 — Noise Injection 

inject_noise() applies one of 11 noise types to a clean segment:

Noise types
Type	Description
`gaussian`	Additive white Gaussian noise scaled to `amplitude × peak-to-peak`
`baseline_wander`	Sinusoidal drift at 0.05–0.5 Hz (respiration-like)
`motion`	3 burst-style high-amplitude artifacts at random positions
`harmonic`	Sum of sine/cosine harmonics at a random base frequency 1–15 Hz
`powerline`	50 Hz or 60 Hz sinusoidal interference
`polynomial`	Degree 2–4 polynomial trend (temperature / electrode drift)
`impulse`	Random isolated spikes at ~2% of samples
`pink`	1/f colored noise via FFT shaping
`brown`	1/f² colored noise (Brownian motion-like)
`time_shift`	Circular roll by a random number of samples (sync jitter)
`clock_drift`	Resample ±amplitude% then trim/pad (wrong fs label simulation)
`dropout`	Replace amplitude% of samples with their predecessor (packet loss)
`combined_mild`	Gaussian + baseline wander + polynomial (mild realistic composite)
`combined_severe`	Gaussian + harmonic + motion + impulse (severe realistic composite)

NOISE_PROFILES defines 33 (label, noise_type, amplitude) triples.

The accept pool includes:

All n_segments pure clean segments.
All segments from profiles labelled clean or gaussian_very_mild (amplitude 0.02 — barely perceptible noise).

The reject pool includes segments from all other profiles (amplitude 0.05–0.80, representing clinically poor signal quality). The reject pool uses only n_reject_segments (default 50) per profile for speed — it is used for diagnostics only and does not influence the p5/p95 bounds.

Step 3 — SQI Distribution Computation 

compute_sqi_distributions() runs all 47 SQIs over every segment in both pools:

accept_df = compute_sqi_distributions(accept_segments, wave_type="PPG")
reject_df  = compute_sqi_distributions(reject_segments, wave_type="PPG")

Each call returns a DataFrame with one row per segment and one column per SQI output. poincare_sqi expands to 4 sub-columns (sd1, sd2, area, ratio). Failed SQIs are NaN for that segment.

_patch_fs automatically updates any sample_rate or sampling_rate keyword argument to the actual segment sampling rate.

Step 4 — Threshold Estimation 

estimate_thresholds() processes each SQI column independently:

for each column in accept_df:
    accept_vals = drop NaN and Inf
    if n_valid < 5:
        skip (mark calibrated=False)
    lower = percentile(accept_vals, lower_pct)   # default p5
    upper = percentile(accept_vals, upper_pct)   # default p95
    if upper - lower < 1e-6:
        widen band by epsilon (constant signal guard)
    if reject distribution overlaps accept band:
        record note (thresholds are NOT changed)
    mark calibrated=True

The result for each column is a SQIThreshold dataclass containing:

Field	Meaning
`lower`, `upper`	Accept region bounds
`accept_median`, `accept_std`	Statistics of the clean distribution
`reject_median`	Median of the noisy distribution
`n_accept`, `n_reject`	Sample counts used
`calibrated`	False if SQI could not be calibrated
`note`	Human-readable diagnostic note

SQIs that cannot be calibrated (always NaN on synthetic data):

interpolation_sqi — unimplemented stub, always 0.
vhfp_sqi — band (>150 Hz) is above the Nyquist limit for 100/256 Hz signals.
sample_entropy_sqi, dfa_sqi, hurst_sqi — require natural HRV variability; synthetic signals have constant RR intervals (std=0), making entropy/DFA/Hurst undefined. These SQIs are fully functional on real recordings.

Step 5 — Export 

rule_dict.json — written by export_rule_dict().

Each calibrated SQI produces a four-element "def" array encoding the accept interval (lower, upper):

"kurtosis_sqi": {
  "name": "kurtosis_sqi",
  "def": [
    {"op": ">",  "value": "0.497", "label": "accept"},
    {"op": "<=", "value": "0.497", "label": "reject"},
    {"op": ">=", "value": "8.21",  "label": "reject"},
    {"op": "<",  "value": "8.21",  "label": "accept"}
  ],
  "desc": "Calibrated from synthetic PPG signals. Accept: p5=0.497 to p95=8.21."
}

Existing entries for SQIs that were not calibrated in the current run are preserved unchanged (merge, not overwrite). This protects manually curated thresholds.

sqi_dict.json — written by export_sqi_dict().

Only SQIs that were successfully calibrated AND appear in _SQI_ARG_TEMPLATES are emitted. The output is a ready-to-use configuration file for extract_sqi().

Running Calibration 

From the repo root:

# PPG calibration (default: 200 accept segments, 50 reject per profile)
python -m vital_sqi.calibration.run_calibration --wave_type PPG

# ECG calibration
python -m vital_sqi.calibration.run_calibration --wave_type ECG

# Dry run — compute thresholds but do not write files
python -m vital_sqi.calibration.run_calibration --wave_type PPG --dry_run

# All options
python -m vital_sqi.calibration.run_calibration \
    --wave_type PPG \
    --n_segments 500 \
    --n_reject_segments 100 \
    --duration 30 \
    --lower_pct 5 \
    --upper_pct 95 \
    --output_dir path/to/output \
    --seed 42

Or from Python:

from vital_sqi.calibration.run_calibration import calibrate

thresholds = calibrate(
    wave_type="PPG",
    n_segments=200,
    n_reject_segments=50,
    duration=30.0,
    lower_pct=5.0,
    upper_pct=95.0,
    dry_run=False,
    seed=42,
)

The returned thresholds dict maps SQI column names to SQIThreshold objects and can be inspected with thresholds_to_dataframe().

—

Part 6 — Complete End-to-End Example 

import vital_sqi.data.signal_io as sio
from vital_sqi.preprocess.segment_split import split_segment
from vital_sqi.pipeline.pipeline_functions import extract_sqi, classify_segments, get_decision_segments

# 1. Load signal
df = sio.read_ppg("recording.csv", sampling_rate=100)

# 2. Segment into 30-second windows
segments, milestones = split_segment(df, duration=30, fs=100)

# 3. Extract SQIs
sqi_df = extract_sqi(
    segments, milestones,
    "vital_sqi/resource/sqi_dict.json",
    wave_type="PPG",
)

# 4. Classify — each segment gets 'accept' or 'reject'
ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "correlogram_sqi"}
ruleset, sqis_with_decisions = classify_segments(
    [sqi_df.iloc[[i]] for i in range(len(sqi_df))],
    rule_dict_filename="vital_sqi/resource/rule_dict.json",
    ruleset_order=ruleset_order,
    auto_mode=True,
)

# 5. Separate accepted / rejected segments
decisions    = [df["decision"].iloc[0] for df in sqis_with_decisions]
pre_reject   = ["accept"] * len(segments)   # or from get_reject_segments()
accepted, rejected = get_decision_segments(segments, decisions, pre_reject)

print(f"Accepted: {len(accepted)}  Rejected: {len(rejected)}")

—

Performance Notes 

A few hot paths have been hand-tuned and are worth knowing about when benchmarking or extending the library:

``_argspec_cache` in vital_sqi.pipeline.pipeline_functions memoises inspect.getfullargspec per SQI callable. The cache uses a sentinel value so SQIs with no positional arguments are still cached after the first call (an earlier or-based lookup silently recomputed for empty lists).
DTW reference templates are cached in vital_sqi.sqi.dtw_sqi keyed by (template_type, template_size). Because rr_process() is now seeded by default, ecg_dynamic_template is reproducible between runs and the cached template is identical from one call to the next.
``sample_entropy_sqi`` vectorises the inner template-match counter with sliding_window_view; complexity remains O(n²) but the inner loop runs in NumPy rather than Python.
``dfa_sqi`` uses a closed-form per-block linear detrend (no np.polyfit call per block), giving a 10-20× speed-up on long recordings.
``squeeze_template`` is fully vectorised via a cumulative-sum trick; template generation for DTW is now negligible cost.
``nn_intervals`` are computed once per segment by extract_segment_sqi() and injected into every HRV SQI via a private _nn_intervals= kwarg.

Numerical Edge Cases 

The following SQIs intentionally return np.nan (not 0) when their underlying assumption fails — so downstream rules can treat the result as “missing” rather than as a legitimate quality score:

hf_energy_sqi / vhf_norm_power_sqi — band lies above Nyquist
sample_entropy_sqi — either the m-length or (m+1)-length template match count is zero (would otherwise produce ±inf)
perfusion_sqi — raw-signal mean is near zero
baseline_wander_sqi / spectral_snr_sqi — total or out-of-band power is zero
amplitude_consistency_sqi — fewer than two peaks detected

Rules built from the calibrated rule_dict.json treat NaN as "reject" (see apply_rule()), so these guards fail safe.

SQI Pipeline and Calibration

nn_intervals Caching

execute(value_df)

Threshold-selection strategies (auto_mode)

The independence-approximation math

Worked example — auto-tune on a clean PPG recording

Implementation outline

Degenerate-band guard

Strictest-rule detection

ruleset_order and get_decision_segments

`execute(value_df)`

Threshold-selection strategies (`auto_mode`)

`ruleset_order` and `get_decision_segments`