SQI Pipeline and Calibration
This page describes in full detail:
How raw physiological signals are segmented and passed through the SQI extraction pipeline.
How each SQI is computed, routed, and packaged into a result DataFrame.
How the rule-based classifier works and makes accept/reject decisions.
How the calibration system derives thresholds empirically from synthetic signals.
All source lives in vital_sqi/pipeline/pipeline_functions.py,
vital_sqi/rule/, and vital_sqi/calibration/.
—
—
Part 1 — SQI Extraction Pipeline
Overview
The extraction pipeline transforms a list of signal segments into a structured DataFrame of quality metrics. The call chain is:
extract_sqi(segments, milestones, sqi_dict_filename)
└── for each segment:
extract_segment_sqi(segment, sqi_list, sqi_names, sqi_arg_list, wave_type)
├── [once] get_nn(signal) # nn_intervals cache
└── for each SQI:
get_sqi(sqi_func, sqi_name, segment, ...)
├── [if per_beat] per_beat_sqi(...)
└── [else] sqi_func(signal, **kwargs)
└── get_sqi_dict(result, sqi_name)
Step 1 — extract_sqi (entry point)
df = extract_sqi(segments, milestones, "vital_sqi/resource/sqi_dict.json", wave_type="PPG")
Inputs
segmentsA list of DataFrames produced by
split_segment(). Each DataFrame has two columns:time(timestamps) andsignal(raw waveform values).milestonesA two-column DataFrame with
start_idxandend_idx(sample positions in the original recording) for each segment.sqi_dict_filenamePath to a JSON configuration file. Each top-level key is a user-chosen label; each value is a dict with:
"sqi"— the function name registered insqi_mapping(e.g."kurtosis_sqi")."args"— keyword arguments forwarded to the SQI function.
Example:
{ "perfusion": {"sqi": "perfusion_sqi", "args": {}}, "kurtosis": {"sqi": "kurtosis_sqi", "args": {"axis": 0}}, "poincare": {"sqi": "poincare_sqi", "args": {}} }
What it does
Loads
sqi_dict_filenameand resolves each"sqi"key to a callable fromsqi_mapping.Iterates over every segment (with a
tqdmprogress bar).Calls
extract_segment_sqi()for each segment, collecting apd.Seriesof SQI values.Assembles all series into a
pd.DataFrame(one row per segment).Appends
start_idx/end_idxcolumns from milestones.
Output
A DataFrame with one row per segment. Columns are the SQI labels from
the config file. Multi-value SQIs (e.g. poincare_sqi which returns
sd1, sd2, area, ratio) expand into sub-columns.
Per-beat SQIs that return lists expand into _mean_sqi,
_median_sqi, and _std_sqi columns.
Step 2 — extract_segment_sqi (per-segment)
Called once per segment. Responsible for:
Extracting the raw signal array from the segment DataFrame once (avoiding repeated
iloccalls inside the loop).Computing
nn_intervalsonce and caching the result for reuse by all HRV SQIs (see nn_intervals Caching).Iterating over every (function, name) pair, calling
get_sqi(), and collecting results.
Special handling:
perfusion_sqirequires two signals (raw + filtered). When it appears in the list,extract_segment_sqi()passesy=signal_valuesdirectly and skips the standard dispatch path.Any SQI that raises an exception is caught; a warning is emitted and that SQI column receives
NaNfor the segment.
nn_intervals Caching
HRV SQIs (sdnn_sqi, rmssd_sqi, poincare_sqi, etc.) take
nn_intervals as their first positional argument rather than the raw
signal. NN intervals are derived from peak detection, which involves a
Kalman filter and is expensive (~14 ms per segment at 100 Hz).
Without caching, get_nn would be called once per HRV SQI (12–18
calls per segment). The optimisation: extract_segment_sqi inspects
the first argument name of each SQI function using
inspect.getfullargspec (cached in _argspec_cache at module level
so introspection runs only once per function lifetime) and routes HRV
SQIs through a single shared _nn_cache:
_nn_cache = None
for sqi_func, sqi_name in zip(sqi_list, sqi_names):
first_arg = _argspec_cache[sqi_func][0]
if first_arg == "nn_intervals":
if _nn_cache is None:
_nn_cache = get_nn(signal_values) # computed once
args["_nn_intervals"] = _nn_cache # injected into get_sqi
This reduces peak-detection calls from N (one per HRV SQI) to 1 per segment, giving a ~3.6x speedup.
Step 3 — get_sqi (per-SQI dispatch)
Handles two execution modes:
- Whole-segment mode (
per_beat=False, default) The SQI function is called on the full signal array (or on
nn_intervalsfor HRV SQIs). The result — scalar, dict, or list — is packaged byget_sqi_dict().- Per-beat mode (
per_beat=True) extract_segment_sqi()runs peak detection once per segment on the first per-beat SQI encountered and caches the resultingpeak_list/trough_list. All subsequent per-beat SQIs in the same segment reuse the cached peaks without re-running the detector.get_sqi()accepts the cached peaks via the_peak_list/_trough_listprivate keyword arguments;per_beat_sqi()then slices the signal into individual beats and calls the SQI function on each. Ifuse_mean_beat=True(default), all beats are resampled tomean_resample_sizesamples, averaged, and the SQI is computed once on the mean beat (one value per segment).
wave_type is forwarded to every SQI function whose signature includes
a wave_type parameter. sampling_rate / sample_rate arguments
in sqi_arg_list are automatically updated to the actual signal
sampling rate before calling (via _patch_fs in the calibration
runner).
Step 4 — get_sqi_dict (result packaging)
Converts whatever the SQI function returns into a {column: value}
dict that maps cleanly to DataFrame columns:
Raw return type |
Column(s) produced |
|---|---|
|
Returned unchanged (e.g. |
scalar ( |
|
1-element |
|
multi-element |
|
—
Part 2 — SQI Catalogue
The following table lists every SQI registered in sqi_mapping.
SQIs whose first argument is nn_intervals are automatically routed
through the cached NN-interval path. All others receive the raw signal.
Function name |
Input type |
What it measures |
|---|---|---|
|
raw + filtered |
|
|
signal |
Tail heaviness of the amplitude distribution |
|
signal |
Asymmetry of the amplitude distribution |
|
signal |
Shannon entropy of the normalised amplitude histogram |
|
signal |
|
|
signal |
Rate of sign changes — high value indicates noise |
|
signal |
Rate at which signal crosses its own mean |
|
signal |
Fraction of samples at the amplitude rail (saturation) |
|
signal |
Low-frequency (<0.5 Hz) energy fraction — drift indicator |
|
signal |
In-band vs out-of-band power ratio (dB) |
|
signal |
Ratio of ectopic/outlier RR intervals |
|
signal |
Mean of top ACF peaks — periodicity score |
|
signal |
Peak-detector agreement ratio between two algorithms |
|
signal |
Beat-to-beat peak amplitude coefficient of variation |
|
signal |
Peak STFT time-marginal in a frequency band |
|
signal |
Low-frequency (0–0.5 Hz) STFT energy |
|
signal |
QRS-band (5–25 Hz) STFT energy |
|
signal |
High-frequency (100+ Hz) STFT energy |
|
signal |
Very-high-frequency normalised power (>150 Hz) |
|
signal |
Median QRS amplitude via WaveformMorphology |
|
signal |
DTW distance to a reference template (4 types) |
|
nn_intervals |
Sample std of NN intervals |
|
nn_intervals |
Sample std of successive NN differences |
|
nn_intervals |
Root mean square of successive differences |
|
nn_intervals |
RMSSD / mean NN (normalised short-term variability) |
|
nn_intervals |
SDNN / mean NN (normalised long-term variability) |
|
nn_intervals |
Mean of NN intervals |
|
nn_intervals |
Median of NN intervals |
|
nn_intervals |
% of successive differences strictly > threshold (pNN50) |
|
nn_intervals |
Mean heart rate in BPM |
|
nn_intervals |
Median heart rate in BPM |
|
nn_intervals |
Minimum instantaneous heart rate |
|
nn_intervals |
Maximum instantaneous heart rate |
|
nn_intervals |
Sample std of instantaneous heart rate |
|
nn_intervals |
Fraction of HR values outside [range_min, range_max] |
|
nn_intervals |
Spectral peak frequency in the LF band (0.04–0.15 Hz) |
|
nn_intervals |
Absolute power in the LF band |
|
nn_intervals |
Log power in the LF band |
|
nn_intervals |
LF band power / total power |
|
nn_intervals |
LF band power / (total power − VLF power), per HRV Task Force 1996 |
|
nn_intervals |
LF/HF power ratio (sympatho-vagal balance) |
|
nn_intervals |
SD1, SD2, ellipse area, SD1/SD2 ratio |
|
nn_intervals |
MAD of successive RR diffs / median RR |
|
nn_intervals |
Sample entropy — complexity / regularity of RR series |
|
nn_intervals |
DFA alpha1 — short-range fractal scaling exponent |
|
nn_intervals |
Hurst exponent — long-range correlation (R/S analysis) |
—
Part 3 — Rule-Based Classification
Overview
After SQI extraction, every segment receives an "accept" or
"reject" decision by running a RuleSet over its SQI row. The
workflow:
classify_segments(sqis, rule_dict_filename, ruleset_order)
├── load rule_dict.json
├── [auto_mode] adjust thresholds to observed p5/p95
├── build Rule objects → RuleSet
└── for each segment row:
RuleSet.execute(row_df) → "accept" | "reject"
Rule Format
Rules are stored in rule_dict.json. Each entry has the structure:
{
"kurtosis_sqi": {
"name": "kurtosis_sqi",
"def": [
{"op": ">", "value": "0.5", "label": "accept"},
{"op": "<=", "value": "0.5", "label": "reject"},
{"op": ">=", "value": "8.0", "label": "reject"},
{"op": "<", "value": "8.0", "label": "accept"}
],
"desc": "Calibrated from synthetic PPG signals. Accept: p5=0.5 to p95=8.0.",
"ref": "vital_sqi.calibration"
}
}
The four-element "def" array encodes a half-open accept interval
(lower, upper):
Condition 1&2: value > lower → accept; value <= lower → reject
Condition 3&4: value >= upper → reject; value < upper → accept
The Rule class parses this into a sorted boundaries array and a
labels array, then uses bisect for O(log n) lookup.
The Rule class
Rule represents a single threshold rule for one
SQI.
load_def(path)— loads rule from JSON file by name.update_def(op_list, value_list, label_list)— programmatically sets new thresholds.apply_rule(x)— appliesbisect_lefton the sorted boundaries array to find the interval containingx, returns its label.save_def(path, overwrite=False)— writes or merges the rule to JSON.
The RuleSet class
RuleSet manages an ordered dict of Rule
objects and executes them sequentially.
execute(value_df)
for order, rule in sorted(self.rules.items()):
value = value_df.iloc[0][rule.name]
decision = rule.apply_rule(value)
if decision == "reject":
return "reject"
return "accept"
This is a linear early-exit scan, not recursion. Rules are checked
in ascending integer order. The moment any rule returns "reject",
the loop short-circuits and returns immediately without checking the
remaining rules. Only if every rule returns "accept" does the
function return "accept".
This design is:
Optimal — O(R) where R is the number of rules; cannot be done faster than linear since every rule must be checked in the worst case.
Ordered — rule priority is controlled by the integer key; lower key = checked first.
Short-circuiting — reject-heavy rules placed early (low key) minimize average rule evaluations per segment.
Threshold-selection strategies (auto_mode)
classify_segments supports three strategies for deciding the
(lower, upper) accept band on each rule:
auto_mode=False(or"manual")Use the bounds stored in
rule_dict.jsonverbatim. Pick this when you want to apply externally calibrated thresholds without adapting them to the current recording.auto_mode=True(or"quantile") — defaultReplace each rule’s bounds with the empirical lower_bound / upper_bound quantiles (p5 / p95 by default) of the SQI values observed across all segments. Self-adapting, but the joint accept rate falls off geometrically as you stack more rules: 5 rules at p5/p95 on uniformly clean data only accept ~60 % of segments because each rule independently trims its own tails.
auto_mode="tune"Per-rule quantile is computed automatically so the joint accept rate (under the independence approximation) hits
target_accept_rate(default0.85). Solving the standard equationtarget = (1 - 2q)^nfor the symmetric trimqandnrules: with 5 rules and target 0.85 each rule keeps ~96.8 % of its distribution, i.e. bands at p1.6 / p98.4. Much more forgiving than plain"quantile"mode when several rules are active.
Degenerate rules — SQIs whose distribution is constant across the
recording (e.g. zero_crossings_rate_sqi == 0 for mean-centred
clean PPG) — are dropped from the rule set with a warning instead of
producing a “reject everything” 0-width band.
See vital_sqi.rule.auto_threshold for the underlying helpers,
which the Inspect view in the web app reuses for its live preview.
The independence-approximation math
Auto-tune mode picks a per-rule quantile q such that the joint
accept rate hits the user’s target. Under the assumption that rules
are independent — a reasonable simplification for SQIs that measure
different facets of the signal — the joint accept rate is the product
of the per-rule rates:
Solving for the symmetric trim q given a target p:
That’s exactly what
vital_sqi.rule.auto_threshold.per_rule_quantile() computes.
For typical configurations:
|
|
per-rule keep rate |
per-rule trim |
|---|---|---|---|
1 |
0.85 |
0.850 |
7.5 % (p7.5 / p92.5) |
3 |
0.85 |
0.947 |
2.6 % (p2.6 / p97.4) |
5 |
0.85 |
0.968 |
1.6 % (p1.6 / p98.4) |
10 |
0.85 |
0.984 |
0.8 % (p0.8 / p99.2) |
5 |
0.95 |
0.990 |
0.5 % (p0.5 / p99.5) |
In practice rules are not perfectly independent — kurtosis and dtw
both react to morphology disturbances, for instance — so the
empirical joint accept rate is typically ± 3 percentage points of
the target. Good enough for an interactive UI control; if you need
an exact rate, use the iterative widening strategy in
vital_sqi.rule.auto_threshold (not yet exposed, see the
auto_threshold module for the building blocks).
Worked example — auto-tune on a clean PPG recording
The reference recording is a 2-hour OUCRU SmartCare PPG file
(100 Hz, 833 800 samples → 278 segments at 30 s windows). Same SQI
catalogue, same five-rule default subset (kurtosis_sqi,
perfusion_sqi, correlogram_sqi, msq_sqi, dtw_sqi);
only the auto_mode configuration changes:
|
accept rate (278 segments) |
|---|---|
|
60.1 % — over-restrictive |
|
83.8 % |
|
76.6 % |
|
80.9 % — matches target |
|
83.8 % |
The p5 / p95 row shows the failure mode: with 5 independent rules each
trimming 10 % of the distribution, the joint accept ceiling is
0.9^5 ≈ 59 % — which is exactly what the table shows on a clean
recording. Tune mode at 85 % is the recommended default for
interactive use; the GUI’s Threshold mode → Auto-tune slider
exposes this directly.
Implementation outline
classify_segments dispatches on auto_mode in three branches:
if auto_mode == "manual":
# Use the bounds in rule_dict.json verbatim.
rule = generate_rule(name, rule_dict[name]["def"])
elif auto_mode == "quantile":
# Per-column p5/p95-style band; drop degenerate columns.
band = auto_threshold.quantile_band(
name, sqi_df[name].values,
lower_pct=lower_bound, upper_pct=upper_bound,
)
if band is None:
continue # degenerate or all-NaN; drop with warning
elif auto_mode == "tune":
# Pre-compute every band so per-rule quantile is uniform.
all_bands = auto_threshold.tuned_bands(
{name: sqi_df[name].values for name in candidate_columns},
target_accept_rate=target_accept_rate,
)
After the loop the surviving bands populate a single
RuleSet; consecutive integer priorities are
re-assigned so the rule set’s “no gaps starting at 1” invariant
holds even after drops. Classification is then the existing linear
early-exit scan from RuleSet.execute.
Degenerate-band guard
Both quantile and tune modes share a single test:
band_width = upper - lower
if band_width < 1e-6:
# Column is essentially constant; skip with warning.
continue
The threshold (DEGENERATE_BAND_HALF_WIDTH = 1e-6) is the same one
used by vital_sqi.calibration.threshold_estimator for its
constant-column epsilon guard. This catches:
zero_crossings_rate_sqion already-mean-centred clean signals (always 0).ectopic_sqion recordings with no detected ectopics (rule_index=0→ always returns the no-ectopic outlier ratio of ~1.0).hfe_sqiwhose default band sits above Nyquist forsampling_rate=100(always 0).
Each dropped rule logs a warning naming the column and the observed band; the user sees them as Auto-skipped: … in the Inspect view’s rule panel.
Strictest-rule detection
The Inspect view’s Drop strictest rule button calls
vital_sqi.rule.auto_threshold.strictest_columns() over the
per-rule rejection tally:
counts = {"kurtosis_sqi": 23, "perfusion_sqi": 28, "msq_sqi": 28,
"correlogram_sqi": 21, "dtw_sqi": 11}
median = 23
mad = 5
threshold = median + 3*MAD = 38
# No outliers — all counts within bounds.
This uses the modified Z-score (median + k × MAD) rather than
mean + k·σ, because the very outlier you’re trying to detect would
inflate the mean and standard deviation and hide itself. The default
k = 3 flags clear upward outliers without false-positive nags on
roughly-even distributions.
ruleset_order and get_decision_segments
ruleset_order is a dict mapping integer priority keys to rule names:
ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "sdnn_sqi"}
Only the SQIs listed here participate in classification. All other SQI columns in the DataFrame are ignored.
After classification, get_decision_segments(segments, decision,
reject_decision) separates segments into accepted and rejected lists
by combining the rule-set decision with any pre-existing rejection
flags.
—
Part 4 — Peak Detection Algorithms
Overview
Peak detection is used by:
All
per_beat=TrueSQIs (DTW, skewness, kurtosis on individual beats).HRV SQIs —
get_nn()calls the PPG/ECG detector to derive RR intervals.ectopic_sqi,correlogram_sqi,msq_sqi,amplitude_consistency_sqi.
The PeakDetector class exposes two
independent method families: ppg_detector (9 algorithms) and
ecg_detector (4 algorithms).
PPG Detectors
Select via the peak_detector argument in sqi_dict.json or via the
detector_type constant imported from
vital_sqi.common.rpeak_detection.
Constant |
Value |
Description |
|---|---|---|
|
6 |
vitalDSP |
|
1 |
Threshold adapts to the local signal amplitude (running max). Suitable for recordings with slow baseline drift. |
|
2 |
Local maxima above a |
|
3 |
KMeans (k=2) separates peaks from non-peaks.
Non-deterministic; set |
|
4 |
Zong 2003 slope-sum energy with onset back-search.
Robust on noisy PPG; requires |
|
5 |
Elgendi two-moving-average (MApeak > MAbeat). Returns raw-signal peaks within each detected “block of interest”. |
|
7 |
Billauer delta-based peak/trough tracker.
Records |
|
8 |
Automatic Multiscale Peak Detection (Scholkmann 2012). Parameter-free; works well on quasi-periodic PPG at 50–250 Hz. |
|
9 |
Local-maxima with inter-beat-interval gating. Filters candidate peaks by physiological IBI range (300–2000 ms). |
ECG Detectors
Used by ecg_detector(s, detector_type=ECG_DEFAULT). All algorithms
detect R-peaks; Q/S/P/T morphology extraction always uses vitalDSP
WaveformMorphology, anchored to the chosen R-peak indices.
Constant |
Value |
Description |
|---|---|---|
|
10 |
vitalDSP |
|
11 |
Classic Pan-Tompkins 1985. Bandpass 5–15 Hz → derivative → squaring → 150 ms MWI → adaptive dual threshold with 2 s learning period. Back-projects within ±60 ms for true R-peak. |
|
12 |
Hamilton-Tompkins simplified 2002. Bandpass 8–16 Hz → first difference → 80 ms moving average → mean+0.5σ threshold. Single-pass, suitable for real-time use. |
|
13 |
Engzee-Zeelenberg 1979. High-pass filtered derivative with a dynamic threshold that adapts after each detected beat. |
Usage example:
from vital_sqi.common.rpeak_detection import (
PeakDetector, ECG_DEFAULT, PAN_TOMPKINS, HAMILTON, ENGZEE,
DEFAULT, AMPD_METHOD
)
ppg_detector = PeakDetector(wave_type="PPG", fs=100)
peaks, troughs = ppg_detector.ppg_detector(ppg_signal) # DEFAULT
peaks, troughs = ppg_detector.ppg_detector(ppg_signal, AMPD_METHOD)
ecg_detector = PeakDetector(wave_type="ECG", fs=256)
r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal) # ECG_DEFAULT
r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal, PAN_TOMPKINS)
—
Part 5 — Calibration System
Purpose
The calibration system derives lower and upper threshold bounds
for every SQI empirically, using large pools of synthetic clean and
noisy signals. Results are written to:
vital_sqi/resource/rule_dict.json— threshold rules for the classifier.vital_sqi/resource/sqi_dict.json— SQI function registry (only successfully calibrated SQIs are included).
Architecture
run_calibration.calibrate()
├── 1. generate_clean_ppg / generate_clean_ecg (signal_generator.py)
├── 2. inject_noise × NOISE_PROFILES (noise_injector.py)
│ ├── accept pool: clean + very-mild-noise segments
│ └── reject pool: moderate-to-severe noise segments
├── 3. compute_sqi_distributions(accept + reject) (sqi_runner.py)
├── 4. estimate_thresholds(accept_df, reject_df) (threshold_estimator.py)
└── 5. export_rule_dict + export_sqi_dict (exporter.py)
Step 1 — Signal Generation
generate_clean_ppg() and
generate_clean_ecg()
produce n_segments clean synthetic waveforms using vitalDSP’s
physiological signal generators.
PPG: sampling rate 100 Hz, default segment duration 30 s.
ECG: sampling rate 256 Hz, default segment duration 30 s.
noise_floor=0.0— no noise added at generation time.Each segment is a
(signal_array, fs)tuple.
Step 2 — Noise Injection
inject_noise() applies one
of 11 noise types to a clean segment:
Type |
Description |
|---|---|
|
Additive white Gaussian noise scaled to |
|
Sinusoidal drift at 0.05–0.5 Hz (respiration-like) |
|
3 burst-style high-amplitude artifacts at random positions |
|
Sum of sine/cosine harmonics at a random base frequency 1–15 Hz |
|
50 Hz or 60 Hz sinusoidal interference |
|
Degree 2–4 polynomial trend (temperature / electrode drift) |
|
Random isolated spikes at ~2% of samples |
|
1/f colored noise via FFT shaping |
|
1/f² colored noise (Brownian motion-like) |
|
Circular roll by a random number of samples (sync jitter) |
|
Resample ±amplitude% then trim/pad (wrong fs label simulation) |
|
Replace amplitude% of samples with their predecessor (packet loss) |
|
Gaussian + baseline wander + polynomial (mild realistic composite) |
|
Gaussian + harmonic + motion + impulse (severe realistic composite) |
NOISE_PROFILES defines 33 (label, noise_type, amplitude) triples.
The accept pool includes:
All
n_segmentspure clean segments.All segments from profiles labelled
cleanorgaussian_very_mild(amplitude 0.02 — barely perceptible noise).
The reject pool includes segments from all other profiles (amplitude
0.05–0.80, representing clinically poor signal quality). The reject pool
uses only n_reject_segments (default 50) per profile for speed — it
is used for diagnostics only and does not influence the p5/p95 bounds.
Step 3 — SQI Distribution Computation
compute_sqi_distributions()
runs all 47 SQIs over every segment in both pools:
accept_df = compute_sqi_distributions(accept_segments, wave_type="PPG")
reject_df = compute_sqi_distributions(reject_segments, wave_type="PPG")
Each call returns a DataFrame with one row per segment and one column
per SQI output. poincare_sqi expands to 4 sub-columns (sd1, sd2,
area, ratio). Failed SQIs are NaN for that segment.
_patch_fs automatically updates any sample_rate or
sampling_rate keyword argument to the actual segment sampling rate.
Step 4 — Threshold Estimation
estimate_thresholds()
processes each SQI column independently:
for each column in accept_df:
accept_vals = drop NaN and Inf
if n_valid < 5:
skip (mark calibrated=False)
lower = percentile(accept_vals, lower_pct) # default p5
upper = percentile(accept_vals, upper_pct) # default p95
if upper - lower < 1e-6:
widen band by epsilon (constant signal guard)
if reject distribution overlaps accept band:
record note (thresholds are NOT changed)
mark calibrated=True
The result for each column is a SQIThreshold
dataclass containing:
Field |
Meaning |
|---|---|
|
Accept region bounds |
|
Statistics of the clean distribution |
|
Median of the noisy distribution |
|
Sample counts used |
|
False if SQI could not be calibrated |
|
Human-readable diagnostic note |
SQIs that cannot be calibrated (always NaN on synthetic data):
interpolation_sqi— unimplemented stub, always 0.vhfp_sqi— band (>150 Hz) is above the Nyquist limit for 100/256 Hz signals.sample_entropy_sqi,dfa_sqi,hurst_sqi— require natural HRV variability; synthetic signals have constant RR intervals (std=0), making entropy/DFA/Hurst undefined. These SQIs are fully functional on real recordings.
Step 5 — Export
rule_dict.json — written by
export_rule_dict().
Each calibrated SQI produces a four-element "def" array encoding the
accept interval (lower, upper):
"kurtosis_sqi": {
"name": "kurtosis_sqi",
"def": [
{"op": ">", "value": "0.497", "label": "accept"},
{"op": "<=", "value": "0.497", "label": "reject"},
{"op": ">=", "value": "8.21", "label": "reject"},
{"op": "<", "value": "8.21", "label": "accept"}
],
"desc": "Calibrated from synthetic PPG signals. Accept: p5=0.497 to p95=8.21."
}
Existing entries for SQIs that were not calibrated in the current run are preserved unchanged (merge, not overwrite). This protects manually curated thresholds.
sqi_dict.json — written by
export_sqi_dict().
Only SQIs that were successfully calibrated AND appear in
_SQI_ARG_TEMPLATES are emitted. The output is a ready-to-use
configuration file for extract_sqi().
Running Calibration
From the repo root:
# PPG calibration (default: 200 accept segments, 50 reject per profile)
python -m vital_sqi.calibration.run_calibration --wave_type PPG
# ECG calibration
python -m vital_sqi.calibration.run_calibration --wave_type ECG
# Dry run — compute thresholds but do not write files
python -m vital_sqi.calibration.run_calibration --wave_type PPG --dry_run
# All options
python -m vital_sqi.calibration.run_calibration \
--wave_type PPG \
--n_segments 500 \
--n_reject_segments 100 \
--duration 30 \
--lower_pct 5 \
--upper_pct 95 \
--output_dir path/to/output \
--seed 42
Or from Python:
from vital_sqi.calibration.run_calibration import calibrate
thresholds = calibrate(
wave_type="PPG",
n_segments=200,
n_reject_segments=50,
duration=30.0,
lower_pct=5.0,
upper_pct=95.0,
dry_run=False,
seed=42,
)
The returned thresholds dict maps SQI column names to
SQIThreshold objects
and can be inspected with
thresholds_to_dataframe().
—
Part 6 — Complete End-to-End Example
import vital_sqi.data.signal_io as sio
from vital_sqi.preprocess.segment_split import split_segment
from vital_sqi.pipeline.pipeline_functions import extract_sqi, classify_segments, get_decision_segments
# 1. Load signal
df = sio.read_ppg("recording.csv", sampling_rate=100)
# 2. Segment into 30-second windows
segments, milestones = split_segment(df, duration=30, fs=100)
# 3. Extract SQIs
sqi_df = extract_sqi(
segments, milestones,
"vital_sqi/resource/sqi_dict.json",
wave_type="PPG",
)
# 4. Classify — each segment gets 'accept' or 'reject'
ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "correlogram_sqi"}
ruleset, sqis_with_decisions = classify_segments(
[sqi_df.iloc[[i]] for i in range(len(sqi_df))],
rule_dict_filename="vital_sqi/resource/rule_dict.json",
ruleset_order=ruleset_order,
auto_mode=True,
)
# 5. Separate accepted / rejected segments
decisions = [df["decision"].iloc[0] for df in sqis_with_decisions]
pre_reject = ["accept"] * len(segments) # or from get_reject_segments()
accepted, rejected = get_decision_segments(segments, decisions, pre_reject)
print(f"Accepted: {len(accepted)} Rejected: {len(rejected)}")
—
Performance Notes
A few hot paths have been hand-tuned and are worth knowing about when benchmarking or extending the library:
``_argspec_cache` in
vital_sqi.pipeline.pipeline_functionsmemoisesinspect.getfullargspecper SQI callable. The cache uses a sentinel value so SQIs with no positional arguments are still cached after the first call (an earlieror-based lookup silently recomputed for empty lists).DTW reference templates are cached in
vital_sqi.sqi.dtw_sqikeyed by(template_type, template_size). Becauserr_process()is now seeded by default,ecg_dynamic_templateis reproducible between runs and the cached template is identical from one call to the next.``sample_entropy_sqi`` vectorises the inner template-match counter with
sliding_window_view; complexity remains O(n²) but the inner loop runs in NumPy rather than Python.``dfa_sqi`` uses a closed-form per-block linear detrend (no
np.polyfitcall per block), giving a 10-20× speed-up on long recordings.``squeeze_template`` is fully vectorised via a cumulative-sum trick; template generation for DTW is now negligible cost.
``nn_intervals`` are computed once per segment by
extract_segment_sqi()and injected into every HRV SQI via a private_nn_intervals=kwarg.
Numerical Edge Cases
The following SQIs intentionally return np.nan (not 0) when their
underlying assumption fails — so downstream rules can treat the result as
“missing” rather than as a legitimate quality score:
hf_energy_sqi/vhf_norm_power_sqi— band lies above Nyquistsample_entropy_sqi— either the m-length or (m+1)-length template match count is zero (would otherwise produce±inf)perfusion_sqi— raw-signal mean is near zerobaseline_wander_sqi/spectral_snr_sqi— total or out-of-band power is zeroamplitude_consistency_sqi— fewer than two peaks detected
Rules built from the calibrated rule_dict.json treat NaN as
"reject" (see apply_rule()), so these guards
fail safe.
See Also
vital_sqi.sqi— all SQI functions andsqi_mappingvital_sqi.rule—RuleandRuleSetclassesvital_sqi.calibration— calibration subpackage API