SQI Pipeline and Calibration ============================ This page describes in full detail: 1. How raw physiological signals are segmented and passed through the SQI extraction pipeline. 2. How each SQI is computed, routed, and packaged into a result DataFrame. 3. How the rule-based classifier works and makes accept/reject decisions. 4. How the calibration system derives thresholds empirically from synthetic signals. All source lives in ``vital_sqi/pipeline/pipeline_functions.py``, ``vital_sqi/rule/``, and ``vital_sqi/calibration/``. --- .. contents:: Table of Contents :local: :depth: 2 --- Part 1 — SQI Extraction Pipeline --------------------------------- Overview ~~~~~~~~ The extraction pipeline transforms a list of signal segments into a structured DataFrame of quality metrics. The call chain is:: extract_sqi(segments, milestones, sqi_dict_filename) └── for each segment: extract_segment_sqi(segment, sqi_list, sqi_names, sqi_arg_list, wave_type) ├── [once] get_nn(signal) # nn_intervals cache └── for each SQI: get_sqi(sqi_func, sqi_name, segment, ...) ├── [if per_beat] per_beat_sqi(...) └── [else] sqi_func(signal, **kwargs) └── get_sqi_dict(result, sqi_name) Step 1 — ``extract_sqi`` (entry point) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python df = extract_sqi(segments, milestones, "vital_sqi/resource/sqi_dict.json", wave_type="PPG") **Inputs** ``segments`` A list of DataFrames produced by :func:`~vital_sqi.preprocess.segment_split.split_segment`. Each DataFrame has two columns: ``time`` (timestamps) and ``signal`` (raw waveform values). ``milestones`` A two-column DataFrame with ``start_idx`` and ``end_idx`` (sample positions in the original recording) for each segment. ``sqi_dict_filename`` Path to a JSON configuration file. Each top-level key is a user-chosen label; each value is a dict with: - ``"sqi"`` — the function name registered in ``sqi_mapping`` (e.g. ``"kurtosis_sqi"``). - ``"args"`` — keyword arguments forwarded to the SQI function. Example:: { "perfusion": {"sqi": "perfusion_sqi", "args": {}}, "kurtosis": {"sqi": "kurtosis_sqi", "args": {"axis": 0}}, "poincare": {"sqi": "poincare_sqi", "args": {}} } **What it does** 1. Loads ``sqi_dict_filename`` and resolves each ``"sqi"`` key to a callable from ``sqi_mapping``. 2. Iterates over every segment (with a ``tqdm`` progress bar). 3. Calls :func:`extract_segment_sqi` for each segment, collecting a ``pd.Series`` of SQI values. 4. Assembles all series into a ``pd.DataFrame`` (one row per segment). 5. Appends ``start_idx`` / ``end_idx`` columns from *milestones*. **Output** A DataFrame with one row per segment. Columns are the SQI labels from the config file. Multi-value SQIs (e.g. ``poincare_sqi`` which returns ``sd1``, ``sd2``, ``area``, ``ratio``) expand into sub-columns. Per-beat SQIs that return lists expand into ``_mean_sqi``, ``_median_sqi``, and ``_std_sqi`` columns. Step 2 — ``extract_segment_sqi`` (per-segment) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Called once per segment. Responsible for: - Extracting the raw signal array from the segment DataFrame once (avoiding repeated ``iloc`` calls inside the loop). - Computing ``nn_intervals`` **once** and caching the result for reuse by all HRV SQIs (see :ref:`nn-caching`). - Iterating over every (function, name) pair, calling :func:`get_sqi`, and collecting results. Special handling: - ``perfusion_sqi`` requires two signals (raw + filtered). When it appears in the list, :func:`extract_segment_sqi` passes ``y=signal_values`` directly and skips the standard dispatch path. - Any SQI that raises an exception is caught; a warning is emitted and that SQI column receives ``NaN`` for the segment. .. _nn-caching: nn_intervals Caching ^^^^^^^^^^^^^^^^^^^^^ HRV SQIs (``sdnn_sqi``, ``rmssd_sqi``, ``poincare_sqi``, etc.) take ``nn_intervals`` as their first positional argument rather than the raw signal. NN intervals are derived from peak detection, which involves a Kalman filter and is expensive (~14 ms per segment at 100 Hz). Without caching, ``get_nn`` would be called once per HRV SQI (12–18 calls per segment). The optimisation: ``extract_segment_sqi`` inspects the first argument name of each SQI function using ``inspect.getfullargspec`` (cached in ``_argspec_cache`` at module level so introspection runs only once per function lifetime) and routes HRV SQIs through a single shared ``_nn_cache``: .. code-block:: python _nn_cache = None for sqi_func, sqi_name in zip(sqi_list, sqi_names): first_arg = _argspec_cache[sqi_func][0] if first_arg == "nn_intervals": if _nn_cache is None: _nn_cache = get_nn(signal_values) # computed once args["_nn_intervals"] = _nn_cache # injected into get_sqi This reduces peak-detection calls from N (one per HRV SQI) to 1 per segment, giving a ~3.6x speedup. Step 3 — ``get_sqi`` (per-SQI dispatch) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Handles two execution modes: **Whole-segment mode** (``per_beat=False``, default) The SQI function is called on the full signal array (or on ``nn_intervals`` for HRV SQIs). The result — scalar, dict, or list — is packaged by :func:`get_sqi_dict`. **Per-beat mode** (``per_beat=True``) :func:`extract_segment_sqi` runs peak detection **once** per segment on the first per-beat SQI encountered and caches the resulting ``peak_list`` / ``trough_list``. All subsequent per-beat SQIs in the same segment reuse the cached peaks without re-running the detector. :func:`get_sqi` accepts the cached peaks via the ``_peak_list`` / ``_trough_list`` private keyword arguments; :func:`per_beat_sqi` then slices the signal into individual beats and calls the SQI function on each. If ``use_mean_beat=True`` (default), all beats are resampled to ``mean_resample_size`` samples, averaged, and the SQI is computed once on the mean beat (one value per segment). ``wave_type`` is forwarded to every SQI function whose signature includes a ``wave_type`` parameter. ``sampling_rate`` / ``sample_rate`` arguments in ``sqi_arg_list`` are automatically updated to the actual signal sampling rate before calling (via ``_patch_fs`` in the calibration runner). Step 4 — ``get_sqi_dict`` (result packaging) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Converts whatever the SQI function returns into a ``{column: value}`` dict that maps cleanly to DataFrame columns: .. list-table:: ``get_sqi_dict`` rules :header-rows: 1 :widths: 35 65 * - Raw return type - Column(s) produced * - ``dict`` - Returned unchanged (e.g. ``poincare_sqi`` → ``{"sd1": …, "sd2": …, "area": …, "ratio": …}``). * - scalar (``float`` / ``int``) - ``{sqi_name: scalar}``. * - 1-element ``list`` / ``ndarray`` - ``{sqi_name: value}``. * - multi-element ``list`` / ``ndarray`` - ``{sqi_name_mean_sqi, sqi_name_median_sqi, sqi_name_std_sqi}``. --- Part 2 — SQI Catalogue ----------------------- The following table lists every SQI registered in ``sqi_mapping``. SQIs whose first argument is ``nn_intervals`` are automatically routed through the cached NN-interval path. All others receive the raw signal. .. list-table:: Registered SQIs :header-rows: 1 :widths: 30 15 55 * - Function name - Input type - What it measures * - ``perfusion_sqi`` - raw + filtered - ``(max - min of filtered) / abs(mean of raw) * 100`` * - ``kurtosis_sqi`` - signal - Tail heaviness of the amplitude distribution * - ``skewness_sqi`` - signal - Asymmetry of the amplitude distribution * - ``entropy_sqi`` - signal - Shannon entropy of the normalised amplitude histogram * - ``signal_to_noise_sqi`` - signal - ``abs(mean) / std`` — higher is cleaner * - ``zero_crossings_rate_sqi`` - signal - Rate of sign changes — high value indicates noise * - ``mean_crossing_rate_sqi`` - signal - Rate at which signal crosses its own mean * - ``clipping_sqi`` - signal - Fraction of samples at the amplitude rail (saturation) * - ``baseline_wander_sqi`` - signal - Low-frequency (<0.5 Hz) energy fraction — drift indicator * - ``spectral_snr_sqi`` - signal - In-band vs out-of-band power ratio (dB) * - ``ectopic_sqi`` - signal - Ratio of ectopic/outlier RR intervals * - ``correlogram_sqi`` - signal - Mean of top ACF peaks — periodicity score * - ``msq_sqi`` - signal - Peak-detector agreement ratio between two algorithms * - ``amplitude_consistency_sqi`` - signal - Beat-to-beat peak amplitude coefficient of variation * - ``band_energy_sqi`` - signal - Peak STFT time-marginal in a frequency band * - ``lfe_sqi`` - signal - Low-frequency (0–0.5 Hz) STFT energy * - ``qrse_sqi`` - signal - QRS-band (5–25 Hz) STFT energy * - ``hfe_sqi`` - signal - High-frequency (100+ Hz) STFT energy * - ``vhfp_sqi`` - signal - Very-high-frequency normalised power (>150 Hz) * - ``qrsa_sqi`` - signal - Median QRS amplitude via WaveformMorphology * - ``dtw_sqi`` - signal - DTW distance to a reference template (4 types) * - ``sdnn_sqi`` - nn_intervals - Sample std of NN intervals * - ``sdsd_sqi`` - nn_intervals - Sample std of successive NN differences * - ``rmssd_sqi`` - nn_intervals - Root mean square of successive differences * - ``cvsd_sqi`` - nn_intervals - RMSSD / mean NN (normalised short-term variability) * - ``cvnn_sqi`` - nn_intervals - SDNN / mean NN (normalised long-term variability) * - ``mean_nn_sqi`` - nn_intervals - Mean of NN intervals * - ``median_nn_sqi`` - nn_intervals - Median of NN intervals * - ``pnn_sqi`` - nn_intervals - % of successive differences strictly > threshold (pNN50) * - ``hr_mean_sqi`` - nn_intervals - Mean heart rate in BPM * - ``hr_median_sqi`` - nn_intervals - Median heart rate in BPM * - ``hr_min_sqi`` - nn_intervals - Minimum instantaneous heart rate * - ``hr_max_sqi`` - nn_intervals - Maximum instantaneous heart rate * - ``hr_std_sqi`` - nn_intervals - Sample std of instantaneous heart rate * - ``hr_range_sqi`` - nn_intervals - Fraction of HR values outside [range_min, range_max] * - ``peak_frequency_sqi`` - nn_intervals - Spectral peak frequency in the LF band (0.04–0.15 Hz) * - ``absolute_power_sqi`` - nn_intervals - Absolute power in the LF band * - ``log_power_sqi`` - nn_intervals - Log power in the LF band * - ``relative_power_sqi`` - nn_intervals - LF band power / total power * - ``normalized_power_sqi`` - nn_intervals - LF band power / (total power − VLF power), per HRV Task Force 1996 * - ``lf_hf_ratio_sqi`` - nn_intervals - LF/HF power ratio (sympatho-vagal balance) * - ``poincare_sqi`` - nn_intervals - SD1, SD2, ellipse area, SD1/SD2 ratio * - ``rr_irregularity_sqi`` - nn_intervals - MAD of successive RR diffs / median RR * - ``sample_entropy_sqi`` - nn_intervals - Sample entropy — complexity / regularity of RR series * - ``dfa_sqi`` - nn_intervals - DFA alpha1 — short-range fractal scaling exponent * - ``hurst_sqi`` - nn_intervals - Hurst exponent — long-range correlation (R/S analysis) --- Part 3 — Rule-Based Classification ------------------------------------ Overview ~~~~~~~~ After SQI extraction, every segment receives an ``"accept"`` or ``"reject"`` decision by running a ``RuleSet`` over its SQI row. The workflow:: classify_segments(sqis, rule_dict_filename, ruleset_order) ├── load rule_dict.json ├── [auto_mode] adjust thresholds to observed p5/p95 ├── build Rule objects → RuleSet └── for each segment row: RuleSet.execute(row_df) → "accept" | "reject" Rule Format ~~~~~~~~~~~ Rules are stored in ``rule_dict.json``. Each entry has the structure: .. code-block:: json { "kurtosis_sqi": { "name": "kurtosis_sqi", "def": [ {"op": ">", "value": "0.5", "label": "accept"}, {"op": "<=", "value": "0.5", "label": "reject"}, {"op": ">=", "value": "8.0", "label": "reject"}, {"op": "<", "value": "8.0", "label": "accept"} ], "desc": "Calibrated from synthetic PPG signals. Accept: p5=0.5 to p95=8.0.", "ref": "vital_sqi.calibration" } } The four-element ``"def"`` array encodes a half-open accept interval ``(lower, upper)``: - Condition 1&2: value > lower → accept; value <= lower → reject - Condition 3&4: value >= upper → reject; value < upper → accept The ``Rule`` class parses this into a sorted ``boundaries`` array and a ``labels`` array, then uses ``bisect`` for O(log n) lookup. The ``Rule`` class ~~~~~~~~~~~~~~~~~~ :class:`~vital_sqi.rule.Rule` represents a single threshold rule for one SQI. - ``load_def(path)`` — loads rule from JSON file by name. - ``update_def(op_list, value_list, label_list)`` — programmatically sets new thresholds. - ``apply_rule(x)`` — applies ``bisect_left`` on the sorted boundaries array to find the interval containing ``x``, returns its label. - ``save_def(path, overwrite=False)`` — writes or merges the rule to JSON. The ``RuleSet`` class ~~~~~~~~~~~~~~~~~~~~~ :class:`~vital_sqi.rule.RuleSet` manages an ordered dict of ``Rule`` objects and executes them sequentially. ``execute(value_df)`` ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python for order, rule in sorted(self.rules.items()): value = value_df.iloc[0][rule.name] decision = rule.apply_rule(value) if decision == "reject": return "reject" return "accept" **This is a linear early-exit scan, not recursion.** Rules are checked in ascending integer order. The moment any rule returns ``"reject"``, the loop short-circuits and returns immediately without checking the remaining rules. Only if every rule returns ``"accept"`` does the function return ``"accept"``. This design is: - **Optimal** — O(R) where R is the number of rules; cannot be done faster than linear since every rule must be checked in the worst case. - **Ordered** — rule priority is controlled by the integer key; lower key = checked first. - **Short-circuiting** — reject-heavy rules placed early (low key) minimize average rule evaluations per segment. Threshold-selection strategies (``auto_mode``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``classify_segments`` supports three strategies for deciding the ``(lower, upper)`` accept band on each rule: ``auto_mode=False`` (or ``"manual"``) Use the bounds stored in ``rule_dict.json`` verbatim. Pick this when you want to apply externally calibrated thresholds without adapting them to the current recording. ``auto_mode=True`` (or ``"quantile"``) — *default* Replace each rule's bounds with the empirical *lower_bound* / *upper_bound* quantiles (p5 / p95 by default) of the SQI values observed across all segments. Self-adapting, but the joint accept rate falls off geometrically as you stack more rules: 5 rules at p5/p95 on uniformly clean data only accept ~60 % of segments because each rule independently trims its own tails. ``auto_mode="tune"`` Per-rule quantile is computed automatically so the **joint** accept rate (under the independence approximation) hits ``target_accept_rate`` (default ``0.85``). Solving the standard equation ``target = (1 - 2q)^n`` for the symmetric trim ``q`` and ``n`` rules: with 5 rules and target 0.85 each rule keeps ~96.8 % of its distribution, i.e. bands at p1.6 / p98.4. Much more forgiving than plain ``"quantile"`` mode when several rules are active. Degenerate rules — SQIs whose distribution is constant across the recording (e.g. ``zero_crossings_rate_sqi == 0`` for mean-centred clean PPG) — are dropped from the rule set with a warning instead of producing a "reject everything" 0-width band. See :mod:`vital_sqi.rule.auto_threshold` for the underlying helpers, which the Inspect view in the web app reuses for its live preview. The independence-approximation math ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Auto-tune mode picks a per-rule quantile ``q`` such that the *joint* accept rate hits the user's target. Under the assumption that rules are independent — a reasonable simplification for SQIs that measure different facets of the signal — the joint accept rate is the product of the per-rule rates: .. math:: P(\text{accept}) = \prod_{i=1}^{n} P(\text{accept}_i) = (1 - 2q)^{n} Solving for the symmetric trim ``q`` given a target ``p``: .. math:: q = \frac{1 - p^{1/n}}{2} That's exactly what :func:`vital_sqi.rule.auto_threshold.per_rule_quantile` computes. For typical configurations: ============ ======== ================== =================== ``n_rules`` ``p`` per-rule keep rate per-rule trim ``q`` ============ ======== ================== =================== 1 0.85 0.850 7.5 % (p7.5 / p92.5) 3 0.85 0.947 2.6 % (p2.6 / p97.4) 5 0.85 0.968 1.6 % (p1.6 / p98.4) 10 0.85 0.984 0.8 % (p0.8 / p99.2) 5 0.95 0.990 0.5 % (p0.5 / p99.5) ============ ======== ================== =================== In practice rules are not perfectly independent — kurtosis and dtw both react to morphology disturbances, for instance — so the empirical joint accept rate is typically ± 3 percentage points of the target. Good enough for an interactive UI control; if you need an exact rate, use the iterative widening strategy in :mod:`vital_sqi.rule.auto_threshold` (not yet exposed, see the auto_threshold module for the building blocks). Worked example — auto-tune on a clean PPG recording ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The reference recording is a 2-hour OUCRU SmartCare PPG file (100 Hz, 833 800 samples → 278 segments at 30 s windows). Same SQI catalogue, same five-rule default subset (``kurtosis_sqi``, ``perfusion_sqi``, ``correlogram_sqi``, ``msq_sqi``, ``dtw_sqi``); only the ``auto_mode`` configuration changes: ============================== =========================================== ``auto_mode`` accept rate (278 segments) ============================== =========================================== ``"quantile"``, p5 / p95 **60.1 %** — over-restrictive ``"quantile"``, p1 / p99 83.8 % ``"tune"``, target 0.80 76.6 % ``"tune"``, target 0.85 **80.9 %** — matches target ``"tune"``, target 0.90 83.8 % ============================== =========================================== The p5 / p95 row shows the failure mode: with 5 independent rules each trimming 10 % of the distribution, the joint accept ceiling is ``0.9^5 ≈ 59 %`` — which is exactly what the table shows on a clean recording. Tune mode at 85 % is the recommended default for interactive use; the GUI's *Threshold mode → Auto-tune* slider exposes this directly. Implementation outline ^^^^^^^^^^^^^^^^^^^^^^ ``classify_segments`` dispatches on ``auto_mode`` in three branches:: if auto_mode == "manual": # Use the bounds in rule_dict.json verbatim. rule = generate_rule(name, rule_dict[name]["def"]) elif auto_mode == "quantile": # Per-column p5/p95-style band; drop degenerate columns. band = auto_threshold.quantile_band( name, sqi_df[name].values, lower_pct=lower_bound, upper_pct=upper_bound, ) if band is None: continue # degenerate or all-NaN; drop with warning elif auto_mode == "tune": # Pre-compute every band so per-rule quantile is uniform. all_bands = auto_threshold.tuned_bands( {name: sqi_df[name].values for name in candidate_columns}, target_accept_rate=target_accept_rate, ) After the loop the surviving bands populate a single :class:`~vital_sqi.rule.RuleSet`; consecutive integer priorities are re-assigned so the rule set's "no gaps starting at 1" invariant holds even after drops. Classification is then the existing linear early-exit scan from ``RuleSet.execute``. Degenerate-band guard ^^^^^^^^^^^^^^^^^^^^^ Both ``quantile`` and ``tune`` modes share a single test:: band_width = upper - lower if band_width < 1e-6: # Column is essentially constant; skip with warning. continue The threshold (``DEGENERATE_BAND_HALF_WIDTH = 1e-6``) is the same one used by :mod:`vital_sqi.calibration.threshold_estimator` for its constant-column epsilon guard. This catches: * ``zero_crossings_rate_sqi`` on already-mean-centred clean signals (always 0). * ``ectopic_sqi`` on recordings with no detected ectopics (``rule_index=0`` → always returns the no-ectopic outlier ratio of ~1.0). * ``hfe_sqi`` whose default band sits above Nyquist for ``sampling_rate=100`` (always 0). Each dropped rule logs a warning naming the column and the observed band; the user sees them as *Auto-skipped: …* in the Inspect view's rule panel. Strictest-rule detection ^^^^^^^^^^^^^^^^^^^^^^^^ The Inspect view's *Drop strictest rule* button calls :func:`vital_sqi.rule.auto_threshold.strictest_columns` over the per-rule rejection tally:: counts = {"kurtosis_sqi": 23, "perfusion_sqi": 28, "msq_sqi": 28, "correlogram_sqi": 21, "dtw_sqi": 11} median = 23 mad = 5 threshold = median + 3*MAD = 38 # No outliers — all counts within bounds. This uses the **modified Z-score** (median + ``k`` × MAD) rather than mean + k·σ, because the very outlier you're trying to detect would inflate the mean and standard deviation and hide itself. The default ``k = 3`` flags clear upward outliers without false-positive nags on roughly-even distributions. ``ruleset_order`` and ``get_decision_segments`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``ruleset_order`` is a dict mapping integer priority keys to rule names:: ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "sdnn_sqi"} Only the SQIs listed here participate in classification. All other SQI columns in the DataFrame are ignored. After classification, ``get_decision_segments(segments, decision, reject_decision)`` separates segments into accepted and rejected lists by combining the rule-set decision with any pre-existing rejection flags. --- Part 4 — Peak Detection Algorithms ------------------------------------ Overview ~~~~~~~~ Peak detection is used by: - All ``per_beat=True`` SQIs (DTW, skewness, kurtosis on individual beats). - HRV SQIs — ``get_nn()`` calls the PPG/ECG detector to derive RR intervals. - ``ectopic_sqi``, ``correlogram_sqi``, ``msq_sqi``, ``amplitude_consistency_sqi``. The :class:`~vital_sqi.common.rpeak_detection.PeakDetector` class exposes two independent method families: ``ppg_detector`` (9 algorithms) and ``ecg_detector`` (4 algorithms). PPG Detectors ~~~~~~~~~~~~~ Select via the ``peak_detector`` argument in ``sqi_dict.json`` or via the ``detector_type`` constant imported from ``vital_sqi.common.rpeak_detection``. .. list-table:: PPG detection algorithms :header-rows: 1 :widths: 10 25 65 * - Constant - Value - Description * - ``DEFAULT`` - 6 - vitalDSP ``WaveformMorphology.systolic_peaks`` — height + distance threshold on the band-passed signal. **Recommended default.** * - ``ADAPTIVE_THRESHOLD`` - 1 - Threshold adapts to the local signal amplitude (running max). Suitable for recordings with slow baseline drift. * - ``COUNT_ORIG_METHOD`` - 2 - Local maxima above a ``0.75``-quantile × 0.2 threshold. Fast but sensitive to non-zero DC baselines. * - ``CLUSTERER_METHOD`` - 3 - KMeans (k=2) separates peaks from non-peaks. Non-deterministic; set ``random_state`` for reproducibility. * - ``SLOPE_SUM_METHOD`` - 4 - Zong 2003 slope-sum energy with onset back-search. Robust on noisy PPG; requires ``fs``-scaled windows. * - ``MOVING_AVERAGE_METHOD`` - 5 - Elgendi two-moving-average (MA\ :sub:`peak` > MA\ :sub:`beat`). Returns raw-signal peaks within each detected "block of interest". * - ``BILLAUER_METHOD`` - 7 - Billauer delta-based peak/trough tracker. Records ``mxpos`` (true peak sample) not the detection crossing. * - ``AMPD_METHOD`` - 8 - Automatic Multiscale Peak Detection (Scholkmann 2012). Parameter-free; works well on quasi-periodic PPG at 50–250 Hz. * - ``LOCAL_MAX_IBI`` - 9 - Local-maxima with inter-beat-interval gating. Filters candidate peaks by physiological IBI range (300–2000 ms). ECG Detectors ~~~~~~~~~~~~~ Used by ``ecg_detector(s, detector_type=ECG_DEFAULT)``. All algorithms detect R-peaks; Q/S/P/T morphology extraction always uses vitalDSP ``WaveformMorphology``, anchored to the chosen R-peak indices. .. list-table:: ECG detection algorithms :header-rows: 1 :widths: 15 8 77 * - Constant - Value - Description * - ``ECG_DEFAULT`` - 10 - vitalDSP ``WaveformMorphology`` derivative-square-MWI with height threshold. Provides R + Q/S/P/T in a single call. **Recommended default.** * - ``PAN_TOMPKINS`` - 11 - Classic Pan-Tompkins 1985. Bandpass 5–15 Hz → derivative → squaring → 150 ms MWI → adaptive dual threshold with 2 s learning period. Back-projects within ±60 ms for true R-peak. * - ``HAMILTON`` - 12 - Hamilton-Tompkins simplified 2002. Bandpass 8–16 Hz → first difference → 80 ms moving average → mean+0.5σ threshold. Single-pass, suitable for real-time use. * - ``ENGZEE`` - 13 - Engzee-Zeelenberg 1979. High-pass filtered derivative with a dynamic threshold that adapts after each detected beat. Usage example:: from vital_sqi.common.rpeak_detection import ( PeakDetector, ECG_DEFAULT, PAN_TOMPKINS, HAMILTON, ENGZEE, DEFAULT, AMPD_METHOD ) ppg_detector = PeakDetector(wave_type="PPG", fs=100) peaks, troughs = ppg_detector.ppg_detector(ppg_signal) # DEFAULT peaks, troughs = ppg_detector.ppg_detector(ppg_signal, AMPD_METHOD) ecg_detector = PeakDetector(wave_type="ECG", fs=256) r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal) # ECG_DEFAULT r, q, s, p, t = ecg_detector.ecg_detector(ecg_signal, PAN_TOMPKINS) --- Part 5 — Calibration System ---------------------------- Purpose ~~~~~~~ The calibration system derives ``lower`` and ``upper`` threshold bounds for every SQI empirically, using large pools of synthetic clean and noisy signals. Results are written to: - ``vital_sqi/resource/rule_dict.json`` — threshold rules for the classifier. - ``vital_sqi/resource/sqi_dict.json`` — SQI function registry (only successfully calibrated SQIs are included). Architecture ~~~~~~~~~~~~ :: run_calibration.calibrate() ├── 1. generate_clean_ppg / generate_clean_ecg (signal_generator.py) ├── 2. inject_noise × NOISE_PROFILES (noise_injector.py) │ ├── accept pool: clean + very-mild-noise segments │ └── reject pool: moderate-to-severe noise segments ├── 3. compute_sqi_distributions(accept + reject) (sqi_runner.py) ├── 4. estimate_thresholds(accept_df, reject_df) (threshold_estimator.py) └── 5. export_rule_dict + export_sqi_dict (exporter.py) Step 1 — Signal Generation ~~~~~~~~~~~~~~~~~~~~~~~~~~~ :func:`~vital_sqi.calibration.signal_generator.generate_clean_ppg` and :func:`~vital_sqi.calibration.signal_generator.generate_clean_ecg` produce ``n_segments`` clean synthetic waveforms using ``vitalDSP``'s physiological signal generators. - PPG: sampling rate 100 Hz, default segment duration 30 s. - ECG: sampling rate 256 Hz, default segment duration 30 s. - ``noise_floor=0.0`` — no noise added at generation time. - Each segment is a ``(signal_array, fs)`` tuple. Step 2 — Noise Injection ~~~~~~~~~~~~~~~~~~~~~~~~~ :func:`~vital_sqi.calibration.noise_injector.inject_noise` applies one of 11 noise types to a clean segment: .. list-table:: Noise types :header-rows: 1 :widths: 20 80 * - Type - Description * - ``gaussian`` - Additive white Gaussian noise scaled to ``amplitude × peak-to-peak`` * - ``baseline_wander`` - Sinusoidal drift at 0.05–0.5 Hz (respiration-like) * - ``motion`` - 3 burst-style high-amplitude artifacts at random positions * - ``harmonic`` - Sum of sine/cosine harmonics at a random base frequency 1–15 Hz * - ``powerline`` - 50 Hz or 60 Hz sinusoidal interference * - ``polynomial`` - Degree 2–4 polynomial trend (temperature / electrode drift) * - ``impulse`` - Random isolated spikes at ~2% of samples * - ``pink`` - 1/f colored noise via FFT shaping * - ``brown`` - 1/f² colored noise (Brownian motion-like) * - ``time_shift`` - Circular roll by a random number of samples (sync jitter) * - ``clock_drift`` - Resample ±amplitude% then trim/pad (wrong fs label simulation) * - ``dropout`` - Replace amplitude% of samples with their predecessor (packet loss) * - ``combined_mild`` - Gaussian + baseline wander + polynomial (mild realistic composite) * - ``combined_severe`` - Gaussian + harmonic + motion + impulse (severe realistic composite) ``NOISE_PROFILES`` defines 33 ``(label, noise_type, amplitude)`` triples. The **accept pool** includes: - All ``n_segments`` pure clean segments. - All segments from profiles labelled ``clean`` or ``gaussian_very_mild`` (amplitude 0.02 — barely perceptible noise). The **reject pool** includes segments from all other profiles (amplitude 0.05–0.80, representing clinically poor signal quality). The reject pool uses only ``n_reject_segments`` (default 50) per profile for speed — it is used for **diagnostics only** and does not influence the p5/p95 bounds. Step 3 — SQI Distribution Computation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :func:`~vital_sqi.calibration.sqi_runner.compute_sqi_distributions` runs all 47 SQIs over every segment in both pools: .. code-block:: python accept_df = compute_sqi_distributions(accept_segments, wave_type="PPG") reject_df = compute_sqi_distributions(reject_segments, wave_type="PPG") Each call returns a DataFrame with one row per segment and one column per SQI output. ``poincare_sqi`` expands to 4 sub-columns (sd1, sd2, area, ratio). Failed SQIs are NaN for that segment. ``_patch_fs`` automatically updates any ``sample_rate`` or ``sampling_rate`` keyword argument to the actual segment sampling rate. Step 4 — Threshold Estimation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :func:`~vital_sqi.calibration.threshold_estimator.estimate_thresholds` processes each SQI column independently: .. code-block:: text for each column in accept_df: accept_vals = drop NaN and Inf if n_valid < 5: skip (mark calibrated=False) lower = percentile(accept_vals, lower_pct) # default p5 upper = percentile(accept_vals, upper_pct) # default p95 if upper - lower < 1e-6: widen band by epsilon (constant signal guard) if reject distribution overlaps accept band: record note (thresholds are NOT changed) mark calibrated=True The result for each column is a :class:`~vital_sqi.calibration.threshold_estimator.SQIThreshold` dataclass containing: ========================================= ==================================== Field Meaning ========================================= ==================================== ``lower``, ``upper`` Accept region bounds ``accept_median``, ``accept_std`` Statistics of the clean distribution ``reject_median`` Median of the noisy distribution ``n_accept``, ``n_reject`` Sample counts used ``calibrated`` False if SQI could not be calibrated ``note`` Human-readable diagnostic note ========================================= ==================================== SQIs that cannot be calibrated (always NaN on synthetic data): - ``interpolation_sqi`` — unimplemented stub, always 0. - ``vhfp_sqi`` — band (>150 Hz) is above the Nyquist limit for 100/256 Hz signals. - ``sample_entropy_sqi``, ``dfa_sqi``, ``hurst_sqi`` — require natural HRV variability; synthetic signals have constant RR intervals (std=0), making entropy/DFA/Hurst undefined. These SQIs are fully functional on real recordings. Step 5 — Export ~~~~~~~~~~~~~~~~ **rule_dict.json** — written by :func:`~vital_sqi.calibration.exporter.export_rule_dict`. Each calibrated SQI produces a four-element ``"def"`` array encoding the accept interval ``(lower, upper)``:: "kurtosis_sqi": { "name": "kurtosis_sqi", "def": [ {"op": ">", "value": "0.497", "label": "accept"}, {"op": "<=", "value": "0.497", "label": "reject"}, {"op": ">=", "value": "8.21", "label": "reject"}, {"op": "<", "value": "8.21", "label": "accept"} ], "desc": "Calibrated from synthetic PPG signals. Accept: p5=0.497 to p95=8.21." } Existing entries for SQIs that were **not** calibrated in the current run are **preserved unchanged** (merge, not overwrite). This protects manually curated thresholds. **sqi_dict.json** — written by :func:`~vital_sqi.calibration.exporter.export_sqi_dict`. Only SQIs that were successfully calibrated AND appear in ``_SQI_ARG_TEMPLATES`` are emitted. The output is a ready-to-use configuration file for :func:`~vital_sqi.pipeline.pipeline_functions.extract_sqi`. Running Calibration ~~~~~~~~~~~~~~~~~~~~ From the repo root:: # PPG calibration (default: 200 accept segments, 50 reject per profile) python -m vital_sqi.calibration.run_calibration --wave_type PPG # ECG calibration python -m vital_sqi.calibration.run_calibration --wave_type ECG # Dry run — compute thresholds but do not write files python -m vital_sqi.calibration.run_calibration --wave_type PPG --dry_run # All options python -m vital_sqi.calibration.run_calibration \ --wave_type PPG \ --n_segments 500 \ --n_reject_segments 100 \ --duration 30 \ --lower_pct 5 \ --upper_pct 95 \ --output_dir path/to/output \ --seed 42 Or from Python: .. code-block:: python from vital_sqi.calibration.run_calibration import calibrate thresholds = calibrate( wave_type="PPG", n_segments=200, n_reject_segments=50, duration=30.0, lower_pct=5.0, upper_pct=95.0, dry_run=False, seed=42, ) The returned ``thresholds`` dict maps SQI column names to :class:`~vital_sqi.calibration.threshold_estimator.SQIThreshold` objects and can be inspected with :func:`~vital_sqi.calibration.threshold_estimator.thresholds_to_dataframe`. --- Part 6 — Complete End-to-End Example -------------------------------------- .. code-block:: python import vital_sqi.data.signal_io as sio from vital_sqi.preprocess.segment_split import split_segment from vital_sqi.pipeline.pipeline_functions import extract_sqi, classify_segments, get_decision_segments # 1. Load signal df = sio.read_ppg("recording.csv", sampling_rate=100) # 2. Segment into 30-second windows segments, milestones = split_segment(df, duration=30, fs=100) # 3. Extract SQIs sqi_df = extract_sqi( segments, milestones, "vital_sqi/resource/sqi_dict.json", wave_type="PPG", ) # 4. Classify — each segment gets 'accept' or 'reject' ruleset_order = {1: "kurtosis_sqi", 2: "perfusion_sqi", 3: "correlogram_sqi"} ruleset, sqis_with_decisions = classify_segments( [sqi_df.iloc[[i]] for i in range(len(sqi_df))], rule_dict_filename="vital_sqi/resource/rule_dict.json", ruleset_order=ruleset_order, auto_mode=True, ) # 5. Separate accepted / rejected segments decisions = [df["decision"].iloc[0] for df in sqis_with_decisions] pre_reject = ["accept"] * len(segments) # or from get_reject_segments() accepted, rejected = get_decision_segments(segments, decisions, pre_reject) print(f"Accepted: {len(accepted)} Rejected: {len(rejected)}") --- Performance Notes ----------------- A few hot paths have been hand-tuned and are worth knowing about when benchmarking or extending the library: - **``_argspec_cache`** in :mod:`vital_sqi.pipeline.pipeline_functions` memoises ``inspect.getfullargspec`` per SQI callable. The cache uses a sentinel value so SQIs with **no positional arguments** are still cached after the first call (an earlier ``or``-based lookup silently recomputed for empty lists). - **DTW reference templates** are cached in :mod:`vital_sqi.sqi.dtw_sqi` keyed by ``(template_type, template_size)``. Because :func:`~vital_sqi.common.generate_template.rr_process` is now seeded by default, ``ecg_dynamic_template`` is reproducible between runs and the cached template is identical from one call to the next. - **``sample_entropy_sqi``** vectorises the inner template-match counter with ``sliding_window_view``; complexity remains O(n²) but the inner loop runs in NumPy rather than Python. - **``dfa_sqi``** uses a closed-form per-block linear detrend (no ``np.polyfit`` call per block), giving a 10-20× speed-up on long recordings. - **``squeeze_template``** is fully vectorised via a cumulative-sum trick; template generation for DTW is now negligible cost. - **``nn_intervals``** are computed once per segment by :func:`~vital_sqi.pipeline.pipeline_functions.extract_segment_sqi` and injected into every HRV SQI via a private ``_nn_intervals=`` kwarg. Numerical Edge Cases -------------------- The following SQIs intentionally return ``np.nan`` (not ``0``) when their underlying assumption fails — so downstream rules can treat the result as "missing" rather than as a legitimate quality score: - ``hf_energy_sqi`` / ``vhf_norm_power_sqi`` — band lies above Nyquist - ``sample_entropy_sqi`` — either the m-length or (m+1)-length template match count is zero (would otherwise produce ``±inf``) - ``perfusion_sqi`` — raw-signal mean is near zero - ``baseline_wander_sqi`` / ``spectral_snr_sqi`` — total or out-of-band power is zero - ``amplitude_consistency_sqi`` — fewer than two peaks detected Rules built from the calibrated ``rule_dict.json`` treat ``NaN`` as ``"reject"`` (see :meth:`~vital_sqi.rule.Rule.apply_rule`), so these guards fail safe. See Also -------- - :py:mod:`vital_sqi.sqi` — all SQI functions and ``sqi_mapping`` - :py:mod:`vital_sqi.rule` — ``Rule`` and ``RuleSet`` classes - :py:mod:`vital_sqi.calibration` — calibration subpackage API - :py:func:`vital_sqi.pipeline.pipeline_functions.extract_sqi` - :py:func:`vital_sqi.pipeline.pipeline_functions.classify_segments` - :py:func:`vital_sqi.calibration.run_calibration.calibrate`