Calibration (vital_sqi.calibration)
Automated derivation of SQI accept/reject thresholds. The calibration
workflow generates clean synthetic signals, injects parameterised noise,
computes SQIs over both pools, and writes a rule_dict.json and
sqi_dict.json that can be loaded by the pipeline.
Top-level runner
Top-level calibration experiment runner.
Usage (from repo root)
python -m vital_sqi.calibration.run_calibration –wave_type PPG python -m vital_sqi.calibration.run_calibration –wave_type ECG python -m vital_sqi.calibration.run_calibration –wave_type PPG –n_segments 500 –dry_run
What it does
Generate
n_segmentsclean signals (noise_floor=0).For every noise profile in
NOISE_PROFILESthat is labelled as clean, accumulate into the accept pool.For every noise profile labelled as reject-level, generate
n_segmentsdegraded signals and accumulate into the reject pool.Compute SQIs over both pools via
compute_sqi_distributions().Estimate p5/p95 thresholds from the accept pool.
Export
rule_dict.jsonandsqi_dict.jsontooutput_dir.Write a diagnostics CSV alongside the outputs.
The script prints a summary table on completion showing which SQIs were calibrated and their derived thresholds.
- vital_sqi.calibration.run_calibration.calibrate(wave_type='PPG', n_segments=200, n_reject_segments=50, duration=30.0, lower_pct=5.0, upper_pct=95.0, output_dir=None, dry_run=False, seed=42, show_progress=True)[source]
Run the full calibration experiment and (optionally) export results.
- Parameters:
wave_type (str) –
'PPG'or'ECG'.n_segments (int) – Number of clean segments (and accept-pool segments per noise condition).
n_reject_segments (int) – Number of segments per reject-pool noise condition. Smaller than n_segments is fine — the reject pool is only used for diagnostics and overlap detection, not for the p5/p95 accept-band thresholds (default
50).duration (float) – Segment duration in seconds.
lower_pct (float) – Lower percentile for accept band (default
5).upper_pct (float) – Upper percentile for accept band (default
95).output_dir (str, optional) – Directory to write
rule_dict.json,sqi_dict.json, and the diagnostics CSV. Defaults tovital_sqi/resource/.dry_run (bool) – If
True, compute thresholds but do NOT write any files.seed (int) – Random seed for reproducibility (default
42).show_progress (bool) – Show tqdm progress bars.
- Returns:
Calibrated thresholds as returned by
estimate_thresholds().- Return type:
Synthetic signal generation
Generate clean synthetic ECG and PPG segments for calibration.
Each function returns a list of (signal_array, fs) tuples, one per segment. Segments vary heart rate across the physiological range so the SQI distribution covers realistic within-subject variability.
- vital_sqi.calibration.signal_generator.generate_clean_ecg(n_segments=200, duration=30.0, sampling_rate=256, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]
Generate clean synthetic ECG segments.
- Parameters:
n_segments (int) – Number of independent segments to generate.
duration (float) – Length of each segment in seconds.
sampling_rate (int) – Samples per second (vitalDSP ECG uses
sfecg).noise_floor (float) – Additive noise amplitude (
Anoisein vitalDSP units). Use 0.0 for clean signals.rng (np.random.Generator, optional) – Random number generator for reproducibility.
- Returns:
Each element is
(signal_array: np.ndarray, fs: int).- Return type:
- vital_sqi.calibration.signal_generator.generate_clean_ppg(n_segments=200, duration=30.0, sampling_rate=100, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]
Generate clean synthetic PPG segments.
- Parameters:
n_segments (int) – Number of independent segments to generate.
duration (float) – Length of each segment in seconds.
sampling_rate (int) – Samples per second.
hr_range (tuple of int) – (min_hr, max_hr) in bpm; each segment gets a randomly drawn HR.
noise_floor (float) – Baseline noise amplitude (fraction of peak-to-peak). Use 0.0 for truly clean signals.
rng (np.random.Generator, optional) – Random number generator for reproducibility.
- Returns:
Each element is
(signal_array: np.ndarray, fs: int).- Return type:
Noise injection
Noise injection for calibration experiments.
Each noise function takes a clean signal array and returns a degraded copy. All amplitudes are expressed as a fraction of the signal’s peak-to-peak range so they scale correctly regardless of the raw signal magnitude.
Available noise types
gaussian: additive white Gaussian noisebaseline_wander: slow sinusoidal drift (0.05–0.5 Hz)motion: burst-style motion artifacts (random amplitude spikes)harmonic: sum of sine/cosine harmonics at multiples of a base freqpowerline: 50 Hz or 60 Hz sinusoidal interferencepolynomial: low-degree polynomial trend added across the segmentimpulse: random isolated spike impulsescolored: pink (1/f) or brown (1/f²) noise
NOISE_PROFILES is a list of (name, noise_fn, amplitude) triples that
define the experiments run by the calibration pipeline.
- vital_sqi.calibration.noise_injector.CLEAN_PROFILE_LABELS = {'clean', 'gaussian_very_mild'}
Profiles considered “clean enough” to contribute to the accept distribution. Any profile NOT in this set contributes to the reject distribution.
- vital_sqi.calibration.noise_injector.NOISE_PROFILES = [('clean', 'gaussian', 0.0), ('gaussian_very_mild', 'gaussian', 0.02), ('gaussian_mild', 'gaussian', 0.05), ('gaussian_moderate', 'gaussian', 0.15), ('gaussian_severe', 'gaussian', 0.4), ('gaussian_extreme', 'gaussian', 0.8), ('bw_mild', 'baseline_wander', 0.05), ('bw_moderate', 'baseline_wander', 0.2), ('bw_severe', 'baseline_wander', 0.5), ('motion_mild', 'motion', 0.1), ('motion_moderate', 'motion', 0.4), ('motion_severe', 'motion', 0.8), ('harmonic_mild', 'harmonic', 0.05), ('harmonic_moderate', 'harmonic', 0.2), ('harmonic_severe', 'harmonic', 0.5), ('powerline_mild', 'powerline', 0.05), ('powerline_severe', 'powerline', 0.3), ('poly_mild', 'polynomial', 0.1), ('poly_severe', 'polynomial', 0.5), ('impulse_mild', 'impulse', 0.05), ('impulse_severe', 'impulse', 0.3), ('pink_mild', 'pink', 0.05), ('pink_severe', 'pink', 0.3), ('brown_mild', 'brown', 0.05), ('brown_severe', 'brown', 0.3), ('combined_mild', 'combined_mild', 0.1), ('combined_severe', 'combined_severe', 0.5), ('shift_mild', 'time_shift', 0.05), ('shift_severe', 'time_shift', 0.2), ('drift_mild', 'clock_drift', 0.02), ('drift_severe', 'clock_drift', 0.05), ('dropout_mild', 'dropout', 0.02), ('dropout_severe', 'dropout', 0.1)]
Full set of noise conditions used in calibration experiments. Each entry is
(label, noise_type, amplitude)where amplitude is a fraction of the signal peak-to-peak range.
- vital_sqi.calibration.noise_injector.baseline_wander(signal, amplitude, rng, fs=100)[source]
Slow sinusoidal drift at a random frequency in [0.05, 0.5] Hz, mimicking respiration-induced baseline wander.
- vital_sqi.calibration.noise_injector.clock_drift(signal, amplitude, rng, fs=100)[source]
Simulate clock drift by resampling the signal at a perturbed rate.
The effective sampling rate is stretched or compressed by
amplitudefraction (e.g. amplitude=0.02 → ±2% clock error). The signal is resampled to a perturbed length and then trimmed / zero-padded back to the original length, simulating what happens when data is labelled with the wrong fs.
- vital_sqi.calibration.noise_injector.colored_noise(signal, amplitude, rng, exponent=1.0)[source]
Colored noise with power spectrum ∝ 1/f^exponent.
exponent=1 → pink noise, exponent=2 → brown noise. Generated via FFT shaping of white noise.
- vital_sqi.calibration.noise_injector.gaussian_noise(signal, amplitude, rng)[source]
Additive white Gaussian noise scaled to
amplitude× peak-to-peak.
- vital_sqi.calibration.noise_injector.harmonic_interference(signal, amplitude, rng, fs=100)[source]
Sum of sine and cosine harmonics at random base frequency [1–15 Hz] with up to 5 harmonics. Mimics periodic electrical interference or muscle noise.
- vital_sqi.calibration.noise_injector.impulse_noise(signal, amplitude, rng, rate=0.02)[source]
Random isolated spike impulses at
ratefraction of samples. Positive and negative spikes with equal probability.
- vital_sqi.calibration.noise_injector.inject_noise(signal, noise_type, amplitude, rng, fs=100)[source]
Apply a named noise type to a clean signal.
- Parameters:
signal (np.ndarray) – Clean input signal.
noise_type (str) – One of the keys in
NOISE_PROFILESor a raw noise name:'gaussian','baseline_wander','motion','harmonic','powerline','polynomial','impulse','pink','brown','combined_mild','combined_severe'.amplitude (float) – Noise amplitude as a fraction of signal peak-to-peak.
rng (np.random.Generator) – Random number generator.
fs (int) – Sampling frequency in Hz (required by time-dependent noise types).
- Returns:
Degraded signal (same length as input).
- Return type:
np.ndarray
- vital_sqi.calibration.noise_injector.motion_artifact(signal, amplitude, rng, fs=100, n_bursts=3)[source]
Random burst artifacts: short high-amplitude segments injected at random positions, mimicking movement in wearable recordings.
- vital_sqi.calibration.noise_injector.polynomial_trend(signal, amplitude, rng)[source]
Polynomial trend of degree 2–4 added across the segment. Mimics slow non-stationary drift or temperature effects.
- vital_sqi.calibration.noise_injector.powerline_noise(signal, amplitude, rng, fs=100)[source]
50 Hz or 60 Hz sinusoidal powerline interference with a random phase.
- vital_sqi.calibration.noise_injector.sample_dropout(signal, amplitude, rng, fs=100)[source]
Randomly drop and duplicate samples to simulate packet loss and jitter.
For each dropped sample, a neighbouring sample is duplicated to keep the signal length constant.
amplitudecontrols the fraction of samples affected (e.g. 0.05 → 5% dropout rate).
- vital_sqi.calibration.noise_injector.time_shift(signal, amplitude, rng, fs=100)[source]
Circularly shift the signal by a random number of samples.
Simulates late electrode attachment, trigger jitter, or sync misalignment. The shift magnitude is
amplitude * fssamples (e.g. amplitude=0.05 at fs=100 means up to 5-sample / 50 ms shift). Circular padding preserves signal length without introducing zero-padding edge artifacts.
Batch SQI computation
Batch SQI computation over synthetic signal segments.
compute_sqi_distributions is the main entry point. It takes a list of
(signal, fs) tuples, wraps each into the DataFrame format expected by
extract_segment_sqi, runs all configured SQIs, and returns the raw
per-segment SQI values as a DataFrame — one row per segment.
The function handles three special cases transparently:
- dict-returning SQIs (e.g. poincare_sqi) are flattened to multiple columns
- SQIs that raise exceptions emit NaN for that segment (logged, not re-raised)
- nn_intervals-based SQIs are detected and handled by the existing pipeline logic
- vital_sqi.calibration.sqi_runner.compute_sqi_distributions(segments, wave_type='PPG', sqi_names=None, sqi_arg_list=None, show_progress=True, n_jobs=1)[source]
Compute SQIs for every segment and return a DataFrame of raw values.
- Parameters:
segments (list of tuple) – Each element is
(signal_array: np.ndarray, fs: int)as produced bygenerate_clean_ppg()orgenerate_clean_ecg().wave_type (str, optional) –
'PPG'or'ECG'(default'PPG').sqi_names (list of str, optional) – Subset of SQI names to compute. Defaults to all keys in
DEFAULT_SQI_ARG_LIST.sqi_arg_list (dict, optional) – Custom argument dict keyed by SQI name. Any key not present falls back to
DEFAULT_SQI_ARG_LIST.show_progress (bool) – Show tqdm progress bar (default
True).n_jobs (int, optional) – Number of parallel workers for joblib.
1(default) runs sequentially;-1uses all available CPU cores.
- Returns:
One row per segment, one column per SQI output. Multi-value SQIs (poincare) produce multiple columns. Failed/errored SQIs are NaN.
- Return type:
pd.DataFrame
Threshold estimator
Derive accept/reject thresholds from empirical SQI distributions.
The core algorithm
accept_df— SQI values from clean signals (noise_floor ≈ 0)reject_df— SQI values from heavily degraded signals
For each SQI column:
lower = percentile(accept_df[col], lower_pct) # e.g. p5 upper = percentile(accept_df[col], upper_pct) # e.g. p95
The accept region is the open interval (lower, upper).
Edge cases
All-NaN column → SQI is skipped (not calibratable)
Constant column → warns and widens the band by a small epsilon
Accept and reject → the reject distribution is stored for reference distributions overlap but does not change the thresholds (clean-signal
distribution is authoritative)
Very small range → epsilon guard prevents a zero-width rule
- class vital_sqi.calibration.threshold_estimator.SQIThreshold(sqi_name, lower=nan, upper=nan, accept_median=nan, accept_std=nan, reject_median=nan, n_accept=0, n_reject=0, calibrated=False, note='')[source]
Bases:
objectCalibrated threshold for a single SQI column.
- Parameters:
- vital_sqi.calibration.threshold_estimator.estimate_thresholds(accept_df, reject_df=None, lower_pct=5.0, upper_pct=95.0)[source]
Derive accept/reject bounds for every SQI column.
- Parameters:
accept_df (pd.DataFrame) – SQI values from clean segments. One row per segment, one column per SQI. All-NaN columns are skipped.
reject_df (pd.DataFrame, optional) – SQI values from degraded segments. Used only for diagnostics (the reject distribution does not change the threshold values).
lower_pct (float) – Lower percentile of the accept distribution that defines the accept lower bound (default
5).upper_pct (float) – Upper percentile defining the accept upper bound (default
95).
- Returns:
Mapping of
sqi_name → SQIThreshold. Only calibratable SQIs are included.- Return type:
- vital_sqi.calibration.threshold_estimator.thresholds_to_dataframe(thresholds)[source]
Convert a thresholds dict to a summary DataFrame for inspection.
- Parameters:
thresholds (dict) – Output of
estimate_thresholds().- Returns:
One row per SQI with columns: sqi_name, lower, upper, accept_median, accept_std, reject_median, n_accept, n_reject, calibrated, note.
- Return type:
pd.DataFrame
Exporter
Export calibrated thresholds to rule_dict.json and sqi_dict.json.
rule_dict format (one entry per SQI)
Each entry uses the paired operator structure required by the rule engine:
- {
- “sqi_name”: {
“name”: “sqi_name”, “def”: [
{“op”: “>”, “value”: “<lower>”, “label”: “accept”}, {“op”: “<=”, “value”: “<lower>”, “label”: “reject”}, {“op”: “>=”, “value”: “<upper>”, “label”: “reject”}, {“op”: “<”, “value”: “<upper>”, “label”: “accept”}
], “desc”: “Calibrated from synthetic <wave_type> signals. Accept: p<lo>-p<hi>.”, “ref”: “vital_sqi.calibration”
}
}
The exporter merges new calibrated entries INTO the existing rule_dict rather than replacing it, so manually curated entries (for SQIs the calibrator cannot reach) are preserved.
- vital_sqi.calibration.exporter.export_diagnostics(thresholds, output_path)[source]
Write a human-readable CSV report of all calibration results.
- Parameters:
thresholds (dict) – Output of
estimate_thresholds().output_path (str) – Destination CSV file path.
- Return type:
None
- vital_sqi.calibration.exporter.export_rule_dict(thresholds, output_path, wave_type='PPG', lower_pct=5.0, upper_pct=95.0, backup=True)[source]
Write or update a rule_dict.json file with calibrated thresholds.
Existing entries whose SQI was NOT calibrated are preserved unchanged. Calibrated entries overwrite their previous counterparts.
- Parameters:
thresholds (dict) – Output of
estimate_thresholds(). Only entries withcalibrated=Trueare written.output_path (str) – Destination file path (created if absent).
wave_type (str) – Used in the
descfield for documentation purposes.lower_pct (float) – Percentiles used during estimation (for documentation).
upper_pct (float) – Percentiles used during estimation (for documentation).
backup (bool) – If
Trueand the file already exists, a timestamped backup is made before overwriting (defaultTrue).
- Return type:
None
- vital_sqi.calibration.exporter.export_sqi_dict(thresholds, output_path, backup=True)[source]
Write a sqi_dict.json containing only successfully calibrated SQIs.
Entries for SQIs that could not be calibrated are excluded so the output template is safe to use directly in
extract_sqi.- Parameters:
thresholds (dict) – Output of
estimate_thresholds().output_path (str) – Destination file path.
backup (bool) – If the file exists, make a timestamped backup before overwriting.
- Return type:
None