Calibration (vital_sqi.calibration)

Automated derivation of SQI accept/reject thresholds. The calibration workflow generates clean synthetic signals, injects parameterised noise, computes SQIs over both pools, and writes a rule_dict.json and sqi_dict.json that can be loaded by the pipeline.

Top-level runner

Top-level calibration experiment runner.

Usage (from repo root)

python -m vital_sqi.calibration.run_calibration –wave_type PPG python -m vital_sqi.calibration.run_calibration –wave_type ECG python -m vital_sqi.calibration.run_calibration –wave_type PPG –n_segments 500 –dry_run

What it does

  1. Generate n_segments clean signals (noise_floor=0).

  2. For every noise profile in NOISE_PROFILES that is labelled as clean, accumulate into the accept pool.

  3. For every noise profile labelled as reject-level, generate n_segments degraded signals and accumulate into the reject pool.

  4. Compute SQIs over both pools via compute_sqi_distributions().

  5. Estimate p5/p95 thresholds from the accept pool.

  6. Export rule_dict.json and sqi_dict.json to output_dir.

  7. Write a diagnostics CSV alongside the outputs.

The script prints a summary table on completion showing which SQIs were calibrated and their derived thresholds.

vital_sqi.calibration.run_calibration.calibrate(wave_type='PPG', n_segments=200, n_reject_segments=50, duration=30.0, lower_pct=5.0, upper_pct=95.0, output_dir=None, dry_run=False, seed=42, show_progress=True)[source]

Run the full calibration experiment and (optionally) export results.

Parameters:
  • wave_type (str) – 'PPG' or 'ECG'.

  • n_segments (int) – Number of clean segments (and accept-pool segments per noise condition).

  • n_reject_segments (int) – Number of segments per reject-pool noise condition. Smaller than n_segments is fine — the reject pool is only used for diagnostics and overlap detection, not for the p5/p95 accept-band thresholds (default 50).

  • duration (float) – Segment duration in seconds.

  • lower_pct (float) – Lower percentile for accept band (default 5).

  • upper_pct (float) – Upper percentile for accept band (default 95).

  • output_dir (str, optional) – Directory to write rule_dict.json, sqi_dict.json, and the diagnostics CSV. Defaults to vital_sqi/resource/.

  • dry_run (bool) – If True, compute thresholds but do NOT write any files.

  • seed (int) – Random seed for reproducibility (default 42).

  • show_progress (bool) – Show tqdm progress bars.

Returns:

Calibrated thresholds as returned by estimate_thresholds().

Return type:

dict

Synthetic signal generation

Generate clean synthetic ECG and PPG segments for calibration.

Each function returns a list of (signal_array, fs) tuples, one per segment. Segments vary heart rate across the physiological range so the SQI distribution covers realistic within-subject variability.

vital_sqi.calibration.signal_generator.generate_clean_ecg(n_segments=200, duration=30.0, sampling_rate=256, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]

Generate clean synthetic ECG segments.

Parameters:
  • n_segments (int) – Number of independent segments to generate.

  • duration (float) – Length of each segment in seconds.

  • sampling_rate (int) – Samples per second (vitalDSP ECG uses sfecg).

  • hr_range (tuple of int) – (min_hr, max_hr) in bpm.

  • noise_floor (float) – Additive noise amplitude (Anoise in vitalDSP units). Use 0.0 for clean signals.

  • rng (np.random.Generator, optional) – Random number generator for reproducibility.

Returns:

Each element is (signal_array: np.ndarray, fs: int).

Return type:

list of tuple

vital_sqi.calibration.signal_generator.generate_clean_ppg(n_segments=200, duration=30.0, sampling_rate=100, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]

Generate clean synthetic PPG segments.

Parameters:
  • n_segments (int) – Number of independent segments to generate.

  • duration (float) – Length of each segment in seconds.

  • sampling_rate (int) – Samples per second.

  • hr_range (tuple of int) – (min_hr, max_hr) in bpm; each segment gets a randomly drawn HR.

  • noise_floor (float) – Baseline noise amplitude (fraction of peak-to-peak). Use 0.0 for truly clean signals.

  • rng (np.random.Generator, optional) – Random number generator for reproducibility.

Returns:

Each element is (signal_array: np.ndarray, fs: int).

Return type:

list of tuple

Noise injection

Noise injection for calibration experiments.

Each noise function takes a clean signal array and returns a degraded copy. All amplitudes are expressed as a fraction of the signal’s peak-to-peak range so they scale correctly regardless of the raw signal magnitude.

Available noise types

  • gaussian : additive white Gaussian noise

  • baseline_wander : slow sinusoidal drift (0.05–0.5 Hz)

  • motion : burst-style motion artifacts (random amplitude spikes)

  • harmonic : sum of sine/cosine harmonics at multiples of a base freq

  • powerline : 50 Hz or 60 Hz sinusoidal interference

  • polynomial : low-degree polynomial trend added across the segment

  • impulse : random isolated spike impulses

  • colored : pink (1/f) or brown (1/f²) noise

NOISE_PROFILES is a list of (name, noise_fn, amplitude) triples that define the experiments run by the calibration pipeline.

vital_sqi.calibration.noise_injector.CLEAN_PROFILE_LABELS = {'clean', 'gaussian_very_mild'}

Profiles considered “clean enough” to contribute to the accept distribution. Any profile NOT in this set contributes to the reject distribution.

vital_sqi.calibration.noise_injector.NOISE_PROFILES = [('clean', 'gaussian', 0.0), ('gaussian_very_mild', 'gaussian', 0.02), ('gaussian_mild', 'gaussian', 0.05), ('gaussian_moderate', 'gaussian', 0.15), ('gaussian_severe', 'gaussian', 0.4), ('gaussian_extreme', 'gaussian', 0.8), ('bw_mild', 'baseline_wander', 0.05), ('bw_moderate', 'baseline_wander', 0.2), ('bw_severe', 'baseline_wander', 0.5), ('motion_mild', 'motion', 0.1), ('motion_moderate', 'motion', 0.4), ('motion_severe', 'motion', 0.8), ('harmonic_mild', 'harmonic', 0.05), ('harmonic_moderate', 'harmonic', 0.2), ('harmonic_severe', 'harmonic', 0.5), ('powerline_mild', 'powerline', 0.05), ('powerline_severe', 'powerline', 0.3), ('poly_mild', 'polynomial', 0.1), ('poly_severe', 'polynomial', 0.5), ('impulse_mild', 'impulse', 0.05), ('impulse_severe', 'impulse', 0.3), ('pink_mild', 'pink', 0.05), ('pink_severe', 'pink', 0.3), ('brown_mild', 'brown', 0.05), ('brown_severe', 'brown', 0.3), ('combined_mild', 'combined_mild', 0.1), ('combined_severe', 'combined_severe', 0.5), ('shift_mild', 'time_shift', 0.05), ('shift_severe', 'time_shift', 0.2), ('drift_mild', 'clock_drift', 0.02), ('drift_severe', 'clock_drift', 0.05), ('dropout_mild', 'dropout', 0.02), ('dropout_severe', 'dropout', 0.1)]

Full set of noise conditions used in calibration experiments. Each entry is (label, noise_type, amplitude) where amplitude is a fraction of the signal peak-to-peak range.

vital_sqi.calibration.noise_injector.baseline_wander(signal, amplitude, rng, fs=100)[source]

Slow sinusoidal drift at a random frequency in [0.05, 0.5] Hz, mimicking respiration-induced baseline wander.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.clock_drift(signal, amplitude, rng, fs=100)[source]

Simulate clock drift by resampling the signal at a perturbed rate.

The effective sampling rate is stretched or compressed by amplitude fraction (e.g. amplitude=0.02 → ±2% clock error). The signal is resampled to a perturbed length and then trimmed / zero-padded back to the original length, simulating what happens when data is labelled with the wrong fs.

Parameters:
  • signal (np.ndarray)

  • amplitude (float) – Maximum fractional clock error (e.g. 0.02 = ±2%).

  • rng (np.random.Generator)

  • fs (int (unused — kept for API uniformity))

Return type:

ndarray

vital_sqi.calibration.noise_injector.colored_noise(signal, amplitude, rng, exponent=1.0)[source]

Colored noise with power spectrum ∝ 1/f^exponent.

exponent=1 → pink noise, exponent=2 → brown noise. Generated via FFT shaping of white noise.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.gaussian_noise(signal, amplitude, rng)[source]

Additive white Gaussian noise scaled to amplitude × peak-to-peak.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.harmonic_interference(signal, amplitude, rng, fs=100)[source]

Sum of sine and cosine harmonics at random base frequency [1–15 Hz] with up to 5 harmonics. Mimics periodic electrical interference or muscle noise.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.impulse_noise(signal, amplitude, rng, rate=0.02)[source]

Random isolated spike impulses at rate fraction of samples. Positive and negative spikes with equal probability.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.inject_noise(signal, noise_type, amplitude, rng, fs=100)[source]

Apply a named noise type to a clean signal.

Parameters:
  • signal (np.ndarray) – Clean input signal.

  • noise_type (str) – One of the keys in NOISE_PROFILES or a raw noise name: 'gaussian', 'baseline_wander', 'motion', 'harmonic', 'powerline', 'polynomial', 'impulse', 'pink', 'brown', 'combined_mild', 'combined_severe'.

  • amplitude (float) – Noise amplitude as a fraction of signal peak-to-peak.

  • rng (np.random.Generator) – Random number generator.

  • fs (int) – Sampling frequency in Hz (required by time-dependent noise types).

Returns:

Degraded signal (same length as input).

Return type:

np.ndarray

vital_sqi.calibration.noise_injector.motion_artifact(signal, amplitude, rng, fs=100, n_bursts=3)[source]

Random burst artifacts: short high-amplitude segments injected at random positions, mimicking movement in wearable recordings.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.polynomial_trend(signal, amplitude, rng)[source]

Polynomial trend of degree 2–4 added across the segment. Mimics slow non-stationary drift or temperature effects.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.powerline_noise(signal, amplitude, rng, fs=100)[source]

50 Hz or 60 Hz sinusoidal powerline interference with a random phase.

Parameters:
Return type:

ndarray

vital_sqi.calibration.noise_injector.sample_dropout(signal, amplitude, rng, fs=100)[source]

Randomly drop and duplicate samples to simulate packet loss and jitter.

For each dropped sample, a neighbouring sample is duplicated to keep the signal length constant. amplitude controls the fraction of samples affected (e.g. 0.05 → 5% dropout rate).

Parameters:
  • signal (np.ndarray)

  • amplitude (float) – Fraction of samples to drop (and replace by duplication).

  • rng (np.random.Generator)

  • fs (int (unused))

Return type:

ndarray

vital_sqi.calibration.noise_injector.time_shift(signal, amplitude, rng, fs=100)[source]

Circularly shift the signal by a random number of samples.

Simulates late electrode attachment, trigger jitter, or sync misalignment. The shift magnitude is amplitude * fs samples (e.g. amplitude=0.05 at fs=100 means up to 5-sample / 50 ms shift). Circular padding preserves signal length without introducing zero-padding edge artifacts.

Parameters:
  • signal (np.ndarray)

  • amplitude (float) – Maximum shift as a fraction of fs (in seconds).

  • rng (np.random.Generator)

  • fs (int)

Return type:

ndarray

Batch SQI computation

Batch SQI computation over synthetic signal segments.

compute_sqi_distributions is the main entry point. It takes a list of (signal, fs) tuples, wraps each into the DataFrame format expected by extract_segment_sqi, runs all configured SQIs, and returns the raw per-segment SQI values as a DataFrame — one row per segment.

The function handles three special cases transparently: - dict-returning SQIs (e.g. poincare_sqi) are flattened to multiple columns - SQIs that raise exceptions emit NaN for that segment (logged, not re-raised) - nn_intervals-based SQIs are detected and handled by the existing pipeline logic

vital_sqi.calibration.sqi_runner.compute_sqi_distributions(segments, wave_type='PPG', sqi_names=None, sqi_arg_list=None, show_progress=True, n_jobs=1)[source]

Compute SQIs for every segment and return a DataFrame of raw values.

Parameters:
  • segments (list of tuple) – Each element is (signal_array: np.ndarray, fs: int) as produced by generate_clean_ppg() or generate_clean_ecg().

  • wave_type (str, optional) – 'PPG' or 'ECG' (default 'PPG').

  • sqi_names (list of str, optional) – Subset of SQI names to compute. Defaults to all keys in DEFAULT_SQI_ARG_LIST.

  • sqi_arg_list (dict, optional) – Custom argument dict keyed by SQI name. Any key not present falls back to DEFAULT_SQI_ARG_LIST.

  • show_progress (bool) – Show tqdm progress bar (default True).

  • n_jobs (int, optional) – Number of parallel workers for joblib. 1 (default) runs sequentially; -1 uses all available CPU cores.

Returns:

One row per segment, one column per SQI output. Multi-value SQIs (poincare) produce multiple columns. Failed/errored SQIs are NaN.

Return type:

pd.DataFrame

Threshold estimator

Derive accept/reject thresholds from empirical SQI distributions.

The core algorithm

  1. accept_df — SQI values from clean signals (noise_floor ≈ 0)

  2. reject_df — SQI values from heavily degraded signals

For each SQI column:

lower = percentile(accept_df[col], lower_pct) # e.g. p5 upper = percentile(accept_df[col], upper_pct) # e.g. p95

The accept region is the open interval (lower, upper).

Edge cases

  • All-NaN column → SQI is skipped (not calibratable)

  • Constant column → warns and widens the band by a small epsilon

  • Accept and reject → the reject distribution is stored for reference distributions overlap but does not change the thresholds (clean-signal

    distribution is authoritative)

  • Very small range → epsilon guard prevents a zero-width rule

class vital_sqi.calibration.threshold_estimator.SQIThreshold(sqi_name, lower=nan, upper=nan, accept_median=nan, accept_std=nan, reject_median=nan, n_accept=0, n_reject=0, calibrated=False, note='')[source]

Bases: object

Calibrated threshold for a single SQI column.

Parameters:
sqi_name

Column name as it appears in the SQI DataFrame.

Type:

str

lower

Lower bound of the accept region (exclusive, > operator).

Type:

float

upper

Upper bound of the accept region (exclusive, < operator).

Type:

float

accept_median

Median of the accept (clean) distribution.

Type:

float

accept_std

Standard deviation of the accept distribution.

Type:

float

reject_median

Median of the reject distribution (NaN if not available).

Type:

float or None

n_accept

Number of valid (non-NaN) accept samples used.

Type:

int

n_reject

Number of valid (non-NaN) reject samples used.

Type:

int

calibrated

False if the SQI could not be calibrated (all-NaN, constant, etc.).

Type:

bool

note

Human-readable note about any special handling applied.

Type:

str

accept_median: float = nan
accept_std: float = nan
calibrated: bool = False
lower: float = nan
n_accept: int = 0
n_reject: int = 0
note: str = ''
reject_median: float = nan
sqi_name: str
upper: float = nan
vital_sqi.calibration.threshold_estimator.estimate_thresholds(accept_df, reject_df=None, lower_pct=5.0, upper_pct=95.0)[source]

Derive accept/reject bounds for every SQI column.

Parameters:
  • accept_df (pd.DataFrame) – SQI values from clean segments. One row per segment, one column per SQI. All-NaN columns are skipped.

  • reject_df (pd.DataFrame, optional) – SQI values from degraded segments. Used only for diagnostics (the reject distribution does not change the threshold values).

  • lower_pct (float) – Lower percentile of the accept distribution that defines the accept lower bound (default 5).

  • upper_pct (float) – Upper percentile defining the accept upper bound (default 95).

Returns:

Mapping of sqi_name SQIThreshold. Only calibratable SQIs are included.

Return type:

dict

vital_sqi.calibration.threshold_estimator.thresholds_to_dataframe(thresholds)[source]

Convert a thresholds dict to a summary DataFrame for inspection.

Parameters:

thresholds (dict) – Output of estimate_thresholds().

Returns:

One row per SQI with columns: sqi_name, lower, upper, accept_median, accept_std, reject_median, n_accept, n_reject, calibrated, note.

Return type:

pd.DataFrame

Exporter

Export calibrated thresholds to rule_dict.json and sqi_dict.json.

rule_dict format (one entry per SQI)

Each entry uses the paired operator structure required by the rule engine:

{
“sqi_name”: {

“name”: “sqi_name”, “def”: [

{“op”: “>”, “value”: “<lower>”, “label”: “accept”}, {“op”: “<=”, “value”: “<lower>”, “label”: “reject”}, {“op”: “>=”, “value”: “<upper>”, “label”: “reject”}, {“op”: “<”, “value”: “<upper>”, “label”: “accept”}

], “desc”: “Calibrated from synthetic <wave_type> signals. Accept: p<lo>-p<hi>.”, “ref”: “vital_sqi.calibration”

}

}

The exporter merges new calibrated entries INTO the existing rule_dict rather than replacing it, so manually curated entries (for SQIs the calibrator cannot reach) are preserved.

vital_sqi.calibration.exporter.export_diagnostics(thresholds, output_path)[source]

Write a human-readable CSV report of all calibration results.

Parameters:
Return type:

None

vital_sqi.calibration.exporter.export_rule_dict(thresholds, output_path, wave_type='PPG', lower_pct=5.0, upper_pct=95.0, backup=True)[source]

Write or update a rule_dict.json file with calibrated thresholds.

Existing entries whose SQI was NOT calibrated are preserved unchanged. Calibrated entries overwrite their previous counterparts.

Parameters:
  • thresholds (dict) – Output of estimate_thresholds(). Only entries with calibrated=True are written.

  • output_path (str) – Destination file path (created if absent).

  • wave_type (str) – Used in the desc field for documentation purposes.

  • lower_pct (float) – Percentiles used during estimation (for documentation).

  • upper_pct (float) – Percentiles used during estimation (for documentation).

  • backup (bool) – If True and the file already exists, a timestamped backup is made before overwriting (default True).

Return type:

None

vital_sqi.calibration.exporter.export_sqi_dict(thresholds, output_path, backup=True)[source]

Write a sqi_dict.json containing only successfully calibrated SQIs.

Entries for SQIs that could not be calibrated are excluded so the output template is safe to use directly in extract_sqi.

Parameters:
  • thresholds (dict) – Output of estimate_thresholds().

  • output_path (str) – Destination file path.

  • backup (bool) – If the file exists, make a timestamped backup before overwriting.

Return type:

None