Calibration (`vital_sqi.calibration`)

Automated derivation of SQI accept/reject thresholds. The calibration workflow generates clean synthetic signals, injects parameterised noise, computes SQIs over both pools, and writes a rule_dict.json and sqi_dict.json that can be loaded by the pipeline.

Top-level runner 

Top-level calibration experiment runner.

Usage (from repo root)

python -m vital_sqi.calibration.run_calibration –wave_type PPG python -m vital_sqi.calibration.run_calibration –wave_type ECG python -m vital_sqi.calibration.run_calibration –wave_type PPG –n_segments 500 –dry_run

What it does

Generate n_segments clean signals (noise_floor=0).
For every noise profile in NOISE_PROFILES that is labelled as clean, accumulate into the accept pool.
For every noise profile labelled as reject-level, generate n_segments degraded signals and accumulate into the reject pool.
Compute SQIs over both pools via compute_sqi_distributions().
Estimate p5/p95 thresholds from the accept pool.
Export rule_dict.json and sqi_dict.json to output_dir.
Write a diagnostics CSV alongside the outputs.

The script prints a summary table on completion showing which SQIs were calibrated and their derived thresholds.

vital_sqi.calibration.run_calibration.calibrate(wave_type='PPG', n_segments=200, n_reject_segments=50, duration=30.0, lower_pct=5.0, upper_pct=95.0, output_dir=None, dry_run=False, seed=42, show_progress=True)[source]

Run the full calibration experiment and (optionally) export results.

Parameters:

wave_type (str) – 'PPG' or 'ECG'.
n_segments (int) – Number of clean segments (and accept-pool segments per noise condition).
n_reject_segments (int) – Number of segments per reject-pool noise condition. Smaller than n_segments is fine — the reject pool is only used for diagnostics and overlap detection, not for the p5/p95 accept-band thresholds (default 50).
duration (float) – Segment duration in seconds.
lower_pct (float) – Lower percentile for accept band (default 5).
upper_pct (float) – Upper percentile for accept band (default 95).
output_dir (str, optional) – Directory to write rule_dict.json, sqi_dict.json, and the diagnostics CSV. Defaults to vital_sqi/resource/.
dry_run (bool) – If True, compute thresholds but do NOT write any files.
seed (int) – Random seed for reproducibility (default 42).
show_progress (bool) – Show tqdm progress bars.

Returns:

Calibrated thresholds as returned by estimate_thresholds().

Return type:

dict

Synthetic signal generation 

Generate clean synthetic ECG and PPG segments for calibration.

Each function returns a list of (signal_array, fs) tuples, one per segment. Segments vary heart rate across the physiological range so the SQI distribution covers realistic within-subject variability.

vital_sqi.calibration.signal_generator.generate_clean_ecg(n_segments=200, duration=30.0, sampling_rate=256, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]

Generate clean synthetic ECG segments.

Parameters:

n_segments (int) – Number of independent segments to generate.
duration (float) – Length of each segment in seconds.
sampling_rate (int) – Samples per second (vitalDSP ECG uses sfecg).
hr_range (tuple of int) – (min_hr, max_hr) in bpm.
noise_floor (float) – Additive noise amplitude (Anoise in vitalDSP units). Use 0.0 for clean signals.
rng (np.random.Generator, optional) – Random number generator for reproducibility.

Returns:

Each element is (signal_array: np.ndarray, fs: int).

Return type:

list of tuple

vital_sqi.calibration.signal_generator.generate_clean_ppg(n_segments=200, duration=30.0, sampling_rate=100, hr_range=(50, 110), noise_floor=0.0, rng=None)[source]

Generate clean synthetic PPG segments.

Parameters:

n_segments (int) – Number of independent segments to generate.
duration (float) – Length of each segment in seconds.
sampling_rate (int) – Samples per second.
hr_range (tuple of int) – (min_hr, max_hr) in bpm; each segment gets a randomly drawn HR.
noise_floor (float) – Baseline noise amplitude (fraction of peak-to-peak). Use 0.0 for truly clean signals.
rng (np.random.Generator, optional) – Random number generator for reproducibility.

Returns:

Each element is (signal_array: np.ndarray, fs: int).

Return type:

list of tuple

Noise injection 

Noise injection for calibration experiments.

Each noise function takes a clean signal array and returns a degraded copy. All amplitudes are expressed as a fraction of the signal’s peak-to-peak range so they scale correctly regardless of the raw signal magnitude.

Available noise types

gaussian : additive white Gaussian noise
baseline_wander : slow sinusoidal drift (0.05–0.5 Hz)
motion : burst-style motion artifacts (random amplitude spikes)
harmonic : sum of sine/cosine harmonics at multiples of a base freq
powerline : 50 Hz or 60 Hz sinusoidal interference
polynomial : low-degree polynomial trend added across the segment
impulse : random isolated spike impulses
colored : pink (1/f) or brown (1/f²) noise

NOISE_PROFILES is a list of (name, noise_fn, amplitude) triples that define the experiments run by the calibration pipeline.

vital_sqi.calibration.noise_injector.CLEAN_PROFILE_LABELS = {'clean', 'gaussian_very_mild'}: Profiles considered “clean enough” to contribute to the accept distribution. Any profile NOT in this set contributes to the reject distribution.

vital_sqi.calibration.noise_injector.NOISE_PROFILES = [('clean', 'gaussian', 0.0), ('gaussian_very_mild', 'gaussian', 0.02), ('gaussian_mild', 'gaussian', 0.05), ('gaussian_moderate', 'gaussian', 0.15), ('gaussian_severe', 'gaussian', 0.4), ('gaussian_extreme', 'gaussian', 0.8), ('bw_mild', 'baseline_wander', 0.05), ('bw_moderate', 'baseline_wander', 0.2), ('bw_severe', 'baseline_wander', 0.5), ('motion_mild', 'motion', 0.1), ('motion_moderate', 'motion', 0.4), ('motion_severe', 'motion', 0.8), ('harmonic_mild', 'harmonic', 0.05), ('harmonic_moderate', 'harmonic', 0.2), ('harmonic_severe', 'harmonic', 0.5), ('powerline_mild', 'powerline', 0.05), ('powerline_severe', 'powerline', 0.3), ('poly_mild', 'polynomial', 0.1), ('poly_severe', 'polynomial', 0.5), ('impulse_mild', 'impulse', 0.05), ('impulse_severe', 'impulse', 0.3), ('pink_mild', 'pink', 0.05), ('pink_severe', 'pink', 0.3), ('brown_mild', 'brown', 0.05), ('brown_severe', 'brown', 0.3), ('combined_mild', 'combined_mild', 0.1), ('combined_severe', 'combined_severe', 0.5), ('shift_mild', 'time_shift', 0.05), ('shift_severe', 'time_shift', 0.2), ('drift_mild', 'clock_drift', 0.02), ('drift_severe', 'clock_drift', 0.05), ('dropout_mild', 'dropout', 0.02), ('dropout_severe', 'dropout', 0.1)]: Full set of noise conditions used in calibration experiments. Each entry is (label, noise_type, amplitude) where amplitude is a fraction of the signal peak-to-peak range.

vital_sqi.calibration.noise_injector.baseline_wander(signal, amplitude, rng, fs=100)[source]

Slow sinusoidal drift at a random frequency in [0.05, 0.5] Hz, mimicking respiration-induced baseline wander.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
fs (int)

Return type:

ndarray

vital_sqi.calibration.noise_injector.clock_drift(signal, amplitude, rng, fs=100)[source]

Simulate clock drift by resampling the signal at a perturbed rate.

The effective sampling rate is stretched or compressed by amplitude fraction (e.g. amplitude=0.02 → ±2% clock error). The signal is resampled to a perturbed length and then trimmed / zero-padded back to the original length, simulating what happens when data is labelled with the wrong fs.

Parameters:

signal (np.ndarray)
amplitude (float) – Maximum fractional clock error (e.g. 0.02 = ±2%).
rng (np.random.Generator)
fs (int (unused — kept for API uniformity))

Return type:

ndarray

vital_sqi.calibration.noise_injector.colored_noise(signal, amplitude, rng, exponent=1.0)[source]

Colored noise with power spectrum ∝ 1/f^exponent.

exponent=1 → pink noise, exponent=2 → brown noise. Generated via FFT shaping of white noise.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
exponent (float)

Return type:

ndarray

vital_sqi.calibration.noise_injector.gaussian_noise(signal, amplitude, rng)[source]

Additive white Gaussian noise scaled to amplitude × peak-to-peak.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)

Return type:

ndarray

vital_sqi.calibration.noise_injector.harmonic_interference(signal, amplitude, rng, fs=100)[source]

Sum of sine and cosine harmonics at random base frequency [1–15 Hz] with up to 5 harmonics. Mimics periodic electrical interference or muscle noise.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
fs (int)

Return type:

ndarray

vital_sqi.calibration.noise_injector.impulse_noise(signal, amplitude, rng, rate=0.02)[source]

Random isolated spike impulses at rate fraction of samples. Positive and negative spikes with equal probability.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
rate (float)

Return type:

ndarray

vital_sqi.calibration.noise_injector.inject_noise(signal, noise_type, amplitude, rng, fs=100)[source]

Apply a named noise type to a clean signal.

Parameters:

signal (np.ndarray) – Clean input signal.
noise_type (str) – One of the keys in NOISE_PROFILES or a raw noise name: 'gaussian', 'baseline_wander', 'motion', 'harmonic', 'powerline', 'polynomial', 'impulse', 'pink', 'brown', 'combined_mild', 'combined_severe'.
amplitude (float) – Noise amplitude as a fraction of signal peak-to-peak.
rng (np.random.Generator) – Random number generator.
fs (int) – Sampling frequency in Hz (required by time-dependent noise types).

Returns:

Degraded signal (same length as input).

Return type:

np.ndarray

vital_sqi.calibration.noise_injector.motion_artifact(signal, amplitude, rng, fs=100, n_bursts=3)[source]

Random burst artifacts: short high-amplitude segments injected at random positions, mimicking movement in wearable recordings.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
fs (int)
n_bursts (int)

Return type:

ndarray

vital_sqi.calibration.noise_injector.polynomial_trend(signal, amplitude, rng)[source]

Polynomial trend of degree 2–4 added across the segment. Mimics slow non-stationary drift or temperature effects.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)

Return type:

ndarray

vital_sqi.calibration.noise_injector.powerline_noise(signal, amplitude, rng, fs=100)[source]

50 Hz or 60 Hz sinusoidal powerline interference with a random phase.

Parameters:

signal (ndarray)
amplitude (float)
rng (Generator)
fs (int)

Return type:

ndarray

vital_sqi.calibration.noise_injector.sample_dropout(signal, amplitude, rng, fs=100)[source]

Randomly drop and duplicate samples to simulate packet loss and jitter.

For each dropped sample, a neighbouring sample is duplicated to keep the signal length constant. amplitude controls the fraction of samples affected (e.g. 0.05 → 5% dropout rate).

Parameters:

signal (np.ndarray)
amplitude (float) – Fraction of samples to drop (and replace by duplication).
rng (np.random.Generator)
fs (int (unused))

Return type:

ndarray

vital_sqi.calibration.noise_injector.time_shift(signal, amplitude, rng, fs=100)[source]

Circularly shift the signal by a random number of samples.

Simulates late electrode attachment, trigger jitter, or sync misalignment. The shift magnitude is amplitude * fs samples (e.g. amplitude=0.05 at fs=100 means up to 5-sample / 50 ms shift). Circular padding preserves signal length without introducing zero-padding edge artifacts.

Parameters:

signal (np.ndarray)
amplitude (float) – Maximum shift as a fraction of fs (in seconds).
rng (np.random.Generator)
fs (int)

Return type:

ndarray

Batch SQI computation 

Batch SQI computation over synthetic signal segments.

compute_sqi_distributions is the main entry point. It takes a list of (signal, fs) tuples, wraps each into the DataFrame format expected by extract_segment_sqi, runs all configured SQIs, and returns the raw per-segment SQI values as a DataFrame — one row per segment.

The function handles three special cases transparently: - dict-returning SQIs (e.g. poincare_sqi) are flattened to multiple columns - SQIs that raise exceptions emit NaN for that segment (logged, not re-raised) - nn_intervals-based SQIs are detected and handled by the existing pipeline logic

vital_sqi.calibration.sqi_runner.compute_sqi_distributions(segments, wave_type='PPG', sqi_names=None, sqi_arg_list=None, show_progress=True, n_jobs=1)[source]

Compute SQIs for every segment and return a DataFrame of raw values.

Parameters:

segments (list of tuple) – Each element is (signal_array: np.ndarray, fs: int) as produced by generate_clean_ppg() or generate_clean_ecg().
wave_type (str, optional) – 'PPG' or 'ECG' (default 'PPG').
sqi_names (list of str, optional) – Subset of SQI names to compute. Defaults to all keys in DEFAULT_SQI_ARG_LIST.
sqi_arg_list (dict, optional) – Custom argument dict keyed by SQI name. Any key not present falls back to DEFAULT_SQI_ARG_LIST.
show_progress (bool) – Show tqdm progress bar (default True).
n_jobs (int, optional) – Number of parallel workers for joblib. 1 (default) runs sequentially; -1 uses all available CPU cores.

Returns:

One row per segment, one column per SQI output. Multi-value SQIs (poincare) produce multiple columns. Failed/errored SQIs are NaN.

Return type:

pd.DataFrame

Threshold estimator 

Derive accept/reject thresholds from empirical SQI distributions.

The core algorithm

accept_df — SQI values from clean signals (noise_floor ≈ 0)
reject_df — SQI values from heavily degraded signals

For each SQI column:

lower = percentile(accept_df[col], lower_pct) # e.g. p5 upper = percentile(accept_df[col], upper_pct) # e.g. p95

The accept region is the open interval (lower, upper).

Edge cases

All-NaN column → SQI is skipped (not calibratable)
Constant column → warns and widens the band by a small epsilon
Accept and reject → the reject distribution is stored for reference distributions overlap but does not change the thresholds (clean-signal

distribution is authoritative)
Very small range → epsilon guard prevents a zero-width rule

class vital_sqi.calibration.threshold_estimator.SQIThreshold(sqi_name, lower=nan, upper=nan, accept_median=nan, accept_std=nan, reject_median=nan, n_accept=0, n_reject=0, calibrated=False, note='')[source]

Bases: object

Calibrated threshold for a single SQI column.

Parameters:

sqi_name (str)
lower (float)
upper (float)
accept_median (float)
accept_std (float)
reject_median (float)
n_accept (int)
n_reject (int)
calibrated (bool)
note (str)

sqi_name

Column name as it appears in the SQI DataFrame.

Type:: str

lower

Lower bound of the accept region (exclusive, > operator).

Type:: float

upper

Upper bound of the accept region (exclusive, < operator).

Type:: float

accept_median

Median of the accept (clean) distribution.

Type:: float

accept_std

Standard deviation of the accept distribution.

Type:: float

reject_median

Median of the reject distribution (NaN if not available).

Type:: float or None

n_accept

Number of valid (non-NaN) accept samples used.

Type:: int

n_reject

Number of valid (non-NaN) reject samples used.

Type:: int

calibrated

False if the SQI could not be calibrated (all-NaN, constant, etc.).

Type:: bool

note

Human-readable note about any special handling applied.

Type:: str

accept_median: float = nan

accept_std: float = nan

calibrated: bool = False

lower: float = nan

n_accept: int = 0

n_reject: int = 0

note: str = ''

reject_median: float = nan

sqi_name: str

upper: float = nan

vital_sqi.calibration.threshold_estimator.estimate_thresholds(accept_df, reject_df=None, lower_pct=5.0, upper_pct=95.0)[source]

Derive accept/reject bounds for every SQI column.

Parameters:

accept_df (pd.DataFrame) – SQI values from clean segments. One row per segment, one column per SQI. All-NaN columns are skipped.
reject_df (pd.DataFrame, optional) – SQI values from degraded segments. Used only for diagnostics (the reject distribution does not change the threshold values).
lower_pct (float) – Lower percentile of the accept distribution that defines the accept lower bound (default 5).
upper_pct (float) – Upper percentile defining the accept upper bound (default 95).

Returns:

Mapping of sqi_name → SQIThreshold. Only calibratable SQIs are included.

Return type:

dict

vital_sqi.calibration.threshold_estimator.thresholds_to_dataframe(thresholds)[source]

Convert a thresholds dict to a summary DataFrame for inspection.

Parameters:: thresholds (dict) – Output of estimate_thresholds().
Returns:: One row per SQI with columns: sqi_name, lower, upper, accept_median, accept_std, reject_median, n_accept, n_reject, calibrated, note.
Return type:: pd.DataFrame

Exporter 

Export calibrated thresholds to rule_dict.json and sqi_dict.json.

rule_dict format (one entry per SQI)

Each entry uses the paired operator structure required by the rule engine:

{

“sqi_name”: {
“name”: “sqi_name”, “def”: [

{“op”: “>”, “value”: “<lower>”, “label”: “accept”}, {“op”: “<=”, “value”: “<lower>”, “label”: “reject”}, {“op”: “>=”, “value”: “<upper>”, “label”: “reject”}, {“op”: “<”, “value”: “<upper>”, “label”: “accept”}

], “desc”: “Calibrated from synthetic <wave_type> signals. Accept: p<lo>-p<hi>.”, “ref”: “vital_sqi.calibration”

}

}

The exporter merges new calibrated entries INTO the existing rule_dict rather than replacing it, so manually curated entries (for SQIs the calibrator cannot reach) are preserved.

vital_sqi.calibration.exporter.export_diagnostics(thresholds, output_path)[source]

Write a human-readable CSV report of all calibration results.

Parameters:

thresholds (dict) – Output of estimate_thresholds().
output_path (str) – Destination CSV file path.

Return type:

None

vital_sqi.calibration.exporter.export_rule_dict(thresholds, output_path, wave_type='PPG', lower_pct=5.0, upper_pct=95.0, backup=True)[source]

Write or update a rule_dict.json file with calibrated thresholds.

Existing entries whose SQI was NOT calibrated are preserved unchanged. Calibrated entries overwrite their previous counterparts.

Parameters:

thresholds (dict) – Output of estimate_thresholds(). Only entries with calibrated=True are written.
output_path (str) – Destination file path (created if absent).
wave_type (str) – Used in the desc field for documentation purposes.
lower_pct (float) – Percentiles used during estimation (for documentation).
upper_pct (float) – Percentiles used during estimation (for documentation).
backup (bool) – If True and the file already exists, a timestamped backup is made before overwriting (default True).

Return type:

None

vital_sqi.calibration.exporter.export_sqi_dict(thresholds, output_path, backup=True)[source]

Write a sqi_dict.json containing only successfully calibrated SQIs.

Entries for SQIs that could not be calibrated are excluded so the output template is safe to use directly in extract_sqi.

Parameters:

thresholds (dict) – Output of estimate_thresholds().
output_path (str) – Destination file path.
backup (bool) – If the file exists, make a timestamped backup before overwriting.

Return type:

None

Calibration (vital_sqi.calibration)

Top-level runner

Usage (from repo root)

What it does

Synthetic signal generation

Noise injection

Available noise types

Batch SQI computation

Threshold estimator

The core algorithm

Edge cases

Exporter

rule_dict format (one entry per SQI)

Calibration (`vital_sqi.calibration`)

Top-level runner 

Synthetic signal generation 

Noise injection 

Batch SQI computation 

Threshold estimator 

Exporter 