Preprocess (vital_sqi.preprocess)

Signal preprocessing utilities: tapering and smoothing, removal of flat/constant regions, and time- or beat-based segmentation.

Tapering & smoothing

vital_sqi.preprocess.preprocess_signal.scale_pattern(s, window_size)[source]

Scales or resamples the signal to a specified window size for comparison with a template.

Parameters:
  • s (np.ndarray) – Input signal as a 1D array of floats.

  • window_size (int) – The desired size of the output signal.

Returns:

Resampled and smoothed signal to match the desired window size.

Return type:

np.ndarray

vital_sqi.preprocess.preprocess_signal.smooth_signal(s, window_len=5, window='flat')[source]

Smooths the signal using a specified window.

Parameters:
  • s (np.ndarray) – 1D signal array.

  • window_len (int, optional) – Size of the smoothing window, must be greater than 2.

  • window (str, optional) – Window type for smoothing. Options: ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’.

Returns:

Smoothed signal.

Return type:

np.ndarray

vital_sqi.preprocess.preprocess_signal.taper_signal(s, window=None, shift_min_to_zero=True)[source]

Applies a tapering window to the signal and optionally shifts the minimum value to zero.

Parameters:
  • s (np.ndarray) – Input signal as a 1D array of floats.

  • window (np.ndarray, optional) – Window shape to apply, defaults to Tukey window if None.

  • shift_min_to_zero (bool, optional) – If True, shifts the signal minimum value to zero.

Returns:

Tapered and optionally shifted signal.

Return type:

np.ndarray

Removal utilities

Signal Processing Utilities for Removing Noise and Interpolating Missing Data.

vital_sqi.preprocess.removal_utilities.get_start_end_points(start_cut_pivot, end_cut_pivot, length_df)[source]

Determines the start and end points for each retained signal segment.

Parameters:
  • start_cut_pivot (array-like) – Array of starting points of removed segments.

  • end_cut_pivot (array-like) – Array of corresponding ending points of removed segments.

  • length_df (int) – Length of the original signal.

Returns:

Arrays of start and end milestones for retained segments.

Return type:

tuple

vital_sqi.preprocess.removal_utilities.interpolate_signal(s, missing_index, missing_len, method='arima', lag_ratio=10)[source]

Interpolates missing signal segments using ARIMA.

Parameters:
  • s (pd.DataFrame) – Signal with first column as pd.Timestamp and second as float.

  • missing_index (list or array-like) – Starting indices of missing segments.

  • missing_len (list or array-like) – Lengths of missing segments corresponding to each starting index.

  • method (str, optional) – Interpolation method (only ‘arima’ supported, default).

  • lag_ratio (int, optional) – Multiplier for the ARIMA lag window size (default 10).

Returns:

Signal with interpolated segments.

Return type:

pd.DataFrame

vital_sqi.preprocess.removal_utilities.remove_invalid_smartcare(s, info, output_signal=True)[source]

Filters out invalid signal samples based on Smartcare oximeter data.

Parameters:
  • s (pd.DataFrame) – Signal with first column as pd.Timestamp and second as float.

  • info (pd.DataFrame) – Info containing “SPO2_PCT”, “PERFUSION_INDEX”, and “PULSE_BPM” columns.

  • output_signal (bool, optional) – If True, returns processed signal along with milestones.

Returns:

Processed signal (optional) and DataFrame of retained segment milestones.

Return type:

tuple

vital_sqi.preprocess.removal_utilities.remove_unchanged(s, sampling_rate, duration=10, output_signal=True)[source]

Removes flat (unchanged) segments of the signal considered as noise.

Parameters:
  • s (pd.DataFrame) – Signal with first column as pd.Timestamp and second as float.

  • sampling_rate (int or float) – Sampling rate of the signal.

  • duration (int or float, optional) – Duration of flat signal (default 10 seconds) to be considered as noise.

  • output_signal (bool, optional) – If True, returns processed signal along with milestones.

Returns:

Processed signal (optional) and DataFrame of retained segment milestones.

Return type:

tuple

vital_sqi.preprocess.removal_utilities.trim_signal(s, sampling_rate, duration_left=300, duration_right=300)[source]

Trims noise from the beginning and end of the signal.

Parameters:
  • s (pd.DataFrame) – Signal with first column as pd.Timestamp and second as float.

  • sampling_rate (int or float) – Sampling rate of the signal.

  • duration_left (int or float, optional) – Seconds to trim from the start (default 300).

  • duration_right (int or float, optional) – Seconds to trim from the end (default 300).

Returns:

Trimmed signal.

Return type:

pd.DataFrame

Segment splitter

Splitting long recordings into segments with optional overlapping: - By duration - By beat

vital_sqi.preprocess.segment_split.save_segment(segment_list, segment_name='segment', save_file_folder=None, save_image=False, save_img_folder=None)[source]

Saves segments of waveforms to .csv files and optionally plots them to image files.

Parameters:
  • segment_list (list) – List of segments (arrays or DataFrames).

  • segment_name (str, optional) – Base filename for saved files (default is “segment”).

  • save_file_folder (str, optional) – Directory to save .csv files (default is current working directory).

  • save_image (bool, optional) – If True, saves images of each segment (default is False).

  • save_img_folder (str, optional) – Directory to save image files (default is current working directory).

Return type:

None

vital_sqi.preprocess.segment_split.split_segment(s, sampling_rate, split_type=0, duration=30.0, overlapping=0, peak_detector=6, wave_type='PPG')[source]

Splits a long signal into segments based on time or beat, with optional overlap.

Parameters:
  • s (pd.DataFrame) – Signal data with timestamps as the first column and signal values as the second.

  • sampling_rate (float or int) – Sampling rate of the signal.

  • split_type (int, optional) – 0: split by time; 1: split by beat (default is 0).

  • duration (float, optional) – Segment length in seconds (if split_type=0) or in beats (if split_type=1, default is 30).

  • overlapping (float or int, optional) – Overlap in seconds (only used when split_type=0; ignored for beat-based split, default is 0).

  • peak_detector (int, optional) – Type of peak detector for beat-based segmentation, 1–7 (default is 6 — vitalDSP detector).

  • wave_type (str, optional) – Type of signal, either ‘PPG’ or ‘ECG’ (default is ‘PPG’).

Returns:

  • segments (list) – List of segmented DataFrames.

  • milestones (pd.DataFrame) – DataFrame containing start and end indices of each segment.

Examples

>>> from vital_sqi.common.utils import generate_timestamp
>>> s = np.arange(100000)
>>> timestamps = generate_timestamp(None, 100, len(s))
>>> df = pd.DataFrame({'time': timestamps, 'signal': s})
>>> segments, milestones = split_segment(df, sampling_rate=100, duration=5)