chromatography: chromatographic trace postprocessing library

Code author: Peter Kraus

Includes functions to integrate a chromatographic trace, and post-process an integrated trace using calibration.

Functions

integrate_trace(time, signal, species[, ...])

Chromatographic trace integration.

apply_calibration(areas, calibration[, output])

Apply calibration to an integrated chromatographic trace.

dgpost.transform.chromatography.integrate_trace(time, signal, species, polyorder=3, window=7, prominence=0.0001, threshold=1.0, output='trace')

Chromatographic trace integration.

Function which integrates peaks found in the chromatographic trace, which is itself defined as a set of time, signal arrays. The procedure is as follows:

  1. The signal is smoothed the Savigny-Golay filter, via the scipy.signal.savgol_filter(). For this, the arguments polyorder and window are used.

  2. Peak maxima are found using scipy.signal.find_peaks(). For this, the argument prominence is used, scaled by max(abs(signal)).

  3. Peak edges of every found peak are determined. The peak ends are either determined from the nearest minima, or from the nearest inflection point at which the gradient is below the threshold.

  4. Peak maxima are matched against known peaks, provided in the species argument. The peak is considered matching a species when its maximum is between the left and right limits defined in species.

  5. A baseline is constructed by interpolating by copying the signal data and interpolating between the ends of all matched peaks. If consecutive peaks are found, the interpolation spans the whole domain.

  6. The baseline is subtracted from the signal and the peak areas are integrated using the numpy.trapz() function.

  7. The peak height is taken from the original signal data.

The format of the species specification, used for peak matching, is as follows:

"{{ species_name }}" :
    l:  pint.Quantity
    r:  pint.Quantity

with the keys "l" and "r" corresponding to the left and right limit for the maximum of the peak. The limits can be either pint.Quantity, or str with the same dimensionality as time, or a float in which case the units of time are assumed.

Parameters:
  • time (Quantity) – A pint.Quantity array object determining the X-axis of the trace. By default in seconds.

  • signal (Quantity) – A pint.Quantity array object containing the Y-axis of the trace. By default dimensionless.

  • species (dict[str, dict]) – A dict[str, dict], where the keys are species names and the values define the left and right limits for matching the peak maximum.

  • polyorder (int) – An int defining the order of the polynomial for the Savigny-Golay filter. Defaults to 3. The polyorder must be less than window.

  • window (int) – An int defining the smoothing window for the Savigny-Golay filter. Defaults to 7. Must be odd. The polyorder must be less than window.

  • prominence (float) – A float used to calculate the prominence of the peaks in signal by scaling the max(abs(signal)). Used in the peak picking process. Defaults to 0.0001.

  • threshold (float) – A float used to find ends of peaks by comparing to the gradient of signal at the nearest inflection points.

  • output (str) – A str prefix for the output namespace. The results are collated in the f"{output}->area namespace for peak areas and f"{output}->height namespace for peak height.

Returns:

retvals – A dictionary containing the peak areas and peak heights of matched peaks stored in namespaced pint.Quantities.

Return type:

dict[str, dict[str, pint.Quantity]

dgpost.transform.chromatography.apply_calibration(areas, calibration, output='x')

Apply calibration to an integrated chromatographic trace.

Function which applies calibration information, provided in a dict, to an integrated chromatographic trace. Elements in the calibration dict are treated as chemicals, matched against the chromatographic data using SMILES.

The format of the calibration is as follows:

"{{ species_name }}" :
    function: Literal["inverse", "linear"]
    m:  float
    c:  Optional[float]

Two calibration functions are provided. Either the output value x is calculated as $x = (A - c) / m$, i.e. an “inverse” relationship, or using $x = m times A + c$, a “linear” relationship. The offset $c$ is optional. Both $m$ and $c$ are internally converted to pint.Quantity, therefore they can be specified with uncertainty, but have to be annotated by appropriate units to convert the units of the peak areas to the desired output.

Parameters:
  • areas (Quantity) – A dict containing a namespace of pint.Quantity containing the integrated peak areas $A$, with their keys corresponding to chemicals.

  • calibration (dict[str, dict]) – A dict containing the calibration information for processing the above peak areas into the resulting pint.Quantity.

  • output (str) – The str prefix for the output namespace.

Return type:

dict[str, float]