chromatography: chromatographic trace postprocessing library
Code author: Peter Kraus
Includes functions to integrate a chromatographic trace, and post-process an integrated trace using calibration.
Functions
|
Chromatographic trace integration. |
|
Apply calibration to an integrated chromatographic trace. |
- dgpost.transform.chromatography.integrate_trace(time: Quantity, signal: Quantity, species: dict[str, dict], polyorder: int = 3, window: int = 7, prominence: float = 0.0001, threshold: float = 1.0, output: str = 'trace') dict[str, float]
Chromatographic trace integration.
Function which integrates peaks found in the chromatographic trace, which is itself defined as a set of
time, signal
arrays. The procedure is as follows:The
signal
is smoothed the Savigny-Golay filter, via thescipy.signal.savgol_filter()
. For this, the argumentspolyorder
andwindow
are used.Peak maxima are found using
scipy.signal.find_peaks()
. For this, the argumentprominence
is used, scaled bymax(abs(signal))
.Peak edges of every found peak are determined. The peak ends are either determined from the nearest minima, or from the nearest inflection point at which the gradient is below the
threshold
.Peak maxima are matched against known peaks, provided in the
species
argument. The peak is considered matching a species when its maximum is between the left and right limits defined inspecies
.A baseline is constructed by interpolating by copying the
signal
data and interpolating between the ends of all matched peaks. If consecutive peaks are found, the interpolation spans the whole domain.The baseline is subtracted from the signal and the peak areas are integrated using the
numpy.trapz()
function.The peak height is taken from the original
signal
data.
The format of the
species
specification, used for peak matching, is as follows:"{{ species_name }}" : l: pint.Quantity r: pint.Quantity
with the keys
"l"
and"r"
corresponding to the left and right limit for the maximum of the peak. The limits can be eitherpint.Quantity
, orstr
with the same dimensionality astime
, or afloat
in which case the units oftime
are assumed.- Parameters:
time – A
pint.Quantity
array object determining the X-axis of the trace. By default in seconds.signal – A
pint.Quantity
array object containing the Y-axis of the trace. By default dimensionless.species – A
dict[str, dict]
, where the keys are species names and the values define the left and right limits for matching the peak maximum.polyorder – An
int
defining the order of the polynomial for the Savigny-Golay filter. Defaults to 3. Thepolyorder
must be less thanwindow
.window – An
int
defining the smoothing window for the Savigny-Golay filter. Defaults to 7. Must be odd. Thepolyorder
must be less thanwindow
.prominence – A
float
used to calculate the prominence of the peaks insignal
by scaling themax(abs(signal))
. Used in the peak picking process. Defaults to 0.0001.threshold – A
float
used to find ends of peaks by comparing to the gradient ofsignal
at the nearest inflection points.output – A
str
prefix for the output namespace. The results are collated in thef"{output}->area
namespace for peak areas andf"{output}->height
namespace for peak height.
- Returns:
retvals – A dictionary containing the peak areas and peak heights of matched peaks stored in namespaced
pint.Quantities
.- Return type:
dict[str, dict[str, pint.Quantity]
- dgpost.transform.chromatography.apply_calibration(areas: Quantity, calibration: dict[str, dict], output: str = 'x') dict[str, float]
Apply calibration to an integrated chromatographic trace.
Function which applies calibration information, provided in a
dict
, to an integrated chromatographic trace. Elements in the calibrationdict
are treated as chemicals, matched against the chromatographic data using SMILES.The format of the
calibration
is as follows:"{{ species_name }}" : function: Literal["inverse", "linear"] m: float c: Optional[float]
Two calibration functions are provided. Either the output value
x
is calculated as $x = (A - c) / m$, i.e. an “inverse” relationship, or using $x = m times A + c$, a “linear” relationship. The offset $c$ is optional. Both $m$ and $c$ are internally converted topint.Quantity
, therefore they can be specified with uncertainty, but have to be annotated by appropriate units to convert the units of the peak areas to the desired output.- Parameters:
areas – A
dict
containing a namespace ofpint.Quantity
containing the integrated peak areas $A$, with their keys corresponding to chemicals.calibration – A
dict
containing the calibration information for processing the above peakareas
into the resultingpint.Quantity
.output – The
str
prefix for the output namespace.