yadg.parsers.chromtrace package
List of supported file formats:
EZ-Chrom ASCII export (dat.asc)
yadg.parsers.chromtrace.ezchromasc
Agilent Chemstation Chromtab (csv)
yadg.parsers.chromtrace.agilentcsv
Agilent OpenLab binary signal (ch)
yadg.parsers.chromtrace.agilentch
Agilent OpenLab data archive (dx)
yadg.parsers.chromtrace.agilentdx
Fusion JSON format (json)
yadg.parsers.chromtrace.fusionjson
Submodules
yadg.parsers.chromtrace.agilentch module
File parser for Agilent OpenLab binary signal trace files (CH and IT).
Currently supports version “179” of the files. Version information is defined in the magic_values (parameters & metadata) and data_dtypes (data) dictionaries.
Adapted from ImportAgilent.m and aston.
Exposed metadata:
params:
method: !!str
sampleid: !!str
username: !!str
version: !!str
valve: None
datafile: None
File Structure of .ch
files
0x0000 "version magic"
0x0108 "data offset"
0x011a "x-axis minimum (ms)"
0x011e "x-axis maximum (ms)"
0x035a "sample ID"
0x0559 "description"
0x0758 "username"
0x0957 "timestamp"
0x09e5 "instrument name"
0x09bc "inlet"
0x0a0e "method"
0x104c "y-axis unit"
0x1075 "detector name"
0x1274 "y-axis intercept"
0x127c "y-axis slope"
Data is stored in a consecutive set of <f8
, starting at the offset (calculated
as offset = ("data offset" - 1) * 512
) until the end of the file.
- yadg.parsers.chromtrace.agilentch.process(fn, encoding, timezone)
Agilent OpenLAB signal trace parser
One chromatogram per file with a single trace. Binary data format.
- Parameters
fn (
str
) – Filename to process.encoding (
str
) – Not used as the file is binary.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns
([chrom], metadata) – Standard timesteps & metadata tuple.
- Return type
tuple[list, dict]
yadg.parsers.chromtrace.agilentcsv module
File parser for Agilent Chemstation Chromtab tabulated data files (csv).
This file format may include more than one timestep in each CSV file. It contains a header section for each timestep, followed by a detector name, and a sequence of [X, Y] datapoints.
Exposed metadata:
params:
method: None
sampleid: !!str
username: None
version: None
valve: None
datafile: !!str
Unfortunately, neither method
nor version
are exposed, which is a big weakness
of this file format.
- yadg.parsers.chromtrace.agilentcsv.process(fn, encoding, timezone)
Agilent Chemstation CSV (Chromtab) file parser
Each file may contain multiple chromatograms per file with multiple traces. Each chromatogram starts with a header section, and is followed by each trace, which includes a header line and x,y-data.
- Parameters
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns
(chroms, metadata) – Standard timesteps & metadata tuple.
- Return type
tuple[list, dict]
yadg.parsers.chromtrace.agilentdx module
File parser for Agilent OpenLab data archive files (DX).
This is a wrapper parser which unzips the provided DX file, and then uses the
yadg.parsers.chromtrace.agilentch
parser to parse every CH file present in
the archive. The IT files in the archive are currently ignored.
Exposed metadata:
params:
method: !!str
sampleid: !!str
username: !!str
version: !!str
valve: None
datafile: !!str
In addition to the metadata exposed by the CH parser, the datafile
entry
is populated with the corresponding name of the CH file. The fn
entry in each
timestep contains the parent DX file.
Note
Currently the timesteps from multiple CH files (if present) are appended in the timesteps array without any further sorting.
- yadg.parsers.chromtrace.agilentdx.process(fn, encoding, timezone)
Agilent OpenLab DX archive parser.
This is a simple wrapper around the Agilent OpenLab signal trace parser in
yadg.parsers.chromtrace.agilentch
. This wrapper first un-zips the DX file into a temporary directory, and then processess all CH files found within the archive, concatenating timesteps from multiple files.- Parameters
fn (
str
) – Filename to process.encoding (
str
) – Not used as the file is binary.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns
(chroms, metadata) – Standard timesteps & metadata tuple.
- Return type
tuple[list, dict]
yadg.parsers.chromtrace.ezchromasc module
File parser for EZ-Chrom ASCII export files (dat.asc).
This file format includes one timestep with multiple traces in each ASCII file. It contains a header section, and a sequence of Y datapoints for each detector. The X axis is uniform between traces, and its units have to be deduced from the header.
Exposed metadata:
params:
method: !!str
sampleid: !!str
username: !!str
version: !!str
valve: None
datafile: !!str
- yadg.parsers.chromtrace.ezchromasc.process(fn, encoding, timezone)
EZ-Chrome ASCII export file parser.
One chromatogram per file with multiple traces. A header section is followed by y-values for each trace. x-values have to be deduced using number of points, frequency, and x-multiplier. Method name is available, but detector names are not. They are assigned their numerical index in the file.
- Parameters
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns
([chrom], metadata) – Standard timesteps & metadata tuple.
- Return type
tuple[list, dict]
yadg.parsers.chromtrace.fusionjson module
File parser for Fusion json data format (json).
This is a fairly detailed data format, including the traces, the calibration applied,
and also the integrated peak areas. If the peak areas are present, this is returned
in the list of timesteps as a "peaks"
entry.
Note
The detectors in the trace data are not necessarily in a consistent order, which may change between different files. Hence, the keys are sorted.
Exposed metadata:
params:
method: !!str
sampleid: !!str
username: None
version: !!str
valve: !!int
datafile: !!str
- yadg.parsers.chromtrace.fusionjson.process(fn, encoding, timezone)
Fusion json format.
One chromatogram per file with multiple traces, and pre-analysed results. Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.
- Parameters
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns
([chrom], metadata) – Standard timesteps & metadata tuple.
- Return type
tuple[list, dict]
yadg.parsers.chromtrace.integration module
This module contains the yadg.parsers.chromtrace.integration.integrate_trace()
function, as well as several helper functions to smoothen, peak-pick, determine edges,
and integrate the supplied traces.
Smoothing
Smoothing can be optionally performed on the Y-values of each trace, using a Savigny-Golay filter. The default smoothing is performed using a cubic fit to a window length of 7; if the polyorder or the window length are not specified, smoothing is not used.
Peak-picking and edge-finding
Peak-picking is performed on the smoothed Y-data to find peaks, as well as on the mirror image of the data to find bands. Only peaks are further processed. Additionally, the 1st and 2nd derivatives of the Y-data are evaluated, and the zero-points are found using numpy routines.
The peak edges are taken as either the nearest minima adjacent to the peak maximum, or as the inflection points at which the gradient falls below a prescribed threshold, whichever is closest to the peak maximum.
Baseline correction
Using the determined peak-edges, the baseline is linearly interpolated in sections of Y-data which belong to a peak. The interpolation is performed using the raw (not smoothened) Y-data.
If multiple peaks are adjacent to each other without a gap, the interpolation begins at the left limit of the leftmost peak and continues uninterrupted to the right limit of the rightmost peak. The points which belong to the interpolated areas are assumed to have an uncertainty of zero.
The corrected baseline is then obtained by subtracting the interpolated baseline from the original raw (not smoothened) data.
Peak integration
Peak integration is performed on the corrected baseline and the matching X-data using
the trapezoidal method as implemented in np.trapz
.
- yadg.parsers.chromtrace.integration.integrate_trace(traces, chromspec)
Integration, calibration, and normalisation handling function. Used to process all chromatographic data for which a calibration has been provided
- Parameters
traces (
dict
) – A dictionary of trace data, with keys being the “raw” name of the detector, and the values containing the"id"
for specification matching, and a"data"
tuple containing the(xs, ys)
where each element is a pair of(np.ndarray)
with the nominal values and standard deviations.chromspec (
dict
) – Parsed calibration information, with keys being the detector names in the calibration file, and values containing the"id"
for detector matching,"peakdetect"
dictionary with peak-picking and edge-finding settings, and"species"
dictionary with names of species as keys and the left, right limits and calibration information as values.
- Returns
(peaks, xout) – A tuple containing a dictionary with the peak picking information (name, maximum, limits, height, area) as well as a dictionary containing the normalised molar fractions of the assigned and detected species.
- Return type
tuple[dict, dict]
yadg.parsers.chromtrace.main module
- yadg.parsers.chromtrace.main.parse_detector_spec(calfile=None, detectors=None, species=None)
Chromatography detector parser.
Combines the specification provided in
calfile
with that provided indetectors
andspecies
.The format of
calfile
is as follows:"{{ detector_name }}": # name of the detector id: !!int # ID of the detector used for matching prefer: !!bool # whether to prefer this detector for xout calc peakdetect: window: !!int # Savigny-Golay window_length = 2*window + 1 polyorder: !!int # Savigny-Golay polyorder prominence: !!float # peak picking prominence parameter threshold: !!float # peak edge detection threshold species: "{{ species_name }}": # name of the analyte l: !!float # peak picking left limit [s] r: !!float # peak picking right limit [s] calib: {} # calibration specification
Note
The syntax of the calibration specification is detailed in
yadg.dgutils.calib.calib_handler()
.The format of
detectors
is as follows:"{{ detector_name }}": # name of the detector id: !!int # ID of the detector used for matching prefer: !!bool # whether to prefer this detector for xout calc peakdetect: window: !!int # Savigny-Golay window_length = 2*window + 1 polyorder: !!int # Savigny-Golay polyorder prominence: !!float # peak picking prominence parameter threshold: !!float # peak edge detection threshold
The format of
species
is as follows:"{{ detector_name }}": # name of the detector species: "{{ species_name }}": # name of the analyte l: !!float # peak picking left limit [s] r: !!float # peak picking right limit [s] calib: !!calib # calibration specification
Note
The syntax of the calibration specification is detailed in
yadg.dgutils.calib.calib_handler()
.- Parameters
calfile (
Optional
[str
]) – A json file containing the calibration data in the format prescribed above.detectors (
Optional
[dict
]) – A dictionary containing the"id"
,"peakdetect"
and"prefer"
keys for each detector, as shown here.species (
Optional
[dict
]) – A dictionary containing the species names as keys and their specification as dictionaries, as shown here.
- Returns
calib – The combined calibration specification.
- Return type
dict
- yadg.parsers.chromtrace.main.process(fn, encoding='utf-8', timezone='localtime', tracetype='ezchrom.asc', detectors=None, species=None, calfile=None)
Unified chromatogram parser.
This parser processes GC and LC chromatograms in signal(time) format. When provided with a calibration file, this tool will integrate the trace, and provide the peak areas, retention times, and concentrations of the detected species.
- Parameters
fn (
str
) – The file containing the trace(s) to parse.encoding (
str
) – Encoding offn
, by default “utf-8”.timezone (
str
) – A string description of the timezone. Default is “localtime”.tracetype (
str
) –Determines the output file format. Currently supported formats are:
"ezchrom.asc"
(EZ-Chrom ASCII export),"agilent.csv"
(Agilent Chemstation chromtab csv format),"agilent.ch"
(Agilent OpenLab binary signal file),"agilent.dx"
(Agilent OpenLab binary data archive),"fusion.json"
(Fusion json file),
The default is
"ezchrom.asc"
.detectors (
Optional
[dict
]) – Detector specification. Matches and identifies a trace in the fn file. If provided, overrides data provided incalfile
, below.species (
Optional
[dict
]) – Species specification. Per-detector species can be listed here, providing an expected retention time range for the peak maximum. Additionally, calibration data can be supplied here. Overrides data provided incalfile
, below.calfile (
Optional
[str
]) – Path to a json file containing thedetectors
andspecies
spec. Eithercalfile
and/orspecies
anddetectors
have to be provided.
- Returns
(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag. All currently supported file formats return full date.
- Return type
tuple[list, dict, bool]