yadg.parsers.chromtrace package

List of supported file formats:

Submodules

yadg.parsers.chromtrace.agilentch module

File parser for Agilent OpenLab binary signal trace files (CH and IT).

Currently supports version “179” of the files. Version information is defined in the magic_values (parameters & metadata) and data_dtypes (data) dictionaries.

Adapted from ImportAgilent.m and aston.

Exposed metadata:

params:
  method:   !!str
  sampleid: !!str
  username: !!str
  version:  !!str
  valve:    None
  datafile: None

File Structure of .ch files

0x0000 "version magic"
0x0108 "data offset"
0x011a "x-axis minimum (ms)"
0x011e "x-axis maximum (ms)"
0x035a "sample ID"
0x0559 "description"
0x0758 "username"
0x0957 "timestamp"
0x09e5 "instrument name"
0x09bc "inlet"
0x0a0e "method"
0x104c "y-axis unit"
0x1075 "detector name"
0x1274 "y-axis intercept"
0x127c "y-axis slope"

Data is stored in a consecutive set of <f8, starting at the offset (calculated as offset =  ("data offset" - 1) * 512) until the end of the file.

yadg.parsers.chromtrace.agilentch.process(fn, encoding, timezone)

Agilent OpenLAB signal trace parser

One chromatogram per file with a single trace. Binary data format.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Not used as the file is binary.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromtrace.agilentcsv module

File parser for Agilent Chemstation Chromtab tabulated data files (csv).

This file format may include more than one timestep in each CSV file. It contains a header section for each timestep, followed by a detector name, and a sequence of [X, Y] datapoints.

Exposed metadata:

params:
  method:   None
  sampleid: !!str
  username: None
  version:  None
  valve:    None
  datafile: !!str

Unfortunately, neither method nor version are exposed, which is a big weakness of this file format.

yadg.parsers.chromtrace.agilentcsv.process(fn, encoding, timezone)

Agilent Chemstation CSV (Chromtab) file parser

Each file may contain multiple chromatograms per file with multiple traces. Each chromatogram starts with a header section, and is followed by each trace, which includes a header line and x,y-data.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

(chroms, metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromtrace.agilentdx module

File parser for Agilent OpenLab data archive files (DX).

This is a wrapper parser which unzips the provided DX file, and then uses the yadg.parsers.chromtrace.agilentch parser to parse every CH file present in the archive. The IT files in the archive are currently ignored.

Exposed metadata:

params:
  method:   !!str
  sampleid: !!str
  username: !!str
  version:  !!str
  valve:    None
  datafile: !!str

In addition to the metadata exposed by the CH parser, the datafile entry is populated with the corresponding name of the CH file. The fn entry in each timestep contains the parent DX file.

Note

Currently the timesteps from multiple CH files (if present) are appended in the timesteps array without any further sorting.

yadg.parsers.chromtrace.agilentdx.process(fn, encoding, timezone)

Agilent OpenLab DX archive parser.

This is a simple wrapper around the Agilent OpenLab signal trace parser in yadg.parsers.chromtrace.agilentch. This wrapper first un-zips the DX file into a temporary directory, and then processess all CH files found within the archive, concatenating timesteps from multiple files.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Not used as the file is binary.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

(chroms, metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromtrace.ezchromasc module

File parser for EZ-Chrom ASCII export files (dat.asc).

This file format includes one timestep with multiple traces in each ASCII file. It contains a header section, and a sequence of Y datapoints for each detector. The X axis is uniform between traces, and its units have to be deduced from the header.

Exposed metadata:

params:
  method:   !!str
  sampleid: !!str
  username: !!str
  version:  !!str
  valve:    None
  datafile: !!str
yadg.parsers.chromtrace.ezchromasc.process(fn, encoding, timezone)

EZ-Chrome ASCII export file parser.

One chromatogram per file with multiple traces. A header section is followed by y-values for each trace. x-values have to be deduced using number of points, frequency, and x-multiplier. Method name is available, but detector names are not. They are assigned their numerical index in the file.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromtrace.fusionjson module

File parser for Fusion json data format (json).

This is a fairly detailed data format, including the traces, the calibration applied, and also the integrated peak areas. If the peak areas are present, this is returned in the list of timesteps as a "peaks" entry.

Note

The detectors in the trace data are not necessarily in a consistent order, which may change between different files. Hence, the keys are sorted.

Exposed metadata:

params:
  method:   !!str
  sampleid: !!str
  username: None
  version:  !!str
  valve:    !!int
  datafile: !!str
yadg.parsers.chromtrace.fusionjson.process(fn, encoding, timezone)

Fusion json format.

One chromatogram per file with multiple traces, and pre-analysed results. Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromtrace.integration module

This module contains the yadg.parsers.chromtrace.integration.integrate_trace() function, as well as several helper functions to smoothen, peak-pick, determine edges, and integrate the supplied traces.

Smoothing

Smoothing can be optionally performed on the Y-values of each trace, using a Savigny-Golay filter. The default smoothing is performed using a cubic fit to a window length of 7; if the polyorder or the window length are not specified, smoothing is not used.

Peak-picking and edge-finding

Peak-picking is performed on the smoothed Y-data to find peaks, as well as on the mirror image of the data to find bands. Only peaks are further processed. Additionally, the 1st and 2nd derivatives of the Y-data are evaluated, and the zero-points are found using numpy routines.

The peak edges are taken as either the nearest minima adjacent to the peak maximum, or as the inflection points at which the gradient falls below a prescribed threshold, whichever is closest to the peak maximum.

Baseline correction

Using the determined peak-edges, the baseline is linearly interpolated in sections of Y-data which belong to a peak. The interpolation is performed using the raw (not smoothened) Y-data.

If multiple peaks are adjacent to each other without a gap, the interpolation begins at the left limit of the leftmost peak and continues uninterrupted to the right limit of the rightmost peak. The points which belong to the interpolated areas are assumed to have an uncertainty of zero.

The corrected baseline is then obtained by subtracting the interpolated baseline from the original raw (not smoothened) data.

Peak integration

Peak integration is performed on the corrected baseline and the matching X-data using the trapezoidal method as implemented in np.trapz.

yadg.parsers.chromtrace.integration.integrate_trace(traces, chromspec)

Integration, calibration, and normalisation handling function. Used to process all chromatographic data for which a calibration has been provided

Parameters
  • traces (dict) – A dictionary of trace data, with keys being the “raw” name of the detector, and the values containing the "id" for specification matching, and a "data" tuple containing the (xs, ys) where each element is a pair of (np.ndarray) with the nominal values and standard deviations.

  • chromspec (dict) – Parsed calibration information, with keys being the detector names in the calibration file, and values containing the "id" for detector matching, "peakdetect" dictionary with peak-picking and edge-finding settings, and "species" dictionary with names of species as keys and the left, right limits and calibration information as values.

Returns

(peaks, xout) – A tuple containing a dictionary with the peak picking information (name, maximum, limits, height, area) as well as a dictionary containing the normalised molar fractions of the assigned and detected species.

Return type

tuple[dict, dict]

yadg.parsers.chromtrace.main module

yadg.parsers.chromtrace.main.parse_detector_spec(calfile=None, detectors=None, species=None)

Chromatography detector parser.

Combines the specification provided in calfile with that provided in detectors and species.

The format of calfile is as follows:

"{{ detector_name }}":    # name of the detector
  id:           !!int     # ID of the detector used for matching
  prefer:       !!bool    # whether to prefer this detector for xout calc
  peakdetect:
    window:     !!int     # Savigny-Golay window_length = 2*window + 1
    polyorder:  !!int     # Savigny-Golay polyorder
    prominence: !!float   # peak picking prominence parameter
    threshold:  !!float   # peak edge detection threshold
  species:
    "{{ species_name }}": # name of the analyte
      l:        !!float   # peak picking left limit [s]
      r:        !!float   # peak picking right limit [s]
      calib:    {}        # calibration specification

Note

The syntax of the calibration specification is detailed in yadg.dgutils.calib.calib_handler().

The format of detectors is as follows:

"{{ detector_name }}":  # name of the detector
  id:           !!int   # ID of the detector used for matching
  prefer:       !!bool  # whether to prefer this detector for xout calc
  peakdetect:
    window:     !!int   # Savigny-Golay window_length = 2*window + 1
    polyorder:  !!int   # Savigny-Golay polyorder
    prominence: !!float # peak picking prominence parameter
    threshold:  !!float # peak edge detection threshold

The format of species is as follows:

"{{ detector_name }}":    # name of the detector
  species:
    "{{ species_name }}": # name of the analyte
      l:        !!float   # peak picking left limit [s]
      r:        !!float   # peak picking right limit [s]
      calib:    !!calib   # calibration specification

Note

The syntax of the calibration specification is detailed in yadg.dgutils.calib.calib_handler().

Parameters
  • calfile (Optional[str]) – A json file containing the calibration data in the format prescribed above.

  • detectors (Optional[dict]) – A dictionary containing the "id", "peakdetect" and "prefer" keys for each detector, as shown here.

  • species (Optional[dict]) – A dictionary containing the species names as keys and their specification as dictionaries, as shown here.

Returns

calib – The combined calibration specification.

Return type

dict

yadg.parsers.chromtrace.main.process(fn, encoding='utf-8', timezone='localtime', tracetype='ezchrom.asc', detectors=None, species=None, calfile=None)

Unified chromatogram parser.

This parser processes GC and LC chromatograms in signal(time) format. When provided with a calibration file, this tool will integrate the trace, and provide the peak areas, retention times, and concentrations of the detected species.

Parameters
  • fn (str) – The file containing the trace(s) to parse.

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

  • tracetype (str) –

    Determines the output file format. Currently supported formats are:

    • "ezchrom.asc" (EZ-Chrom ASCII export),

    • "agilent.csv" (Agilent Chemstation chromtab csv format),

    • "agilent.ch" (Agilent OpenLab binary signal file),

    • "agilent.dx" (Agilent OpenLab binary data archive),

    • "fusion.json" (Fusion json file),

    The default is "ezchrom.asc".

  • detectors (Optional[dict]) – Detector specification. Matches and identifies a trace in the fn file. If provided, overrides data provided in calfile, below.

  • species (Optional[dict]) – Species specification. Per-detector species can be listed here, providing an expected retention time range for the peak maximum. Additionally, calibration data can be supplied here. Overrides data provided in calfile, below.

  • calfile (Optional[str]) – Path to a json file containing the detectors and species spec. Either calfile and/or species and detectors have to be provided.

Returns

(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag. All currently supported file formats return full date.

Return type

tuple[list, dict, bool]