chromdata: Post-processed chromatography data parser

This module handles the reading of post-processed chromatography data, i.e. files containing peak areas, concentrations, or mole fractions.

chromdata loads the processed chromatographic data from the specified file, including the peak heights, areas, retention times, as well as the concentrations and mole fractions (normalised, unitless concentrations).

Note

To parse trace data as present in raw chromatograms, use the chromtrace parser.

Usage

Available since yadg-4.2. The parser supports the following parameters:

pydantic model dgbowl_schemas.yadg.dataschema_4_2.step.ChromData.Params

Show JSON schema
{
   "title": "Params",
   "type": "object",
   "properties": {
      "filetype": {
         "title": "Filetype",
         "default": "fusion.json",
         "enum": [
            "fusion.json",
            "fusion.zip",
            "fusion.csv",
            "empalc.csv",
            "empalc.xlsx"
         ],
         "type": "string"
      }
   },
   "additionalProperties": false
}

field filetype: Literal['fusion.json', 'fusion.zip', 'fusion.csv', 'empalc.csv', 'empalc.xlsx'] = 'fusion.json'

Formats

The filetypes currently supported by the parser are:

  • Inficon Fusion JSON format (fusion.json): see fusionjson

  • Inficon Fusion zip archive (fusion.zip): see fusionzip

  • Inficon Fusion csv export (fusion.csv): see fusioncsv

  • Empa’s Agilent LC csv export (empalc.csv): see empalccsv

  • Empa’s Agilent LC excel export (empalc.xlsx): see empalcxlsx

Provides

This raw data is stored, for each timestep, using the following format:

- uts: !!float
  fn:  !!str
  raw:
    sampleid: !!str             # sample name or valve ID
    height:                     # heights of the peak maxima
      "{{ species_name }}":
          {n: !!float, s: !!float, u: !!str}
    area:                       # integrated areas of the peaks
      "{{ species_name }}":
          {n: !!float, s: !!float, u: !!str}
    concentration:
      "{{ species_name }}":
          {n: !!float, s: !!float, u: !!str}
    xout:                       # mole fractions (normalised concentrations)
      "{{ species_name }}":
          {n: !!float, s: !!float, u: " "}
    retention time:
      "{{ species_name }}":
          {n: !!float, s: !!float, u: " "}

Note

The mole fractions in xout always sum up to unity. If there is more than one outlet stream, or if some analytes remain unidentified, the values in xout will not be accurate.

Submodules

empalccsv: Processing Empa’s online LC exported data (csv)

This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:

  • metadata section,

  • table containing sampling information,

  • table containing analysed chromatography data.

Exposed metadata:

params:
  method:   !!str
  username: !!str
  version:  !!int
  datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromdata.empalccsv.process(fn, encoding, timezone)

Fusion csv export format.

Multiple chromatograms per file, with multiple detectors.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata, fulldate) – Standard timesteps, metadata, and date tuple.

Return type

tuple[list, dict, bool]

empalcxlsx: Processing Empa’s online LC exported data (xlsx)

This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:

  • metadata section,

  • table containing sampling information,

  • table containing analysed chromatography data.

Exposed metadata:

params:
  method:   !!str
  username: !!str
  version:  !!int
  datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromdata.empalcxlsx.process(fn, encoding, timezone)

Fusion xlsx export format.

Multiple chromatograms per file, with multiple detectors.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata, fulldate) – Standard timesteps, metadata, and date tuple.

Return type

tuple[list, dict, bool]

fusioncsv: Processing Inficon Fusion csv export format (csv).

This is a tabulated format, including the concentrations, mole fractions, peak areas, and retention times. The latter is ignored by this parser.

Warning

As also mentioned in the csv files themselves, the use of this filetype is discouraged, and the json files (or a zipped archive of them) should be parsed instead.

Exposed metadata:

params:
  method:   !!str
  username: None
  version:  None
  datafile: None

Code author: Peter Kraus

yadg.parsers.chromdata.fusioncsv.process(fn, encoding, timezone)

Fusion csv export format.

Multiple chromatograms per file, with multiple detectors.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata, fulldate) – Standard timesteps, metadata, and date tuple.

Return type

tuple[list, dict, bool]

fusionjson: Processing Inficon Fusion json data format (json).

This is a fairly detailed data format, including the traces, the calibration applied, and also the integrated peak areas and other processed information, which are parsed by this module.

Note

To parse the raw trace data, use the chromtrace module.

Warning

The detectors in the json files are not necessarily in a consistent order. To avoid inconsistent parsing of species which appear in both detectors, the detector keys are sorted. Species present in both detectors will be overwritten by the last detector in alphabetical order.

Exposed metadata:

params:
  method:   !!str
  username: None
  version:  !!str
  datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromdata.fusionjson.process(fn, encoding, timezone)

Fusion json format.

One chromatogram per file with multiple traces, and pre-analysed results. Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

([chrom], metadata, fulldate) – Standard timesteps, metadata, and date tuple.

Return type

tuple[list, dict, bool]

fusionzip: Processing Inficon Fusion zipped data format (zip).

This is a wrapper parser which unzips the provided zip file, and then uses the yadg.parsers.chromdata.fusionjson parser to parse every data file present in the archive.

Exposed metadata:

params:
  method:   !!str
  username: None
  version:  !!str
  datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromdata.fusionzip.process(fn, encoding, timezone)

Fusion zip file format.

The Fusion GC’s can export their json formats as a zip archive of a folder of jsons. This parser allows for parsing of this zip archive directly, without the user having to unzip & move the data.

Parameters
  • fn (str) – Filename to process.

  • encoding (str) – Not used as the file is binary.

  • timezone (str) – Timezone information. This should be "localtime".

Returns

(chroms, metadata) – Standard timesteps & metadata tuple.

Return type

tuple[list, dict]

yadg.parsers.chromdata.main.process(fn, encoding='utf-8', timezone='localtime', parameters=None)

Unified chromatographic data parser.

Parameters
  • fn (str) – The file containing the trace(s) to parse.

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

  • parameters (Optional[BaseModel]) – Parameters for ChromData.

Returns

(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag. All currently supported file formats return full date.

Return type

tuple[list, dict, bool]