chromtrace: Raw chromatogram trace file parser

Handles the parsing of raw traces present in chromatography files, whether the source is a liquid chromatograph (LC) or a gas chromatograph (GC). The basic function of the parser is to:

  1. read in the raw data and create timestamped traces

  2. collect metadata such as the method information, sample ID, etc.

chromtrace loads the chromatographic data from the specified file, determines the uncertainties of the signal (y-axis), and explicitly populates the points in the time axis (x-axis), when required.

Usage

Available since yadg-4.0. The parser supports the following parameters:

pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.ChromTrace

Parser for raw chromatography traces.

Note

For parsing processed (integrated) chromatographic data, use the ChromData parser.

Show JSON schema
{
   "title": "ChromTrace",
   "description": "Parser for raw chromatography traces.\n\n.. note::\n\n    For parsing processed (integrated) chromatographic data, use the\n    :class:`ChromData` parser.",
   "type": "object",
   "properties": {
      "tag": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Tag"
      },
      "parser": {
         "const": "chromtrace",
         "title": "Parser"
      },
      "input": {
         "$ref": "#/$defs/Input"
      },
      "extractor": {
         "discriminator": {
            "mapping": {
               "agilent.ch": "#/$defs/Agilent_ch",
               "agilent.csv": "#/$defs/Agilent_csv",
               "agilent.dx": "#/$defs/Agilent_dx",
               "ezchrom.asc": "#/$defs/EZChrom_asc",
               "fusion.json": "#/$defs/Fusion_json",
               "fusion.zip": "#/$defs/Fusion_zip",
               "marda:agilent-ch": "#/$defs/Agilent_ch",
               "marda:agilent-dx": "#/$defs/Agilent_dx"
            },
            "propertyName": "filetype"
         },
         "oneOf": [
            {
               "$ref": "#/$defs/EZChrom_asc"
            },
            {
               "$ref": "#/$defs/Fusion_json"
            },
            {
               "$ref": "#/$defs/Fusion_zip"
            },
            {
               "$ref": "#/$defs/Agilent_ch"
            },
            {
               "$ref": "#/$defs/Agilent_dx"
            },
            {
               "$ref": "#/$defs/Agilent_csv"
            }
         ],
         "title": "Extractor"
      },
      "parameters": {
         "anyOf": [
            {
               "$ref": "#/$defs/Parameters"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "externaldate": {
         "anyOf": [
            {
               "$ref": "#/$defs/ExternalDate"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      }
   },
   "$defs": {
      "Agilent_ch": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "enum": [
                  "agilent.ch",
                  "marda:agilent-ch"
               ],
               "title": "Filetype",
               "type": "string"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Agilent_ch",
         "type": "object"
      },
      "Agilent_csv": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "agilent.csv",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Agilent_csv",
         "type": "object"
      },
      "Agilent_dx": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "enum": [
                  "agilent.dx",
                  "marda:agilent-dx"
               ],
               "title": "Filetype",
               "type": "string"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Agilent_dx",
         "type": "object"
      },
      "EZChrom_asc": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "ezchrom.asc",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "EZChrom_asc",
         "type": "object"
      },
      "ExternalDate": {
         "additionalProperties": false,
         "description": "Supply timestamping information that are external to the processed file.",
         "properties": {
            "using": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/ExternalDateFile"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateFilename"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateISOString"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateUTSOffset"
                  }
               ],
               "title": "Using"
            },
            "mode": {
               "default": "add",
               "enum": [
                  "add",
                  "replace"
               ],
               "title": "Mode",
               "type": "string"
            }
         },
         "required": [
            "using"
         ],
         "title": "ExternalDate",
         "type": "object"
      },
      "ExternalDateFile": {
         "additionalProperties": false,
         "description": "Read external date information from file.",
         "properties": {
            "file": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content"
            }
         },
         "required": [
            "file"
         ],
         "title": "ExternalDateFile",
         "type": "object"
      },
      "ExternalDateFilename": {
         "additionalProperties": false,
         "description": "Read external date information from the file name.",
         "properties": {
            "filename": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content"
            }
         },
         "required": [
            "filename"
         ],
         "title": "ExternalDateFilename",
         "type": "object"
      },
      "ExternalDateISOString": {
         "additionalProperties": false,
         "description": "Read a constant external date using an ISO-formatted string.",
         "properties": {
            "isostring": {
               "title": "Isostring",
               "type": "string"
            }
         },
         "required": [
            "isostring"
         ],
         "title": "ExternalDateISOString",
         "type": "object"
      },
      "ExternalDateUTSOffset": {
         "additionalProperties": false,
         "description": "Read a constant external date using a Unix timestamp offset.",
         "properties": {
            "utsoffset": {
               "title": "Utsoffset",
               "type": "number"
            }
         },
         "required": [
            "utsoffset"
         ],
         "title": "ExternalDateUTSOffset",
         "type": "object"
      },
      "Fusion_json": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "fusion.json",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Fusion_json",
         "type": "object"
      },
      "Fusion_zip": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "fusion.zip",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Fusion_zip",
         "type": "object"
      },
      "Input": {
         "additionalProperties": false,
         "description": "Specification of input files/folders to be processed by the :class:`Step`.",
         "properties": {
            "folders": {
               "items": {
                  "type": "string"
               },
               "title": "Folders",
               "type": "array"
            },
            "prefix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Prefix"
            },
            "suffix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Suffix"
            },
            "contains": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Contains"
            },
            "exclude": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Exclude"
            }
         },
         "required": [
            "folders"
         ],
         "title": "Input",
         "type": "object"
      },
      "Parameters": {
         "additionalProperties": false,
         "description": "Empty parameters specification with no extras allowed.",
         "properties": {},
         "title": "Parameters",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": {
         "additionalProperties": false,
         "properties": {
            "path": {
               "title": "Path",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "type": "string"
            },
            "match": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Match"
            }
         },
         "required": [
            "path",
            "type"
         ],
         "title": "Content",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": {
         "additionalProperties": false,
         "properties": {
            "format": {
               "title": "Format",
               "type": "string"
            },
            "len": {
               "title": "Len",
               "type": "integer"
            }
         },
         "required": [
            "format",
            "len"
         ],
         "title": "Content",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "parser",
      "input",
      "extractor"
   ]
}

Config:
  • extra: str = forbid

field parser: Literal['chromtrace'] [Required]
field extractor: EZChrom_asc | Fusion_json | Fusion_zip | Agilent_ch | Agilent_dx | Agilent_csv [Required]

Formats

The filetypes currently supported by the parser are:

  • EZ-Chrom ASCII export (ezchrom.asc): see ezchromasc

  • Agilent Chemstation Chromtab (agilent.csv): see agilentcsv

  • Agilent OpenLab binary signal (agilent.ch): see agilentch

  • Agilent OpenLab data archive (agilent.dx): see agilentdx

  • Inficon Fusion JSON format (fusion.json): see fusionjson

  • Inficon Fusion zip archive (fusion.zip): see fusionzip

Schema

The data is returned as a datatree.DataTree, containing a xarray.Dataset for each trace / detector name:

datatree.DataTree:
  {{ detector_name }}  !!xr.Dataset
    coords:
      uts:             !!float               # Timestamp of the chromatogram
      elution_time:    !!float               # The time axis of the chromatogram (s)
    data_vars:
      signal:          (uts, elution_time)   # The ordinate axis of the chromatogram

When multiple chromatograms are parsed, they are concatenated separately per detector name. An error might occur during this concatenation if the elution_time axis changes dimensions or coordinates between different timesteps.

Note

To parse processed data in the raw data files, such as integrated peak areas or concentrations, use the chromdata parser instead.

Module Functions

yadg.parsers.chromtrace.process(*, filetype, **kwargs)

Unified raw chromatogram parser. Forwards kwargs to the worker functions based on the supplied filetype.

Parameters:

filetype (str) – Discriminator used to select the appropriate worker function.

Return type:

datatree.DataTree

Submodules

agilentch: Processing Agilent OpenLab binary signal trace files (CH and IT).

Currently supports version “179” of the files. Version information is defined in the magic_values (parameters & metadata) and data_dtypes (data) dictionaries.

Adapted from ImportAgilent.m and aston.

File Structure of .ch files

0x0000 "version magic"
0x0108 "data offset"
0x011a "x-axis minimum (ms)"
0x011e "x-axis maximum (ms)"
0x035a "sample ID"
0x0559 "description"
0x0758 "username"
0x0957 "timestamp"
0x09e5 "instrument name"
0x09bc "inlet"
0x0a0e "method"
0x104c "y-axis unit"
0x1075 "detector name"
0x1274 "y-axis intercept"
0x127c "y-axis slope"

Data is stored in a consecutive set of <f8, starting at the offset (calculated as offset =  ("data offset" - 1) * 512) until the end of the file.

Code author: Peter Kraus

yadg.parsers.chromtrace.agilentch.process(*, fn, timezone, **kwargs)

Agilent OpenLAB signal trace parser

One chromatogram per file with a single trace. Binary data format.

Parameters:
  • fn (str) – Filename to process.

  • encoding – Not used as the file is binary.

  • timezone (ZoneInfo) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector. As there is only one detector data in each CH file, this nesting is only for consistency with other filetypes.

Return type:

datatree.DataTree

agilentcsv: Processing Agilent Chemstation Chromtab tabulated data files (csv).

This file format may include multiple timesteps consisting of several traces each in a single CSV file. It contains a header section for each timestep, followed by a detector name, and a sequence of “X, Y” datapoints, which are stored as elution_time and signal.

Warning

It is not guaranteed that the X-axis of the chromatogram (i.e. elution_time) is consistent between the timesteps of the same trace. The traces are expanded to the length of the longest trace, and the shorter traces are padded with NaNs.

Warning

Unfortunately, the chromatographic method is not exposed in this file format.

Code author: Peter Kraus

yadg.parsers.chromtrace.agilentcsv.process(*, fn, encoding, timezone, **kwargs)

Agilent Chemstation CSV (Chromtab) file parser

Each file may contain multiple chromatograms per file with multiple traces. Each chromatogram starts with a header section, and is followed by each trace, which includes a header line and x,y-data.

Parameters:
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (str) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector. As When multiple timesteps are present in the file, the traces of each detector are expanded to match the longest trace, and collated along the uts-dimension.

Return type:

datatree.DataTree

agilentch: Processing Agilent OpenLab data archive files (DX).

This is a wrapper parser which unzips the provided DX file, and then uses the yadg.parsers.chromtrace.agilentch parser to parse every CH file present in the archive. The IT files in the archive are currently ignored.

In addition to the metadata exposed by the CH parser, the datafile entry is populated with the corresponding name of the CH file. The fn entry in each timestep contains the parent DX file.

Note

Currently the timesteps from multiple CH files (if present) are appended in the timesteps array without any further sorting.

Code author: Peter Kraus

yadg.parsers.chromtrace.agilentdx.process(*, fn, encoding, timezone, **kwargs)

Agilent OpenLab DX archive parser.

This is a simple wrapper around the Agilent OpenLab signal trace parser in yadg.parsers.chromtrace.agilentch. This wrapper first un-zips the DX file into a temporary directory, and then processess all CH files found within the archive, concatenating timesteps from multiple files.

Parameters:
  • fn (str) – Filename to process.

  • encoding (str) – Not used as the file is binary.

  • timezone (str) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector. If multiple timesteps are found in the zip archive, the datatree.DataTrees are collated along the uts dimension.

Return type:

datatree.DataTree

ezchromasc: Processing EZ-Chrom ASCII export files (dat.asc).

This file format includes one timestep with multiple traces in each ASCII file. It contains a header section, and a sequence of Y datapoints (signal) for each detector. The X-axis (elution_time) is assumed to be uniform between traces, and its units have to be deduced from the header.

Code author: Peter Kraus

yadg.parsers.chromtrace.ezchromasc.process(*, fn, encoding, timezone, **kwargs)

EZ-Chrome ASCII export file parser.

One chromatogram per file with multiple traces. A header section is followed by y-values for each trace. x-values have to be deduced using number of points, frequency, and x-multiplier. Method name is available, but detector names are not. They are assigned their numerical index in the file.

Parameters:
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (ZoneInfo) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector.

Return type:

datatree.DataTree

fusionjson: Processing Inficon Fusion json data format (json).

This is a fairly detailed data format, including the traces, the calibration applied, and also the integrated peak areas. If the peak areas are present, this is returned in the list of timesteps as a "peaks" entry.

Exposed metadata:

method:   !!str
sampleid: !!str
version:  !!str
datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromtrace.fusionjson.process(*, fn, encoding, timezone, **kwargs)

Fusion json format.

One chromatogram per file with multiple traces, and integrated peak areas.

Warning

To parse the integrated data present in these files, use the chromdata parser.

Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.

Parameters:
  • fn (str) – Filename to process.

  • encoding (str) – Encoding used to open the file.

  • timezone (ZoneInfo) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector.

Return type:

datatree.DataTree

fusionzip: Processing Inficon Fusion zipped data format (zip).

This is a wrapper parser which unzips the provided zip file, and then uses the yadg.parsers.chromtrace.fusionjson parser to parse every data file present in the archive.

Exposed metadata:

method:   !!str
sampleid: !!str
version:  !!str
datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromtrace.fusionzip.process(*, fn, encoding, timezone, **kwargs)

Fusion zip file format.

The Fusion GC’s can export their json formats as a zip archive of a folder of jsons. This parser allows for parsing of this zip archive directly, without the user having to unzip & move the data.

Parameters:
  • fn (str) – Filename to process.

  • encoding (str) – Not used as the file is binary.

  • timezone (str) – Timezone information. This should be "localtime".

Returns:

class – A datatree.DataTree containing one xarray.Dataset per detector. If multiple timesteps are found in the zip archive, the datatree.DataTrees are collated along the uts dimension.

Return type:

datatree.DataTree