chromtrace: Raw chromatogram trace file parser
Handles the parsing of raw traces present in chromatography files, whether the source is a liquid chromatograph (LC) or a gas chromatograph (GC). The basic function of the parser is to:
read in the raw data and create timestamped traces
collect metadata such as the method information, sample ID, etc.
chromtrace
loads the chromatographic data from the specified
file, determines the uncertainties of the signal (y-axis), and explicitly
populates the points in the time axis (x-axis), when required.
Usage
Available since yadg-4.0
. The parser supports the following parameters:
- pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.ChromTrace
Parser for raw chromatography traces.
Note
For parsing processed (integrated) chromatographic data, use the
ChromData
parser.Show JSON schema
{ "title": "ChromTrace", "description": "Parser for raw chromatography traces.\n\n.. note::\n\n For parsing processed (integrated) chromatographic data, use the\n :class:`ChromData` parser.", "type": "object", "properties": { "tag": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Tag" }, "parser": { "const": "chromtrace", "title": "Parser" }, "input": { "$ref": "#/$defs/Input" }, "extractor": { "discriminator": { "mapping": { "agilent.ch": "#/$defs/Agilent_ch", "agilent.csv": "#/$defs/Agilent_csv", "agilent.dx": "#/$defs/Agilent_dx", "ezchrom.asc": "#/$defs/EZChrom_asc", "fusion.json": "#/$defs/Fusion_json", "fusion.zip": "#/$defs/Fusion_zip", "marda:agilent-ch": "#/$defs/Agilent_ch", "marda:agilent-dx": "#/$defs/Agilent_dx" }, "propertyName": "filetype" }, "oneOf": [ { "$ref": "#/$defs/EZChrom_asc" }, { "$ref": "#/$defs/Fusion_json" }, { "$ref": "#/$defs/Fusion_zip" }, { "$ref": "#/$defs/Agilent_ch" }, { "$ref": "#/$defs/Agilent_dx" }, { "$ref": "#/$defs/Agilent_csv" } ], "title": "Extractor" }, "parameters": { "anyOf": [ { "$ref": "#/$defs/Parameters" }, { "type": "null" } ], "default": null }, "externaldate": { "anyOf": [ { "$ref": "#/$defs/ExternalDate" }, { "type": "null" } ], "default": null } }, "$defs": { "Agilent_ch": { "additionalProperties": false, "properties": { "filetype": { "enum": [ "agilent.ch", "marda:agilent-ch" ], "title": "Filetype", "type": "string" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Agilent_ch", "type": "object" }, "Agilent_csv": { "additionalProperties": false, "properties": { "filetype": { "const": "agilent.csv", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Agilent_csv", "type": "object" }, "Agilent_dx": { "additionalProperties": false, "properties": { "filetype": { "enum": [ "agilent.dx", "marda:agilent-dx" ], "title": "Filetype", "type": "string" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Agilent_dx", "type": "object" }, "EZChrom_asc": { "additionalProperties": false, "properties": { "filetype": { "const": "ezchrom.asc", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "EZChrom_asc", "type": "object" }, "ExternalDate": { "additionalProperties": false, "description": "Supply timestamping information that are external to the processed file.", "properties": { "using": { "anyOf": [ { "$ref": "#/$defs/ExternalDateFile" }, { "$ref": "#/$defs/ExternalDateFilename" }, { "$ref": "#/$defs/ExternalDateISOString" }, { "$ref": "#/$defs/ExternalDateUTSOffset" } ], "title": "Using" }, "mode": { "default": "add", "enum": [ "add", "replace" ], "title": "Mode", "type": "string" } }, "required": [ "using" ], "title": "ExternalDate", "type": "object" }, "ExternalDateFile": { "additionalProperties": false, "description": "Read external date information from file.", "properties": { "file": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content" } }, "required": [ "file" ], "title": "ExternalDateFile", "type": "object" }, "ExternalDateFilename": { "additionalProperties": false, "description": "Read external date information from the file name.", "properties": { "filename": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content" } }, "required": [ "filename" ], "title": "ExternalDateFilename", "type": "object" }, "ExternalDateISOString": { "additionalProperties": false, "description": "Read a constant external date using an ISO-formatted string.", "properties": { "isostring": { "title": "Isostring", "type": "string" } }, "required": [ "isostring" ], "title": "ExternalDateISOString", "type": "object" }, "ExternalDateUTSOffset": { "additionalProperties": false, "description": "Read a constant external date using a Unix timestamp offset.", "properties": { "utsoffset": { "title": "Utsoffset", "type": "number" } }, "required": [ "utsoffset" ], "title": "ExternalDateUTSOffset", "type": "object" }, "Fusion_json": { "additionalProperties": false, "properties": { "filetype": { "const": "fusion.json", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Fusion_json", "type": "object" }, "Fusion_zip": { "additionalProperties": false, "properties": { "filetype": { "const": "fusion.zip", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Fusion_zip", "type": "object" }, "Input": { "additionalProperties": false, "description": "Specification of input files/folders to be processed by the :class:`Step`.", "properties": { "folders": { "items": { "type": "string" }, "title": "Folders", "type": "array" }, "prefix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Prefix" }, "suffix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Suffix" }, "contains": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Contains" }, "exclude": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Exclude" } }, "required": [ "folders" ], "title": "Input", "type": "object" }, "Parameters": { "additionalProperties": false, "description": "Empty parameters specification with no extras allowed.", "properties": {}, "title": "Parameters", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": { "additionalProperties": false, "properties": { "path": { "title": "Path", "type": "string" }, "type": { "title": "Type", "type": "string" }, "match": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Match" } }, "required": [ "path", "type" ], "title": "Content", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": { "additionalProperties": false, "properties": { "format": { "title": "Format", "type": "string" }, "len": { "title": "Len", "type": "integer" } }, "required": [ "format", "len" ], "title": "Content", "type": "object" } }, "additionalProperties": false, "required": [ "parser", "input", "extractor" ] }
- Config:
extra: str = forbid
- field parser: Literal['chromtrace'] [Required]
- field extractor: EZChrom_asc | Fusion_json | Fusion_zip | Agilent_ch | Agilent_dx | Agilent_csv [Required]
Formats
The filetypes
currently supported by the parser are:
EZ-Chrom ASCII export (
ezchrom.asc
): seeezchromasc
Agilent Chemstation Chromtab (
agilent.csv
): seeagilentcsv
Agilent OpenLab binary signal (
agilent.ch
): seeagilentch
Agilent OpenLab data archive (
agilent.dx
): seeagilentdx
Inficon Fusion JSON format (
fusion.json
): seefusionjson
Inficon Fusion zip archive (
fusion.zip
): seefusionzip
Schema
The data is returned as a datatree.DataTree
, containing a xarray.Dataset
for each trace / detector name:
datatree.DataTree:
{{ detector_name }} !!xr.Dataset
coords:
uts: !!float # Timestamp of the chromatogram
elution_time: !!float # The time axis of the chromatogram (s)
data_vars:
signal: (uts, elution_time) # The ordinate axis of the chromatogram
When multiple chromatograms are parsed, they are concatenated separately per detector
name. An error might occur during this concatenation if the elution_time
axis changes
dimensions or coordinates between different timesteps.
Note
To parse processed data in the raw data files, such as integrated peak areas or
concentrations, use the chromdata
parser instead.
Module Functions
- yadg.parsers.chromtrace.process(*, filetype, **kwargs)
Unified raw chromatogram parser. Forwards
kwargs
to the worker functions based on the suppliedfiletype
.- Parameters:
filetype (
str
) – Discriminator used to select the appropriate worker function.- Return type:
Submodules
agilentch: Processing Agilent OpenLab binary signal trace files (CH and IT).
Currently supports version “179” of the files. Version information is defined in the magic_values (parameters & metadata) and data_dtypes (data) dictionaries.
Adapted from ImportAgilent.m and aston.
File Structure of .ch
files
0x0000 "version magic"
0x0108 "data offset"
0x011a "x-axis minimum (ms)"
0x011e "x-axis maximum (ms)"
0x035a "sample ID"
0x0559 "description"
0x0758 "username"
0x0957 "timestamp"
0x09e5 "instrument name"
0x09bc "inlet"
0x0a0e "method"
0x104c "y-axis unit"
0x1075 "detector name"
0x1274 "y-axis intercept"
0x127c "y-axis slope"
Data is stored in a consecutive set of <f8
, starting at the offset (calculated
as offset = ("data offset" - 1) * 512
) until the end of the file.
Code author: Peter Kraus
- yadg.parsers.chromtrace.agilentch.process(*, fn, timezone, **kwargs)
Agilent OpenLAB signal trace parser
One chromatogram per file with a single trace. Binary data format.
- Parameters:
fn (
str
) – Filename to process.encoding – Not used as the file is binary.
timezone (
ZoneInfo
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector. As there is only one detector data in each CH file, this nesting is only for consistency with other filetypes.- Return type:
datatree.DataTree
agilentcsv: Processing Agilent Chemstation Chromtab tabulated data files (csv).
This file format may include multiple timesteps consisting of several traces each in a
single CSV file. It contains a header section for each timestep, followed by a detector
name, and a sequence of “X, Y” datapoints, which are stored as elution_time
and
signal
.
Warning
It is not guaranteed that the X-axis of the chromatogram (i.e. elution_time
) is
consistent between the timesteps of the same trace. The traces are expanded to the
length of the longest trace, and the shorter traces are padded with NaNs
.
Warning
Unfortunately, the chromatographic method
is not exposed in this file format.
Code author: Peter Kraus
- yadg.parsers.chromtrace.agilentcsv.process(*, fn, encoding, timezone, **kwargs)
Agilent Chemstation CSV (Chromtab) file parser
Each file may contain multiple chromatograms per file with multiple traces. Each chromatogram starts with a header section, and is followed by each trace, which includes a header line and x,y-data.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector. As When multiple timesteps are present in the file, the traces of each detector are expanded to match the longest trace, and collated along theuts
-dimension.- Return type:
datatree.DataTree
agilentch: Processing Agilent OpenLab data archive files (DX).
This is a wrapper parser which unzips the provided DX file, and then uses the
yadg.parsers.chromtrace.agilentch
parser to parse every CH file present in
the archive. The IT files in the archive are currently ignored.
In addition to the metadata exposed by the CH parser, the datafile
entry
is populated with the corresponding name of the CH file. The fn
entry in each
timestep contains the parent DX file.
Note
Currently the timesteps from multiple CH files (if present) are appended in the timesteps array without any further sorting.
Code author: Peter Kraus
- yadg.parsers.chromtrace.agilentdx.process(*, fn, encoding, timezone, **kwargs)
Agilent OpenLab DX archive parser.
This is a simple wrapper around the Agilent OpenLab signal trace parser in
yadg.parsers.chromtrace.agilentch
. This wrapper first un-zips the DX file into a temporary directory, and then processess all CH files found within the archive, concatenating timesteps from multiple files.- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Not used as the file is binary.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector. If multiple timesteps are found in the zip archive, thedatatree.DataTrees
are collated along theuts
dimension.- Return type:
datatree.DataTree
ezchromasc: Processing EZ-Chrom ASCII export files (dat.asc).
This file format includes one timestep with multiple traces in each ASCII file. It
contains a header section, and a sequence of Y datapoints (signal
) for each detector.
The X-axis (elution_time
) is assumed to be uniform between traces, and its units have
to be deduced from the header.
Code author: Peter Kraus
- yadg.parsers.chromtrace.ezchromasc.process(*, fn, encoding, timezone, **kwargs)
EZ-Chrome ASCII export file parser.
One chromatogram per file with multiple traces. A header section is followed by y-values for each trace. x-values have to be deduced using number of points, frequency, and x-multiplier. Method name is available, but detector names are not. They are assigned their numerical index in the file.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
ZoneInfo
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector.- Return type:
datatree.DataTree
fusionjson: Processing Inficon Fusion json data format (json).
This is a fairly detailed data format, including the traces, the calibration applied,
and also the integrated peak areas. If the peak areas are present, this is returned
in the list of timesteps as a "peaks"
entry.
Exposed metadata:
method: !!str
sampleid: !!str
version: !!str
datafile: !!str
Code author: Peter Kraus
- yadg.parsers.chromtrace.fusionjson.process(*, fn, encoding, timezone, **kwargs)
Fusion json format.
One chromatogram per file with multiple traces, and integrated peak areas.
Warning
To parse the integrated data present in these files, use the
chromdata
parser.Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
ZoneInfo
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector.- Return type:
datatree.DataTree
fusionzip: Processing Inficon Fusion zipped data format (zip).
This is a wrapper parser which unzips the provided zip file, and then uses
the yadg.parsers.chromtrace.fusionjson
parser to parse every data
file present in the archive.
Exposed metadata:
method: !!str
sampleid: !!str
version: !!str
datafile: !!str
Code author: Peter Kraus
- yadg.parsers.chromtrace.fusionzip.process(*, fn, encoding, timezone, **kwargs)
Fusion zip file format.
The Fusion GC’s can export their json formats as a zip archive of a folder of jsons. This parser allows for parsing of this zip archive directly, without the user having to unzip & move the data.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Not used as the file is binary.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns:
class – A
datatree.DataTree
containing onexarray.Dataset
per detector. If multiple timesteps are found in the zip archive, thedatatree.DataTrees
are collated along theuts
dimension.- Return type:
datatree.DataTree