chromdata: Post-processed chromatography data parser
Handles the reading of post-processed chromatography data, i.e. files containing peak areas, concentrations, or mole fractions.
Note
To parse trace data as present in raw chromatograms, use the
chromtrace
parser.
Usage
Available since yadg-4.2
. The parser supports the following parameters:
- pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.ChromData
Parser for processed chromatography data.
Show JSON schema
{ "title": "ChromData", "description": "Parser for processed chromatography data.", "type": "object", "properties": { "tag": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Tag" }, "parser": { "const": "chromdata", "title": "Parser" }, "input": { "$ref": "#/$defs/Input" }, "extractor": { "discriminator": { "mapping": { "empalc.csv": "#/$defs/EmpaLC_csv", "empalc.xlsx": "#/$defs/EmpaLC_xlsx", "fusion.csv": "#/$defs/Fusion_csv", "fusion.json": "#/$defs/Fusion_json", "fusion.zip": "#/$defs/Fusion_zip" }, "propertyName": "filetype" }, "oneOf": [ { "$ref": "#/$defs/Fusion_json" }, { "$ref": "#/$defs/Fusion_zip" }, { "$ref": "#/$defs/Fusion_csv" }, { "$ref": "#/$defs/EmpaLC_csv" }, { "$ref": "#/$defs/EmpaLC_xlsx" } ], "title": "Extractor" }, "parameters": { "anyOf": [ { "$ref": "#/$defs/Parameters" }, { "type": "null" } ], "default": null }, "externaldate": { "anyOf": [ { "$ref": "#/$defs/ExternalDate" }, { "type": "null" } ], "default": null } }, "$defs": { "EmpaLC_csv": { "additionalProperties": false, "properties": { "filetype": { "const": "empalc.csv", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "EmpaLC_csv", "type": "object" }, "EmpaLC_xlsx": { "additionalProperties": false, "properties": { "filetype": { "const": "empalc.xlsx", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "EmpaLC_xlsx", "type": "object" }, "ExternalDate": { "additionalProperties": false, "description": "Supply timestamping information that are external to the processed file.", "properties": { "using": { "anyOf": [ { "$ref": "#/$defs/ExternalDateFile" }, { "$ref": "#/$defs/ExternalDateFilename" }, { "$ref": "#/$defs/ExternalDateISOString" }, { "$ref": "#/$defs/ExternalDateUTSOffset" } ], "title": "Using" }, "mode": { "default": "add", "enum": [ "add", "replace" ], "title": "Mode", "type": "string" } }, "required": [ "using" ], "title": "ExternalDate", "type": "object" }, "ExternalDateFile": { "additionalProperties": false, "description": "Read external date information from file.", "properties": { "file": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content" } }, "required": [ "file" ], "title": "ExternalDateFile", "type": "object" }, "ExternalDateFilename": { "additionalProperties": false, "description": "Read external date information from the file name.", "properties": { "filename": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content" } }, "required": [ "filename" ], "title": "ExternalDateFilename", "type": "object" }, "ExternalDateISOString": { "additionalProperties": false, "description": "Read a constant external date using an ISO-formatted string.", "properties": { "isostring": { "title": "Isostring", "type": "string" } }, "required": [ "isostring" ], "title": "ExternalDateISOString", "type": "object" }, "ExternalDateUTSOffset": { "additionalProperties": false, "description": "Read a constant external date using a Unix timestamp offset.", "properties": { "utsoffset": { "title": "Utsoffset", "type": "number" } }, "required": [ "utsoffset" ], "title": "ExternalDateUTSOffset", "type": "object" }, "Fusion_csv": { "additionalProperties": false, "properties": { "filetype": { "const": "fusion.csv", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Fusion_csv", "type": "object" }, "Fusion_json": { "additionalProperties": false, "properties": { "filetype": { "const": "fusion.json", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Fusion_json", "type": "object" }, "Fusion_zip": { "additionalProperties": false, "properties": { "filetype": { "const": "fusion.zip", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Fusion_zip", "type": "object" }, "Input": { "additionalProperties": false, "description": "Specification of input files/folders to be processed by the :class:`Step`.", "properties": { "folders": { "items": { "type": "string" }, "title": "Folders", "type": "array" }, "prefix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Prefix" }, "suffix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Suffix" }, "contains": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Contains" }, "exclude": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Exclude" } }, "required": [ "folders" ], "title": "Input", "type": "object" }, "Parameters": { "additionalProperties": false, "description": "Empty parameters specification with no extras allowed.", "properties": {}, "title": "Parameters", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": { "additionalProperties": false, "properties": { "path": { "title": "Path", "type": "string" }, "type": { "title": "Type", "type": "string" }, "match": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Match" } }, "required": [ "path", "type" ], "title": "Content", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": { "additionalProperties": false, "properties": { "format": { "title": "Format", "type": "string" }, "len": { "title": "Len", "type": "integer" } }, "required": [ "format", "len" ], "title": "Content", "type": "object" } }, "additionalProperties": false, "required": [ "parser", "input", "extractor" ] }
- Config:
extra: str = forbid
- field parser: Literal['chromdata'] [Required]
- field extractor: Fusion_json | Fusion_zip | Fusion_csv | EmpaLC_csv | EmpaLC_xlsx [Required]
Formats
The filetypes
currently supported by the parser are:
Inficon Fusion JSON format (
fusion.json
): seefusionjson
Inficon Fusion zip archive (
fusion.zip
): seefusionzip
Inficon Fusion csv export (
fusion.csv
): seefusioncsv
Empa’s Agilent LC csv export (
empalc.csv
): seeempalccsv
Empa’s Agilent LC excel export (
empalc.xlsx
): seeempalcxlsx
Schema
Each file is processed into a single xarray.Dataset
, containing the following
coords
and data_vars
(if present in the file):
xr.Dataset:
coords:
uts: !!float # Unix timestamp
species: !!str # Species names
data_vars:
height: (uts, species) # Peak height maximum
area: (uts, species) # Integrated peak area
retention time: (uts, species) # Peak retention time
concentration: (uts, species) # Species concentration (mol/l)
xout: (uts, species) # Species mole fraction (-)
Module Functions
- yadg.parsers.chromdata.process(*, filetype, **kwargs)
Unified chromatographic data parser. Forwards
kwargs
to the worker functions based on the suppliedfiletype
.- Parameters:
filetype (
str
) – Discriminator used to select the appropriate worker function.- Return type:
Submodules
empalccsv: Processing Empa’s online LC exported data (csv)
This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:
metadata section,
table containing sampling information,
table containing analysed chromatography data.
Code author: Peter Kraus
- yadg.parsers.chromdata.empalccsv.process(*, fn, encoding, **kwargs)
Custom Agilent Online LC csv export format.
Multiple chromatograms per file, with multiple detectors.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.
- Return type:
empalcxlsx: Processing Empa’s online LC exported data (xlsx)
This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:
metadata section,
table containing sampling information,
table containing analysed chromatography data.
Code author: Peter Kraus
- yadg.parsers.chromdata.empalcxlsx.process(*, fn, **kwargs)
Fusion xlsx export format.
Multiple chromatograms per file, with multiple detectors.
- Parameters:
fn (
str
) – Filename to process.- Return type:
fusioncsv: Processing Inficon Fusion csv export format (csv).
This is a tabulated format, including the concentrations, mole fractions, peak areas, and retention times. The latter is ignored by this parser.
Warning
As also mentioned in the csv
files themselves, the use of this filetype
is discouraged, and the json
files (or a zipped archive of them) should
be parsed instead.
Code author: Peter Kraus
- yadg.parsers.chromdata.fusioncsv.process(*, fn, encoding, timezone, **kwargs)
Fusion csv export format.
Multiple chromatograms per file, with multiple detectors.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
ZoneInfo
) – Timezone information. This should be"localtime"
.
- Return type:
fusionjson: Processing Inficon Fusion json data format (json).
This is a fairly detailed data format, including the traces, the calibration applied, and also the integrated peak areas and other processed information, which are parsed by this module.
Note
To parse the raw trace data, use the chromtrace
module.
Warning
The detectors in the json files are not necessarily in a consistent order. To avoid inconsistent parsing of species which appear in both detectors, the detector keys are sorted. Species present in both detectors will be overwritten by the last detector in alphabetical order.
Exposed metadata:
params:
method: !!str
username: None
version: !!str
datafile: !!str
Code author: Peter Kraus
- yadg.parsers.chromdata.fusionjson.process(*, fn, encoding, timezone, **kwargs)
Fusion json format.
One chromatogram per file with multiple traces, and pre-analysed results. Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Encoding used to open the file.timezone (
ZoneInfo
) – Timezone information. This should be"localtime"
.
- Return type:
fusionzip: Processing Inficon Fusion zipped data format (zip).
This is a wrapper parser which unzips the provided zip file, and then uses
the yadg.parsers.chromdata.fusionjson
parser to parse every data
file present in the archive.
Code author: Peter Kraus
- yadg.parsers.chromdata.fusionzip.process(*, fn, encoding, timezone, **kwargs)
Fusion zip file format.
The Fusion GC’s can export their json formats as a zip archive of a folder of jsons. This parser allows for parsing of this zip archive directly, without the user having to unzip & move the data.
- Parameters:
fn (
str
) – Filename to process.encoding (
str
) – Not used as the file is binary.timezone (
str
) – Timezone information. This should be"localtime"
.
- Returns:
The data from the inidividual json files contained in the zip archive are concatenated into a single
xarray.Dataset
. This might fail if the metadata in the json files differs, or if the dimensions are not easily concatenable.- Return type: