chromdata: Post-processed chromatography data parser

Handles the reading of post-processed chromatography data, i.e. files containing peak areas, concentrations, or mole fractions.

Note

To parse trace data as present in raw chromatograms, use the chromtrace parser.

Usage

Available since yadg-4.2. The parser supports the following parameters:

pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.ChromData

Parser for processed chromatography data.

Show JSON schema

{
   "title": "ChromData",
   "description": "Parser for processed chromatography data.",
   "type": "object",
   "properties": {
      "tag": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Tag"
      },
      "parser": {
         "const": "chromdata",
         "title": "Parser"
      },
      "input": {
         "$ref": "#/$defs/Input"
      },
      "extractor": {
         "discriminator": {
            "mapping": {
               "empalc.csv": "#/$defs/EmpaLC_csv",
               "empalc.xlsx": "#/$defs/EmpaLC_xlsx",
               "fusion.csv": "#/$defs/Fusion_csv",
               "fusion.json": "#/$defs/Fusion_json",
               "fusion.zip": "#/$defs/Fusion_zip"
            },
            "propertyName": "filetype"
         },
         "oneOf": [
            {
               "$ref": "#/$defs/Fusion_json"
            },
            {
               "$ref": "#/$defs/Fusion_zip"
            },
            {
               "$ref": "#/$defs/Fusion_csv"
            },
            {
               "$ref": "#/$defs/EmpaLC_csv"
            },
            {
               "$ref": "#/$defs/EmpaLC_xlsx"
            }
         ],
         "title": "Extractor"
      },
      "parameters": {
         "anyOf": [
            {
               "$ref": "#/$defs/Parameters"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "externaldate": {
         "anyOf": [
            {
               "$ref": "#/$defs/ExternalDate"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      }
   },
   "$defs": {
      "EmpaLC_csv": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "empalc.csv",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "EmpaLC_csv",
         "type": "object"
      },
      "EmpaLC_xlsx": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "empalc.xlsx",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "EmpaLC_xlsx",
         "type": "object"
      },
      "ExternalDate": {
         "additionalProperties": false,
         "description": "Supply timestamping information that are external to the processed file.",
         "properties": {
            "using": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/ExternalDateFile"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateFilename"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateISOString"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateUTSOffset"
                  }
               ],
               "title": "Using"
            },
            "mode": {
               "default": "add",
               "enum": [
                  "add",
                  "replace"
               ],
               "title": "Mode",
               "type": "string"
            }
         },
         "required": [
            "using"
         ],
         "title": "ExternalDate",
         "type": "object"
      },
      "ExternalDateFile": {
         "additionalProperties": false,
         "description": "Read external date information from file.",
         "properties": {
            "file": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content"
            }
         },
         "required": [
            "file"
         ],
         "title": "ExternalDateFile",
         "type": "object"
      },
      "ExternalDateFilename": {
         "additionalProperties": false,
         "description": "Read external date information from the file name.",
         "properties": {
            "filename": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content"
            }
         },
         "required": [
            "filename"
         ],
         "title": "ExternalDateFilename",
         "type": "object"
      },
      "ExternalDateISOString": {
         "additionalProperties": false,
         "description": "Read a constant external date using an ISO-formatted string.",
         "properties": {
            "isostring": {
               "title": "Isostring",
               "type": "string"
            }
         },
         "required": [
            "isostring"
         ],
         "title": "ExternalDateISOString",
         "type": "object"
      },
      "ExternalDateUTSOffset": {
         "additionalProperties": false,
         "description": "Read a constant external date using a Unix timestamp offset.",
         "properties": {
            "utsoffset": {
               "title": "Utsoffset",
               "type": "number"
            }
         },
         "required": [
            "utsoffset"
         ],
         "title": "ExternalDateUTSOffset",
         "type": "object"
      },
      "Fusion_csv": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "fusion.csv",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Fusion_csv",
         "type": "object"
      },
      "Fusion_json": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "fusion.json",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Fusion_json",
         "type": "object"
      },
      "Fusion_zip": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "fusion.zip",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "required": [
            "filetype"
         ],
         "title": "Fusion_zip",
         "type": "object"
      },
      "Input": {
         "additionalProperties": false,
         "description": "Specification of input files/folders to be processed by the :class:`Step`.",
         "properties": {
            "folders": {
               "items": {
                  "type": "string"
               },
               "title": "Folders",
               "type": "array"
            },
            "prefix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Prefix"
            },
            "suffix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Suffix"
            },
            "contains": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Contains"
            },
            "exclude": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Exclude"
            }
         },
         "required": [
            "folders"
         ],
         "title": "Input",
         "type": "object"
      },
      "Parameters": {
         "additionalProperties": false,
         "description": "Empty parameters specification with no extras allowed.",
         "properties": {},
         "title": "Parameters",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": {
         "additionalProperties": false,
         "properties": {
            "path": {
               "title": "Path",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "type": "string"
            },
            "match": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Match"
            }
         },
         "required": [
            "path",
            "type"
         ],
         "title": "Content",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": {
         "additionalProperties": false,
         "properties": {
            "format": {
               "title": "Format",
               "type": "string"
            },
            "len": {
               "title": "Len",
               "type": "integer"
            }
         },
         "required": [
            "format",
            "len"
         ],
         "title": "Content",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "parser",
      "input",
      "extractor"
   ]
}

Config:

extra: str = forbid

field parser: Literal['chromdata'] [Required]

field extractor: Fusion_json | Fusion_zip | Fusion_csv | EmpaLC_csv | EmpaLC_xlsx [Required]

Formats

The filetypes currently supported by the parser are:

Inficon Fusion JSON format (fusion.json): see fusionjson

Inficon Fusion zip archive (fusion.zip): see fusionzip

Inficon Fusion csv export (fusion.csv): see fusioncsv

Empa’s Agilent LC csv export (empalc.csv): see empalccsv

Empa’s Agilent LC excel export (empalc.xlsx): see empalcxlsx

Schema

Each file is processed into a single xarray.Dataset, containing the following coords and data_vars (if present in the file):

xr.Dataset:
  coords:
    uts:            !!float               # Unix timestamp
    species:        !!str                 # Species names
  data_vars:
    height:         (uts, species)        # Peak height maximum
    area:           (uts, species)        # Integrated peak area
    retention time: (uts, species)        # Peak retention time
    concentration:  (uts, species)        # Species concentration (mol/l)
    xout:           (uts, species)        # Species mole fraction (-)

Module Functions

yadg.parsers.chromdata.process(*, filetype, **kwargs)

Unified chromatographic data parser. Forwards kwargs to the worker functions based on the supplied filetype.

Parameters:: filetype (str) – Discriminator used to select the appropriate worker function.
Return type:: xarray.Dataset

Submodules

empalccsv: Processing Empa’s online LC exported data (csv)

This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:

metadata section,

table containing sampling information,

table containing analysed chromatography data.

Code author: Peter Kraus

yadg.parsers.chromdata.empalccsv.process(*, fn, encoding, **kwargs)

Custom Agilent Online LC csv export format.

Multiple chromatograms per file, with multiple detectors.

Parameters:

fn (str) – Filename to process.
encoding (str) – Encoding used to open the file.

Return type:

xarray.Dataset

empalcxlsx: Processing Empa’s online LC exported data (xlsx)

This is a structured format produced by the export from Agilent’s Online LC device at Empa. It contains three sections:

metadata section,

table containing sampling information,

table containing analysed chromatography data.

Code author: Peter Kraus

yadg.parsers.chromdata.empalcxlsx.process(*, fn, **kwargs)

Fusion xlsx export format.

Multiple chromatograms per file, with multiple detectors.

Parameters:: fn (str) – Filename to process.
Return type:: xarray.Dataset

fusioncsv: Processing Inficon Fusion csv export format (csv).

This is a tabulated format, including the concentrations, mole fractions, peak areas, and retention times. The latter is ignored by this parser.

Warning

As also mentioned in the csv files themselves, the use of this filetype is discouraged, and the json files (or a zipped archive of them) should be parsed instead.

Code author: Peter Kraus

yadg.parsers.chromdata.fusioncsv.process(*, fn, encoding, timezone, **kwargs)

Fusion csv export format.

Multiple chromatograms per file, with multiple detectors.

Parameters:

fn (str) – Filename to process.
encoding (str) – Encoding used to open the file.
timezone (ZoneInfo) – Timezone information. This should be "localtime".

Return type:

xarray.Dataset

fusionjson: Processing Inficon Fusion json data format (json).

This is a fairly detailed data format, including the traces, the calibration applied, and also the integrated peak areas and other processed information, which are parsed by this module.

Note

To parse the raw trace data, use the chromtrace module.

Warning

The detectors in the json files are not necessarily in a consistent order. To avoid inconsistent parsing of species which appear in both detectors, the detector keys are sorted. Species present in both detectors will be overwritten by the last detector in alphabetical order.

Exposed metadata:

params:
  method:   !!str
  username: None
  version:  !!str
  datafile: !!str

Code author: Peter Kraus

yadg.parsers.chromdata.fusionjson.process(*, fn, encoding, timezone, **kwargs)

Fusion json format.

One chromatogram per file with multiple traces, and pre-analysed results. Only a subset of the metadata is retained, including the method name, detector names, and information about assigned peaks.

Parameters:

fn (str) – Filename to process.
encoding (str) – Encoding used to open the file.
timezone (ZoneInfo) – Timezone information. This should be "localtime".

Return type:

xarray.Dataset

fusionzip: Processing Inficon Fusion zipped data format (zip).

This is a wrapper parser which unzips the provided zip file, and then uses the yadg.parsers.chromdata.fusionjson parser to parse every data file present in the archive.

Code author: Peter Kraus

yadg.parsers.chromdata.fusionzip.process(*, fn, encoding, timezone, **kwargs)

Fusion zip file format.

The Fusion GC’s can export their json formats as a zip archive of a folder of jsons. This parser allows for parsing of this zip archive directly, without the user having to unzip & move the data.

Parameters:

fn (str) – Filename to process.
encoding (str) – Not used as the file is binary.
timezone (str) – Timezone information. This should be "localtime".

Returns:

The data from the inidividual json files contained in the zip archive are concatenated into a single xarray.Dataset. This might fail if the metadata in the json files differs, or if the dimensions are not easily concatenable.

Return type:

xarray.Dataset