masstrace: Mass spectroscopy trace file parser
Handles the reading and processing of mass spectrometry files. The basic function of the parser is to:
read in the raw data and create timestamped traces with one
xarray.Dataset
per tracecollect metadata such as the software version, author, etc.
Usage
Select masstrace
by supplying it to the parser
keyword
in the dataschema. The parser supports the following parameters:
- pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.MassTrace
Parser for mass spectroscopy traces.
Show JSON schema
{ "title": "MassTrace", "description": "Parser for mass spectroscopy traces.", "type": "object", "properties": { "tag": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Tag" }, "parser": { "const": "masstrace", "title": "Parser" }, "input": { "$ref": "#/$defs/Input" }, "extractor": { "$ref": "#/$defs/Quadstar_sac" }, "parameters": { "anyOf": [ { "$ref": "#/$defs/Parameters" }, { "type": "null" } ], "default": null }, "externaldate": { "anyOf": [ { "$ref": "#/$defs/ExternalDate" }, { "type": "null" } ], "default": null } }, "$defs": { "ExternalDate": { "additionalProperties": false, "description": "Supply timestamping information that are external to the processed file.", "properties": { "using": { "anyOf": [ { "$ref": "#/$defs/ExternalDateFile" }, { "$ref": "#/$defs/ExternalDateFilename" }, { "$ref": "#/$defs/ExternalDateISOString" }, { "$ref": "#/$defs/ExternalDateUTSOffset" } ], "title": "Using" }, "mode": { "default": "add", "enum": [ "add", "replace" ], "title": "Mode", "type": "string" } }, "required": [ "using" ], "title": "ExternalDate", "type": "object" }, "ExternalDateFile": { "additionalProperties": false, "description": "Read external date information from file.", "properties": { "file": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content" } }, "required": [ "file" ], "title": "ExternalDateFile", "type": "object" }, "ExternalDateFilename": { "additionalProperties": false, "description": "Read external date information from the file name.", "properties": { "filename": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content" } }, "required": [ "filename" ], "title": "ExternalDateFilename", "type": "object" }, "ExternalDateISOString": { "additionalProperties": false, "description": "Read a constant external date using an ISO-formatted string.", "properties": { "isostring": { "title": "Isostring", "type": "string" } }, "required": [ "isostring" ], "title": "ExternalDateISOString", "type": "object" }, "ExternalDateUTSOffset": { "additionalProperties": false, "description": "Read a constant external date using a Unix timestamp offset.", "properties": { "utsoffset": { "title": "Utsoffset", "type": "number" } }, "required": [ "utsoffset" ], "title": "ExternalDateUTSOffset", "type": "object" }, "Input": { "additionalProperties": false, "description": "Specification of input files/folders to be processed by the :class:`Step`.", "properties": { "folders": { "items": { "type": "string" }, "title": "Folders", "type": "array" }, "prefix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Prefix" }, "suffix": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Suffix" }, "contains": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Contains" }, "exclude": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Exclude" } }, "required": [ "folders" ], "title": "Input", "type": "object" }, "Parameters": { "additionalProperties": false, "description": "Empty parameters specification with no extras allowed.", "properties": {}, "title": "Parameters", "type": "object" }, "Quadstar_sac": { "additionalProperties": false, "properties": { "filetype": { "const": "quadstar.sac", "title": "Filetype" }, "timezone": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timezone" }, "locale": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Locale" }, "encoding": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Encoding" } }, "required": [ "filetype" ], "title": "Quadstar_sac", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": { "additionalProperties": false, "properties": { "path": { "title": "Path", "type": "string" }, "type": { "title": "Type", "type": "string" }, "match": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Match" } }, "required": [ "path", "type" ], "title": "Content", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": { "additionalProperties": false, "properties": { "format": { "title": "Format", "type": "string" }, "len": { "title": "Len", "type": "integer" } }, "required": [ "format", "len" ], "title": "Content", "type": "object" } }, "additionalProperties": false, "required": [ "parser", "input", "extractor" ] }
- Config:
extra: str = forbid
- field parser: Literal['masstrace'] [Required]
- field extractor: Quadstar_sac [Required]
Formats
The filetypes
currently supported by the parser are:
Pfeiffer Quadstar 32-bit scan analog data (
quadstar.sac
), seequadstarsac
Schema
The raw data, loaded from the supplied files, is stored using the following format:
datatree.DataTree:
{{ detector_name }} !!xr.Dataset
coords:
uts: !!float
mass_to_charge: !!float # m/z (amu)
data_vars:
y: (uts, mass_to_charge) # Detected signal (counts)
The uncertainties in mass_to_charge
are taken as the step-width of
the linearly spaced mass values.
The uncertainties in of y
are the largest value between:
The quantization error from the ADC, its resolution assumed to be 32 bit. Dividing F.S.R. by
2 ** 32
gives an error in the order of magnitude of the smallest data value iny
.The contribution from neighboring masses. In the operating manual of the QMS 200 (see 2.8 QMS 200 F & 2.9 QMS 200 M), a maximum contribution from the neighboring mass of 50 ppm is noted.
Note
The data in y
may contain NaN
s. The measured ion
count/current value will occasionally exceed the specified detector
F.S.R. (e.g. 1e-9), and will then flip directly to the maximum value
of a float32. These values are set to float("NaN")
.
Module Functions
- yadg.parsers.masstrace.process(*, filetype, **kwargs)
Unified mass spectrometry data parser.Forwards
kwargs
to the worker functions based on the suppliedfiletype
.- Parameters:
filetype (
str
) – Discriminator used to select the appropriate worker function.- Return type:
Submodules
quadstarsac: Processing of Quadstar 32-bit scan analog data.
The sac2dat.c code from Dr. Moritz Bubek was a really useful stepping stone for this Python file parser.
Pretty much the entire file format has been reverse engineered. There are still one or two unknown fields.
File Structure of .sac Files
0x00 "data_index"
0x02 "software_id"
0x06 "version_major"
0x07 "version_minor"
0x08 "second"
0x09 "minute"
0x0a "hour"
0x0b "day"
0x0c "month"
0x0d "year"
0x0f "author"
0x64 "n_timesteps"
0x68 "n_traces"
0x6a "timestep_length"
...
# Not sure what sits from 0x6e to 0xc2.
...
0xc2 "uts_base_s"
0xc6 "uts_base_ms"
# Trace header. Read these 9 bytes for every trace (n_traces).
0xc8 + (n * 0x09) "type"
0xc9 + (n * 0x09) "info_position"
0xcd + (n * 0x09) "data_position"
...
# Trace info. Read these 137 bytes for every trace where type != 0x11.
info_position + 0x00 "data_format"
info_position + 0x02 "y_title"
info_position + 0x0f "y_unit"
info_position + 0x1d "x_title"
info_position + 0x2a "x_unit"
info_position + 0x38 "comment"
info_position + 0x7a "first_mass"
info_position + 0x7e "scan_width"
info_position + 0x80 "values_per_mass"
info_position + 0x81 "zoom_start"
info_position + 0x85 "zoom_end"
...
# UTS offset. Read these 6 bytes for every timestep (n_timesteps).
0xc2 + (n * timestep_length) "uts_offset_s"
0xc6 + (n * timestep_length) "uts_offset_ms"
# Read everything remaining below for every timestep and every trace
# where type != 0x11.
data_position + (n * timestep_length) + 0x00 "n_datapoints"
data_position + (n * timestep_length) + 0x04 "data_range"
# Datapoints. Read these 4 bytes (scan_width * values_per_mass)
# times.
data_position + (n * timestep_length) + 0x06 "datapoints"
...
Code author: Nicolas Vetsch
- yadg.parsers.masstrace.quadstarsac.process(*, fn, **kwargs)
Processes a Quadstar 32-bit analog data .sac file.
- Parameters:
fn (
str
) – The file containing the trace(s) to parse.- Returns:
A
datatree.DataTree
containing onexarray.Dataset
per mass trace. The traces in the Quadstar.sac
files are not named, therefore their index is used as thexarray.Dataset
name.- Return type: