masstrace: Mass spectroscopy trace file parser

The module handles the reading and processing of mass spectrometry files. The basic function of the parser is to:

  1. read in the raw data and create timestamped traces

  2. collect metadata such as the software version, author, etc.

Usage

Select masstrace by supplying it to the parser keyword in the dataschema. The parser supports the following parameters:

pydantic model dgbowl_schemas.yadg.dataschema_4_2.step.MassTrace.Params

Show JSON schema
{
   "title": "Params",
   "type": "object",
   "properties": {
      "filetype": {
         "title": "Filetype",
         "default": "quadstar.sac",
         "enum": [
            "quadstar.sac"
         ],
         "type": "string"
      }
   },
   "additionalProperties": false
}

field filetype: Literal['quadstar.sac'] = 'quadstar.sac'

Formats

The filetypes currently supported by the parser are:

  • Pfeiffer Quadstar 32-bit scan analog data (quadstar.sac), see quadstarsac

Provides

The raw data, loaded from the supplied files, is stored using the following format:

- raw:
    traces:
      "{{ trace_number }}":  # number of the trace
        y_title: !!str       # y-axis label from file
        comment: !!str       # comment
        fsr:     !!str       # full scale range of the detector
        m/z:                 # masses are always in amu
          {n: [!!float, ...], s: [!!float, ...], u: "amu"}
        y:                   # y-axis units from file
          {n: [!!float, ...], s: [!!float, ...], u: !!str}

The uncertainties "s" in m/z are taken as the step-width of the linearly spaced mass values.

The uncertainties "s" of y are the largest value between:

  1. The quantization error from the ADC, its resolution assumed to be 32 bit. Dividing F.S.R. by 2 ** 32 gives an error in the order of magnitude of the smallest data value in y.

  2. The contribution from neighboring masses. In the operating manual of the QMS 200 (see 2.8 QMS 200 F & 2.9 QMS 200 M), a maximum contribution from the neighboring mass of 50 ppm is noted.

Note

The data in y may contain NaN s. The measured ion count/current value will occasionally exceed the specified detector F.S.R. (e.g. 1e-9), and will then flip directly to the maximum value of a float32. These values are set to float("NaN").

Submodules

yadg.parsers.masstrace.main.process(fn, encoding='utf-8', timezone='localtime', parameters=None)

Unified mass spectrometry data parser.

This parser processes mass spectrometry scans in signal(mass) format.

Parameters
  • fn (str) – The file containing the trace(s) to parse.

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

  • parameters (Optional[BaseModel]) – Parameters for MassTrace.

Returns

(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag.

Return type

tuple[list, dict, bool]

quadstarsac: Processing of Quadstar 32-bit scan analog data.

The sac2dat.c code from Dr. Moritz Bubek was a really useful stepping stone for this Python file parser.

Pretty much the entire file format has been reverse engineered. There are still one or two unknown fields.

File Structure of .sac Files

0x00 "data_index"
0x02 "software_id"
0x06 "version_major"
0x07 "version_minor"
0x08 "second"
0x09 "minute"
0x0a "hour"
0x0b "day"
0x0c "month"
0x0d "year"
0x0f "author"
0x64 "n_timesteps"
0x68 "n_traces"
0x6a "timestep_length"
...
# Not sure what sits from 0x6e to 0xc2.
...
0xc2 "uts_base_s"
0xc6 "uts_base_ms"
# Trace header. Read these 9 bytes for every trace (n_traces).
0xc8 + (n * 0x09) "type"
0xc9 + (n * 0x09) "info_position"
0xcd + (n * 0x09) "data_position"
...
# Trace info. Read these 137 bytes for every trace where type != 0x11.
info_position + 0x00 "data_format"
info_position + 0x02 "y_title"
info_position + 0x0f "y_unit"
info_position + 0x1d "x_title"
info_position + 0x2a "x_unit"
info_position + 0x38 "comment"
info_position + 0x7a "first_mass"
info_position + 0x7e "scan_width"
info_position + 0x80 "values_per_mass"
info_position + 0x81 "zoom_start"
info_position + 0x85 "zoom_end"
...
# UTS offset. Read these 6 bytes for every timestep (n_timesteps).
0xc2 + (n * timestep_length) "uts_offset_s"
0xc6 + (n * timestep_length) "uts_offset_ms"
# Read everything remaining below for every timestep and every trace
# where type != 0x11.
data_position + (n * timestep_length) + 0x00 "n_datapoints"
data_position + (n * timestep_length) + 0x04 "data_range"
# Datapoints. Read these 4 bytes (scan_width * values_per_mass)
# times.
data_position + (n * timestep_length) + 0x06 "datapoints"
...

Structure of Parsed Timesteps

- fn:  !!str
- uts: !!float
- raw:
    traces:
      "{{ trace_number }}":  # number of the trace
        y_title:  !!str      # y-axis label from file
        comment:  !!str      # comment
        fsr:      !!str      # full scale range of detector
        m/z:                 # masses are always in amu
          {n: [!!float, ...], s: [!!float, ...], u: "amu"}
        y:                   # y-axis units from file
          {n: [!!float, ...], s: [!!float, ...], u: !!str}

Code author: Nicolas Vetsch <vetschnicolas@gmail.com>

yadg.parsers.masstrace.quadstarsac.process(fn, encoding='utf-8', timezone='localtime')

Processes a Quadstar 32-bit analog data .sac file.

Parameters
  • fn (str) – The file containing the trace(s) to parse.

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

Returns

(data, metadata, common) – Tuple containing the timesteps, metadata, and common data.

Return type

tuple[list, dict, None]