electrochem: Electrochemistry data parser
This module handles the reading and processing of files containing electrochemical data, including BioLogic’s EC-Lab file formats.
Note
This interface is not yet final and will change with version 5.0.0
Usage
The usage of electrochem
can be specified by supplying
electrochem
as an argument to the parser
keyword of the dataschema.
The parser supports the following parameters:
- pydantic model dgbowl_schemas.yadg.dataschema_4_1.step.ElectroChem.Params
Show JSON schema
{ "title": "Params", "type": "object", "properties": { "filetype": { "title": "Filetype", "default": "eclab.mpr", "enum": [ "eclab.mpt", "eclab.mpr", "tomato.json" ], "type": "string" } }, "additionalProperties": false }
- field filetype: Literal['eclab.mpt', 'eclab.mpr', 'tomato.json'] = 'eclab.mpr'
Formats
The currently supported file formats are:
EC-Lab raw data binary file and parameter settings
eclabmpr
EC-Lab human-readable text export of data
eclabmpt
tomato’s structured json output
tomatojson
Provides
The basic function of the parser is to:
Read in the technique data and create timesteps.
Collect metadata, such as the measurement settings and the loops contained in a given file.
Collect data describing the technique parameter sequences.
Note
.mpt
files can contain more data than the corresponding binary
.mpr
file.
Most techniques write data that can be understood as a series of timesteps. Each timestep provided by the parser has the following format:
- fn !!str
- uts !!float
- raw:
"{{ col1 }}": !!int
"{{ col2 }}":
{n: !!float, s: !!float, u: !!str}
For impedance spectroscopy techniques (PEIS, GEIS), the data is made up
of spectroscopy traces. The data is thus split into traces by the column
"cycle number"
and each trace is cast into a single timestep. Each trace
now corresponds to a spectroscopy scan, indexed by the technique name (PEIS or
GEIS). The timestep takes the following format:
- fn !!str
- uts !!float
- raw:
traces:
"{{ technique }}":
"{{ col1 }}":
[!!int, ...]
"{{ col2 }}":
{n: [!!float, ...], s: [!!float, ...], u: !!str}
Note
The parsed data may contain infinities (i.e. float("inf")
/
float("-inf")
) or NaNs (i.e. float("nan")
). While datagrams
containing NaN and Inf can be exported and read back using python’s json
module, they are not strictly valid jsons.
TODO
https://github.com/dgbowl/yadg/issues/10
Current values of the uncertainties "s"
are hard-coded from VMP-3 values
of resolutions and accuracies, with math.ulp(n)
as fallback. The values
should be device-specific, and the fallback should be eliminated.
TODO
https://github.com/dgbowl/yadg/issues/11
The “raw” data in electrochemistry files should only contain the raw quantities,
that is the control_I
or control_V
and the measured potentials Ewe
,
Ece
or the measured current I
. Analogous quantities should be recorded
for PEIS/GEIS. All other columns should be computed by yadg.
Metadata
The metadata collected from the raw file will depend on the filetype
. Currently,
no metadata is recorded for tomato.json
filetype. For the eclab.mpt
and
eclab.mpr
filetypes, the metadata will contain a settings
and a params
field:
The settings
field for parsed .mpt
files contains the technique name, a
posix timestamp and the raw header lines as found in the file. The settings
from parsed .mpr
files contain the technique and more explicitly parsed
information than from .mpt
files, like the “cell characteristics” specified
in EC-Lab.
The params
will contain the technique parameter sequences and the
keys in each sequence will be the same independent of filetype
, but
an int
value in the .mpr
file may be a str
when
parsed from the corresponding .mpt
file, since the mapping has not
yet been reverse engineered.
TODO
https://github.com/dgbowl/yadg/issues/12
In .mpr
files, some technique parameters in the settings module
correspond to entries in drop-down lists in EC-Lab. These values are
stored as single-byte values in .mpr
files.
The metadata from parsed ".mpr"
files also provides the "log"
which contains more general parameters, like software, firmware and
server versions, channel number, host address and an acquisition start
timestamp in Microsoft OLE format.
Note
If the .mpr
file contains an ExtDev
module (containing parameters
of any external sensors plugged into the device), the log
is usually
not present and therefore the full timestamp cannot be calculated.
Submodules
eclabmpr: Processing of BioLogic’s EC-Lab binary modular files.
.mpr
files are structured in a set of “modules”, one concerning
settings, one for actual data, one for logs, and an optional loops
module. The parameter sequences can be found in the settings module.
This code is partly an adaptation of the galvani module by Chris Kerr, and builds on the work done by the previous civilian service member working on the project, Jonas Krieger.
These are the implemented techniques for which the technique parameter sequences can be parsed:
CA |
Chronoamperometry / Chronocoulometry |
CP |
Chronopotentiometry |
CV |
Cyclic Voltammetry |
GCPL |
Galvanostatic Cycling with Potential Limitation |
GEIS |
Galvano Electrochemical Impedance Spectroscopy |
LOOP |
Loop |
LSV |
Linear Sweep Voltammetry |
MB |
Modulo Bat |
OCV |
Open Circuit Voltage |
PEIS |
Potentio Electrochemical Impedance Spectroscopy |
WAIT |
Wait |
ZIR |
IR compensation (PEIS) |
File Structure of .mpr Files
At a top level, .mpr files are made up of a number of modules, separated by the MODULE keyword. In all the files I have seen, the first module is the settings module, followed by the data module, the log module and then an optional loop module.
0x0000 BIO-LOGIC MODULAR FILE # File magic.
0x0034 MODULE # Module magic.
... # Module 1.
0x???? MODULE # Module magic.
... # Module 2.
0x???? MODULE # Module magic.
... # Module 3.
0x???? MODULE # Module magic.
... # Module 4.
After splitting the entire file on MODULE, each module starts with a header that is structured like this (offsets from start of module):
0x0000 short_name # Short name, e.g. VMP Set.
0x000A long_name # Longer name, e.g. VMP settings.
0x0023 length # Number of bytes in module data.
0x0027 version # Module version.
0x002B date # Acquisition date in ASCII, e.g. 08/10/21.
... # Module data.
The contents of each module’s data vary wildly depending on the used technique, the module and perhaps the software version, the settings in EC-Lab, etc. Here a quick overview (offsets from start of module data).
Settings Module
0x0000 technique_id # Unique technique ID.
... # ???
0x0007 comments # Pascal string.
... # Zero padding.
# Cell Characteristics.
0x0107 active_material_mass # Mass of active material
0x010B at_x # at x =
0x010F molecular_weight # Molecular weight of active material
0x0113 atomic_weight # Atomic weight of intercalated ion
0x0117 acquisition_start # Acquisition started a: xo =
0x011B e_transferred # Number of e- transferred
0x011E electrode_material # Pascal string.
... # Zero Padding
0x01C0 electrolyte # Pascal string.
... # Zero Padding, ???.
0x0211 electrode_area # Electrode surface area
0x0215 reference_electrode # Pascal string
... # Zero padding
0x024C characteristic_mass # Characteristic mass
... # ???
0x025C battery_capacity # Battery capacity C =
0x0260 battery_capacity_unit # Unit of the battery capacity.
... # ???
# Technique parameters can randomly be found at 0x0572, 0x1845 or
# 0x1846. All you can do is guess and try until it fits.
0x1845 ns # Number of sequences.
0x1847 n_params # Number of technique parameters.
0x1849 params # ns sets of n_params parameters.
... # ???
Data Module
0x0000 n_datapoints # Number of datapoints.
0x0004 n_columns # Number of values per datapoint.
0x0005 column_ids # n_columns unique column IDs.
...
# Depending on module version, datapoints start 0x0195 or 0x0196.
# Length of each datapoint depends on number and IDs of columns.
0x0195 datapoints # n_datapoints points of data.
Log Module
... # ???
0x0009 channel_number # Zero-based channel number.
... # ???
0x00AB channel_sn # Channel serial number.
... # ???
0x01F8 Ewe_ctrl_min # Ewe ctrl range min.
0x01FC Ewe_ctrl_max # Ewe ctrl range max.
... # ???
0x0249 ole_timestamp # Timestamp in OLE format.
0x0251 filename # Pascal String.
... # Zero padding, ???.
0x0351 host # IP address of host, Pascal string.
... # Zero padding.
0x0384 address # IP address / COM port of potentiostat.
... # Zero padding.
0x03B7 ec_lab_version # EC-Lab version (software)
... # Zero padding.
0x03BE server_version # Internet server version (firmware)
... # Zero padding.
0x03C5 interpreter_version # Command interpretor version (firmware)
... # Zero padding.
0x03CF device_sn # Device serial number.
... # Zero padding.
0x0922 averaging_points # Smooth data on ... points.
... # ???
Loop Module
0x0000 n_indexes # Number of loop indexes.
0x0004 indexes # n_indexes indexes at which loops start in data.
... # ???
Structure of Parsed Data
EIS Techniques (PEIS/GEIS)
- fn !!str
- uts !!float
- raw:
traces:
"{{ technique name }}":
"{{ col1 }}":
[!!int, ...]
"{{ col2 }}":
{n: [!!float, ...], s: [!!float, ...], u: !!str}
All Other Techniques
- fn !!str
- uts !!float
- raw:
"{{ col1 }}": !!int
"{{ col2 }}":
{n: !!float, s: !!float, u: !!str}
- yadg.parsers.electrochem.eclabmpr.process(fn, encoding='windows-1252', timezone='localtime')
Processes EC-Lab raw data binary files.
- Parameters
fn (
str
) – The file containing the data to parse.encoding (
str
) – Encoding offn
, by default “windows-1252”.timezone (
str
) – A string description of the timezone. Default is “localtime”.
- Returns
(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and the full date tag. For mpr files, the full date is specified if the “LOG” module is present.
- Return type
tuple[list, dict, bool]
eclabmpt: Processing of BioLogic’s EC-Lab ASCII export files.
.mpt
files are made up of a header portion (with the technique
parameter sequences and an optional loops section) and a tab-separated
data table.
A list of techniques supported by this parser is shown in the techniques table.
File Structure of .mpt Files
These human-readable files are sectioned into headerlines and datalines. The header part at is made up of information that can be found in the settings, log and loop modules of the binary .mpr file.
If no header is present, the timestamps will instead be calculated from the file’s ctime.
Structure of Parsed Data
EIS Techniques (PEIS/GEIS)
- fn !!str
- uts !!float
- raw:
traces:
"{{ technique name }}":
"{{ col1 }}":
[!!int, ...]
"{{ col2 }}":
{n: [!!float, ...], s: [!!float, ...], u: !!str}
All Other Techniques
- fn !!str
- uts !!float
- raw:
"{{ col1 }}": !!int
"{{ col2 }}":
{n: !!float, s: !!float, u: !!str}
- yadg.parsers.electrochem.eclabmpt.process(fn, encoding='windows-1252', timezone='UTC')
Processes EC-Lab human-readable text export files.
- Parameters
fn (
str
) – The file containing the data to parse.encoding (
str
) – Encoding offn
, by default “windows-1252”.timezone (
str
) – A string description of the timezone. Default is “UTC”.
- Returns
(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and the full date tag. For mpt files, the full date might not be specified if header is not present.
- Return type
tuple[list, dict, bool]
tomatojson: Processing of tomato electrochemistry outputs.
This module parses the electrochemistry json
files generated by tomato.
Warning
This parser is brand-new in yadg-4.1 and the interface is unstable.
Four sections are expected in each tomato data file:
technique
section, describing the current technique,previous
section, containing status information of the previous file,current
section, containing status information of the current file,data
section, containing the timesteps.
The reason why both previous
and current
are requires is that the device
status is recorded at the time of data polling, which means the values in current
might be invalid (after the run has finished) or not in sync with the data
(if
a technique change happened). However, previous
may not be present in the first
data file of an experiment.
To determine the measurement errors, the values from BioLogic manual are used: for measured voltages (\(E_{\text{we}}\) and \(E_{\text{ce}}\)) this corresponds to a constant uncertainty of 0.004% of the applied E-range with a maximum of 75 uV, while for currents (\(I\)) this is a constant uncertainty of 0.0015% of the applied I-range with a maximum of 0.76 uA.
- yadg.parsers.electrochem.tomatojson.process(fn, encoding='UTF-8', timezone='UTC')
- Return type
tuple
[list
,dict
,bool
]
eclabtechniques: Parameters for implemented techniques.
- Implemented techniques:
CA - Chronoamperometry / Chronocoulometry CP - Chronopotentiometry CV - Cyclic Voltammetry GCPL - Galvanostatic Cycling with Potential Limitation GEIS - Galvano Electrochemical Impedance Spectroscopy LOOP - Loop LSV - Linear Sweep Voltammetry MB - Modulo Bat OCV - Open Circuit Voltage PEIS - Potentio Electrochemical Impedance Spectroscopy WAIT - Wait ZIR - IR compensation (PEIS)
The module also implements resolution determination for parameters of techniques,
in get_resolution()
.
TODO
https://github.com/dgbowl/yadg/issues/10
Current values of the uncertainties "s"
are hard-coded from VMP-3 values
of resolutions and accuracies, with math.ulp(n)
as fallback. The values
should be device-specific, and the fallback should be eliminated.
- yadg.parsers.electrochem.eclabtechniques.get_resolution(name, value, Erange, Irange)
Function that returns the resolution of a property based on its name, value, E-range and I-range.
The values used here are hard-coded from VMP-3 potentiostats. Generally, the resolution is returned, however in some cases only the accuracy is specified (currently
freq
andPhase
).- Return type
float
- yadg.parsers.electrochem.eclabtechniques.param_from_key(param, key, to_str=True)
Convert a supplied key of a certain parameter to its string or float value.
The function uses the map defined in
param_map
to convert between the entries in the tuples, which contain thestr
value of the parameter (present in.mpt
files), theint
value of the parameter (present in.mpr
files), and the correspondingfloat
value in SI units.- Parameters
param (
str
) – The name of the parameter, a key within theparam_map
. Ifparam
is not present inparam_map
, the supplied key is returned back.key (
Union
[int
,str
]) – The key of the parameter that is to be converted to a different representation.to_str (
bool
) – A switch betweenstr
andfloat
output.
- Returns
key – The key converted to the requested format.
- Return type
Union[str, float, int]
- yadg.parsers.electrochem.eclabtechniques.technique_params(technique, settings)
Constructs the parameter names for different techniques.
- Parameters
technique (
str
) – The full name of the technique.settings (
list
[str
]) – The list of settings from the start of an .mpt or .mps file.
- Returns
The short technique name and a full list of technique parameter names depending on what is present in the file.
- Return type
tuple[str, list]
- yadg.parsers.electrochem.main.process(fn, encoding='windows-1252', timezone='localtime', parameters=None)
Unified parser for electrochemistry data.
- Parameters
fn (
str
) – The file containing the data to parse.encoding (
str
) – Encoding offn
, by default “windows-1252”.timezone (
str
) – A string description of the timezone. Default is “localtime”.parameters (
Optional
[BaseModel
]) – Parameters forElectroChem
.
- Returns
(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag. The currently implemented parsers all return full date.
- Return type
tuple[list, dict, bool]