Key features of yadg

Units and uncertainties

One of the key features of yadg is the enforced association of units and uncertainties with measured properties. This means that all floating-point values are stored in the format {"n": float, "s": float, "u": str}, where "n" is the nominal value, "s" is the uncertainty / error estimate, and "u" is the unit.

Units

yadg uses the pint package to validate units in the created datagrams. For this, an extended pint.UnitRegistry is exposed in yadg, containing definitions of some quantities present in the raw data files in addition to pint’s standard unit registry. This pint.UnitRegistry should be used in downstream packages which depend on yadg. An arbitrary unit is denoted as " ". See yadg.dgutils.pintutils for more info.

Uncertainties

In many cases it is possible to define more than one uncertainty: for example, accuracy, precision, instrument resolution etc. may be available. The convention in yadg is that when both a measure of within-measurement uncertainty (resolution) and a cross-measurement error (accuracy) are available, "s" corresponds to the instrumental resolution associated with each datapoint, and the accuracy of the measurement (which is normally a higher value than that of the resoution) should be noted in the step metadata.

Unless more information is available, when converting str data to float, the uncertainty is determined from the last significant digit specified in the str. For this, the functionality from within the uncertainties package is used.

When derived data is generated by yadg, error propagation is handled using the linear error propagation functionality as implemented in the uncertainties package.

Timestamping

Another key feature in yadg is the timestamping of all datapoints. The Unix timestamp is used, as it’s the natural timestamp for Python, and with its second resolution it can be easily converted to minutes or hours.

Most of the supported file formats contain a timestamp of some kind. However, several file formats may not define both date and time of each datapoint, or may define neither. That is why yadg includes a powerful “external date” interface, see yadg.dgutils.dateutils.complete_timestamps().

Object validation

Additionally, yadg provides dataschema and datagram validation functionality. The validation of dataschema is handled using a Pydantic model implemented in the dgbowl_schemas.yadg_dataschema package, developed in lockstep with yadg. This Pydantic-based validator class should be used to ensure that the incoming dataschema is valid.

The validation of the created datagram is handled by yadg.core.validators. By default, yadg checks that the datagram conforms to the specification. Among others, the validator ensures that provenance data is included for every operation, and that uncertainties and units are specified for each measurement.