dgpost features

Note

For an overview of the data-processing features within dgpost, see the documentation of the dgpost.transform module.

Pandas compatibility

One of the design goals of dgpost was to develop a library that can be used with datagrams, the pd.DataFrames created by dgpost, as well as with any other pd.DataFrames, created e.g. by parsing an xlsx or csv file.

This is achieved by placing some necessary requirements on the functions in the dgpost.transform module. The key requirements are:

the function must process pint.Quantity objects,
the function must return data in a dict[str, pint.Quantity] format.

If these requirements are met, the decorator function load_data() can be used to either extract data from the supplied pd.DataFrame, or wrap directly supplied data into pint.Quantity objects, and supply those into the called transform function transparently to the user.

Units and uncertainties

Another key objective of dgpost is to allow and encourage annotating data by units as well as error estimates / uncertainties. The design philosophy here is that by building unit- and uncertainty- awareness into the toolchain, users will be encouraged to use it, and in case of uncertainties, be more thoughtful about the limitations of their data.

As discussed in the documentation of yadg, when experimental data is loaded from datagrams, it is annotated with units by default. In dgpost, the units for the data in each column in each table are stored as a dict[str, str] in the "units" key of the df.attrs attribute, and they are extracted and exported appropriately when the table is saved.

If the df.attrs attribute does not contain the "units" entry, dgpost assumes the underlying data is unitless, and the default units selected for each function in the dgpost.transform library by its developers are applied to the data. Internally, all units are handled using yadg’s custom pint.UnitRegistry, via the pint library.

Uncertainties are handled using the linear uncertainty propagation library, uncertainties. As the input data for the functions in the dgpost.transform module is passed using pint.Quantity objects, which supports the uncetainties.unumpy arrays, uncertainty handling is generally transparent to both user and developer. The notable exceptions here are transformations using fitting functions from the scipy library, where arrays containing floats are expected - this has to be handled explicitly by the developer.

When saving tables created in dgpost, the units are appended to the column names (csv/xlsx) or stored in the table (pkl/json), while the uncertainties may be optionally dropped from the exported table; see dgpost.utils.save.

Provenance

Provenance tracking is implemented in dgpost using the "meta" entry of the df.attrs attribute of the created pd.DataFrame. This entry is exported when the pd.DataFrame is saved as pkl/json, and contains dgpost version information as well as a copy of the recipe used to create the saved object.