dgpost features
Note
For an overview of the data-processing features within dgpost, see the
documentation of the dgpost.transform
module.
Pandas compatibility
One of the design goals of dgpost was to develop a library that can be used with
datagrams, the pd.DataFrames
created by dgpost, as well as with any
other pd.DataFrames
, created e.g. by parsing an xlsx
or csv
file.
This is achieved by placing some necessary requirements on the functions in the
dgpost.transform
module. The key requirements are:
the function must process
pint.Quantity
objects,the function must return data in a
dict[str, pint.Quantity]
format.
If these requirements are met, the decorator function
load_data()
can be used to either extract data
from the supplied pd.DataFrame
, or wrap directly supplied data into
pint.Quantity
objects, and supply those into the called transform
function transparently to the user.
Units and uncertainties
Another key objective of dgpost is to allow and encourage annotating data by units as well as error estimates / uncertainties. The design philosophy here is that by building unit- and uncertainty- awareness into the toolchain, users will be encouraged to use it, and in case of uncertainties, be more thoughtful about the limitations of their data.
As discussed in the
documentation of yadg,
when experimental data is loaded from datagrams, it is annotated with units by
default. In dgpost, the units for the data in each column in each table are stored
as a dict[str, str]
in the "units"
key of the df.attrs
attribute,
and they are extracted and exported appropriately when the table is saved.
If the df.attrs
attribute does not contain the "units"
entry, dgpost assumes
the underlying data is unitless, and the default units selected for each function in
the dgpost.transform
library by its developers are applied to the data.
Internally, all units are handled using yadg’s custom pint.UnitRegistry
,
via the pint library.
Uncertainties are handled using the linear uncertainty propagation library,
uncertainties. As the input data for
the functions in the dgpost.transform
module is passed using
pint.Quantity
objects, which supports the uncetainties.unumpy
arrays, uncertainty handling is generally transparent to both user and developer.
The notable exceptions here are transformations using fitting functions from the
scipy library, where arrays containing
floats
are expected - this has to be handled explicitly by the developer.
When saving tables created in dgpost, the units are appended to the column
names (csv/xlsx
) or stored in the table (pkl/json
), while the uncertainties
may be optionally dropped from the exported table; see dgpost.utils.save
.
Provenance
Provenance tracking is implemented in dgpost using the "meta"
entry of the
df.attrs
attribute of the created pd.DataFrame
. This entry is exported
when the pd.DataFrame
is saved as pkl/json
, and contains dgpost version
information as well as a copy of the recipe used to create the saved object.