helpers: helper functions for the transform
package
Code author: Peter Kraus, Ueli Sauter
- dgpost.utils.helpers.element_from_formula(f, el)
Given a chemical formula
f
, returns the number of atoms of elementel
in that formula.- Return type
int
- dgpost.utils.helpers.default_element(f)
Given a formula
f
, return the default element for calculating conversion. The priority list is["C", "O", "H"]
.- Return type
str
- dgpost.utils.helpers.name_to_chem(name)
- Return type
str
- dgpost.utils.helpers.columns_to_smiles(**kwargs)
Creates a dictionary with a SMILES representation of all chemicals present among the keys in the kwargs, storing the returned
chemicals.ChemicalMetadata
as well as the full name within args.- Parameters
kwargs (
dict
[str
,dict
[str
,Any
]]) – Adict
containingdict[str, Any]
values. Thestr
keys of the innerdicts
are parsed to SMILES.- Returns
smiles – A new
dict[str, dict]
containing the SMILES of all prefixed chemicals asstr
keys, and the metadata and column specification as thedict
values.- Return type
dict
- dgpost.utils.helpers.electrons_from_smiles(smiles, ions=None)
- Return type
float
- dgpost.utils.helpers.pQ(df, col, unit=None)
Unit-aware dataframe accessor function.
Given a dataframe in
df
and a column name incol
, the function looks through the units stored indf.attrs["units"]
and returns a unit-annotatedpint.Quantity
containing the column data. Alternatively, the data indf[col]
can be annotated by the providedunit
.Note
If
df.attrs
has no units, orcol
is not indf.attrs["units"]
, the returnedpint.Quantity
is dimensionless.- Parameters
df (
DataFrame
) – Apd.DataFrame
, optionally annotated with units indf.attrs
.col (
Union
[str
,tuple
[str
]]) – Thestr
name of the column to be loaded from thedf
.unit (
Optional
[str
]) – Optional override for units.
- Returns
Quantity – Unit-aware
ping.Quantity
object containing the data fromdf[col]
.- Return type
pint.Quantity
- dgpost.utils.helpers.separate_data(data, unit=None)
Separates the data into values, errors and units
- Parameters
data (
Quantity
) – Apint.Quantity
object containing the data points. Can be eitherfloat
oruc.ufloat
.unit (
Optional
[str
]) – When specified, converts the data to this unit.
- Returns
Converted nominal values and errors, and the original unit of the data.
- Return type
(values, errors, old_unit)
- dgpost.utils.helpers.load_data(*cols)
Decorator factory for data loading.
Creates a decorator that will load the columns specified in
cols
and calls the wrapped functionfunc
as appropriate. Thefunc
has to acceptpint.Quantity
objects, return adict[str, pint.Quantity]
, and handle an optional parameter"output"
which prefixes (or assigns) the output data in the returneddict
appropriately.The argument of the decorator is a
list[tuple]
, with each element being a aretuple[str, str, type]
. The first field in thistuple
is thestr
name of the argument of the decoratedfunc
, the secondstr
field denotes the default units for that argument (orNone
for a unitless quantity), and thetype
field allows the use of the decorator with functions that expectlist
of points in the argument (such as trace-processing functions) ordict
ofpint.Quantity
objects (such as functions operating on chemical compositions).The decorator handles the following cases:
the decorated
func
is launched directly, either withkwargs
or with a mixture ofargs
andkwargs
:the
args
are assigned intokwargs
using their position in theargs
andcols
array as provided to the decoratorall elements in
kwargs
that match the argument names in thecols
list
provided to the decorator are converted topint.Quantity
objects, assigning the default units using the data from thecols
list
, unless they are apint.Quantity
already.
decorated
func
is launched with apd.DataFrame
as theargs
and other parameters inkwargs
:the data for the arguments listed in
cols
is sourced from the columns of thepd.DataFrame
, using the providedstr
arguments to find the appropriate columnsif
pd.Index
is provided as the data type, and no column name is provided by the user, the index of thepd.DataFrame
is passed into the called functiondata from unit-aware
pd.DataFrame
objects is loaded using thepQ()
accessor accordinglydata from unit-naive
pd.DataFrame
objects are coerced intopint.Quantity
objects using the default units as specified in thecols
list
- Parameters
cols (
tuple
[str
,str
,type
]) – Alist[tuple[str, str, type]]
containing the column names used to call thefunc
.- Returns
loading – A wrapped version of the decorated
func
.- Return type
Callable
- dgpost.utils.helpers.combine_tables(a, b)
Combine two
pd.DataFrames
into a newpd.DataFrame
.Assumes the
pd.DataFrames
contain apd.MultiIndex
. Automatically pads thepd.MultiIndex
to match the higher number of levels, if necessary. Merges units.- Return type
DataFrame
- dgpost.utils.helpers.arrow_to_multiindex(df, warn=True)
Convert the provided
pd.DataFrame
to adgpost
-compatible format.converts tables with
pd.Index
intopd.MultiIndex
,converts
->
-separated namespaces intopd.MultiIndex
,processes units into nested
dicts
.
- Return type
DataFrame
- dgpost.utils.helpers.keys_in_df(key, df)
Find all columns in the provided
pd.DataFrame
that matchkey
.Returns a
set
of all columns in thedf
which are matched bykey
. Assumes the providedpd.DataFrame
contains apd.MultiIndex
.- Return type
set
[tuple
]
- dgpost.utils.helpers.key_to_tuple(key)
Convert a provided
key
to atuple
for use withpd.DataFrames
containing apd.MultiIndex
.- Return type
tuple
- dgpost.utils.helpers.get_units(key, df)
Given a
key
corresponding to a column in thedf
, return the units. The providedkey
can be both astr
fordf
withpd.Index
, or any otherSequence
for adf
withpd.MultiIndex
.- Return type
Optional
[str
]
- dgpost.utils.helpers.set_units(key, unit, target)
Set the units of
key
tounit
in thetarget
object, which can be either adict
or apd.DataFrame
. See alsoget_units()
.- Return type
None