helpers: helper functions for the transform
package
Code author: Peter Kraus, Ueli Sauter
- dgpost.utils.helpers.element_from_formula(f: str, el: str) int
Given a chemical formula
f
, returns the number of atoms of elementel
in that formula.
- dgpost.utils.helpers.default_element(f: str) str
Given a formula
f
, return the default element for calculating conversion. The priority list is["C", "O", "H"]
.
- dgpost.utils.helpers.name_to_chem(name: str) str
- dgpost.utils.helpers.columns_to_smiles(**kwargs: dict[str, dict[str, Any]]) dict
Creates a dictionary with a SMILES representation of all chemicals present among the keys in the kwargs, storing the returned
chemicals.ChemicalMetadata
as well as the full name within args.- Parameters:
kwargs – A
dict
containingdict[str, Any]
values. Thestr
keys of the innerdicts
are parsed to SMILES.- Returns:
smiles – A new
dict[str, dict]
containing the SMILES of all prefixed chemicals asstr
keys, and the metadata and column specification as thedict
values.- Return type:
dict
- dgpost.utils.helpers.electrons_from_smiles(smiles: str, ions: dict | None = None) float
- dgpost.utils.helpers.pQ(df: DataFrame, col: str | tuple[str], unit: str | None = None) Quantity
Unit-aware dataframe accessor function.
Given a dataframe in
df
and a column name incol
, the function looks through the units stored indf.attrs["units"]
and returns a unit-annotatedureg.Quantity
containing the column data. Alternatively, the data indf[col]
can be annotated by the providedunit
.Note
If
df.attrs
has no units, orcol
is not indf.attrs["units"]
, the returnedureg.Quantity
is dimensionless.- Parameters:
df – A
pd.DataFrame
, optionally annotated with units indf.attrs
.col – The
str
name of the column to be loaded from thedf
.unit – Optional override for units.
- Returns:
Quantity – Unit-aware
ping.Quantity
object containing the data fromdf[col]
.- Return type:
ureg.Quantity
- dgpost.utils.helpers.separate_data(data: Quantity, unit: str | None = None) tuple[ndarray, ndarray, str]
Separates the data into values, errors and units
- Parameters:
data – A
ureg.Quantity
object containing the data points. Can be eitherfloat
oruc.ufloat
.unit – When specified, converts the data to this unit.
- Returns:
Converted nominal values and errors, and the original unit of the data.
- Return type:
(values, errors, old_unit)
- dgpost.utils.helpers.load_data(*cols: tuple[str, str, type])
Decorator factory for data loading.
Creates a decorator that will load the columns specified in
cols
and calls the wrapped functionfunc
as appropriate. Thefunc
has to acceptureg.Quantity
objects, return adict[str, ureg.Quantity]
, and handle an optional parameter"output"
which prefixes (or assigns) the output data in the returneddict
appropriately.The argument of the decorator is a
list[tuple]
, with each element being a aretuple[str, str, type]
. The first field in thistuple
is thestr
name of the argument of the decoratedfunc
, the secondstr
field denotes the default units for that argument (orNone
for a unitless quantity), and thetype
field allows the use of the decorator with functions that expectlist
of points in the argument (such as trace-processing functions) ordict
ofureg.Quantity
objects (such as functions operating on chemical compositions).The decorator handles the following cases:
the decorated
func
is launched directly, either withkwargs
or with a mixture ofargs
andkwargs
:the
args
are assigned intokwargs
using their position in theargs
andcols
array as provided to the decoratorall elements in
kwargs
that match the argument names in thecols
list
provided to the decorator are converted toureg.Quantity
objects, assigning the default units using the data from thecols
list
, unless they are aureg.Quantity
already.
decorated
func
is launched with apd.DataFrame
as theargs
and other parameters inkwargs
:the data for the arguments listed in
cols
is sourced from the columns of thepd.DataFrame
, using the providedstr
arguments to find the appropriate columnsif
pd.Index
is provided as the data type, and no column name is provided by the user, the index of thepd.DataFrame
is passed into the called functiondata from unit-aware
pd.DataFrame
objects is loaded using thepQ()
accessor accordinglydata from unit-naive
pd.DataFrame
objects are coerced intoureg.Quantity
objects using the default units as specified in thecols
list
- Parameters:
cols – A
list[tuple[str, str, type]]
containing the column names used to call thefunc
.- Returns:
loading – A wrapped version of the decorated
func
.- Return type:
Callable
- dgpost.utils.helpers.combine_tables(a: DataFrame, b: DataFrame) DataFrame
Combine two
pd.DataFrames
into a newpd.DataFrame
.Assumes the
pd.DataFrames
contain apd.MultiIndex
. Automatically pads thepd.MultiIndex
to match the higher number of levels, if necessary. Merges units.
- dgpost.utils.helpers.arrow_to_multiindex(df: DataFrame, warn: bool = True) DataFrame
Convert the provided
pd.DataFrame
to adgpost
-compatible format.converts tables with
pd.Index
intopd.MultiIndex
,converts
->
-separated namespaces intopd.MultiIndex
,processes units into nested
dicts
.
- dgpost.utils.helpers.keys_in_df(key: str | tuple, df: DataFrame) set[tuple]
Find all columns in the provided
pd.DataFrame
that matchkey
.Returns a
set
of all columns in thedf
which are matched bykey
. Assumes the providedpd.DataFrame
contains apd.MultiIndex
.
- dgpost.utils.helpers.key_to_tuple(key: str | tuple) tuple
Convert a provided
key
to atuple
for use withpd.DataFrames
containing apd.MultiIndex
.
- dgpost.utils.helpers.get_units(key: str | Sequence, df: DataFrame) str | None
Given a
key
corresponding to a column in thedf
, return the units. The providedkey
can be both astr
fordf
withpd.Index
, or any otherSequence
for adf
withpd.MultiIndex
.
- dgpost.utils.helpers.set_units(key: str | Sequence, unit: str | None, target: dict | DataFrame) None
Set the units of
key
tounit
in thetarget
object, which can be either adict
or apd.DataFrame
. See alsoget_units()
.