helpers: helper functions for the transform package

Code author: Peter Kraus, Ueli Sauter

dgpost.utils.helpers.element_from_formula(f: str, el: str) int

Given a chemical formula f, returns the number of atoms of element el in that formula.

dgpost.utils.helpers.default_element(f: str) str

Given a formula f, return the default element for calculating conversion. The priority list is ["C", "O", "H"].

dgpost.utils.helpers.name_to_chem(name: str) str
dgpost.utils.helpers.columns_to_smiles(**kwargs: dict[str, dict[str, Any]]) dict

Creates a dictionary with a SMILES representation of all chemicals present among the keys in the kwargs, storing the returned chemicals.ChemicalMetadata as well as the full name within args.

Parameters:

kwargs – A dict containing dict[str, Any] values. The str keys of the inner dicts are parsed to SMILES.

Returns:

smiles – A new dict[str, dict] containing the SMILES of all prefixed chemicals as str keys, and the metadata and column specification as the dict values.

Return type:

dict

dgpost.utils.helpers.electrons_from_smiles(smiles: str, ions: dict | None = None) float
dgpost.utils.helpers.pQ(df: DataFrame, col: str | tuple[str], unit: str | None = None) Quantity

Unit-aware dataframe accessor function.

Given a dataframe in df and a column name in col, the function looks through the units stored in df.attrs["units"] and returns a unit-annotated ureg.Quantity containing the column data. Alternatively, the data in df[col] can be annotated by the provided unit.

Note

If df.attrs has no units, or col is not in df.attrs["units"], the returned ureg.Quantity is dimensionless.

Parameters:
  • df – A pd.DataFrame, optionally annotated with units in df.attrs.

  • col – The str name of the column to be loaded from the df.

  • unit – Optional override for units.

Returns:

Quantity – Unit-aware ping.Quantity object containing the data from df[col].

Return type:

ureg.Quantity

dgpost.utils.helpers.separate_data(data: Quantity, unit: str | None = None) tuple[ndarray, ndarray, str]

Separates the data into values, errors and units

Parameters:
  • data – A ureg.Quantity object containing the data points. Can be either float or uc.ufloat.

  • unit – When specified, converts the data to this unit.

Returns:

Converted nominal values and errors, and the original unit of the data.

Return type:

(values, errors, old_unit)

dgpost.utils.helpers.load_data(*cols: tuple[str, str, type])

Decorator factory for data loading.

Creates a decorator that will load the columns specified in cols and calls the wrapped function func as appropriate. The func has to accept ureg.Quantity objects, return a dict[str, ureg.Quantity], and handle an optional parameter "output" which prefixes (or assigns) the output data in the returned dict appropriately.

The argument of the decorator is a list[tuple], with each element being a are tuple[str, str, type]. The first field in this tuple is the str name of the argument of the decorated func, the second str field denotes the default units for that argument (or None for a unitless quantity), and the type field allows the use of the decorator with functions that expect list of points in the argument (such as trace-processing functions) or dict of ureg.Quantity objects (such as functions operating on chemical compositions).

The decorator handles the following cases:

  • the decorated func is launched directly, either with kwargs or with a mixture of args and kwargs:

    • the args are assigned into kwargs using their position in the args and cols array as provided to the decorator

    • all elements in kwargs that match the argument names in the cols list provided to the decorator are converted to ureg.Quantity objects, assigning the default units using the data from the cols list, unless they are a ureg.Quantity already.

  • decorated func is launched with a pd.DataFrame as the args and other parameters in kwargs:

    • the data for the arguments listed in cols is sourced from the columns of the pd.DataFrame, using the provided str arguments to find the appropriate columns

    • if pd.Index is provided as the data type, and no column name is provided by the user, the index of the pd.DataFrame is passed into the called function

    • data from unit-aware pd.DataFrame objects is loaded using the pQ() accessor accordingly

    • data from unit-naive pd.DataFrame objects are coerced into ureg.Quantity objects using the default units as specified in the cols list

Parameters:

cols – A list[tuple[str, str, type]] containing the column names used to call the func.

Returns:

loading – A wrapped version of the decorated func.

Return type:

Callable

dgpost.utils.helpers.combine_tables(a: DataFrame, b: DataFrame) DataFrame

Combine two pd.DataFrames into a new pd.DataFrame.

Assumes the pd.DataFrames contain a pd.MultiIndex. Automatically pads the pd.MultiIndex to match the higher number of levels, if necessary. Merges units.

dgpost.utils.helpers.arrow_to_multiindex(df: DataFrame, warn: bool = True) DataFrame

Convert the provided pd.DataFrame to a dgpost-compatible format.

  • converts tables with pd.Index into pd.MultiIndex,

  • converts ->-separated namespaces into pd.MultiIndex,

  • processes units into nested dicts.

dgpost.utils.helpers.keys_in_df(key: str | tuple, df: DataFrame) set[tuple]

Find all columns in the provided pd.DataFrame that match key.

Returns a set of all columns in the df which are matched by key. Assumes the provided pd.DataFrame contains a pd.MultiIndex.

dgpost.utils.helpers.key_to_tuple(key: str | tuple) tuple

Convert a provided key to a tuple for use with pd.DataFrames containing a pd.MultiIndex.

dgpost.utils.helpers.get_units(key: str | Sequence, df: DataFrame) str | None

Given a key corresponding to a column in the df, return the units. The provided key can be both a str for df with pd.Index, or any other Sequence for a df with pd.MultiIndex.

dgpost.utils.helpers.set_units(key: str | Sequence, unit: str | None, target: dict | DataFrame) None

Set the units of key to unit in the target object, which can be either a dict or a pd.DataFrame. See also get_units().