extract: extract and interpolate data into tables
Code author: Peter Kraus
The function dgpost.utils.extract.extract()
processes the below specification
in order to extract the required data from the supplied datagram.
- pydantic model dgbowl_schemas.dgpost.recipe.Extract
Extract columns from loaded files into tables, interpolate as necessary.
Show JSON schema
{ "title": "Extract", "description": "Extract columns from loaded files into tables, interpolate as necessary.", "type": "object", "properties": { "into": { "title": "Into", "type": "string" }, "from": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "From" }, "at": { "anyOf": [ { "$ref": "#/$defs/At" }, { "type": "null" } ], "default": null }, "columns": { "anyOf": [ { "items": { "$ref": "#/$defs/Column" }, "type": "array" }, { "type": "null" } ], "default": null, "title": "Columns" }, "constants": { "anyOf": [ { "items": { "$ref": "#/$defs/Constant" }, "type": "array" }, { "type": "null" } ], "default": null, "title": "Constants" } }, "$defs": { "At": { "additionalProperties": false, "properties": { "steps": { "default": null, "items": { "type": "string" }, "title": "Steps", "type": "array" }, "indices": { "default": null, "items": { "type": "integer" }, "title": "Indices", "type": "array" }, "timestamps": { "default": null, "items": { "type": "number" }, "title": "Timestamps", "type": "array" } }, "title": "At", "type": "object" }, "Column": { "additionalProperties": false, "properties": { "key": { "title": "Key", "type": "string" }, "as": { "title": "As", "type": "string" } }, "required": [ "key", "as" ], "title": "Column", "type": "object" }, "Constant": { "additionalProperties": false, "properties": { "value": { "title": "Value" }, "as": { "title": "As", "type": "string" }, "units": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Units" } }, "required": [ "value", "as" ], "title": "Constant", "type": "object" } }, "additionalProperties": false, "required": [ "into" ] }
- Config:
extra: str = forbid
- Validators:
check_one_input
»all fields
-
field into:
str
[Required] Name of a new, or existing / loaded table into which the extraction happens.
- Validated by:
-
field from_:
Optional
[str
] = None (alias 'from') Name of the source object for the extracted data.
- Validated by:
-
field at:
Optional
[At
] = None Specification of the steps (or data indices) from which data is to be extracted.
- Validated by:
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
field columns:
Optional
[Sequence
[Column
]] = None Specifications for the columns to be extracted, including new headers.
- Validated by:
-
field constants:
Optional
[Sequence
[Constant
]] = None Specifications for additional columns containing data constants, including units.
- Validated by:
- validator check_one_input » all fields
Note
The keys from
and into
are not processed by extract()
, they should
be used by its caller to supply the requested datagram
and assign the returned
pd.DataFrame
into the correct variable.
Handling of sparse data depends on the extraction format specified:
for direct extraction, if the value is not present at any of the timesteps specified in
at
, aNaN
is added insteadfor interpolation, if a value is missing at any of the timesteps specified in
at
or in thepd.DataFrame
index, that timestep is masked and the interpolation is performed from neighbouring points
Interpolation of uc.ufloat
is performed separately for the nominal and error
component.
Units are added into the attrs
dictionary of the pd.DataFrame
on a
per-column basis.
Data from multiple datagrams can be combined into one pd.DataFrame
using a
YAML such as the following example:
load:
- as: norm
path: normalized.dg.json
- as: sparse
path: sparse.dg.json
extract:
- into: df
from: norm
at:
step: "a"
columns:
- key: raw->T_f
as: rawT
- into: df
from: sparse
at:
steps: b1, b2, b3
direct:
- key: derived->xout->*
as: xout
In this example, the pd.DataFrame
is created with an index corresponding to
the timestamps of step: "a"
of the datagram. The values specified using columns
in the first section are entered directly, after renaming the column names.
The data pulled out of the datagram in the second step using the prescription in at
are interpolated onto the index of the existing pd.DataFrame
.
- dgpost.utils.extract.get_step(obj, at=None)
- Return type:
Union
[DataFrame
,DataTree
,list
[dict
],None
]
- dgpost.utils.extract.get_constant(spec, ts)
- dgpost.utils.extract.extract(obj, spec, index=None)
- Return type:
DataFrame
- dgpost.utils.extract.extract_obj(obj, columns)
- Return type:
list
[Series
]