pivot: reorder tables using columns
Code author: Peter Kraus
The function dgpost.utils.pivot.pivot()
processes the below specification,
allowing the user to “pivot” a given pd.DataFrame
using specified column(s) as
additional indices:
- pydantic model dgbowl_schemas.dgpost.recipe.Pivot
Reorder tables by grouping rows into arrays using columns as indices.
Show JSON schema
{ "title": "Pivot", "description": "Reorder tables by grouping rows into arrays using columns as indices.", "type": "object", "properties": { "table": { "title": "Table", "type": "string" }, "as": { "title": "As", "type": "string" }, "using": { "anyOf": [ { "type": "string" }, { "items": { "type": "string" }, "type": "array" } ], "title": "Using" }, "columns": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "title": "Columns" }, "timestamp": { "default": "first", "enum": [ "first", "last", "mean" ], "title": "Timestamp", "type": "string" }, "timedelta": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Timedelta" } }, "additionalProperties": false, "required": [ "table", "as", "using" ] }
- Config:
extra: str = forbid
populate_by_name: bool = True
- field table: str [Required]
The name of the table loaded in memory to be pivoted.
- field as_: str [Required] (alias 'as')
The name for the resulting table for in memory storage.
- field using: str | Sequence[str] [Required]
A column name (or their sequence) by which the pivoting is performed.
- field columns: Sequence[str] | None = None
A sequence of column names which are to be pivoted.
- field timestamp: Literal['first', 'last', 'mean'] = 'first'
Specification of the resulting timestamp for the pivoted data. For each pivoted row, the
first
orlast
timestamp can be used as index. Alternatively, themean
can be calculated and used as index.
- field timedelta: str | None = None
If provided, the corresponding time deltas for the pivoted data is computed and stored under the provided column name. By default, this data is not computed.
This feature is best illustrated using an below example. Consider an input table in the following format:
uts |
index |
frequency |
impedance |
---|---|---|---|
10000 |
1 |
1e9 |
0.5700 |
10005 |
1 |
2e9 |
0.5500 |
10010 |
1 |
4e9 |
0.5000 |
10015 |
1 |
8e9 |
0.4900 |
10020 |
2 |
1e9 |
0.5740 |
10025 |
2 |
2e9 |
0.5480 |
10030 |
2 |
4e9 |
0.5000 |
10035 |
2 |
8e9 |
0.4950 |
Here, the column index
contains a numerical index of each impedance trace,
with a pair of frequency
and impedance
data in each row. However, for
post-processing in dgpost, it might be useful to re-order the data so that
the traces are grouped in each row:
uts |
index |
frequency |
impedance |
---|---|---|---|
10000 |
1 |
[1e9, 2e9, 4e9, 8e9] |
[0.5700, 0.5500, 0.5000, 0.4900] |
10020 |
2 |
[1e9, 2e9, 4e9, 8e9] |
[0.5740, 0.5480, 0.5000, 0.4950] |
This can be achieved using the following code:
newdf = pivot(df, using="index", columns=["frequency", "impedance"], timestamp="first")
More documentation is provided in the pivot()
function definition.
- dgpost.utils.pivot.pivot(table: DataFrame, using: str | list[str], columns: list[str] | None = None, timestamp: str = 'first', timedelta: str | None = None) DataFrame