pivot: reorder tables using columns

Code author: Peter Kraus

The function dgpost.utils.pivot.pivot() processes the below specification, allowing the user to “pivot” a given pd.DataFrame using specified column(s) as additional indices:

pydantic model dgbowl_schemas.dgpost.recipe.Pivot

Reorder tables by grouping rows into arrays using columns as indices.

Show JSON schema

{
   "title": "Pivot",
   "description": "Reorder tables by grouping rows into arrays using columns as indices.",
   "type": "object",
   "properties": {
      "table": {
         "title": "Table",
         "type": "string"
      },
      "as": {
         "title": "As",
         "type": "string"
      },
      "using": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            }
         ],
         "title": "Using"
      },
      "columns": {
         "anyOf": [
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Columns"
      },
      "timestamp": {
         "default": "first",
         "enum": [
            "first",
            "last",
            "mean"
         ],
         "title": "Timestamp",
         "type": "string"
      },
      "timedelta": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Timedelta"
      }
   },
   "additionalProperties": false,
   "required": [
      "table",
      "as",
      "using"
   ]
}

Config:

extra: str = forbid
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

field table: str [Required]: The name of the table loaded in memory to be pivoted.

field as_: str [Required] (alias 'as'): The name for the resulting table for in memory storage.

field using: str | Sequence[str] [Required]: A column name (or their sequence) by which the pivoting is performed.

field columns: Sequence[str] | None = None: A sequence of column names which are to be pivoted.

field timestamp: Literal['first', 'last', 'mean'] = 'first': Specification of the resulting timestamp for the pivoted data. For each pivoted row, the first or last timestamp can be used as index. Alternatively, the mean can be calculated and used as index.

field timedelta: str | None = None: If provided, the corresponding time deltas for the pivoted data is computed and stored under the provided column name. By default, this data is not computed.

This feature is best illustrated using an below example. Consider an input table in the following format:

uts	index	frequency	impedance
10000	1	1e9	0.5700
10005	1	2e9	0.5500
10010	1	4e9	0.5000
10015	1	8e9	0.4900
10020	2	1e9	0.5740
10025	2	2e9	0.5480
10030	2	4e9	0.5000
10035	2	8e9	0.4950

Here, the column index contains a numerical index of each impedance trace, with a pair of frequency and impedance data in each row. However, for post-processing in dgpost, it might be useful to re-order the data so that the traces are grouped in each row:

uts	index	frequency	impedance
10000	1	[1e9, 2e9, 4e9, 8e9]	[0.5700, 0.5500, 0.5000, 0.4900]
10020	2	[1e9, 2e9, 4e9, 8e9]	[0.5740, 0.5480, 0.5000, 0.4950]

This can be achieved using the following code:

newdf = pivot(df, using="index", columns=["frequency", "impedance"], timestamp="first")

More documentation is provided in the pivot() function definition.

dgpost.utils.pivot.pivot(table: DataFrame, using: str | list[str], columns: list[str] | None = None, timestamp: str = 'first', timedelta: str | None = None) → DataFrame