pivot: reorder tables using columns

Code author: Peter Kraus

The function dgpost.utils.pivot.pivot() processes the below specification, allowing the user to “pivot” a given pd.DataFrame using specified column(s) as additional indices:

pydantic model dgbowl_schemas.dgpost.recipe.Pivot

Reorder tables by grouping rows into arrays using columns as indices.

Show JSON schema
{
   "title": "Pivot",
   "description": "Reorder tables by grouping rows into arrays using columns as indices.",
   "type": "object",
   "properties": {
      "table": {
         "title": "Table",
         "type": "string"
      },
      "as": {
         "title": "As",
         "type": "string"
      },
      "using": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            }
         ],
         "title": "Using"
      },
      "columns": {
         "anyOf": [
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Columns"
      },
      "timestamp": {
         "default": "first",
         "enum": [
            "first",
            "last",
            "mean"
         ],
         "title": "Timestamp",
         "type": "string"
      },
      "timedelta": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Timedelta"
      }
   },
   "additionalProperties": false,
   "required": [
      "table",
      "as",
      "using"
   ]
}

Config:
  • extra: str = forbid

  • populate_by_name: bool = True

field table: str [Required]

The name of the table loaded in memory to be pivoted.

field as_: str [Required] (alias 'as')

The name for the resulting table for in memory storage.

field using: str | Sequence[str] [Required]

A column name (or their sequence) by which the pivoting is performed.

field columns: Sequence[str] | None = None

A sequence of column names which are to be pivoted.

field timestamp: Literal['first', 'last', 'mean'] = 'first'

Specification of the resulting timestamp for the pivoted data. For each pivoted row, the first or last timestamp can be used as index. Alternatively, the mean can be calculated and used as index.

field timedelta: str | None = None

If provided, the corresponding time deltas for the pivoted data is computed and stored under the provided column name. By default, this data is not computed.

This feature is best illustrated using an below example. Consider an input table in the following format:

uts

index

frequency

impedance

10000

1

1e9

0.5700

10005

1

2e9

0.5500

10010

1

4e9

0.5000

10015

1

8e9

0.4900

10020

2

1e9

0.5740

10025

2

2e9

0.5480

10030

2

4e9

0.5000

10035

2

8e9

0.4950

Here, the column index contains a numerical index of each impedance trace, with a pair of frequency and impedance data in each row. However, for post-processing in dgpost, it might be useful to re-order the data so that the traces are grouped in each row:

uts

index

frequency

impedance

10000

1

[1e9, 2e9, 4e9, 8e9]

[0.5700, 0.5500, 0.5000, 0.4900]

10020

2

[1e9, 2e9, 4e9, 8e9]

[0.5740, 0.5480, 0.5000, 0.4950]

This can be achieved using the following code:

newdf = pivot(df, using="index", columns=["frequency", "impedance"], timestamp="first")

More documentation is provided in the pivot() function definition.

dgpost.utils.pivot.pivot(table: DataFrame, using: str | list[str], columns: list[str] | None = None, timestamp: str = 'first', timedelta: str | None = None) DataFrame