basic: For tabulated data

A basic tabulated file extractor.

yadg.extractors.basic.csv module

Handles the reading and processing of any tabular files, as long as the first line contains the column headers. The columns of the table must be separated using a separator such as ,, ;, or \t.

Note

By default, the second line of the file should contain the units. Alternatively, the units can be supplied using extractor parameters, in which case the second line is considered to be data.

Since yadg-5.0, the basic.csv extractor handles sparse tables (i.e. tables with missing data) by back-filling empty cells with np.NaNs.

The basic.csv extractor attempts to deduce the timestamps from the column headers, using yadg.dgutils.dateutils.infer_timestamp_from(). Alternatively, the column(s) containing the timestamp data and their format can be provided using extractor parameters.

Usage

Available since yadg-4.0.

pydantic model dgbowl_schemas.yadg.dataschema_6_0.filetype.Basic_csv
Config:
  • extra: str = forbid

Validators:

pydantic model Parameters
Config:
  • extra: str = forbid

field sep: str = ','

Separator of table columns.

field strip: str | None = None

A str of characters to strip from headers & data.

field units: Mapping[str, str] | None = None

A dict containing column: unit keypairs.

field timestamp: Timestamp | TimeDate | UTS | None = None

Timestamp specification allowing calculation of Unix timestamp for each table row.

field parameters: Parameters [Optional]
field filetype: Literal['basic.csv'] [Required]

Schema

xarray.DataTree:
  coords:
    uts:            !!float               # Unix timestamp
  data_vars:
    {{ headers }}:  (uts)                 # Populated from file headers

Uncertainties

  • all values: string to float conversion

Metadata

No metadata is extracted.

Code author: Peter Kraus

yadg.extractors.basic.csv.extract(source: Path, *, encoding: str, locale: str, timezone: str, parameters: BaseModel, **kwargs: dict) DataTree