basic: For tabulated data

A basic tabulated file extractor.

yadg.extractors.basic.csv module

Handles the reading and processing of any tabular files, as long as the first line contains the column headers. The columns of the table must be separated using a separator such as ,, ;, or \t.

Note

By default, the second line of the file should contain the units. Alternatively, the units can be supplied using extractor parameters, in which case the second line is considered to be data.

Since yadg-5.0, the basic.csv extractor handles sparse tables (i.e. tables with missing data) by back-filling empty cells with np.NaNs.

The basic.csv extractor attempts to deduce the timestamps from the column headers, using yadg.dgutils.dateutils.infer_timestamp_from(). Alternatively, the column(s) containing the timestamp data and their format can be provided using extractor parameters.

Usage

Available since yadg-4.0.

pydantic model dgbowl_schemas.yadg.dataschema_6_0.filetype.Basic_csv

Config:

extra: str = forbid

Validators:

pydantic model Parameters

Config:

extra: str = forbid

field sep: str = ',': Separator of table columns.

field strip: str | None = None: A str of characters to strip from headers & data.

field units: Mapping[str, str] | None = None: A dict containing column: unit keypairs.

field timestamp: Timestamp | TimeDate | UTS | None = None: Timestamp specification allowing calculation of Unix timestamp for each table row.

field parameters: Parameters [Optional]

field filetype: Literal['basic.csv'] [Required]

Schema

xarray.DataTree:
  coords:
    uts:            !!float               # Unix timestamp
  data_vars:
    {{ headers }}:  (uts)                 # Populated from file headers

Metadata

No metadata is extracted.

Code author: Peter Kraus

yadg.extractors.basic.csv.process_row(headers: list, items: list, datefunc: Callable, datecolumns: list[int], locale: str = 'en_GB') → tuple[dict, dict]

A function that processes a row of a table.

This is the main worker function of basic.csv module, but is often re-used by any other parser that needs to process tabular data.

Parameters:

headers – A list of headers of the table.
items – A list of values corresponding to the headers. Must be the same length as headers.
datefunc – A function that will generate uts given a list of values.
datecolumns – Column indices that need to be passed to datefunc to generate uts.

Returns:

A tuple of result dictionaries, with the first element containing the values and the second element containing the uncertainties of the values.

Return type:

vals, devs

yadg.extractors.basic.csv.extract(source: Path, *, encoding: str, locale: str, timezone: str, parameters: BaseModel, **kwargs: dict) → DataTree