basic: For tabulated data
A basic tabulated file extractor.
yadg.extractors.basic.csv module
Handles the reading and processing of any tabular files, as long as the first line
contains the column headers. The columns of the table must be separated using a
separator such as ,
, ;
, or \t
.
Note
By default, the second line of the file should contain the units. Alternatively, the units can be supplied using extractor parameters, in which case the second line is considered to be data.
Since yadg-5.0
, the basic.csv extractor handles sparse tables (i.e. tables with
missing data) by back-filling empty cells with np.NaNs
.
The basic.csv extractor attempts to deduce the timestamps from the column headers,
using yadg.dgutils.dateutils.infer_timestamp_from()
. Alternatively, the column(s)
containing the timestamp data and their format can be provided using extractor
parameters.
Usage
Available since yadg-4.0
.
- pydantic model dgbowl_schemas.yadg.dataschema_6_0.filetype.Basic_csv
- Config:
extra: str = forbid
- Validators:
- pydantic model Parameters
- Config:
extra: str = forbid
- field sep: str = ','
Separator of table columns.
- field strip: str | None = None
A
str
of characters to strip from headers & data.
- field units: Mapping[str, str] | None = None
A
dict
containingcolumn: unit
keypairs.
- field parameters: Parameters [Optional]
- field filetype: Literal['basic.csv'] [Required]
Schema
xarray.DataTree:
coords:
uts: !!float # Unix timestamp
data_vars:
{{ headers }}: (uts) # Populated from file headers
Metadata
No metadata is extracted.
Code author: Peter Kraus
- yadg.extractors.basic.csv.process_row(headers: list, items: list, datefunc: Callable, datecolumns: list[int], locale: str = 'en_GB') tuple[dict, dict]
A function that processes a row of a table.
This is the main worker function of
basic.csv
module, but is often re-used by any other parser that needs to process tabular data.- Parameters:
headers – A list of headers of the table.
items – A list of values corresponding to the headers. Must be the same length as headers.
datefunc – A function that will generate
uts
given a list of values.datecolumns – Column indices that need to be passed to
datefunc
to generate uts.
- Returns:
A tuple of result dictionaries, with the first element containing the values and the second element containing the uncertainties of the values.
- Return type:
vals, devs