basiccsv: Common tabular file parser
This parser handles the reading and processing of any tabular files, as long as
the first line contains the column headers, and the second line an optional
set of units. The columns of the table must be separated using a separator
(,
or ;
or \t
or similar).
A rudimentary column-converting functionality is also included. This allows the user to specify linear combinations of columns, and can be used to apply a calibration to the columnar data.
An attempt to deduce the timestamp from column headers is made automatically,
using yadg.dgutils.dateutils.infer_timestamp_from()
. Alternatively, the
timestamp column(s) and format can be provided using parameters.
Usage
The use of basiccsv
can be requested by supplying
basiccsv
as an argument to the parser
keyword of the dataschema.
The parser supports the following parameters:
- pydantic model dgbowl_schemas.yadg.dataschema_4_1.step.BasicCSV.Params
Show JSON schema
{ "title": "Params", "type": "object", "properties": { "sep": { "title": "Sep", "default": ",", "type": "string" }, "sigma": { "title": "Sigma", "type": "object", "additionalProperties": { "$ref": "#/definitions/Tol" } }, "calfile": { "title": "Calfile", "type": "string" }, "timestamp": { "title": "Timestamp", "anyOf": [ { "$ref": "#/definitions/Timestamp" }, { "$ref": "#/definitions/TimeDate" }, { "$ref": "#/definitions/UTS" } ] }, "convert": { "title": "Convert" }, "units": { "title": "Units", "type": "object", "additionalProperties": { "type": "string" } } }, "additionalProperties": false, "definitions": { "Tol": { "title": "Tol", "type": "object", "properties": { "atol": { "title": "Atol", "type": "number" }, "rtol": { "title": "Rtol", "type": "number" } }, "additionalProperties": false }, "TimestampSpec": { "title": "TimestampSpec", "type": "object", "properties": { "index": { "title": "Index", "type": "integer" }, "format": { "title": "Format", "type": "string" } }, "additionalProperties": false }, "Timestamp": { "title": "Timestamp", "type": "object", "properties": { "timestamp": { "$ref": "#/definitions/TimestampSpec" } }, "required": [ "timestamp" ], "additionalProperties": false }, "TimeDate": { "title": "TimeDate", "type": "object", "properties": { "date": { "$ref": "#/definitions/TimestampSpec" }, "time": { "$ref": "#/definitions/TimestampSpec" } }, "additionalProperties": false }, "UTS": { "title": "UTS", "type": "object", "properties": { "uts": { "$ref": "#/definitions/TimestampSpec" } }, "required": [ "uts" ], "additionalProperties": false } } }
- field sep: str = ','
- field sigma: Optional[Mapping[str, dgbowl_schemas.yadg.dataschema_4_1.parameters.Tol]] = PydanticUndefined
- field calfile: Optional[str] = PydanticUndefined
- field timestamp: Optional[Union[dgbowl_schemas.yadg.dataschema_4_1.timestamp.Timestamp, dgbowl_schemas.yadg.dataschema_4_1.timestamp.TimeDate, dgbowl_schemas.yadg.dataschema_4_1.timestamp.UTS]] = PydanticUndefined
- field convert: Optional[Any] = PydanticUndefined
- field units: Optional[Mapping[str, str]] = PydanticUndefined
Note
The specification of the calibration dictionary that ought to be passed via
convert
(or stored as json in calfile
) is described in
process_row()
.
Note
The calfile
and convert
functionalities allow for combining and
converting the raw data present in the data files into new entries, which
are stored in the derived
entry of each timestep.
Provides
The primary functionality of basiccsv
is to load the tabular
data, and determine the Unix timestamp. The headers of the tabular data are taken
verbatim from the file, and appear as raw
data keys.
Metadata
The metadata section is currently empty.
Submodules
- yadg.parsers.basiccsv.main.process(fn, encoding='utf-8', timezone='localtime', parameters=None)
A basic csv parser.
This parser processes a csv file. The header of the csv file consists of one or two lines, with the column headers in the first line and the units in the second. The parser also attempts to parse column names to produce a timestamp, and save all other columns as floats or strings.
- Parameters
fn (
str
) – File to processencoding (
str
) – Encoding offn
, by default “utf-8”.timezone (
str
) – A string description of the timezone. Default is “localtime”.parameters (
Optional
[BaseModel
]) – Parameters forBasicCSV
.
- Returns
(data, metadata, fulldate) – Tuple containing the timesteps, metadata, and full date tag. No metadata is returned by the basiccsv parser. The full date might not be returned, eg. when only time is specified in columns.
- Return type
tuple[list, dict, bool]
- yadg.parsers.basiccsv.main.process_row(headers, items, units, datefunc, datecolumns, calib={})
A function that processes a row of a table.
This is the main worker function of
basiccsv
, but can be re-used by any other parser that needs to process tabular data.This function processes the
"calib"
parameter, which should be a(dict)
in the following format:- new_name: !!str # derived entry name - old_name: !!str # raw header name - calib: {} # calibration specification fraction: !!float # coefficient for linear combinations of old_name unit: !!str # unit of new_name
The syntax of the calibration specification is detailed in
yadg.dgutils.calib.calib_handler()
.- Parameters
headers (
list
) – A list of headers of the table.items (
list
) – A list of values corresponding to the headers. Must be the same length as headers.units (
dict
) – A dict for looking up the units corresponding to a certain header.datefunc (
Callable
) – A function that will generateuts
given a list of values.datecolumns (
list
) – Column indices that need to be passed todatefunc
to generate uts.calib (
dict
) – Specification for converting raw data inheaders
anditems
to other quantities. Arbitrary linear combinations ofheaders
are possible. See the above section for the specification.
- Returns
element – A result dictionary, containing the keys
"uts"
with a timestamp,"raw"
for all raw data present in the headers, and"derived"
for any data processes viacalib
.- Return type
dict