basiccsv: Common tabular file parser

Handles the reading and processing of any tabular files, as long as the first line contains the column headers. By default, the second should contain the units. The columns of the table must be separated using a separator such as ,, ;, or \t.

Warning

Since yadg-5.0, the parser handles sparse tables (i.e. tables with missing data) by back-filling empty cells with np.NaNs.

Note

basiccsv attempts to deduce the timestamp from the column headers, using yadg.dgutils.dateutils.infer_timestamp_from(). Alternatively, the column(s) containing the timestamp data and their format can be provided using parameters.

Usage

Available since yadg-4.0. The parser supports the following parameters:

pydantic model dgbowl_schemas.yadg.dataschema_5_0.step.BasicCSV

Customisable tabulated file parser.

Show JSON schema
{
   "title": "BasicCSV",
   "description": "Customisable tabulated file parser.",
   "type": "object",
   "properties": {
      "tag": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Tag"
      },
      "parser": {
         "const": "basiccsv",
         "title": "Parser"
      },
      "input": {
         "$ref": "#/$defs/Input"
      },
      "extractor": {
         "$ref": "#/$defs/NoFileType"
      },
      "parameters": {
         "$ref": "#/$defs/Parameters"
      },
      "externaldate": {
         "anyOf": [
            {
               "$ref": "#/$defs/ExternalDate"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      }
   },
   "$defs": {
      "ExternalDate": {
         "additionalProperties": false,
         "description": "Supply timestamping information that are external to the processed file.",
         "properties": {
            "using": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/ExternalDateFile"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateFilename"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateISOString"
                  },
                  {
                     "$ref": "#/$defs/ExternalDateUTSOffset"
                  }
               ],
               "title": "Using"
            },
            "mode": {
               "default": "add",
               "enum": [
                  "add",
                  "replace"
               ],
               "title": "Mode",
               "type": "string"
            }
         },
         "required": [
            "using"
         ],
         "title": "ExternalDate",
         "type": "object"
      },
      "ExternalDateFile": {
         "additionalProperties": false,
         "description": "Read external date information from file.",
         "properties": {
            "file": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content"
            }
         },
         "required": [
            "file"
         ],
         "title": "ExternalDateFile",
         "type": "object"
      },
      "ExternalDateFilename": {
         "additionalProperties": false,
         "description": "Read external date information from the file name.",
         "properties": {
            "filename": {
               "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content"
            }
         },
         "required": [
            "filename"
         ],
         "title": "ExternalDateFilename",
         "type": "object"
      },
      "ExternalDateISOString": {
         "additionalProperties": false,
         "description": "Read a constant external date using an ISO-formatted string.",
         "properties": {
            "isostring": {
               "title": "Isostring",
               "type": "string"
            }
         },
         "required": [
            "isostring"
         ],
         "title": "ExternalDateISOString",
         "type": "object"
      },
      "ExternalDateUTSOffset": {
         "additionalProperties": false,
         "description": "Read a constant external date using a Unix timestamp offset.",
         "properties": {
            "utsoffset": {
               "title": "Utsoffset",
               "type": "number"
            }
         },
         "required": [
            "utsoffset"
         ],
         "title": "ExternalDateUTSOffset",
         "type": "object"
      },
      "Input": {
         "additionalProperties": false,
         "description": "Specification of input files/folders to be processed by the :class:`Step`.",
         "properties": {
            "folders": {
               "items": {
                  "type": "string"
               },
               "title": "Folders",
               "type": "array"
            },
            "prefix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Prefix"
            },
            "suffix": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Suffix"
            },
            "contains": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Contains"
            },
            "exclude": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Exclude"
            }
         },
         "required": [
            "folders"
         ],
         "title": "Input",
         "type": "object"
      },
      "NoFileType": {
         "additionalProperties": false,
         "properties": {
            "filetype": {
               "const": "None",
               "default": "None",
               "title": "Filetype"
            },
            "timezone": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timezone"
            },
            "locale": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Locale"
            },
            "encoding": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Encoding"
            }
         },
         "title": "NoFileType",
         "type": "object"
      },
      "Parameters": {
         "additionalProperties": false,
         "properties": {
            "sep": {
               "default": ",",
               "title": "Sep",
               "type": "string"
            },
            "strip": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Strip"
            },
            "units": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Units"
            },
            "timestamp": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Timestamp"
                  },
                  {
                     "$ref": "#/$defs/TimeDate"
                  },
                  {
                     "$ref": "#/$defs/UTS"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Timestamp"
            }
         },
         "title": "Parameters",
         "type": "object"
      },
      "TimeDate": {
         "additionalProperties": false,
         "description": "Timestamp from a separate date and/or time column.",
         "properties": {
            "date": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TimestampSpec"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null
            },
            "time": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TimestampSpec"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null
            }
         },
         "title": "TimeDate",
         "type": "object"
      },
      "Timestamp": {
         "additionalProperties": false,
         "description": "Timestamp from a column containing a single timestamp string.",
         "properties": {
            "timestamp": {
               "$ref": "#/$defs/TimestampSpec"
            }
         },
         "required": [
            "timestamp"
         ],
         "title": "Timestamp",
         "type": "object"
      },
      "TimestampSpec": {
         "additionalProperties": false,
         "description": "Specification of the column index and string format of the timestamp.",
         "properties": {
            "index": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Index"
            },
            "format": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Format"
            }
         },
         "title": "TimestampSpec",
         "type": "object"
      },
      "UTS": {
         "additionalProperties": false,
         "description": "Timestamp from a column containing a Unix timestamp.",
         "properties": {
            "uts": {
               "$ref": "#/$defs/TimestampSpec"
            }
         },
         "required": [
            "uts"
         ],
         "title": "UTS",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": {
         "additionalProperties": false,
         "properties": {
            "path": {
               "title": "Path",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "type": "string"
            },
            "match": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Match"
            }
         },
         "required": [
            "path",
            "type"
         ],
         "title": "Content",
         "type": "object"
      },
      "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": {
         "additionalProperties": false,
         "properties": {
            "format": {
               "title": "Format",
               "type": "string"
            },
            "len": {
               "title": "Len",
               "type": "integer"
            }
         },
         "required": [
            "format",
            "len"
         ],
         "title": "Content",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "parser",
      "input"
   ]
}

Config:
  • extra: str = forbid

pydantic model Parameters

Show JSON schema
{
   "title": "Parameters",
   "type": "object",
   "properties": {
      "sep": {
         "default": ",",
         "title": "Sep",
         "type": "string"
      },
      "strip": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Strip"
      },
      "units": {
         "anyOf": [
            {
               "additionalProperties": {
                  "type": "string"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Units"
      },
      "timestamp": {
         "anyOf": [
            {
               "$ref": "#/$defs/Timestamp"
            },
            {
               "$ref": "#/$defs/TimeDate"
            },
            {
               "$ref": "#/$defs/UTS"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Timestamp"
      }
   },
   "$defs": {
      "TimeDate": {
         "additionalProperties": false,
         "description": "Timestamp from a separate date and/or time column.",
         "properties": {
            "date": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TimestampSpec"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null
            },
            "time": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TimestampSpec"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null
            }
         },
         "title": "TimeDate",
         "type": "object"
      },
      "Timestamp": {
         "additionalProperties": false,
         "description": "Timestamp from a column containing a single timestamp string.",
         "properties": {
            "timestamp": {
               "$ref": "#/$defs/TimestampSpec"
            }
         },
         "required": [
            "timestamp"
         ],
         "title": "Timestamp",
         "type": "object"
      },
      "TimestampSpec": {
         "additionalProperties": false,
         "description": "Specification of the column index and string format of the timestamp.",
         "properties": {
            "index": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Index"
            },
            "format": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Format"
            }
         },
         "title": "TimestampSpec",
         "type": "object"
      },
      "UTS": {
         "additionalProperties": false,
         "description": "Timestamp from a column containing a Unix timestamp.",
         "properties": {
            "uts": {
               "$ref": "#/$defs/TimestampSpec"
            }
         },
         "required": [
            "uts"
         ],
         "title": "UTS",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

field sep: str = ','

Separator of table columns.

field strip: str | None = None

A str of characters to strip from headers & data.

field units: Mapping[str, str] | None = None

A dict containing column: unit keypairs.

field timestamp: Timestamp | TimeDate | UTS | None = None

Timestamp specification allowing calculation of Unix timestamp for each table row.

field parser: Literal['basiccsv'] [Required]
field parameters: Parameters [Optional]
field extractor: NoFileType [Optional]

Schema

The primary functionality of basiccsv is to load the tabular data, and determine the Unix timestamp. The headers of the tabular data are taken verbatim from the file, and appear as data_vars of the xarray.Dataset. The single coord for the data_vars is the deduced Unix timestamp, uts.

xr.Dataset:
  coords:
    uts:            !!float               # Unix timestamp
  data_vars:
    {{ headers }}:  (uts)                 # Populated from file headers

Module Functions

yadg.parsers.basiccsv.process(*, fn, encoding, locale, timezone, parameters, **kwargs)

A basic csv parser.

This parser processes a csv file. The header of the csv file consists of one or two lines, with the column headers in the first line and the units in the second. The parser also attempts to parse column names to produce a timestamp, and save all other columns as floats or strings.

Parameters:
  • fn (str) – File to process

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

  • parameters (BaseModel) – Parameters for BasicCSV.

Returns:

No metadata is returned by the basiccsv parser. The full date might not be returned, eg. when only time is specified in columns.

Return type:

xarray.Dataset

Submodules

yadg.parsers.basiccsv.main.process_row(headers, items, datefunc, datecolumns)

A function that processes a row of a table.

This is the main worker function of basiccsv, but is often re-used by any other parser that needs to process tabular data.

Parameters:
  • headers (list) – A list of headers of the table.

  • items (list) – A list of values corresponding to the headers. Must be the same length as headers.

  • units – A dict for looking up the units corresponding to a certain header.

  • datefunc (Callable) – A function that will generate uts given a list of values.

  • datecolumns (list) – Column indices that need to be passed to datefunc to generate uts.

Returns:

A tuple of result dictionaries, with the first element containing the values and the second element containing the deviations of the values.

Return type:

vals, devs

yadg.parsers.basiccsv.main.append_dicts(vals, devs, data, meta, fn=None, li=0)
Return type:

None

yadg.parsers.basiccsv.main.dicts_to_dataset(data, meta, units={}, fulldate=True)
Return type:

Dataset

yadg.parsers.basiccsv.main.process(*, fn, encoding, locale, timezone, parameters, **kwargs)

A basic csv parser.

This parser processes a csv file. The header of the csv file consists of one or two lines, with the column headers in the first line and the units in the second. The parser also attempts to parse column names to produce a timestamp, and save all other columns as floats or strings.

Parameters:
  • fn (str) – File to process

  • encoding (str) – Encoding of fn, by default “utf-8”.

  • timezone (str) – A string description of the timezone. Default is “localtime”.

  • parameters (BaseModel) – Parameters for BasicCSV.

Returns:

No metadata is returned by the basiccsv parser. The full date might not be returned, eg. when only time is specified in columns.

Return type:

xarray.Dataset