yadg.dgutils package
- yadg.dgutils.get_yadg_metadata()
Returns current yadg metadata.
- Return type:
dict
- yadg.dgutils.now(asstr=False, tz=datetime.timezone.utc)
Wrapper around datetime.now()
A convenience function for returning the current time as a ISO 8601 or as a unix timestamp.
- Return type:
Union
[float
,str
]
- yadg.dgutils.infer_timestamp_from(*, headers=None, spec=None, timezone)
Convenience function for timestamping
Given a set of headers, and an optional specification, return an array containing column indices from which a timestamp in a given row can be computed, as well as the function which will compute the timestamp given the returned array.
- Parameters:
headers (
Optional
[list
]) – An array of strings. If spec is not supplied, must contain either “uts”(float)
or “timestep”(str)
(conforming to ISO 8601).spec (
Optional
[TimestampSpec
]) – A specification of timestamp elements with associated column indices and optional formats. Currently accepted combinations of keys are: “uts”; “timestamp”; “date” and / or “time”.tz – Timezone to use for conversion. By default, UTC is used.
- Returns:
(datecolumns, datefunc, fulldate) – A tuple containing a list of indices of columns, a Callable to which the columns have to be passed to obtain a uts timestamp, and whether the determined timestamp is full or partial.
- Return type:
tuple[list, Callable, bool]
- yadg.dgutils.ole_to_uts(ole_timestamp, timezone)
Converts a Microsoft OLE timestamp into a POSIX timestamp.
The OLE automation date format is a floating point value, counting days since midnight 30 December 1899. Hours and minutes are represented as fractional days.
https://devblogs.microsoft.com/oldnewthing/20030905-02/?p=42653
- Parameters:
ole_timestamp (
float
) – A timestamp in Microsoft OLE format.timezone (
ZoneInfo
) – String desribing the timezone.
- Returns:
time – The corresponding Unix timestamp.
- Return type:
float
- yadg.dgutils.complete_timestamps(*, timesteps, fn, spec, timezone)
Timestamp completing function.
This function allows for completing or overriding the uts timestamps determined by the individual parsers. yadg enters this function for any parser which does not return a full timestamp, as well as if the
externaldate
specification is specified by the user.The
externaldate
specification is as follows:- pydantic model dgbowl_schemas.yadg.dataschema_5_0.externaldate.ExternalDate
Supply timestamping information that are external to the processed file.
Show JSON schema
{ "title": "ExternalDate", "description": "Supply timestamping information that are external to the processed file.", "type": "object", "properties": { "using": { "anyOf": [ { "$ref": "#/$defs/ExternalDateFile" }, { "$ref": "#/$defs/ExternalDateFilename" }, { "$ref": "#/$defs/ExternalDateISOString" }, { "$ref": "#/$defs/ExternalDateUTSOffset" } ], "title": "Using" }, "mode": { "default": "add", "enum": [ "add", "replace" ], "title": "Mode", "type": "string" } }, "$defs": { "ExternalDateFile": { "additionalProperties": false, "description": "Read external date information from file.", "properties": { "file": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content" } }, "required": [ "file" ], "title": "ExternalDateFile", "type": "object" }, "ExternalDateFilename": { "additionalProperties": false, "description": "Read external date information from the file name.", "properties": { "filename": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content" } }, "required": [ "filename" ], "title": "ExternalDateFilename", "type": "object" }, "ExternalDateISOString": { "additionalProperties": false, "description": "Read a constant external date using an ISO-formatted string.", "properties": { "isostring": { "title": "Isostring", "type": "string" } }, "required": [ "isostring" ], "title": "ExternalDateISOString", "type": "object" }, "ExternalDateUTSOffset": { "additionalProperties": false, "description": "Read a constant external date using a Unix timestamp offset.", "properties": { "utsoffset": { "title": "Utsoffset", "type": "number" } }, "required": [ "utsoffset" ], "title": "ExternalDateUTSOffset", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": { "additionalProperties": false, "properties": { "path": { "title": "Path", "type": "string" }, "type": { "title": "Type", "type": "string" }, "match": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Match" } }, "required": [ "path", "type" ], "title": "Content", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": { "additionalProperties": false, "properties": { "format": { "title": "Format", "type": "string" }, "len": { "title": "Len", "type": "integer" } }, "required": [ "format", "len" ], "title": "Content", "type": "object" } }, "additionalProperties": false, "required": [ "using" ] }
- Config:
extra: str = forbid
- field using: ExternalDateFile | ExternalDateFilename | ExternalDateISOString | ExternalDateUTSOffset [Required]
Specification of the external date format.
- field mode: Literal['add', 'replace'] = 'add'
Whether the external timestamps should be added to or should replace the parsed data.
The
using
key specifies how an external timestamp is created. Only one entry inusing
is permitted. By default, this entry is:using: filename: format: "%Y-%m-%d-%H-%M-%S" len: 19
Which means the code will attempt to deduce the timestamp from the path of the processed file (
fn
), using the first 19 characters of the base filename according to the above format (eg. “2021-12-31-13-45-00”).If
file
is specified, the handling of timestamps is handed off totimestamps_from_file()
.The
mode
key specifies whether the offsets determined in this function are added to the current timestamps (eg. date offset being added to time) or whether they should replace the existing timestamps completely.As a measure of last resort, the
mtime
of thefn
is used.mtime
is preferred toctime
, as the former has a more consistent cross-platform behaviour.- Parameters:
timesteps (
list
) – A list of timesteps generated from a single file,fn
.fn (
str
) – Filename used to createtimesteps
.spec (
ExternalDate
) –externaldate
specification part of the schema.timezone (
ZoneInfo
) – Timezone, defaults to “UTC”.
- Return type:
list
[float
]
- yadg.dgutils.update_schema(object)
Yadg’s update worker function.
This is the main function called when yadg is executed as
yadg update
. The main idea is to allow a simple update pathway from older versions of schema anddatagram
files to the current latest and greatest.Currently supports:
updating
DataSchema
version 3.1 to 4.0 using routines inyadg
updating
DataSchema
version 4.0 and above to the latestDataSchema
- Parameters:
object (
Union
[list
,dict
]) – The object to be updated- Returns:
newobj – The updated and validated “datagram” or “schema”.
- Return type:
dict
- yadg.dgutils.schema_from_preset(preset, folder)
- Return type:
- yadg.dgutils.read_value(data, offset, dtype, encoding='windows-1252')
Reads a single value or a set of values from a buffer at a certain offset.
Just a handy wrapper for np.frombuffer(…, count=1) With the added benefit of allowing the ‘pascal’ keyword as an indicator for a length-prefixed string.
The read value is converted to a built-in datatype using np.dtype.item().
- Parameters:
data (
bytes
) – An object that exposes the buffer interface. Here always bytes.offset (
int
) – Start reading the buffer from this offset (in bytes).dtype (
Union
[dtype
,str
]) – Data-type to read in.encoding (
str
) – The encoding of the bytes to be converted.
- Returns:
The unpacked and converted value from the buffer.
- Return type:
Any
- yadg.dgutils.sanitize_units(units)
Unit sanitizer.
This sanitizer should be used where user-supplied units are likely to occur, such as in the parsers
yadg.parsers.basiccsv
. Currently, only two replacements are done:“Bar” is replaced with “bar”
“Deg C” is replace with “degC
Use with caution.
- Parameters:
units (
Union
[str
,dict
[str
,str
],list
[str
]]) – Object containing string units.- Return type:
Union
[str
,dict
[str
,str
],list
[str
]]
Submodules
- yadg.dgutils.btools.read_pascal_string(pascal_bytes, encoding='windows-1252')
Parses a length-prefixed string given some encoding.
- Parameters:
bytes – The bytes of the string starting at the length-prefix byte.
encoding (
str
) – The encoding of the string to be converted.
- Returns:
The string decoded from the input bytes.
- Return type:
str
- yadg.dgutils.btools.read_value(data, offset, dtype, encoding='windows-1252')
Reads a single value or a set of values from a buffer at a certain offset.
Just a handy wrapper for np.frombuffer(…, count=1) With the added benefit of allowing the ‘pascal’ keyword as an indicator for a length-prefixed string.
The read value is converted to a built-in datatype using np.dtype.item().
- Parameters:
data (
bytes
) – An object that exposes the buffer interface. Here always bytes.offset (
int
) – Start reading the buffer from this offset (in bytes).dtype (
Union
[dtype
,str
]) – Data-type to read in.encoding (
str
) – The encoding of the bytes to be converted.
- Returns:
The unpacked and converted value from the buffer.
- Return type:
Any
- yadg.dgutils.dateutils.now(asstr=False, tz=datetime.timezone.utc)
Wrapper around datetime.now()
A convenience function for returning the current time as a ISO 8601 or as a unix timestamp.
- Return type:
Union
[float
,str
]
- yadg.dgutils.dateutils.ole_to_uts(ole_timestamp, timezone)
Converts a Microsoft OLE timestamp into a POSIX timestamp.
The OLE automation date format is a floating point value, counting days since midnight 30 December 1899. Hours and minutes are represented as fractional days.
https://devblogs.microsoft.com/oldnewthing/20030905-02/?p=42653
- Parameters:
ole_timestamp (
float
) – A timestamp in Microsoft OLE format.timezone (
ZoneInfo
) – String desribing the timezone.
- Returns:
time – The corresponding Unix timestamp.
- Return type:
float
- yadg.dgutils.dateutils.str_to_uts(*, timestamp, timezone, format=None, strict=True)
Converts a string to POSIX timestamp.
If the optional
format
is specified, thetimestamp
string is processed using thedatetime.datetime.strptime()
function; if noformat
is supplied, an ISO 8601 format is assumed and an attempt to parse usingdateutil.parser.parse()
is made.- Parameters:
timestamp (
str
) – A string containing the timestamp.format (
Optional
[str
]) – Optional format string for parsing of thetimestamp
.timezone (
ZoneInfo
) – Optional timezone of thetimestamp
. By default, “UTC”.strict (
bool
) – Whether to re-raise any parsing errors.
- Returns:
uts – Returns the POSIX timestamp if successful, otherwise None.
- Return type:
Union[float, None]
- yadg.dgutils.dateutils.infer_timestamp_from(*, headers=None, spec=None, timezone)
Convenience function for timestamping
Given a set of headers, and an optional specification, return an array containing column indices from which a timestamp in a given row can be computed, as well as the function which will compute the timestamp given the returned array.
- Parameters:
headers (
Optional
[list
]) – An array of strings. If spec is not supplied, must contain either “uts”(float)
or “timestep”(str)
(conforming to ISO 8601).spec (
Optional
[TimestampSpec
]) – A specification of timestamp elements with associated column indices and optional formats. Currently accepted combinations of keys are: “uts”; “timestamp”; “date” and / or “time”.tz – Timezone to use for conversion. By default, UTC is used.
- Returns:
(datecolumns, datefunc, fulldate) – A tuple containing a list of indices of columns, a Callable to which the columns have to be passed to obtain a uts timestamp, and whether the determined timestamp is full or partial.
- Return type:
tuple[list, Callable, bool]
- yadg.dgutils.dateutils.complete_timestamps(*, timesteps, fn, spec, timezone)
Timestamp completing function.
This function allows for completing or overriding the uts timestamps determined by the individual parsers. yadg enters this function for any parser which does not return a full timestamp, as well as if the
externaldate
specification is specified by the user.The
externaldate
specification is as follows:- pydantic model dgbowl_schemas.yadg.dataschema_5_0.externaldate.ExternalDate
Supply timestamping information that are external to the processed file.
Show JSON schema
{ "title": "ExternalDate", "description": "Supply timestamping information that are external to the processed file.", "type": "object", "properties": { "using": { "anyOf": [ { "$ref": "#/$defs/ExternalDateFile" }, { "$ref": "#/$defs/ExternalDateFilename" }, { "$ref": "#/$defs/ExternalDateISOString" }, { "$ref": "#/$defs/ExternalDateUTSOffset" } ], "title": "Using" }, "mode": { "default": "add", "enum": [ "add", "replace" ], "title": "Mode", "type": "string" } }, "$defs": { "ExternalDateFile": { "additionalProperties": false, "description": "Read external date information from file.", "properties": { "file": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content" } }, "required": [ "file" ], "title": "ExternalDateFile", "type": "object" }, "ExternalDateFilename": { "additionalProperties": false, "description": "Read external date information from the file name.", "properties": { "filename": { "$ref": "#/$defs/dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content" } }, "required": [ "filename" ], "title": "ExternalDateFilename", "type": "object" }, "ExternalDateISOString": { "additionalProperties": false, "description": "Read a constant external date using an ISO-formatted string.", "properties": { "isostring": { "title": "Isostring", "type": "string" } }, "required": [ "isostring" ], "title": "ExternalDateISOString", "type": "object" }, "ExternalDateUTSOffset": { "additionalProperties": false, "description": "Read a constant external date using a Unix timestamp offset.", "properties": { "utsoffset": { "title": "Utsoffset", "type": "number" } }, "required": [ "utsoffset" ], "title": "ExternalDateUTSOffset", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFile__Content": { "additionalProperties": false, "properties": { "path": { "title": "Path", "type": "string" }, "type": { "title": "Type", "type": "string" }, "match": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Match" } }, "required": [ "path", "type" ], "title": "Content", "type": "object" }, "dgbowl_schemas__yadg__dataschema_5_0__externaldate__ExternalDateFilename__Content": { "additionalProperties": false, "properties": { "format": { "title": "Format", "type": "string" }, "len": { "title": "Len", "type": "integer" } }, "required": [ "format", "len" ], "title": "Content", "type": "object" } }, "additionalProperties": false, "required": [ "using" ] }
- Config:
extra: str = forbid
- field using: ExternalDateFile | ExternalDateFilename | ExternalDateISOString | ExternalDateUTSOffset [Required]
Specification of the external date format.
- field mode: Literal['add', 'replace'] = 'add'
Whether the external timestamps should be added to or should replace the parsed data.
The
using
key specifies how an external timestamp is created. Only one entry inusing
is permitted. By default, this entry is:using: filename: format: "%Y-%m-%d-%H-%M-%S" len: 19
Which means the code will attempt to deduce the timestamp from the path of the processed file (
fn
), using the first 19 characters of the base filename according to the above format (eg. “2021-12-31-13-45-00”).If
file
is specified, the handling of timestamps is handed off totimestamps_from_file()
.The
mode
key specifies whether the offsets determined in this function are added to the current timestamps (eg. date offset being added to time) or whether they should replace the existing timestamps completely.As a measure of last resort, the
mtime
of thefn
is used.mtime
is preferred toctime
, as the former has a more consistent cross-platform behaviour.- Parameters:
timesteps (
list
) – A list of timesteps generated from a single file,fn
.fn (
str
) – Filename used to createtimesteps
.spec (
ExternalDate
) –externaldate
specification part of the schema.timezone (
ZoneInfo
) – Timezone, defaults to “UTC”.
- Return type:
list
[float
]
- yadg.dgutils.dateutils.timestamps_from_file(path, type, match, timezone)
Load timestamps from file.
This function enables loading timestamps from file specified by the
path
. The currently supported file formats includejson
andpkl
, which must contain a top-levelMapping
with a key that is matched bymatch
, or a top-levelIterable
, both containingstr
orfloat
-like objects that can be processed into an Unix timestamp.- Parameters:
path (
str
) – Location of the external file.type (
str
) – Type of the external file. Currently,"json", "pkl"
are supported.match (
str
) – An optional key to match if the object inpath
is aMapping
.timezone (
ZoneInfo
) – An optional timezone string, defaults to “UTC”
- Returns:
parseddata – A single or a list of POSIX timestamps.
- Return type:
Union[float, list[float]]
- yadg.dgutils.helpers.get_yadg_metadata()
Returns current yadg metadata.
- Return type:
dict
- yadg.dgutils.helpers.deprecated(arg, depin='4.2', depout='5.0')
- Return type:
None
pint
compatibility functions in yadg.
This package defines ureg
, a pint.UnitRegistry
used for validation of
datagrams in yadg. The default SI pint.UnitRegistry
is extended
by definitions of fractional quantities (%, ppm, etc.), standard volumetric
quantities (smL/min, sccm), and other dimensionless “units” present in several
file types.
- yadg.dgutils.pintutils.sanitize_units(units)
Unit sanitizer.
This sanitizer should be used where user-supplied units are likely to occur, such as in the parsers
yadg.parsers.basiccsv
. Currently, only two replacements are done:“Bar” is replaced with “bar”
“Deg C” is replace with “degC
Use with caution.
- Parameters:
units (
Union
[str
,dict
[str
,str
],list
[str
]]) – Object containing string units.- Return type:
Union
[str
,dict
[str
,str
],list
[str
]]
- yadg.dgutils.utils.calib_3to4(oldcal, caltype)
- Return type:
dict
- yadg.dgutils.utils.schema_3to4(oldschema)
- Return type:
dict
- yadg.dgutils.utils.update_schema(object)
Yadg’s update worker function.
This is the main function called when yadg is executed as
yadg update
. The main idea is to allow a simple update pathway from older versions of schema anddatagram
files to the current latest and greatest.Currently supports:
updating
DataSchema
version 3.1 to 4.0 using routines inyadg
updating
DataSchema
version 4.0 and above to the latestDataSchema
- Parameters:
object (
Union
[list
,dict
]) – The object to be updated- Returns:
newobj – The updated and validated “datagram” or “schema”.
- Return type:
dict
- yadg.dgutils.utils.schema_from_preset(preset, folder)
- Return type: