Developer documentation

The project follows fairly standard developer practices. Every new feature should be associated with a test, and every PR should be formatted using automated formatter.

Testing

Tests are located in the tests folder of the repository, and are executed using pytest for every commit in every PR.

If a new test requires additional data (input files, schemas, etc.), they can be placed in a folder using the name of the test module (that is, test_drycal.py has its test files in test_drycal folder), or in the common folder for files that may be reused multiple times.

Several convenience functions are provided in the utils.py module:

  • datadir, implementing the above mentioned external test data functionality,

  • datagram_from_input, wrapping a simple (dict)-based input for tests into a schema, that is then parsed into a datagram

  • standard_datagram_test, which checks the validity of the returned datagram

  • compare_result_dicts, which compares a reference dictionary in the {"n": float, "s": float, "u": str} format with that in a datagram

Formatting

All files should be formatted by black. Lines containing text fields, including docstrings, should be between 80-88 characters in length. Imports of functions should be absolute, that is including the yadg. prefix.

Implementing new parsers

New parsers should be implemented by:

  • adding their schema into dgbowl_schemas.yadg.DataSchema

  • adding their implementation in a separate Python package under yadg.parsers

Generally, specific filetype parsers should be kept separate from the main parser function in the module.

Documentation

Each parser should be documented by adding a structured docstring into the __init__.py file of each parser module. This documentation should describe the application and usage of the parser, and refer to the Pydantic audotocs via DataSchema to discuss the features exposed via the parameters dictionary. Finally, a short summary of the quantities provided in the "raw" and "derived" entries should be included, and whether any "metadata" are exposed.

Each file type of each parser should be documented as a top-level docstring in the relevant module. If the file is binary, a description of the file structure should be provided in the docstring.