Developer documentation

The project follows fairly standard developer practices. Every new feature should be associated with a test, and every PR requires linting using flake8 and formatting using black.

Testing

Tests are located in the tests folder of the repository, and are executed using pytest for every commit in every PR.

If a new test requires additional data (input files, schemas, etc.), they can be placed in a folder using the name of the test module (that is, test_drycal.py has its test files in test_drycal folder), or in the common folder for files that may be reused multiple times.

Formatting

All files should be formatted by black. Lines containing text fields, including docstrings, should be between 80-88 characters in length. Imports of functions should be absolute, that is including the yadg. prefix.

Implementing new features

New parsers should be implemented by:

adding their schema into dgbowl_schemas.yadg.dataschema.DataSchema
adding their implementation in a separate Python package under yadg.parsers

Each parser should be documented by adding a structured docstring into the __init__.py file of each new sub-module of yadg.parsers. This documentation should describe the application and usage of the parser, and refer to the Pydantic audotocs via DataSchema to discuss the features exposed via the parameters dictionary. Generally, code implementing the parsing of specific filetypes should be kept separate from the main parser function in the module.

New filetypes should be implemented as sub-modules of their parser. They should be documented using a top-level docstring in the relevant sub-module. If the filetype is binary, a description of the file structure should be provided in the docstring. Every new filetype will have to be added into the filetype module as well.

New extractors can be registered using a shim in the yadg.extractors module, referring to the filetype. The __init__.py of each extractor should expose:

an extract() function which returns an xarray.Dataset,
a set named supports, enumerating all filetypes that can be extracted by the new extractor.

Note that a new extractor requires its filetype to be added in the filetype module as well.