Developer documentation
The project follows fairly standard developer practices. Every new feature should be associated with a test, and every PR requires linting using flake8
and formatting using black
.
Testing
Tests are located in the tests
folder of the repository, and are executed using pytest
for every commit in every PR.
If a new test requires additional data (input files, schemas, etc.), they can be placed in a folder using the name of the test module (that is, test_drycal.py
has its test files in test_drycal
folder), or in the common
folder for files that may be reused multiple times.
Formatting
All files should be formatted by black
. Lines containing text fields, including docstrings, should be between 80-88 characters in length. Imports of functions should be absolute, that is including the yadg.
prefix.
Implementing new features
New parsers should be implemented by:
adding their schema into
dgbowl_schemas.yadg.dataschema.DataSchema
adding their implementation in a separate Python package under
yadg.parsers
Each parser should be documented by adding a structured docstring into the __init__.py
file of each new sub-module of yadg.parsers
. This documentation should describe the application and usage of the parser, and refer to the Pydantic audotocs via DataSchema
to discuss the features exposed via the parameters dictionary. Generally, code implementing the parsing of specific filetypes should be kept separate from the main parser function in the module.
New filetypes should be implemented as sub-modules of their parser. They should be documented using a top-level docstring in the relevant sub-module. If the filetype is binary, a description of the file structure should be provided in the docstring. Every new filetype will have to be added into the filetype
module as well.
New extractors can be registered using a shim in the yadg.extractors
module, referring to the filetype. The __init__.py
of each extractor should expose:
an
extract()
function which returns anxarray.Dataset
,a
set
namedsupports
, enumerating all filetypes that can be extracted by the new extractor.
Note that a new extractor requires its filetype to be added in the filetype
module as well.