What is a schema

A schema is an object defining the files and folders to be processed by yadg, as well as the types of parsers and the parser options to be applied. One can think of a schema as a representation of a single experiment, containing measurements from multiple sources (or devices) and following a succession of experimental steps.

TODO

https://gitlab.empa.ch/krpe/yadg/-/issues/8

The syntax of schema will change from a json dictionary to a YAML file in version 5.0.

An example is a simple catalytic test with a temperature ramp. The goal of such an experiment may be to measure the catalytic conversion as a function of temperature, and then calculate the activation energy of the catalytic reaction. The monitored devices and their filetypes are:

the inlet flow and pressure meters -> csv data in foo.csv
the temperature controller -> csv data in bar.csv
the gas chromatograph -> Fusion json data in ./GC/ folder

Despite these three devices measuring concurrently, we would have to specify three separate steps in the schema to process all relevant output files:

{
    "metadata": {"provenance": "manual", "schema_version": "1.0"},
    "steps": [{
        "parser": "basiccsv",
        "import": {"files": ["foo.csv"]},
        "tag": "flow",
        "parameters": {}
    },{
        "parser": "basiccsv",
        "import": {"files": ["bar.csv"]}
    },{
        "parser": "chromtrace",
        "import": {"folders": ["./GC/"]},
        "parameters": {"tracetype": "fusion"}
    }]
}

A valid schema is therefore a (dict), with a top-level "metadata" entry describing the schema provenance and version; and a top-level "steps" entry, which is a (list) containing the definitions for the experimental steps. Each step within the schema is a (dict). In each step, the entries "parser" and "import" have to be specified, telling yadg which parser to use and which files or folders to process, respectively.

Other allowed entries are:

"tag" (str), a tag describing a certain step;
"export" (str), a path defining the location for individual export of steps; and
"parameters" (dict), an object for specifying additional parameters for the parser.

However, a schema can contain more than one step with the same "parser" entry. This is valuable if one wants to split a certain timeseries into smaller chunks – in the above example, if we want to determine the activation energy of a catalytic reaction, it may be helpful to ensure a new csv file is created each time the temperature setpoint is changed. The schema might the look as follows:

 {
     "metadata": {"provenance": "manual", "schema_version": "1.0"},
     "steps": [{
         "datagram": "basiccsv",
         "import": {"files": ["foo.csv"]},
         "tag": "flow",
         "parameters": {}
     },{
         "datagram": "basiccsv",
         "import": {"files": ["01-temp.csv"]}
     },{
         "datagram": "basiccsv",
         "import": {"files": ["02-temp.csv"]},
         "tag": "340 deg C"
     },{
         "datagram": "basiccsv",
         "import": {"files": ["03-temp.csv"]},
         "tag": "320 deg C"
     },{
         "datagram": "basiccsv",
         "import": {"files": ["04-temp.csv"]},
         "tag": "300 deg C"
     },{
         "datagram": "basiccsv",
         "import": {"files": ["05-temp.csv"]}
     },{
         "datagram": "gctrace",
         "import": {"folders": ["./GC/"]},
         "parameters": {"tracetype": "fusion"}
     }]
 }

From this schema, the catalytic conversion can be obtained by combining the inlet flow and outlet composition (GC) data. The activation energy can then be calculated by looking up the conversion corresponding to the conditions at the end of each temperature ramp step highlighted above, and performing an Arrhenius fit.

Note

Further information about the schema can be found in the documentation of the schema validator function: yadg.core.validators.validate_schema(). The whole schema specification is present in the yadg.core.spec_schema module.