.. _object_schema:

What is a `schema`
``````````````````
A `schema` is an object defining the files and folders to be processed by 
**yadg**, as well as the types of parsers and the parser options to be applied.
One can think of a `schema` as a representation of a single experiment, 
containing measurements from multiple sources (or devices) and following a 
succession of experimental `steps`.

.. admonition:: TODO

    https://gitlab.empa.ch/krpe/yadg/-/issues/8

    The syntax of `schema` will change from a json dictionary to a YAML file 
    in version 5.0.

An example is a simple catalytic test with a temperature ramp. The goal of such 
an experiment may be to measure the catalytic conversion as a function of 
temperature, and then calculate the activation energy of the catalytic reaction. 
The monitored devices and their filetypes are:

- the inlet flow and pressure meters -> ``csv`` data in ``foo.csv``
- the temperature controller -> ``csv`` data in ``bar.csv``
- the gas chromatograph -> Fusion ``json`` data in ``./GC/`` folder

Despite these three devices measuring concurrently, we would have to specify 
three separate `steps` in the schema to process all relevant output files:

.. code-block:: json

    {
        "metadata": {"provenance": "manual", "schema_version": "1.0"},
        "steps": [{
            "parser": "basiccsv",
            "import": {"files": ["foo.csv"]},
            "tag": "flow",
            "parameters": {}
        },{
            "parser": "basiccsv",
            "import": {"files": ["bar.csv"]}
        },{
            "parser": "chromtrace",
            "import": {"folders": ["./GC/"]},
            "parameters": {"tracetype": "fusion"}
        }]
    }

A valid `schema` is therefore a :class:`(dict)`, with a top-level ``"metadata"``
entry describing the schema provenance and version; and a top-level ``"steps"``
entry, which is a :class:`(list)` containing the definitions for the experimental
`steps`. Each `step` within the `schema` is a :class:`(dict)`. In each `step`, 
the entries ``"parser"`` and ``"import"`` have to be specified, telling **yadg** 
which `parser` to use and which files or folders to process, respectively.

Other allowed entries are: 

- ``"tag"`` :class:`(str)`, a tag describing a certain `step`; 
- ``"export"`` :class:`(str)`, a path defining the location for individual 
  export of `steps`; and 
- ``"parameters"`` :class:`(dict)`, an object for specifying additional 
  parameters for the `parser`.

However, a `schema` can contain more than one `step` with the same ``"parser"``
entry. This is valuable if one wants to split a certain timeseries into smaller
chunks -- in the above example, if we want to determine the activation energy of 
a catalytic reaction, it may be helpful to ensure a new ``csv`` file is created 
each time the temperature setpoint is changed. The `schema` might the look as 
follows:


.. code-block:: json
   :emphasize-lines: 12-14,16-18,20-22

    {
        "metadata": {"provenance": "manual", "schema_version": "1.0"},
        "steps": [{
            "datagram": "basiccsv",
            "import": {"files": ["foo.csv"]},
            "tag": "flow",
            "parameters": {}
        },{
            "datagram": "basiccsv",
            "import": {"files": ["01-temp.csv"]}
        },{
            "datagram": "basiccsv",
            "import": {"files": ["02-temp.csv"]},
            "tag": "340 deg C"
        },{
            "datagram": "basiccsv",
            "import": {"files": ["03-temp.csv"]},
            "tag": "320 deg C"
        },{
            "datagram": "basiccsv",
            "import": {"files": ["04-temp.csv"]},
            "tag": "300 deg C"
        },{
            "datagram": "basiccsv",
            "import": {"files": ["05-temp.csv"]}
        },{
            "datagram": "gctrace",
            "import": {"folders": ["./GC/"]},
            "parameters": {"tracetype": "fusion"}
        }]
    }

From this `schema`, the catalytic conversion can be obtained by combining the
inlet flow and outlet composition (GC) data. The activation energy can then be 
calculated by looking up the conversion corresponding to the conditions at the 
end of each temperature ramp `step` highlighted above, and performing an
Arrhenius fit.

.. note::

    Further information about the `schema` can be found in the documentation of 
    the `schema` validator function: :func:`yadg.core.validators.validate_schema`.
    The whole `schema` specification is present in the :mod:`yadg.core.spec_schema`
    module.