**********
Definition
**********

Evaluation tools allows to download and pre-process data for use in a challenge.

A evaluation tool artifact is typically stored in a file ``evaluation_tool.yml``.

This schema of the file is described below.

.. jsonschema:: ../../../schemas/core/evaluation_tool.yml

**********************
Create Evaluation Tool
**********************

This guide assumes you already have the done the following:

- Setup the local environment as explained in :doc:`/pages/setup`

An evaluation tool is a piece of code that is able to test the performance of an AI app running on a target hardware.
To do so it sends the data to the AI app using an HTTP API exposed on the target and performs various measurements. It
then stores the results in a output directory. This must minimally be a JSON file with a series of metrics but can also
include for instance a PDF report. Data tools are usually developed with challenges to express the acceptance criterias
of for the AI apps that answer to it.

An evaluation tool at its core it consist of an executable (or script) that runs on the developer workstation and is
capable of reading data, send it to the AI app via an HTTP API, collect the results and build a report . In order to
guarantee portability this script is packaged inside a docker container so that all its dependencies are available on
the system that will run it.

To create an evaluation tool three steps are necessary:

1. Create the evaluation tool script
2. Dockerize the evaluation tool script
3. Create the evaluation tool metadata

The rest of this page describes these steps.

Create the evaluation tool script
---------------------------------

The data tool script must be an executable that can consume the following command line parameters:

- ``--output-dir [path to directory]`` : path to a directory where the data must be stored
- ``--target-url [url]`` : URL where the AI app HTTP API is exposed
- ``--dataset-dir [path to directory]`` or ``--dataset-dir [datasetname] [path to directory]`` (optional) : path
  where some the input datasets are accessible, the first syntax is used if only one is passed, the second if multiple
  are passed.
- ``--parameters [path to JSON file]`` (optional) : path to a JSON file containing the parameters for the script
  provided by the challenge writer
- ``--cache-dir [path to directory]`` (optional): path to directory where intermediate results can be stored

The exact process carried out by the evaluation tool and the parameters accepted depend on the type of evaluation tool
being developed. An evaluation tool must always output a JSON file ``benchmark.json`` that contains an object, each
property of the object is one of the metrics measured by the tool and the key is the resulting value.

The parameters actually required depend on the type of evaluation tool being developed. The ``--dataset-dir`` is used to
pass datasets distributed with the challenge (or the output of the corresponding data tools). The ``--parameters``
parameter is used by generic evaluation tools that can be re-used in multiple challenges. The ``--cache-dir`` is for
by tools that support to resume their execution after a transient error and can be used during development to cache
evaluation results.

The evaluation script must invoke the AI app HTTP interface. Each AI app interface has its own HTTP interface. The
pre-defined interfaces (image classification, object detection, ...)  expect the data to be sent as the body of a POST
request and they return a JSON object with the result of the evaluation. By setting a custom HTTP header ``x-metrics``
to ``all`` it is possible to obtain performance details of the AI app.

Dockerize the evaluation tool script
------------------------------------

Once the evaluation tool script is ready it is necessary to create a docker image containing all the required
dependencies.

This depends on the way the tool has been written, for a python based tool the Dockerfile could look as follows:

.. code-block:: dockerfile

    FROM python

    COPY requirements.txt /app/

    RUN pip3 install -r /app/requirements.txt

    COPY evaluation.py /app/

    ENTRYPOINT [ "python3" , "/app/evaluation.py" ]

It is mandatory to set the script as the entry point of the container with the ``ENTRYPOINT`` directive as this is the
way the container will be invoked by the Bonseyes tooling.

The docker needs then to be built (and potentially stored on a remote registry where other users can download it)::

  $ docker build -t path/to/registry/and/repository/for/tool .
  $ docker push path/to/registry/and/repository/for/tool

Create the evaluation tool metadata
-----------------------------------

.. highlight:: yaml

In order to use the evaluation tool in a challenge some metadata needs to be defined. This allows the system to find
the tool and provides documentation for the tool.

The metadata must be stored in a artifact package, this can be a dedicated package or it can be stored along with the
challenge. To create a new package use you can follow the instructions in :doc:`/pages/dev_guides/package/create_package`.

The evaluation tool metadata consists of a main file typically called ``evaluation_tool.yml`` and an optional schema
for the parameters typically named ``parameters.yml``. The main data tool file has the following structure::

    metadata:
      title: Title for the evaluation tool
      description: |
        Multiline description of the evaluation tool
    image: path/to/registry/and/repository/for/tool
    parameters: relative path to the parameters schema file (optional)
    interface: com_bonseyes/interfaces#image_classification
    input_datasets:
      test_images:
        description: Images and corresponding ground truth used to perform the accuracy test
    metrics:
      accuracy:
        title: Accuracy
        description: Percentage of images correctly classified
        unit: percentage
      latency:
        title: Latency
        description: Average time to perform teh inference on an image
        unit: ms

The ``metadata`` object contains a description of the tool. The ``image`` property points to the image
containing the dockerized tool script. The ``interface`` property specifies the interface that the AI app under test
must implement. The ``metrics`` property specifies a list of metrics that are generated by the tool. The
``input_datasets`` property is used to describe the datasets that the tool requires as input, corresponding entries
must be present in the ``data`` section of the evaluation procedure of the challenge.