********** Definition ********** Evaluation tools allows to download and pre-process data for use in a challenge. A evaluation tool artifact is typically stored in a file ``evaluation_tool.yml``. This schema of the file is described below. .. jsonschema:: ../../../schemas/core/evaluation_tool.yml ********************** Create Evaluation Tool ********************** This guide assumes you already have the done the following: - Setup the local environment as explained in :doc:`/pages/setup` An evaluation tool is a piece of code that is able to test the performance of an AI app running on a target hardware. To do so it sends the data to the AI app using an HTTP API exposed on the target and performs various measurements. It then stores the results in a output directory. This must minimally be a JSON file with a series of metrics but can also include for instance a PDF report. Data tools are usually developed with challenges to express the acceptance criterias of for the AI apps that answer to it. An evaluation tool at its core it consist of an executable (or script) that runs on the developer workstation and is capable of reading data, send it to the AI app via an HTTP API, collect the results and build a report . In order to guarantee portability this script is packaged inside a docker container so that all its dependencies are available on the system that will run it. To create an evaluation tool three steps are necessary: 1. Create the evaluation tool script 2. Dockerize the evaluation tool script 3. Create the evaluation tool metadata The rest of this page describes these steps. Create the evaluation tool script --------------------------------- The data tool script must be an executable that can consume the following command line parameters: - ``--output-dir [path to directory]`` : path to a directory where the data must be stored - ``--target-url [url]`` : URL where the AI app HTTP API is exposed - ``--dataset-dir [path to directory]`` or ``--dataset-dir [datasetname] [path to directory]`` (optional) : path where some the input datasets are accessible, the first syntax is used if only one is passed, the second if multiple are passed. - ``--parameters [path to JSON file]`` (optional) : path to a JSON file containing the parameters for the script provided by the challenge writer - ``--cache-dir [path to directory]`` (optional): path to directory where intermediate results can be stored The exact process carried out by the evaluation tool and the parameters accepted depend on the type of evaluation tool being developed. An evaluation tool must always output a JSON file ``benchmark.json`` that contains an object, each property of the object is one of the metrics measured by the tool and the key is the resulting value. The parameters actually required depend on the type of evaluation tool being developed. The ``--dataset-dir`` is used to pass datasets distributed with the challenge (or the output of the corresponding data tools). The ``--parameters`` parameter is used by generic evaluation tools that can be re-used in multiple challenges. The ``--cache-dir`` is for by tools that support to resume their execution after a transient error and can be used during development to cache evaluation results. The evaluation script must invoke the AI app HTTP interface. Each AI app interface has its own HTTP interface. The pre-defined interfaces (image classification, object detection, ...) expect the data to be sent as the body of a POST request and they return a JSON object with the result of the evaluation. By setting a custom HTTP header ``x-metrics`` to ``all`` it is possible to obtain performance details of the AI app. Dockerize the evaluation tool script ------------------------------------ Once the evaluation tool script is ready it is necessary to create a docker image containing all the required dependencies. This depends on the way the tool has been written, for a python based tool the Dockerfile could look as follows: .. code-block:: dockerfile FROM python COPY requirements.txt /app/ RUN pip3 install -r /app/requirements.txt COPY evaluation.py /app/ ENTRYPOINT [ "python3" , "/app/evaluation.py" ] It is mandatory to set the script as the entry point of the container with the ``ENTRYPOINT`` directive as this is the way the container will be invoked by the Bonseyes tooling. The docker needs then to be built (and potentially stored on a remote registry where other users can download it):: $ docker build -t path/to/registry/and/repository/for/tool . $ docker push path/to/registry/and/repository/for/tool Create the evaluation tool metadata ----------------------------------- .. highlight:: yaml In order to use the evaluation tool in a challenge some metadata needs to be defined. This allows the system to find the tool and provides documentation for the tool. The metadata must be stored in a artifact package, this can be a dedicated package or it can be stored along with the challenge. To create a new package use you can follow the instructions in :doc:`/pages/dev_guides/package/create_package`. The evaluation tool metadata consists of a main file typically called ``evaluation_tool.yml`` and an optional schema for the parameters typically named ``parameters.yml``. The main data tool file has the following structure:: metadata: title: Title for the evaluation tool description: | Multiline description of the evaluation tool image: path/to/registry/and/repository/for/tool parameters: relative path to the parameters schema file (optional) interface: com_bonseyes/interfaces#image_classification input_datasets: test_images: description: Images and corresponding ground truth used to perform the accuracy test metrics: accuracy: title: Accuracy description: Percentage of images correctly classified unit: percentage latency: title: Latency description: Average time to perform teh inference on an image unit: ms The ``metadata`` object contains a description of the tool. The ``image`` property points to the image containing the dockerized tool script. The ``interface`` property specifies the interface that the AI app under test must implement. The ``metrics`` property specifies a list of metrics that are generated by the tool. The ``input_datasets`` property is used to describe the datasets that the tool requires as input, corresponding entries must be present in the ``data`` section of the evaluation procedure of the challenge.