********** Definition ********** Technical explanation about creating a new AI Asset. *************** Create AI Asset *************** This section describes how to setup local and server environment first and then how to create and develop new AI Assets: Setup ===== In order to start developing new AI Asset first complete the initial setup Gitlab setup ^^^^^^^^^^^^ 1. Create Gitlab Account - `GitLab Registration `_ 2. Setup your GitLab account and add ssh key - `Gitlab & SSH keys `_ 3. Create empty GitLab repository named **** in your dedicated group Local setup ^^^^^^^^^^^ Setup localhost workstation/laptop for development, ensure that you have the following software installed: 1. NVIDIA Drivers for your graphics card - `NVIDIA Autodetect driver `_ 2. Docker - `Install on Ubuntu `_ 3. NVIDIA container toolkit - `Install container-toolkit `_ 4. Git 5. Python3.6+ HPC setup ^^^^^^^^^ Setup HPC for training and image builds, ensure that you have the following software installed: 1. NVIDIA Drivers for your graphics card - `NVIDIA Autodetect driver `_ 2. Docker - `Install on Ubuntu `_ 3. NVIDIA container toolkit - `Install container-toolkit `_ 4. Docker Buildx - `Install Buildx `_ 5. Install qemu and enable aarch64 emulation: .. code-block:: bash # Install the qemu packages sudo apt-get install qemu binfmt-support qemu-user-static # Enable emulation docker run --rm --privileged multiarch/qemu-user-static --reset -p yes 6. Git 7. Python3.6+ GitLab runner setup on HPC instance ^^^^^^^^^ 1. On your GitLab repository, open Settings > CI/CD > Runners in order to disable shared and group runners and add specific runner required to support automated preconfigured CI/CD pipelines. .. image:: ../assets/ci-cd-configuration-shared.png :alt: Disable shared runners on GitLab repository. .. image:: ../assets/ci-cd-configuration-group.png :alt: Disable group runners on GitLab repository. 2. Download Gitlab Runner installation script and copy script to the HPC instance :download:`setup-runner.sh <../assets/setup-runner.sh>` 3. Find specific runner configuration credentials on your GitLab repository under Settings > CI/CD > Runners .. image:: ../assets/ci-cd-configuration-specific.png :alt: Disable group runners on GitLab repository. 4. Execute script on the HPC instance: .. code-block:: bash # Change file mod permission sudo chmod +x setup-runner.sh # Execute script providing proper values ./setup-runner.sh https://gitlab.com/ # Example run ./setup-runner.sh bonseyes_3ddfa https://gitlab.com/ HuQV-VGty-HL7vprN5Rb Start new project ^^^^^^^^^^^^^^^^^ 1. Clone `AIAssetContainerGenerator `_ on your local machine and 2. Follow AIAssetContainerGenerator `README.md `_ to create new AI Asset boilerplate project 3. Initialize git in newly created boilerplate project 4. If you plan to use existing network implementation as baseline attach it as submodule in ``/source`` directory of boilerplate root .. code-block:: bash git submodule add /source/ 5. When creating new AI Asset Bonseyes framework suggests the following Git workflow: * Use ``master`` branch for stable tested release tagged with version e.g ``v1.0, v2.0, ...`` * Use ``dev`` branch for daily development * Use ``feature/feature_name`` branch from ``dev`` to implement new features * Tag commits on ``dev`` and ``master`` branches to trigger docker image builds 6. Follow GitLab instructions in your newly created repo on how to push existing folder * Every commit on certain branch triggers GitLab runner, which executes ``.gitlab-ci.yml`` file in your project. .gitlab-ci decides which stages (of possible build, test, push, package and pages) for all listed platforms in it will be executed. Which stages will be executed depends on which branch we are currently. * If you encounter git error regarding unsafe direcotries during container builds, modify ``.gitlab-ci.yml`` and include line ``git config --global --add safe.directory /path/to/unsafe/dir``. If this does not solve the issue try to use ``--system`` instead of ``--global`` git option. Local development workflow ^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Pull x86_64 image that was built during CI/CD process or build image locally .. code-block:: bash # Option 1: Pull built image (check registry tab on your GitLab project web page for url) docker pull # Option 2: Build image on your local machine python3 /docker/build.py \ --platform x86_64 \ --profile /docker/profiles/x86_64/ubuntu18.04_cuda10.2_python3.6_tensorrt7.0.yml \ --image-name x86_64:_cuda10.2_tensorrt7.0 \ --device cuda * ``build`` script calls Dockerfile on specified platform (x86_64, Jetson devices and RaspberryPi4) and device (GPU or CPU). In Dockerfile you should run your ``setup.py`` script which contains all python packages with their versions used in your AI Asset for x86_64, Jetson devices and RaspberryPi4. * Dockerfiles for x86_64, Jetson and RaspberryPi4 are stored in ``//docker/platforms/`` * Pytorch, CMake, OpenCV, ONNXRuntime, ONNX, TensorRT, Python versions which will be installed during docker build are written in ``//docker/profiles/``. These versions are sent as arguments to Dockerfiles. * Existing x86_64 profiles: .. code-block:: bash ubuntu18.04_cuda10.2_python3.6_tensorrt7.0.yml ubuntu20.04_cuda11.4_python3.8_tensorrt8.0.yml ubuntu18.04_python3.7.yml * Existing NVIDIA Jetson profiles: .. code-block:: bash jetpack4.4.yml jetpack4.6.yml * For RaspberryPi4 available profile is: .. code-block:: bash rpi_arm64v8.yml * The result of build script (if everything works properly) is new docker image. .. note:: If you want to make minor changes (very small changes from official code instead of writing in your Bonseyes AI Asset) in submodule, you mustn't commit changes to official source repository. Instead of committing changes to official repository, you need to create git patch and save it to ``/source/patch/`` directory. To apply patch to submodule use command in your container: .. code-block:: bash cd /app/source/ && git apply /app/source/patch/modification_1.patch You also need to add this command in Dockerfile for building image with applying patch before setup. 2. Run x86_64 image and mount your project root to /app * If you are using directory with images and annotations generated by DataTool, you should mount directory with datasets and annotations to ``/data/storage`` directory while executing docker run command. In this case you should run built container with: .. code-block:: bash # Example how to run built container when you are using dataset and its annotations generated by DataTool cd docker run --name \ --privileged --rm -it \ --gpus 0 \ --ipc=host \ -p 8888:8888 \ -v $(pwd):/app \ -v /path/to/processed/dataset1:/app//data/dataset1 \ -v /path/to/custom_base_types.py:/app//data/dataset1/custom_base_types.py \ -v /path/to/custom_dataset_model.py:/app//data/dataset1/custom_dataset_model.py \ -v /path/to/processed/dataset2:/app//data/dataset2 \ -v /path/to/custom_base_types.py:/app//data/dataset2/custom_base_types.py \ -v /path/to/custom_dataset_model.py:/app//data/dataset2/custom_dataset_model.py \ -v /tmp/.X11-unix:/tmp/.X11-unix \ --device /dev/video0 \ -e DISPLAY=$DISPLAY \ x86_64:_cuda10.2_tensorrt7.0 At this point you can develop on your host environment using IDE of your choice and test implementation inside of running docker container I. Data ======= Make sure that you correctly attached datatool requirements and mounted generated processed datasets: 1. Confirm that you have ``datatool_api`` submodule attached in AI Asset ``data`` directory .. code-block:: bash cd git submodule add -b python3.6 ../../../../../../artifacts/data_tools/apis/datatool-api.git /data/datatool_api git submodule update --init --recursive /data/datatool_api **NOTE:** If your AI Asset is not in your group root you will need additionally to change the relative path of the datatool-api submodule. 2. Confirm that you have ``custom_dataset_model.py`` and ``custom_base_types.py`` mounted to for all datasets in ``data/dataset1 ... data/dataset2`` 3. Confirm that when you are executing ``docker run``, you properlly mounted directory which is result from DataTool(s) to ``//data/dataset1`` folder by adding ``-v /path/to/dataset1:/app//data/dataset1`` in docker run command. Steps to Use the Datatools inside AI assets: 1. Remove __future__ imports from custom_base_types.py and custom_dataset_model.py scripts (only for AI Assets with python version < 3.8) As part of datatool development, the AI-talents created the python based custom data model for each datatool which is defined by “custom_base_types.py” and “custom_dataset_model.py” scripts. This data model is the interface that should be used to load and read the datatool output inside AI-assets. Since there is a mismatch on the python versions between datatools (using python >= 3.9) and AI-assets (using python3.6.9, will be updated in future), the “custom_base_types.py” and “custom_dataset_model.py” scripts need to be modified to remove __future__ imports which are not supported by python3.6.9. To do so, remove the line “from __future__ import annotations” from the two scripts. This line is generally found at the beginning of the file. Please refer to the example images below to locate the line. .. image:: ../assets/aiasset-datatool-image1.png :alt: Image1. | .. image:: ../assets/aiasset-datatool-image2.png :alt: Image2. 2. Remove any return types from custom_base_model.py which are not supported due to removal of the __future__ imports (only for AI Assets with python version < 3.8) As a consequence of removing the __future__ import from the scripts, methods inside classes can not have annotations for the return type if the return type is the same as the class type that contains the method. To fix it, remove any return types from the “custom_dataset_model.py” script where the return type is the same as the class type. The image below shows an example where method “extract” has the “CocoBoundingBox2D” as its return type and this should be removed (hence “-> CocoBoundingBox2D” part should be removed) from the function signature. .. image:: ../assets/aiasset-datatool-image3.png :alt: Image3. 3. Rename the “custom_base_types.py” and “custom_dataset_model.py” scripts in case you are using multiple data tools inside the AI-asset In case your AI-assets uses multiple datatools and the datatools do not share the same data model, you need to rename the “custom_base_types.py” and “custom_dataset_model.py” scripts so that they are differentiable for the python interpreter at the import time. For example if you plan to use two datatools, datatool1 and datatool2, you can rename the files to [“ custom_base_types_dt1.py”, “custom_dataset_model_dt1.py”] and “ [custom_base_types_dt2.py”, “custom_dataset_model_dt2.py”] for datatool1 and datatool2 respectively. 4. Import custom data model inside the data loader script Once you have mounted the datatool output directories, mounted custom_base_types.py and custom_data_model.py scripts for each datatool after renaming them and added the Datatool API as a submodule inside your AI-asset by following the instructions provided in the AI-asset documentation, you can use the Datatool API and your custom data models to load the dataset inside your data loader scripts. To load the datasets using the data model classes, you need to add the relative paths to the Datatool API directory, and directory for each custom data model at the top of your data loader script and then import the “DTDatasetCustom” class for each data model. For example if you intend to load the datatool outputs from 2 datasets, you need will do the following for imports: .. code-block:: bash import sys sys.path.append('RELATIVE_PATH_TO_DATATOOL_API') sys.path.append('RELATIVE_PATH_TO_DATASET1') sys.path.append('RELATIVE_PATH_TO_DATASET2') from dataset1.custom_dataset_model_dt1 import DTDatasetCustom as Dataset_1 from dataset2.custom_dataset_model_dt2 import DTDatasetCustom as Dataset_2 Then inside your loader function, you can simply use the Dataset classes to load the respective datasets. .. code-block:: bash dt1 = Dataset_1(name='dt1', operatingMode='memory') dt1.load_from_json('/dataset.json') for k, v in dt1.annotations.items(): print(k, v.dict()) dt2 = Dataset_2(name='dt2', operatingMode='memory') dt2.load_from_json('RELATIVE_PATH_TO_DATASET2/dataset.json') for k, v in dt2.annotations.items(): print(k, v.dict()) II. Train ========= Bonseyes AI Assets provide training package which enables running source training scripts if they exist with specified hyperparameters for different backbones in config files. Bonseyes AI Assets training tool contains: 1. ``config`` directory which contains config files with device and system configurations, paths to datasets and annotations and hyperparameter configuration. 2. ``/train/__main__.py`` script which uses hyperparameters and configurations from ``config.yml`` file and runs source training code if it exists. Training scripts and config files can be found in `AI Asset Container Generator `_. Bonseyes training tool also needs training, validation and test datasets and annotations for training execution. Datasets can be downloaded and used in 2 ways: 1. If you want to download datasets with their original annotations which are used in source repository, you need to implement script for downloading datasets and annotations in ``//data/get_data.py`` script 2. If you are using DataTool check how it can be used in `I. Data `_ section. Scripts which need to be implemented and used for data and annotations downloading without DataTool you can find in `AI Asset Container Generator `_. Get data ^^^^^^^^ Bonseyes AI Assets provide tool for downloading data with official annotations which are used in source code. It is stored in ``/data/get_data.py`` script which contains functions for downloading train, validation and test datasets with annotations. In this `link `_ you can find example how those scripts are implemented in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example how you can download training dataset with annotations in Bonseyes Openpifpaf Wholebody AI Asset: .. code-block:: bash python -m bonseyes_openpifpaf_wholebody.data.get_dataset_mscoco2017 \ --download train \ --dataset wholebody Config file ^^^^^^^^^^ Configuration ``yml`` files in Bonseyes AI Asset training tool are used for storing device and system configurations and hyperparameters which are sent to train script as CLI arguments. For each backbone and each training experiment, different configuration files are created. Path to this file is set as CLI argument in ``/train/__main__.py`` script which reads all hyperparameters and sends it to train script as CLI arguments. Configuration file contains multiple sections with parameters inside: 1. device - GPU number, number of workers 2. hyperparameters - number of epochs, learning rate, backbone, checkpoint and batch size 3. data - paths to datasets and annotations files 4. system - output where log is written .. note:: You should name config ``yml`` files by our naming convention. Example how config file can be named is following: .. code-block:: bash v3.0_shufflenetv2k16_default_641x641_fp32_config.yml where v3.0 refers to tag version, shufflenetv2k16 is backbone name, flag default is for pretrained model from official repository, 641x641 is training input size and fp32 is model precision. In this `link `_ you can find config file examples in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example of one config file: .. code-block:: bash device: loader-workers: 16 gpu_num: 4 hyperparams: lr: 0.0001 momentum: 0.95 b-scale: 3.0 epochs: 250 lr-decay: [130, 140] lr-decay-epochs: 10 batch-size: 16 weight-decay: 1e-5 basenet: "shufflenetv2k16" data: dataset: "wholebody" wholebody-upsample: 2 wholebody-train-annotations: /app/source/data-mscoco/annotations/person_keypoints_train2017_wholebody_pifpaf_style.json wholebody-val-annotations: /app/source/data-mscoco/annotations/person_keypoints_val2017_wholebody_pifpaf_style.json wholebody-train-image-dir: /app/source/data-mscoco/images/train2017 wholebody-val-image-dir: /app/source/data-mscoco/images/val2017 system: output: "/app/bonseyes_openpifpaf_wholebody/train/outputs/openpifpaf_shufflenetv2k16_v13.pth" train script ^^^^^^^^^^^^ ``/train/__main__.py`` script loads ``yml`` config file, converts all hyperparameters from ``yml`` file to CLI arguments and runs source training code with extracted CLI arguments from file. In this `link `_ you can find the example of training script in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example how ``/train/__main__.py`` script is called in Bonseyes Openpifpaf Wholebody AI Asset: .. code-block:: bash python3 -m bonseyes_openpifpaf_wholebody.train --config /app/bonseyes_openpifpaf_wholebody/train/configs/v3.0_shufflenetv2k16_default_641x641_fp32_config.yml III. Model Catalog ================== Bonseyes AI Assets provide specific model nomenclature and directories where pretrained models should be stored using Git LFS (large file system). Also Bonseyes AI Assets provide model summary script for calculating total number of network parameters, number of floating point arithmetics (FLOPs), number of multiply-ads and memory usage. Models nomenclature and storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In Bonseyes AI Asset pretrained Pytorch/Tensorflow models should be stored in ``//model///.`` directory as Git LFS. Here is the example how pretrained model is stored in `Bonseyes Openpifpaf Wholebody AI Asset `_. .. note:: Follow Bonseyes guidelines for model file naming, for example Pytorch model can be named as: .. code-block:: bash v3.0_shufflenetv2k30_default_641x641_fp32 where v3.0 refers to tag version, shufflenetv2k30 is backbone name, flag default is for pretrained model from official repository, 641x641 is training input size and fp32 is model precision. Only pretrained models should be stored in gitlab, while inference engines (ONNXRuntime, TensorRT and torch2trt) shouldn't be committed to Gitlab repository. Model summary ^^^^^^^^^^^^^ Reuse and adjust if needed Bonseyes summary utility script ``//benchmark/model_summary.py`` to create pretrained model summary in json file, which contains: * Total number of network parameters * Theoretical amount of floating point arithmetics (FLOPs) * Theoretical amount of multiply-adds (MAdd) * Memory usage ``//benchmark/model_summary.py`` script you can find in `AI Asset Container Generator `_. In this `link `_ you can find ``//benchmark/model_summary.py`` in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example of ``//benchmark/model_summary.py`` in `Bonseyes Openpifpaf Wholebody AI Asset `_. .. code-block:: bash python -m bonseyes_openpifpaf_wholebody.benchmark.model_summary \ --model-path /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \ --engine pytorch \ --input-size 640x640 \ --backbone shufflenetv2k30 \ --json-output /app/ Also, in this `link `_ you can find model summaries for multiple models with multiple input sizes in Bonseyes Openpifpaf Wholebody Asset. IV. Algorithm ============= Algorithm is important part in every Bonseyes AI Asset which contains complete flow of image processing by model or inference engine. Algorithm components are listed below: 1. ``AlgorithmInput`` class which structures input 2. ``Algorithm`` class which contains functions for: * Loading Pytorch/Tensorflow, ONNXRuntime, TensorRT and torch2trt engines * Preprocessing input before passing it to inference engine * Pytorch/Tensorflow, ONNXRuntime, TensorRT and torch2trt inference * Postprocessing inference engines outputs * Inference engine processing which includes running preprocessing, inference and postprocessing functions and calculating their execution times. This function also stores postprocessing output and execution times in concrete form in ``AlgorithmResult`` class. * Rendering which displays postprocessing results on image * Destroying which runs inference engines destructor 3. ``AlgorithmResult`` class where postprocess output is structured in concrete form. This class stores postprocessing outputs, preprocessing, inference, postprocessing time and latency in ``results`` dictionary. .. note:: Bonseyes AI Assets algorithm examples are provided for image processing, but it can be modified for any kind of input. ``AlgorithmInput``, ``Algorithm`` and ``AlgorithmResult`` classes are stored in ``//algorithm/algorithm.py`` and they need to inherit ``BaseAlgorithmInput``, ``BaseAlgorithm`` and ``BaseAlgorithmResult`` abstract classes. In this purpose Bonseyes AI Assets provide ``BaseAlgorithmInput``, ``BaseAlgorithm`` and ``BaseAlgorithmResult`` abstract classes which are stored in ``//algorithm/algorithm.py`` script. Bonseyes AI Assets also provide `LPDNN algorithm `_ which executes LPDNN using HTTP Worker and runs process and render functions. Bonseyes AI Assets can also provide *Challenge Interface integration*. Challenge represents problem definition in the techical/interface level and Bonseyes AI Asset implements the defined interface. The goal of the Challenge Interface Integration is that ``AlgorithmResult`` output in ``//algorithm/algorithm.py`` script should be in the Challenge Interface form which depends from task to task. All supported tools for algorithm implementation (``//algorithm/algorithm_base.py`` script with abstract algorithm classes, ``//algorithm/lpdnn_algorithm.py`` where `LPDNNAlgorithm` class is defined and ``//algorithm/algorithm.py`` script which needs to be implemented) can be found in `AI Asset Container Generator `_. Also, Bonseyes AI Asset provides scripts for some of the steps of the algorithm. For instance, scripts for loading and inferencing ONNXRuntime, TensorRT and torch2trt inference engines can be found in `AI Asset Container Generator `_. Algorithm classes from ``//algorithm/algorithm.py`` script are used in ``process`` and ``benchmark`` tasks. In image processing case algorithms process and render functions are applied on image, while during video and camera process task they are applied on video frames. During benchmark execution, algorithms process function is ran on every image from validation dataset. Algorithm base classes ^^^^^^^^^^^^^^^^^^^^^^ Bonseyes AI Assets provide Algorithm Base classes for structuring algorithm input, model and inference engine loading, processing, rendering and structuring algorithm result. This script contains ``BaseAlgorithmInput``, ``BaseAlgorithm`` and ``BaseAlgorithmResult`` abstract classes, which need to be inherited in ``//algorithm/algorithm.py`` script. Here is the example of ``algorithm_base.py`` script in `AI Asset Container Generator `_, which needs to be inherited in ``algorithm.py`` script. ``algorithm_base.py`` script contains: 1. ``BaseAlgorithmInput`` class, which is used for structuring algorithm input and it needs to be inherited with ``AlgorithmInput`` class. 2. ``BaseAlgorithm`` class, which is used for loading model, preprocessing, inference, postprocessing and rendering results and it should be inherited with ``Algorithm`` class. 3. ``BaseAlgorithmResult`` class, which is used for algorithm result structuring to ``json/dict`` form and it should be inherited with ``AlgorithmResult`` class. Algorithm inherited classes ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Inherit base algorithm classes, defined and implemented in ``//algorithm/algorithm_base.py``, for loading model, processing and rendering. In this `link `_ you can find algorithm script in AI Asset Container Generator. An example implementing algorithm script with ONNXRuntime and TensorRT inference engines can be found in `Bonseyes Openpifpaf Wholebody algorithm `_. Another example implementing algorithm script with ONNXRuntime, TensorRT and torch2trt inference engines can also be found in `Bonseyes YOLOX algorithm `_. Algorithm implementation process: 1. Implement and use ``AlgorithmInput`` class by inheriting abstract ``BaseAlgorithmInput`` class for structuring algorithm input. This is optional step and should be used with more complex pipelines. For example if you have a face detector and face landmark detector, algorithm input can be ROI detected by face detector where landmark detector is primary algorithm. 2. Implement and use ``Algorithm`` class by inheriting ``BaseAlgorithm`` class. In this class you need to implement: * ``__init__`` of this class with specified ``model_path, engine_type (torch, onnx, tensorrt or torch2trt), input_size, backbone, device (cpu, gpu), thread_num`` and Bonseyes AI Assets specific arguments. * ``load_model`` function, which can load Pytorch/Tensorflow, ONNXRuntime and TensorRT models. You can load also torch2trt models if Pytorch is your starting point model. This function should be called at the end of the ``__init__`` of ``Algorithm`` class. * ``preprocess`` function for all inference engines (Pytorch/Tensorflow, ONNXRuntime and TensorRT), which returns preprocessing result. You can load also torch2trt models if Pytorch is your starting point model. * ``infer`` functions for multiple inference engines (Pytorch, ONNXRuntime, TensorRT and possibly torch2trt). For inference implementation use Bonseyes AI Asset inference engine wrappers in ``//algorithm/inference_engines/`` to run inference. ``infer`` function needs to call ``infer_pytorch``, ``infer_onnxruntime``, ``infer_tensorrt`` or ``infer_torch2trt`` functions depending on the engine. Inference engines implementations (ONNXRuntime, TensorRT and torch2trt) can be found in `AI Asset Container Generator `_. This function takes preprocessing output, runs model inference and returns model output. * ``postprocessing`` function for all inference engines (Pytorch/Tensorflow, ONNXRuntime, TensorRT and potentially torch2trt). This function takes model output from the ``infer`` function and returns postprocessing output. * ``process`` function for all inference engines, which needs to run preprocess, inference and postprocess functions, calculate time for their calls and store preprocessing, inference, postprocessing and processing times with postprocessing output to ``AlgotirhmResult`` class. This function takes input image as input and returns result object from ``AlgorithmResult`` class. * ``render`` function, which takes input image and output of process function (``AlgorithmResult`` object) and applies render on it. The result of this function is rendered image. * ``destroy`` function, which runs destructor for inference engines (all except Pytorch). .. note:: You can test ``//algorithm/algorithm.py`` functionalities by running some of the process scripts to se visual results. 3. Implement and use ``AlgorithmResult`` class by inheriting ``BaseAlgorithmResult`` class to implement algorithm results in ``json/dict`` form. Here is the example of this form: .. code-block:: bash self.dict = { "time": { "pre_processing_time": self.pre_processing_time, "infer_time": self.infer_time, "post_processing_time": self.post_processing_time, "processing_time": self.processing_time, }, "items": self.items } Postprocessing outputs are extracted in ``self.items`` in ``AlgorithmResult`` class. For example in object detection case ``self.items`` is list of dictionaries, where each dictionary represents one prediction and contains keys and values for bbox informations, class name and confidence score. `Preprocessing, inference, postprocessing and processing times` are init arguments of the class. Also `Algorithm postprocess output` is init argument of this class. LPDNN Algorithm ^^^^^^^^^^^^^^^ LPDNN Algorithm is used for running ``process``, ``render`` and ``destroy`` on LPDNN inference engines. It starts AI App by instantiating HTTP worker for specified deployment package. In process function case inputs are sent to POST request and returns the process results. LPDNN Algorithm class is stored in ``//algorithm/lpdnn_algorithm.py`` and it can be found in `AI Asset Container Generator `_. Implement `LPDNNAlgorithm` from AI Asset Container Generator by adding code for: 1. Passing inputs to POST request in process function 2. Parsing HTTP worker output in process function 3. Modifying ``AlgorithmResult`` structure in process function 4. ``render`` function implementation The example of LPDNN Algorithm you can also find in `Bonseyes 3DDFA Asset `_. .. note:: Only ``process``, ``render`` and ``destroy`` functions of `LPDNNAlgorithm` class are used. Other functions are not required to be implemented for process and benchmark scripts. Challenge Interface ^^^^^^^^^^^^^^^^^^^ Challenge Interface is used for reformating ``AlgorithmResult`` classes output to challenge defined format. The Challenge Interface format depends on the task. Integrate Challenge Interface in ``//algorithm/`` with the following steps: * Add Challenge Interface repository as submodule in ``//algorithm/`` directory. Challenge Interface repositories for different tasks you can find in `this link `_. Here is the example how Challenge Interface can be added as submodule: .. code-block:: bash cd git submodule add ../../../../../../artifacts/challenges/.git /algorithm/challenge_interface git submodule update --init --recursive /algorithm/challenge_interface This is going to change your ``/.gitmodules`` file. It adds a new submodule information to it. When you run .. code-block:: bash cd cat .gitmodules ``.gitmodules`` file should look something like this: .. code-block:: bash [submodule "/algorithm/challenge_interface"] path = /algorithm/challenge_interface url = ../../../../../../artifacts/challenges/.git For example, Bonseyes Openpifpaf Wholebody AI Asset is using `NV-Bodypose2D-BP2D `_ challenge and imports it as submodule in ``bonseyes_openpifpaf_wholebody/bonseyes_openpifpaf_wholebody/algorithm/`` directory. Commands how Challenge Interface is added in Bonseyes Openpifpaf Wholebody AI Asset is: .. code-block:: bash git submodule add ../../challenges/nv-bodypose2d-bp2d.git bonseyes_openpifpaf_wholebody/algorithm/nv_bodypose2d_bp2d git submodule update --init --recursive bonseyes_openpifpaf_wholebody/algorithm/nv_bodypose2d_bp2d The example how this submodule is imported in Bonseyes Openpifpaf Wholebody AI Asset you can find in the `following link `_. * Import Challenge Interface submodule classes in ``//algorithm/algorithm.py``. Use imported submodule classes and reformat ``AlgorithmResult`` outputs. Here is the example how Challenge Interface classes are imported in Bonseyes Openpifpaf Wholebody AI Asset ``//algorithm/algorithm.py`` script: .. code-block:: bash from bonseyes_openpifpaf_wholebody.algorithm.nv_bodypose2d_bp2d.interfaces.NVBodypose2DBP2D_Result import The2DBodyJoints, The2DBoundingBox, NVBodypose2DBP2DResultElement Integrate challenge result class (for example ``NVBodypose2DBP2D_Result`` class) into ``AlgorithmResult`` class in ``//algorithm/algorithm.py`` and reformat AlgorithmResult output to be list of Challenge Interface Result classes. In this `link `_ you can find example how ``NVBodypose2DBP2D_Result`` class from ``NV-Bodypose2D-BP2D`` submodule is imported in ``/bonseyes_openpifpaf_wholebody/algorithm/algorithm.py`` and how AlgorithmResult outputs are reformatted as list of ``NVBodypose2DBP2DResultElement`` class from Challenge Interface. V. Export ========= Bonseyes AI Assets provide export tools for different precisions: floating point 32 (fp32) and floating point 16 (fp16) to convert AI model from a training format framework to a deployment format one. Deployment frameworks allow the creation of AI Applications that have lower storage, computation cost and runs more efficiently on the GPU or CPU. Exported engines weights and activations should have certain precision (fp32 or fp16). Bonseyes AI Assets support models export to fp32 and fp16 precision through two inference engines: ONNXRuntime and TensorRT. Export for GPU deployment can be applied on both TensorRT and ONNXRuntime, while export for CPU deployment can only be applied with ONNXRuntime engine. Further, if a Pytorch model is the starting point, it is also possible to apply models export to fp32 and fp16 precision using ``torch2trt`` script directly. Bonseyes AI Assets also provide export tool for AI App generation and exporting ONNX to LPDNN inference engines (LNE, ONNXRuntime and TensorRT). Bonseyes tools for ONNXRuntime, TensorRT, torch2trt and LPDNN export can be found in the `AI Asset Container Generator `_. ONNX export ^^^^^^^^^^^ Bonseyes AI Assets provide ONNXRuntime export tools for Pytorch and TensorFlow/Keras starting point models. torch2onnx export ~~~~~~~~~~~~~~~~~ 1. Use ``//export/torch2onnx.py`` to export Pytorch model to ONNX with defined input size (width and height specified as CLI input arguments) and fp32 precision. In this `link `_ you can find example of ``//export/torch2onnx.py`` of the Bonseyes YOLOX Asset. Here is the example of running ``//export/torch2onnx.py`` script of Bonseyes YOLOX Asset: .. code-block:: bash python -m bonseyes_yolox.export.torch2onnx \ --model-input /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \ --model-output /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \ --input-width 640 \ --input-height 640 or you can run ``//export/all.py`` script with ``engine`` onnxruntime. .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions fp32 \ --input-sizes 640x640 \ --engine onnxruntime \ --backbone yolox_s 2. Exported ONNXRuntime model should be saved in ``//models/onnx/{args.backbone}/`` directory and should be named as: .. code-block:: bash v1.0_yolox_s_default_640x640_fp32.onnx where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is exported from pretrained Pytorch model, ``640x640`` is input size and ``fp32`` is precision. tf2onnx export ~~~~~~~~~~~~~~ 1. Tensorflow/Keras as the staring point model can be used, but is not officially supported in the AI Asset. In ``//export/tf2onnx.py`` you can find script for Keras ``.h5`` model export to ONNXRuntime. 2. Add ``tensorflow`` **(tf-2.0 or newer)** and ``tf2onnx`` **(tf2onnx-1.8.4 or newer)** with their versions in ``setup.py`` and install them during building image. 3. Use ``//export/tf2onnx.py`` to export Tensorflow/Keras model to ONNX with specified input and output model and input size (width and height specified as CLI input arguments) and fp32 precision. Here is the example of running ``//export/tf2onnx.py`` script: .. code-block:: bash python3 -m .export.tf2onnx \ --model-input /path/to/h5/model \ --model-output /path/to/output/onnx/model \ --input-width /input/width/ \ --input-height /input/height/ 4. Add a subprocess call of ``//export/tf2onnx.py`` in ONNXRuntime case in ``//export/all.py`` script. 5. Exported ONNXRuntime model should be saved in ``//models/onnx/{args.backbone}/`` directory. Here is the example how exported model should be named: .. code-block:: bash v1.0_yolox_s_default_640x640_fp32.onnx where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is added as it comes from a pretrained model from official repository, ``640x640`` is input size and ``fp32`` is precision. TensorRT export ^^^^^^^^^^^^^^^ 1. Use ``//export/onnx2trt.py`` to export ONNX model to TensorRT with precisions fp16 or fp32. In this `link `_ you can find example of ``//export/onnx2trt.py`` of the Bonseyes YOLOX Asset. Here is the example of running ``//export/onnx2trt.py`` script in Bonseyes YOLOX Asset: .. code-block:: bash python -m bonseyes_yolox.export.onnx2trt \ --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \ --output-dir /app/bonseyes_yolox/models/tensorrt/Tesla_T4/yolox_s \ --precision fp32 or you can run ``//export.all`` script with ``engine`` tensorrt. Specify ``precision`` to be fp32 or fp16 or add both precisions in CLI argument. Example of running ``onnx2trt`` export to fp16 and fp32 through ``export.all`` script in Bonseyes YOLOX Asset is the following: .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions fp32 fp16 \ --input-sizes 640x640 \ --engine tensorrt \ --backbone yolox_s 2. Exported TensorRT model should be saved in ``//models/tensorrt/{gpu_name}/{args.backbone}`` directory and should be named as one of the following models: .. code-block:: bash v1.0_yolox_s_default_640x640_fp32_dla_disabled.trt v1.0_yolox_s_default_640x640_fp32_dla_enabled.trt v1.0_yolox_s_default_640x640_fp16_dla_disabled.trt v1.0_yolox_s_default_640x640_fp16_dla_enabled.trt where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is exported from ONNX model, which is exported from pretrained Pytorch model, ``640x640`` is input size, ``fp32`` precision and ``dla_enabled`` or ``dla_disabled`` is information that about enabling DLA during export. torch2trt export ^^^^^^^^^^^^^^^^ 1. Make sure you added ``torch2trt`` installation in ``Dockerfile.cuda`` for ``x86`` and ``jetson`` and installed torch2trt. To install torch2trt, you need to add .. code-block:: bash RUN cd /tmp && git clone https://github.com/NVIDIA-AI-IOT/torch2trt \ && cd torch2trt \ && git checkout 0400b38123d01cc845364870bdf0a0044ea2b3b2 \ && wget https://github.com/NVIDIA-AI-IOT/torch2trt/commit/8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \ && git apply 8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \ && python3 setup.py install --user in ``//docker/platforms/x86_64/Dockerfile.cuda`` and ``//docker/platforms/nvidia_jetson/Dockerfile.cuda`` after setup AI Asset. Example of ``torch2trt`` installation in ``Dockerfile.cuda`` for ``x86`` on Bonseyes YOLOX you can find on this `link `_. Example of this installation you can find in `AI Asset Container Generator `_. 2. Use ``//export/torch2trt.py`` to export Pytorch model to TensorRT with precisions fp16 or fp32. It is also possible to use the ``use-onnx`` CLI argument during calling this script where torch2trt converts Pytorch model to ONNX first, and then exports the resulted ONNX to TensorRT model with fp32 or fp16 precision. In this `link `_ you can find example of ``//export/torch2trt.py`` of the Bonseyes YOLOX Asset. Here is the example of running ``export/torch2trt.py`` script in Bonseyes YOLOX Asset: .. code-block:: bash python3 -m bonseyes_yolox.export.torch2trt \ --input-path /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \ --output-dir /app/ \ --precision fp32 \ --input-width 640 \ --input-height 640 or you can run ``//export/all.py`` script with ``engine`` torch2trt. Specify ``precision`` to be fp32 or fp16 or add both precisions in CLI argument. .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions fp32 fp16 \ --input-sizes 640x640 \ --engine torch2trt \ --backbone yolox_s .. note:: Note that ``//export/all.py`` script exports 2 torch2trt models: 1. Converting Pytorch to ONNX and then exporting to TensorRT and 2. Directly exporting TensorRT model from Pytorch 3. Exported torch2trt model should be saved in ``//models/torch2trt/{gpu_name}/{args.backbone}`` directory. .. note:: After torch2trt export, two torch2trt optimized models are saved in ``//models/torch2trt/{gpu_name}/{args.backbone}``, where one has ``.pth`` extension (Python) and another has ``.engine`` extension (tensorRT) and is used in C++. In the given `example `_ the exported ``.pth`` model is imported for inference. More information about loading and inferencing torch2trt models can be found in `AI Asset Container Generator `_ torch2trt models with ``.pth`` extension, which are exported directly from Pytorch to TensorRT are named with one of the following names: .. code-block:: bash v1.0_yolox_s_default_640x640_fp32_dla_disabled_trt.pth v1.0_yolox_s_default_640x640_fp16_dla_disabled_trt.pth v1.0_yolox_s_default_640x640_fp32_dla_enabled_trt.pth v1.0_yolox_s_default_640x640_fp32_dla_enabled_trt.pth where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is exported from pretrained Pytorch model, ``640x640`` is input size, ``fp32`` precision and ``dla_enabled`` or ``dla_disabled`` is information about enabling DLA during export. torch2trt models with ``.pth`` extension, which are exported from Pytorch to ONNX and then from ONNX to TensorRT are named with one of the following names: .. code-block:: bash v1.0_yolox_s_default_640x640_fp32_dla_disabled_with_onnx_trt.pth v1.0_yolox_s_default_640x640_fp16_dla_disabled_with_onnx_trt.pth v1.0_yolox_s_default_640x640_fp32_dla_enabled_with_onnx_trt.pth v1.0_yolox_s_default_640x640_fp32_dla_enabled_with_onnx_trt.pth where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is exported from pretrained Pytorch model, ``640x640`` is input size, ``fp32`` precision, ``dla_enabled`` or ``dla_disabled``, which signs that DLA is enabled or disabled during export and ``_with_onnx`` is that model is exported from Pytorch to ONNX and then from ONNX to TensorRT. onnx2lpdnn export ^^^^^^^^^^^^^^^^^ 1. For onnx2lpdnn export add following dependencies: * In ``/deps/requirements_.txt`` file add ``h5py``, example: `link `_. * In ``//docker/platforms//Dockerfile`` file add ``hdf5-tools``, example: `link `_. 2. Use ``//export/onnx2lpdnn.py`` to generate AI App and export ONNX model to LPDNN inference engines (LNE, ONNXRuntime or TensorRT) with precisions fp32 or fp16. You can find ``//export/onnx2lpdnn.py`` in `AI Asset Container Generator export directory `_. This script uses algorithm, challenge and deployment yaml files, which are stored in ``lpdnn`` directory and can be found in `AI Asset Container Generator `_. In this `link `_ you can find example of ``//export/onnx2lpdnn.py`` script in the Bonseyes 3DDFA Asset. Here is the example of running ``//export/onnx2lpdnn.py`` script in Bonseyes 3DDFA Asset: .. code-block:: bash python bonseyes_3ddfa_v2/export/onnx2lpdnn.py \ --engine onnxruntime \ --precision F32 \ --algorithm-file bonseyes_3ddfa_v2/lpdnn/catalog/mobilenetv1-default-120x120-fp32/algorithm.yml \ --challenge-file bonseyes_3ddfa_v2/lpdnn/challenge/challenge.yml \ --deployment-file bonseyes_3ddfa_v2/lpdnn/deployment/deployment-file.yml \ --deployment-package x86_64-ubuntu20_cuda \ --output-dir build/3dface-landmarks-v1.0-mobilenetv1-120x120 By running ``//export/onnx2lpdnn.py`` you need to specify algorithm, challenge and deployment yaml files. More information about LPDNN's YAML files you can be found in :ref:`lpdnn_tree` and about the available engines in LDPNN's :ref:`lpdnnEngines`. 3. Exported models and additional files should be saved in the directory you specified with ``output-dir`` CLI argument. All export ^^^^^^^^^^ Use ``//export/all.py`` to export to ONNXRuntime, TensorRT, torch2trt or to all engines with specified precision(s), backbone name, input sizes, ONNX opset version (optional) and enable DLA flag (optional). In this `link `_ you can find an example of export ``all.py`` of the Bonseyes YOLOX Asset. Also you can find export all script in `AI Asset Container Generator `_. Here is the example of running ``//export/all.py`` script with all engines: .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions fp32 fp16 \ --input-sizes 640x640 \ --engine all \ --backbone yolox_s .. warning:: Potential export issues and fixes: * If you have problem with exporting Pytorch to ONNX model, try changing opset version. * Note that you can only export TensorRT model with specific input size from existing ONNX model with the same input size (input size of the model will be written in exported ONNX models name). * You can set enable-dla CLI argument to True when calling ``//export/all.py`` script or ``//export/onnx2trt.py`` on JetsonXavier AGX or JetsonXavier NX devices. This flag is enabling Deep Learning Accelerator and it can be used (stored to True) only on JetsonXavier AGX and JetsonXavier NX devices. On other devices or Server you shouldn't set it to True. * Try changing workspace size when calling ``//export/all.py`` script or ``//export/onnx2trt.py`` script to manage how much GPU memory is TensorRT using during export (this can be useful when you are working on edge devices that have low memory). VI. Optimize ============ Bonseyes AI Assets provide optimisation methods such as Post-training Quantization (PTQ) and Quantization-aware Training (QAT) to reduce the memory footprint and improve the efficiency of DNNs. Quantization is a compression method that reduces the storage cost of a variable by employing reduced-numerical precision. This improves the arithmetic intensity of neural network inference by increasing the amount of computational work that can be performed for a given amount of memory traffic. Post Training Quantization (PTQ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bonseyes AI Assets supports post-training quantization for both weights and activations. Weights can be directly quantized to 8-bit integer while the activations require a validation set to determine their dynamic range. PTQ methods usually applied layer fusion of the Bnorm layers by folding them back into the previous convolutions before quantizing the weights and activations, which might lead to small drops in accuracy in some cases. Bonseyes AI Assets provide PTQ through two inference engines: ONNXRuntime and TensorRT. PTQ for GPU deployment can be applied on both TensorRT and ONNXRuntime, while PTQ for CPU deployment can only be applied with ONNXRuntime engine. Further, if a Pytorch model is the starting point, it is also possible to apply Post Training Quantization using ``torch2trt`` script directly. Bonseyes optimization tools for ONNXRuntime, TensorRT and torch2trt Post Training Quantization can be found in the `AI Asset Container Generator `_ to optimize exported models (apply Post Training Quantization). PTQ requires a callibration dataset to adjust the DNN's activations' range so as to calculate the activation' scale and offset and retain a high amount of accuracy. Hence, the first step of PTQ is to implement a ``calibration_dataloader`` function in ``//optimize/post_training_quantization/calibration_data.py`` to provide data sample for int8 quantization (add default model path and validation images path as function arguments). Also in this script you can specify default input size and number of images, which are used for int8 calibration. TensorRT PTQ ~~~~~~~~~~~~ 1. Use ``INT8Calibrator`` calibrator class in ``//optimize/post_training_quantization/trt_quantize.py`` to perform int8 post training quantization with TensorRT (you can specify number of images in main() when calling ``calibration_dataloader`` function and you can specify batch size when calling ``INT8Calibrator class`` also in main()) 2. Use ``//optimize/post_training_quantization/trt_quantize.py`` to optimize TensorRT model to int8 precision. Specify ONNX fp32 model path when running this script. In this `link `_ you can find example of ``trt_quantize.py`` script in BonseyesYOLOX. Here is the example of running ``trt_quantize.py`` script: .. code-block:: bash python3 -m bonseyes_yolox.optimize.post_training_quantization.trt_quantize \ --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \ --output-dir /app/bonseyes_yolox/models/tensorrt/Tesla_T4/yolox_s/ or you can run ``//optimize/post_training_quantization/all.py`` script with ``engine`` tensorrt .. code-block:: bash python -m bonseyes_yolox.optimize.post_training_quantization.all \ --engine tensorrt \ --backbone yolox_s \ --input-sizes 640x640 3. Optimized TensorRT model using PTQ should be saved in ``//models/tensorrt/{gpu_name}/{args.backbone}`` directory and should be named as one of the following models: .. code-block:: bash v1.0_yolox_s_default_640x640_int8_dla_disabled.trt v1.0_yolox_s_default_640x640_int8_dla_enabled.trt where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` for optimized model coming from ONNX model, that in turn, is exported from official pretrained Pytorch model, ``640x640`` is input size, ``int8`` precision and ``dla_enabled`` or ``dla_disabled`` is information about enabling DLA during optimization process. ONNXRuntime PTQ ~~~~~~~~~~~~~~~ 1. Make sure to have simplified and optimised the ONNX model by using the functions in ``//export/torch2onnx.py``. 2. Use ``DataReader`` calibrator class in ``//optimize/post_training_quantization/onnx_quantize.py`` to perform int8 post training quantization with ONNX (you can specify number of images in main() when calling ``DataReader`` class). Add default value of ``calibrate-dataset`` CLI argument to be path to validation dataset images directory. 3. Use ``//optimize/post_training_quantization/onnx_quantize.py`` to optimize ONNX model to int8 precision. In this `link `_ you can find example of ``onnx_quantize.py`` in BonesyesYOLOX case. Here is the example of running ``onnx_quantize.py`` script: .. code-block:: bash python3 -m bonseyes_yolox.optimize.post_training_quantization.onnx_quantize \ --input-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \ --output-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_int8.onnx or you can run ``//optimize/post_training_quantization/all.py`` script with ``engine`` onnxruntime .. code-block:: bash python -m bonseyes_yolox.optimize.post_training_quantization.all \ --engine onnxruntime \ --backbone yolox_s \ --input-sizes 640x640 4. Optimized ONNXRuntime model should be saved in ``//models/onnx/{args.backbone}/`` directory and should be named as: .. code-block:: bash v1.0_yolox_s_default_640x640_int8.onnx where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` for optimized from ONNX model, which is exported from pretrained Pytorch model, ``640x640`` is input size and ``int8`` is precision. torch2trt PTQ ~~~~~~~~~~~~~ 1. Make sure you added ``torch2trt`` installation in ``Dockerfile.cuda`` for ``x86`` and ``jetson`` and installed torch2trt. To install torch2trt, you need to add .. code-block:: bash RUN cd /tmp && git clone https://github.com/NVIDIA-AI-IOT/torch2trt \ && cd torch2trt \ && git checkout 0400b38123d01cc845364870bdf0a0044ea2b3b2 \ && wget https://github.com/NVIDIA-AI-IOT/torch2trt/commit/8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \ && git apply 8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \ && python3 setup.py install --user in ``//docker/platforms/x86_64/Dockerfile.cuda`` and ``//docker/platforms/nvidia_jetson/Dockerfile.cuda`` after setup AI Asset. Example of ``torch2trt`` installation in ``Dockerfile.cuda`` for ``x86`` on Bonseyes YOLOX you can find on this `link `_. Example of this installation you can find in `AI Asset Container Generator `_. 2. Use ``calibration_dataloader`` function in ``//optimize/post_training_quantization/calibration_data.py`` to provide data sample for int8 quantization 3. Use ``//optimize/post_training_quantization/torch2trt_quantize.py`` to optimize torch2trt model with ``int8`` precision from Pytorch model. It is also possible to use the ``use-onnx`` CLI argument during calling this script where torch2trt converts Pytorch model to ONNX first, and then optimizes the resulted ONNX to TensorRT int8 model. In this `link `_ you can find example of ``torch2trt_quantize.py`` in BonesyesYOLOX case Here is the example of running ``torch2trt_quantize.py`` script: .. code-block:: bash python3 -m bonseyes_yolox.optimize.post_training_quantization.torch2trt_quantize \ --pth-model /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \ --output-dir /app/bonseyes_yolox/models/torch2trt/Tesla_T4/yolox_s/ \ --input-width 640 \ --input-height 640 or you can run ``//optimize/post_training_quantization/all.py`` script with ``engine`` torch2trt .. code-block:: bash python -m bonseyes_yolox.optimize.post_training_quantization.all \ --engine torch2trt \ --backbone yolox_s \ --input-sizes 640x640 .. note:: Note that ``//optimize/post_training_quantization/all.py`` script optimizes 2 torch2trt models: 1. Converting Pytorch to ONNX and then optimizing to TensorRT and 2. Directly optimizing TensorRT model from Pytorch. 4. Optimized torch2trt models should be saved in ``//models/torch2trt/{gpu_name}/{args.backbone}`` directory. .. note:: After torch2trt PTQ, two torch2trt optimized models are saved in ``//models/torch2trt/{gpu_name}/{args.backbone}``, where one has ``.pth`` extension (Python) and another has ``.engine`` extension (tensorRT) and is used in C++. In the given `example `_ the optimised ``.pth`` model is imported for inference. More information about loading and inferencing torch2trt models can be found in `AI Asset Container Generator `_ torch2trt models with ``.pth`` extension, which are directly optimized (PTQ) from Pytorch to TensorRT are named with one of the following names: .. code-block:: bash v1.0_yolox_s_default_640x640_int8_dla_disabled_trt.pth v1.0_yolox_s_default_640x640_int8_dla_enabled_trt.pth where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is optimized from pretrained Pytorch model, ``640x640`` is input size, ``int8`` precision and ``dla_enabled`` or ``dla_disabled`` is information that about enabling DLA during optimization process. torch2trt models with ``.pth`` extension, which are exported from Pytorch to ONNX and then optimized from ONNX to TensorRT are named with one of the following names: .. code-block:: bash v1.0_yolox_s_default_640x640_int8_dla_disabled_with_onnx_trt.pth v1.0_yolox_s_default_640x640_int8_dla_enabled_with_onnx_trt.pth where ``v1.0`` is version, ``yolox_s`` is backbone_name, ``default`` is that it is optimized from pretrained Pytorch model, ``640x640`` is input size, ``int8`` precision, ``dla_enabled`` or ``dla_disabled``, which signs that DLA is enabled or disabled during optimization process and ``_with_onnx`` is that model is exported from Pytorch to ONNX and then optimized from ONNX to TensorRT. LPDNN PTQ ~~~~~~~~~ LPDNN supports Post Training Quantization for its inference engines (TensorRT, ONNXRuntime, NCNN and LNE). Instructions for LPDNN Post Training Quantization can be found on `Quantization workflow for LPDNN's engines `_ . All PTQ ~~~~~~~ Use ``//optimize/post_training_quantization/all.py`` to optimize TensorRT, ONNXRuntime, torch2trt or all models with specified input sizes, backbone name, calibration dataset and tag version. Also, make sure that you added validation image's folder path as ``calibrate-dataset`` CLI argument. In this `link `_ you can find PTQ ``all.py`` example in BonseyesYOLOX. Here is the example of running ``post_training_quantization/all.py`` script: .. code-block:: bash python -m bonseyes_yolox.optimize.post_training_quantization.all \ --engine all \ --backbone yolox_s \ --input-sizes 640x640 .. warning:: Potential optimize issues and fixes: * If the quantization process is killed, there is a chance that too many images are used for optimization and device doesn't have memory for this operation. Try changing ``images_num`` argument to lower number of images used when calling calibration_dataloader function in main part of ``//optimize/post_training_quantization/trt_quantize.py`` script for TensorRT and change ``calibration_images_num`` argument while calling DataReader in main part of the ``//optimize/post_training_quantization/onnx_quantize.py`` script for ONNX. The lower the images_num the smaller will be AP (be careful to not set very small images number - it is recommended to set images_num argument from 100). * TensorRT optimized model is made from ONNX fp32 model, so if one wants to optimize TensorRT model with specific input size, ONNX fp32 model with that specific input size must already exist (if not then convert Pytorch model to ONNX fp32 model with that specific input size) * enable-dla CLI argument can be set to True when calling ``//optimizepost_training_quantization/all.py`` script or ``//optimize/post_training_quantization/trt_quantize.py`` on JetsonXavier AGX or JetsonXavier NX. This flag is enabling Deep Learning Accelerator and it can be used (stored to True) only on JetsonXavier AGX and JetsonXavier NX devices. On other devices or Server you shouldn't set it to True. * Try changing workspace size when calling ``//optimize/post_training_quantization/all.py`` script or ``//optimize/post_training_quantization/trt_quantize.py`` script to manage how much GPU memory is TensorRT using during quantization process (this can be useful when we are working on edge devices, which have low memory). Quantization Aware Training (QAT) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PTQ might lead to a drop in accuracy when quantizing from fp32 to int8 formats. The goal of QAT is to recover the accuracy of the int8 models by fine-tuning the model weights while the quantization is performed. In QAT, models are fine-tuned in Pytorch by simulating a quantization fordward pass, i.e., fake quantization, and updating the weights during the backward pass. Thereby, the model is re-trained, increasing the precision of the fake quantized model. After fine-tuning Pytorch models need to be exported to fake quantized ONNX models and finally explicit quantization needs to be applied from fake quantized ONNX to int8 using TensorRT functions. QAT Tools ~~~~~~~~~ The following tools are used for QAT: 1. Bonseyes optimization tool to calibrate data for QAT. This tool can be found in the `container generator `_ 2. The ``pytorch_quantization`` package is used for QAT within the PyTorch training framework: * ``pytorch_quantization`` package is used for QAT process and export fake quantized Pytorch modelº. * For ``pytorch_quantization`` installation you need to have installed ``pytorch==1.10`` and ``torchvision==0.11`` * We need to install pytorch_quantizaton 2.1.2 and for ``x86`` you need to add ``pytorch_quantization`` installation by adding .. code-block:: bash RUN cd /tmp && \ gdown https://developer.download.nvidia.com/compute/redist/pytorch-quantization/pytorch_quantization-2.1.2-cp38-cp38-linux_x86_64.whl && \ python3 -m pip install prettytable==3.2.0 pytorch_quantization-2.1.2-cp38-cp38-linux_x86_64.whl sphinx-glpi-theme==0.3 wcwidth==0.2.5 && \ sudo rm -rf /tmp/* ; \ in ``//docker/platforms/x86_64/Dockerfile.cuda`` after setup AI Asset Example of ``pytorch_quantization`` installation in ``Dockerfile.cuda`` for ``x86`` on YOLOX you can find on this `link `_. The example of ``pytorch_quantization`` installation you can also find in `container generator `_ * On jetson devices it is not possible to install ``pytorch_quantization`` package since this package is only supported on ``x86``. On jetson devices, it is only possible to start from the following step. 3. `torch2onnx.py `_ and `onnx2trt.py `_ scripts to convert fake-quantized Pytorch models to ONNX model and TensorRT models, i.e., explicit quantization, after the QAT process (** ``qat`` flag needs to be set**). QAT process ~~~~~~~~~~~ 1. Change the training code * Change config file for running training (add ``qat`` bool flag in config, which will be sent to training code). Here you can see config file example for Quantization Aware Training, as you can see there is ``qat`` flag enabled: .. code-block:: bash device: loader-workers: 4 gpu_num: 4 hyperparams: fp16: True #Addopting mix precision training qat: True resume: False #resume training cache: False #caching imgs to RAM for fast training occupy: False #occupy GPU memory first for training experiment-name: yolox_s #experiment name name: 'yolox-s' dist-backend: 'nccl' #distributed backend dist-url: 'auto' #url used to set up distributed training batch-size: 16 devices: 1 #number of GPUs 8 in their example for training exp_file: None #experiment description file ckpt: /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth # checkpoint file start_epoch: None #resume training start epoch num_machines: 1 #num of node for training machine_rank: 0 #node rank for multi-node training logger: tensorboard #local rank for dist training system: output: "/app/source/yolox/YOLOX_outputs/yolox_s/train_log.txt" In this `link `_ you can find YOLOX config example for QAT. You can also find `train/__main__.py `_ script, which runs AI Asset training script. * First, you need to add QAT case in your training code (add ``qat`` argument in training CLI, which will be set to True if ``qat`` flag in config is True). * Before loading model initialize quant modules by adding: .. code-block:: bash from pytorch_quantization import quant_modules quant_modules.initialize() which signals Pytorch to use fake quantized layers instead of default layers (for example it uses QuantConv2D layer instead of Conv2D layer), which simulates quantization forward pass. .. note:: Using ``quant_modules.initialize()`` we apply automatic fake quantization on layers. If you want only custom layers to be fake quantized, you can use ``QuantDescriptor`` and define which layers should be fake quantized. Here is example how to add custom fake quantized layer (in this case Conv2D and QuantMaxPool2D): .. code-block:: bash from pytorch_quantization import nn as quant_nn from pytorch_quantization.tensor_quant import QuantDescriptor quant_desc_input = QuantDescriptor(calib_method=calibrator) quant_nn.QuantConv2d.set_default_quant_desc_input(quant_desc_input) quant_nn.QuantMaxPool2d.set_default_quant_desc_input(quant_desc_input) * Get model and then load pretrained models state dict. * If you are working with Pytorch model, calibrate loaded model by importing ``//optimize/quantization_aware_training/calibrate_data.py`` to your training code and apply ``calibrate_model`` function to your model in training code. Use training dataset to calibrate model. .. code-block:: bash from bonseyes_yolox.optimize.quantization_aware_training.calibrate_data import calibrate_model if self.args.qat and not self.calibrated: # Calibrate the model using max calibration technique. with torch.no_grad(): calibrate_model( model=model, model_name="yolox", data_loader=self.train_loader, num_calib_batch=64, calibrator=calibrator, hist_percentile=[99.9, 99.99, 99.999, 99.9999], out_dir="./", batch_size=self.args.batch_size) self.calibrated = True * Fine tune model (the rest of the code for training is the same as code for default training) with lower learning rate, lower number of iterations and low number of epochs (add case with ``qat`` flag for hyperparameters set up). * After fine tuning is done, save fine-tuned model in ``//models/pytorch_qat//`` directory, name of fine-tuned model differs from pretrained model without QAT by replacing ``default`` word in Pytorch model with ``qat`` Example for running training script: .. code-block:: bash python3 -m bonseyes_yolox.train --config /app/bonseyes_yolox/train/configs/v1.0_yolox_s_qat_640x640_fp32_config.yml In this `link `_ you can see modified source training code of YOLOX with added QAT feature in it. 2. Use ``//export/torch2onnx.py`` to export fine tuned QAT Pytorch model with fake quantized layers to ONNX with defined input size (width and height specified as CLI input arguments). ONNX model should have Quantize and Dequantize Layers added in it. ``torch2onnx`` script with ``--qat`` flag exports fake quantized Pytorch model to ONNX model with Quantize and Dequantize layers. You should use ``opset 13`` for applying export to ONNX with fake quantized layers. You can also use ``/export/all.py`` with argument ``--qat`` to export Pytorch model to fake quantized ONNX. You can run ``torch2onnx.py`` script directly: .. code-block:: bash python -m bonseyes_yolox.export.torch2onnx \ --model-input /app/bonseyes_yolox/models/pytorch_qat/yolox_s/v1.0_yolox_s_qat_640x640_fp32.pth \ --model-output /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_qat_640x640_fp32.onnx \ --input-width 640 \ --input-height 640 \ --qat or you can run it using ``export/all.py`` script: .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions fp32 \ --input-sizes 640x640 \ --engine onnxruntime \ --backbone yolox_s \ --qat 3. Use ``//export/onnx2trt.py`` to apply explicit quantization from fake quantized ONNX fp32 model to TensorRT model with int8 precision. Specify fake quantized ONNX fp32 model path when running this script. Set ``qat`` CLI argument to ``True`` and ``precision`` to ``int8`` to apply explicit quantization: .. code-block:: bash python -m bonseyes_yolox.export.onnx2trt \ --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_qat_640x640_fp32.onnx \ --output-dir /app/bonseyes_yolox/models/tensorrt/{gpu_name}/{args.backbone} \ --precision int8 \ --qat or you can run it using ``export/all.py`` script: .. code-block:: bash python -m bonseyes_yolox.export.all \ --precisions int8 \ --input-sizes 640x640 \ --engine tensorrt \ --backbone yolox_s \ --qat .. note:: For explicit quantization you don't need calibration dataset since calibration is applied in Quantization Aware Training process. In this `link `_ you can find BonseyesYOLOX export folder where you can see examples of ``all.py``, ``torch2onnx.py`` and ``onnx2trt.py`` scripts. 4. Apply benchmark on TensorRT explicit quantized model after QAT process: .. code-block:: bash python3 -m bonseyes_yolox.benchmark.all \ --input-size 640x640 \ --dataset all \ --device gpu \ --backbone yolox_s \ --engine tensorrt Note that ``//benchmark/all.py`` script is not applied on ONNX qat model because it is only used for explicit quantization. New version with this addition you can find on `template generator `_ 5. After applying benchmark you should add QAT models in ``//benchmark/generate_graphs.py`` script. You can find example for adding QAT models in graph in `YOLOX generate_graphs.py `_ Pytorch model after Quantization Aware Training should be saved in ``//models/pytorch_qat//`` directory and the name of Pytorch QAT model should be as following: .. code-block:: bash v1.0_yolox_s_qat_640x640_fp32.pth where v1.0 refers to tag version, yolox_s is backbone name, flag qat is for fine-tuned model after Quantization Aware Training, 640x640 is training input size and fp32 is model precision. Exported ONNX model from Pytorch QAT model should be stored in ``//models/onnx//`` directory and the name of this ONNX model should be as following: .. code-block:: bash v1.0_yolox_s_qat_640x640_int8.onnx TensorRT int8 model after applying explicit quantization from the fake quantized ONNX model should be stored in ``//models/tensorrt///`` directory and the name of this TensorRT model should be as following: .. code-block:: bash v1.0_yolox_s_qat_640x640_int8_dla_disabled.trt .. warning:: Potential QAT problems and fixes: * Number of iterations should be very low (for example 20 iterations), learning rate should be very low (around 1% or lower of default learning rate for training), number of epochs should also be very low (couple of epochs). Choose the best hyperparameter values experimentally. * You can do only Pytorch to ONNX export with opset 13 because lower opset versions doesn't support fake quantized layers. * After export to ONNX, check ONNX model in Netron. You should see that Quantize and Dequantize layers are added to the model. * You need calibration dataset, which is training dataset, only when you are training model with Quantization Aware Training. When you are applying export from Pytorch to ONNX or from ONNX to TensorRT you don't need calibration dataset since it is applying explicit quantization. * Pytorch and ONNX models after Quantization Aware Training have fp32 precision, but TensorRT has int8 precision. When applying explicit quantization TensorRT only applies quantization and layer fusion on layer blocks, which are between Quantize and Dequantize layers in ONNX model. * We are only using TensorRT quantized model for benchmarking since other models have fp32 precision and ONNX model has additional layers, which decrease model inference. * After whole process is successfully done, compare PTQ and QAT TensorRT int8 models precision and inference time (TensorRT QAT int8 model should have higher precision than PTQ TensorRT int8 model). VII. Process ============ Bonseyes AI Assets provide tools to process, i.e., infer, an AI model taking input data in several formats (input file, video, or camera stream and HTTP worker), using all available inference engines (Pytorch, ONNXRuntime, TensorRT and LPDNN) with all available precisions (fp32, fp16 and int8). Besides, if a Pytorch model is the starting point, it is possible to apply process using torch2trt inference engine with fp32, fp16 and int8 precisions. Devices that have Nvidia GPU and CPU support can process an input with all inference engines, while those devices only featuring a CPU can process the input with ONNXRuntime and LPDNN inference engine. Bonseyes process tools for image, video, camera and client-server, including LPDNN process can be found in the `AI Asset Container Generator `_. Next, we describe the arguments that need to be used for pytorch, onnxruntime and tensorrt standalone engines. For LPDNN processing, refer to :ref:`lpdnn_process`. Image Process ^^^^^^^^^^^^^ The image-based process takes an input file and infers the AI model on it. 1. The image processing script in ``//process/image.py`` is used to process an input image. This script loads image or image folder, which need to be in ``.jpg`` format, instantiates an *Algorithm* class to process and render the image and finally outputs to a json file. In this `link `_ you can find example of ``image.py`` process image script in Bonseyes Openpifpaf Wholebody. Here is the example of running ``image.py`` script: .. code-block:: bash # user@docker:/app$ python -m bonseyes_openpifpaf_wholebody.process.image \ --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \ --input-size 641x641 \ --engine pytorch \ --jpg-input /app/bonseyes_openpifpaf_wholebody/process/demo/samples/image/test/demo_image_1.jpg \ --jpg-output /app/ \ --json-output /app/ \ --logo \ --device gpu 2. Processed ``jpg`` image is saved in file or directory, which is defined with ``jpg-output`` CLI argument. If ``jbg-output`` is path to directory where processed image needs to be saved, the name of the processed image is the same as the name of the input image with ``processed_`` prefix added to original image name. For example if ``jpg-output`` is path to directory and input image is ``traffic.jpg``, processed image will be saved as ``processed_traffic.jpg`` in specified directory. 3. ``json`` file with image predictions will be saved in file or directory, which is defined with ``json-output`` CLI argument. If ``json-output`` is path to directory where processed image needs to be saved, the name of the json file is the same as the name of processed image. For example if ``json-output`` is path to directory and input image is ``traffic.jpg``, json file will be saved as ``processed_traffic.json`` in specified directory with ``json-output`` CLI argument. .. note:: Image has to be in ``.jpg`` format. .. warning:: If selected docker image does not have CUDA support, replace ``--device gpu`` with ``--device cpu`` Video Process ^^^^^^^^^^^^^ The video-based process takes a video as input and infers the AI model on it. 1. The video processing script in ``//process/video.py`` is used to process an input video. This script loads video file, which needs to be in ``.mp4`` format, instantiates an *Algorithm* class to process and render the video and finally outputs to a ``json`` and ``csv`` files. In this `link `_ you can find example of ``video.py`` process video script in Bonseyes Openpifpaf Wholebody. Here is the example of running ``video.py`` script: .. code-block:: bash # user@docker:/app$ python -m bonseyes_openpifpaf_wholebody.process.video \ --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \ --input-size 640x480 \ --engine pytorch \ --video-input /app/bonseyes_openpifpaf_wholebody/process/demo/samples/video/test/demo_video_1.mp4 \ --video-output /app/ \ --json-output /app/ \ --csv-output /app/ \ --logo \ --debug-info \ --device gpu 2. Processed ``.mp4`` video is saved in file or directory, which is defined with ``video-output`` CLI argument. If ``video-output`` is path to directory where processed video needs to be saved, the name of the processed video is the same as the name of the input video with ``processed_`` prefix added to original image name. For example if ``video-output`` is path to directory and input video is ``test.mp4``, processed video will be saved as ``processed_test.mp4`` in specified directory. 3. ``csv`` and ``json`` files with video predictions will be saved in files or directories, which are defined with ``csv-output`` and ``json-output`` CLI arguments. If ``json-output`` and ``csv-output`` are paths to directories, the name of the json and csv files is the same as the name of processed video. For example if input video is ``test.mp4``, json file will be saved as ``processed_test.json`` in specified directory with ``json-output`` CLI argument and csv file will be saved as ``processed_test.csv`` in specified directory with ``csv-output`` CLI argument. .. note:: Video has to be in ``.mp4`` format. .. warning:: If selected docker image does not have CUDA support, replace ``--device gpu`` with ``--device cpu`` Camera Process ^^^^^^^^^^^^^^ Camera-based processing records from your camera and infers the AI model on the frames during recording. When running ``//process/camera.py`` script, the window with camera recording will be opened with algorithm predictions rendered in it. After recording is stopped (by pressing ``q``), recorded rendered video after recording is saved to ``.mp4`` file and output results are saved to ``.csv`` and ``.json`` files. 1. The camera processing script in ``//process/camera.py`` is used to process an camera-based records. This script opens the window with camera recording, instantiates an *Algorithm* class to process and render the frames during recording and finally outputs to a ``json`` and ``csv`` files. In this `link `_ you can find example of ``camera.py`` process camera script in Bonseyes Openpifpaf Wholebody. Here is the example of running ``camera.py`` script: .. code-block:: bash # user@docker:/app$ python -m bonseyes_openpifpaf_wholebody.process.camera \ --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \ --input-size 320x320 \ --engine pytorch \ --video-output /app/recording.mp4 \ --json-output /app/recording_predictions.json \ --csv-output /app/recording_predictions.csv \ --logo \ --debug-info \ --device gpu 2. Processed ``.mp4`` video is saved in file defined with ``video-output`` CLI argument. 3. ``csv`` and ``json`` files with video predictions will be saved in files, which are defined with ``csv-output`` and ``json-output`` CLI arguments. HTTP Worker Process ^^^^^^^^^^^^^^^^^^^ With HTTP worker-based processing (server), the input is sent from a remote client to HTTP server, which processes the input and returns model predictions to the client. 1. HTTP worker-based processing script in ``//process/server.py`` takes an input image or folder with images from the client, performs inference on them and returns predictionsto the client. In this `link `_ you can find example of ``server.py`` process HTTP worker script in Bonseyes Openpifpaf Wholebody. 2. To run HTTP worker process you need to: * Run Docker container on host with specified ports. Example of running Docker container for Bonseyes YOLOX HTTP Worker process is following: .. code-block:: bash docker run --name bonseyes_openpifpaf_wholebody \ --privileged --rm -it \ --gpus 0 \ --ipc=host \ -p 8888:8888 \ -v /tmp/.X11-unix:/tmp/.X11-unix \ --device /dev/video0 \ -e DISPLAY=$DISPLAY \ registry.gitlab.com/bonseyes/artifacts/assets/bonseyes_openpifpaf_wholebody/x86_64:v3.0_cuda10.2_tensorrt7. * In executed container run ``//process/server.py`` script with specified model, inference engine, input shape, port and device as CLI arguments. Example of running ``//process/server.py`` in container is following: .. code-block:: bash # user@docker:/app$ python -m bonseyes_openpifpaf_wholebody.process.server \ --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \ --input-size 641x641 \ --engine pytorch \ --port 8888 * Send image or image folder from client to server to process them. Here are examples of sending request from client to server: If you are sending request out of the container, run: .. code-block:: bash #user@host:/$ curl --request POST \ --data-binary @/path/to/image.jpg \ http://localhost:/inference for example .. code-block:: bash #user@host:/$ curl --request POST \ --data-binary @bonseyes_openpifpaf_wholebody/process/demo/samples/image/test/demo_image_1.jpg \ http://localhost:8888/inference 3. After sending request to server (by running command above), image or folder with images is processed in server and string of jsonified predictions is returned to client. .. _lpdnn_process: LPDNN process ^^^^^^^^^^^^^ To process images, videos or camera steams with LPDNN, different arguments from those used for pytorch, tensorrt or, onnxruntime need to be passed to the processing scripts. The AI App config ``json`` file needs to be specified, which defines implicitly the underlying inference engine to used within LPDNN, i.e., lne, onnxruntime, ncnn or tensorrt. Image process ~~~~~~~~~~~~~ The image-based process takes an input file and AI App config ``json`` file and infers the LPDNN engine on it. 1. The image processing script in ``//process/image.py`` is used to process an input image. This script loads image or image folder, which needs to be in ``.jpg`` format, executes LPDNN using HTTP Worker (instantiates *LPDNNAlgorithm* class) to process and render the image and finally outputs to a json file. In this `link `_ you can find example of ``image.py`` LPDNN process image script in Bonseyes 3DDFA. By running process image script you need to specify ``--engine`` CLI argument to be 'lpdnn', ``--app-config`` path to the aiapp-config.json file, ``--deployment-package`` depending on the platform and ``--port`` on which the http-worker is running. Here is the example of running ``image.py`` script: .. code-block:: bash python -m bonseyes_3ddfa_v2.process.image \ --engine lpdnn \ --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \ --deployment-package x86_64-ubuntu20_cuda \ --port 8889 \ --jpg-output /app/test.jpg \ --json-output /app/test.json \ --jpg-input /app/test.jpg 2. Processed ``jpg`` image is saved in file or directory, which is defined with ``jpg-output`` CLI argument. If ``jbg-output`` is path to directory where processed image needs to be saved, the name of the processed image is the same as the name of the input image with ``processed_`` prefix added to original image name. For example if ``jpg-output`` is path to directory and input image is ``traffic.jpg``, processed image will be saved as ``processed_traffic.jpg`` in specified directory. 3. ``json`` file with image predictions will be saved in file or directory, which is defined with ``json-output`` CLI argument. If ``json-output`` is path to directory where processed image needs to be saved, the name of the json file is the same as the name of processed image. For example if ``json-output`` is path to directory and input image is ``traffic.jpg``, json file will be saved as ``processed_traffic.json`` in specified directory with ``json-output`` CLI argument. .. note:: Image has to be in ``.jpg`` format. .. warning:: If selected docker image does not have CUDA support, replace ``--device gpu`` with ``--device cpu`` Video Process ~~~~~~~~~~~~~ The video-based process takes a video and AI App config ``json`` file as inputs and infers the LPDNN engine on it. 1. The video processing script in ``//process/video.py`` is used to process an input video. This script loads video file, which needs to be in ``.mp4`` format, executes LPDNN using HTTP Worker (instantiates *LPDNNAlgorithm* class) to process and render the video and finally outputs to a ``json`` and ``csv`` files. In this `link `_ you can find example of ``video.py`` process video script in Bonseyes 3DDFA. By running process video script you need to specify ``--engine`` CLI argument to be 'lpdnn', ``--app-config`` path to the aiapp-config.json file, ``--deployment-package`` depending on the platform and ``--port`` on which the http-worker is running. Here is the example of running ``video.py`` script: .. code-block:: bash python -m bonseyes_3ddfa_v2.process.video \ --engine lpdnn \ --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \ --deployment-package x86_64-ubuntu20_cuda \ --port 8889 \ --video-input /app/demo_video_1.mp4 \ --video-output /app/prediction.mp4 \ --json-output /app/prediction.json \ --csv-output /app/prediction.csv 2. Processed ``.mp4`` video is saved in file or directory, which is defined with ``video-output`` CLI argument. If ``video-output`` is path to directory where processed video needs to be saved, the name of the processed video is the same as the name of the input video with ``processed_`` prefix added to original image name. For example if ``video-output`` is path to directory and input video is ``test.mp4``, processed video will be saved as ``processed_test.mp4`` in specified directory. 3. ``csv`` and ``json`` files with video predictions will be saved in files or directories, which are defined with ``csv-output`` and ``json-output`` CLI arguments. If ``json-output`` and ``csv-output`` are paths to directories, the name of the json and csv files is the same as the name of processed video. For example if input video is ``test.mp4``, json file will be saved as ``processed_test.json`` in specified directory with ``json-output`` CLI argument and csv file will be saved as ``processed_test.csv`` in specified directory with ``csv-output`` CLI argument. .. note:: Video has to be in ``.mp4`` format. .. warning:: If selected docker image does not have CUDA support, replace ``--device gpu`` with ``--device cpu`` Camera Process ~~~~~~~~~~~~~~ Camera-based processing records from your camera and infers the LPDNN engine on the frames during recording. When running ``//process/camera.py`` script, the window with camera recording will be opened with *LPDNNAlgorithm* predictions rendered in it. After recording is stopped (by pressing ``q``), recorded rendered video after recording is saved to ``.mp4`` file and output results are saved to ``.csv`` and ``.json`` files. 1. The camera processing script in ``//process/camera.py`` is used to process an camera-based records. This script opens the window with camera recording, executes LPDNN using HTTP Worker (instantiates *LPDNNAlgorithm* class) to process and render the frames during recording and finally outputs to a ``json`` and ``csv`` files. In this `link `_ you can find example of ``camera.py`` process camera script in Bonseyes 3DDFA. By running process camera script you need to specify ``--engine`` CLI argument to be 'lpdnn', ``--app-config`` path to the aiapp-config.json file, ``--deployment-package`` depending on the platform and ``--port`` on which the http-worker is running. Here is the example of running ``camera.py`` script: .. code-block:: bash python -m bonseyes_3ddfa_v2.process.camera \ --engine lpdnn \ --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \ --deployment-package x86_64-ubuntu20_cuda \ --port 8889 \ --video-output /app/prediction.mp4 \ --json-output /app/prediction.json \ --csv-output /app/prediction.csv 2. Processed ``.mp4`` video is saved in file defined with ``video-output`` CLI argument. 3. ``csv`` and ``json`` files with video predictions will be saved in files, which are defined with ``csv-output`` and ``json-output`` CLI arguments. .. warning:: Potential process issues and fixes: * If you cannot run demo camera in your container, make sure that you have added ``--device /dev/video0`` in docker run command to acces camera. * Possible issue with running camera in your container can be that you don't have permission to /dev/ folder. In order to fix it run ``sudo chown -R user:user /dev/video0`` in container, which enables access to camera. * If you have problem with sending request to server, check docker container port and make sure that you use the same port in curl request. * Make sure that input image used in ``//process/image.py`` and image sent to server is in ``.jpg`` format and video used in ``//process/video.py`` is in ``.mp4`` format. * When root user in container, there is problem with permissions of mounted files/ directories/ devices. That created problems when running camera on Jetson Nano board (problematic connection of xserver so that app can display window with rendered output) and you got the error. .. code-block:: bash No protocol specified Unable to init server: Could not connect: Connection refused then you should execute ``xhost local:root`` on board outside the docker container. VIII. Benchmark =============== Bonseyes AI Assets provide benchmark tools for Pytorch, ONNX, TensorRT (and potentially torch2trt) inference engines evaluation on multiple input sizes. Benchmark tool is running ``//algorithm/algorithm.py`` for specified inference engine on evaluation dataset and then calculates statistics. Accuracy results with calculated preprocessing, inference and postprocessing time, latency, model statistics and hardware statistics (CPU and GPU memory and temperature, power consumption and energy efficiency) are stored in ``csv`` and ``json`` files. Hardware statistics are calculated using ``//utils/meter.py`` which can be also found in `AI Asset Container Generator `_. Also we provide tool for graph generation based on benchmark results (``csv`` file) which will be stored in ``.jpg`` graphic. Bonseyes AI Asset benchmark tools you can find in `AI Asset Container Generator `_. Also, examples of benchmark code in Bonseyes Openpifpaf Wholebody AI Asset you can find in this `link `_. Models benchmark ^^^^^^^^^^^^^^^^ 1. Implement Bonseyes benchmark script in ``//benchmark/__main__.py`` which instantiates ``Algorithm`` class for certain inference engine and runs ``benchmark`` function. ``benchmark`` function is running model evaluation on validation dataset and computes hardware and accuracy statistics which need to be stored in ``results`` dictionary which is the output of this function. Implement ``benchmark`` function in this script: * Apply Algorithm process on every image from dataloader (load images from validation dataset) * Calculate average preprocessing, inference, postprocessing and processing time * Calculate accuracy statistics (for example AP, APM, APL, AR, ARM, ARL) and evaluation time and add them to ``accuracy_stats`` dictionary * Calculate model statistics from ``//benchmark/model_summary.py`` and add ``GFLOPs`` and ``#PARAMS`` to ``result`` dictionary * Calculate hardware statistics from ``HardwareStatusMeter`` class from ``//utils/meter.py`` * Store all calculations in ``result``, ``hw_stats`` and ``accuracy_stats`` dictionaries * Merge ``result``, ``hw_stats`` and ``accuracy_stats`` dictionaries in one dictionary and return it as function output In this `link `_ you can find example of ``//benchmark/__main__.py`` in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example of running ``//benchmark/__main__.py`` script in Bonseyes Openpifpaf Wholebody AI Asset: .. code-block:: bash python -m bonseyes_openpifpaf_wholebody.benchmark \ --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v3.0_shufflenetv2k30_default_641x641_fp32.pkl \ --engine pytorch \ --input-size 641x641 \ --preprocess-with torchvision \ --force-complete-pose \ --seed-threshold 0.2 Benchmark results (``result.csv`` and ``result.json``) should be saved in directory specified with ``result-directory`` CLI argument. 2. Use ``//benchmark/all.py`` to benchmark Pytorch, ONNX, TensorRT (and potentially torch2trt inference engine) or all engines with different precisions (fp32, fp16 and int8). The benchmark results will be stored in ``result.json`` and ``result.csv`` files. Specify possible backbone names as options in backbone CLI argument and dataset path in ``main()`` of this script. In this `link `_ you can find example of ``//benchmark/all.py`` in Bonseyes Openpifpaf Wholebody AI Asset. Here is the example of running ``//benchmark/all.py`` script in Bonseyes Openpifpaf Wholebody AI Asset: .. code-block:: bash python -m bonseyes_openpifpaf_wholebody.benchmark.all \ --input-sizes 28x72 256x192 512x512 \ --device gpu \ --backbone shufflenetv2k30 \ --dataset wholebody LPDNN benchmark ^^^^^^^^^^^^^^^ Benchmark tool is running `LPDNNAlgorithm` class on specified LPDNN inference engine on evaluation dataset and then calculates statistics. In this `link `_ you can find example of running the benchmark with LPDNN engine in 3DDFA: .. code-block:: bash python -m bonseyes_3ddfa_v2.benchmark.evaluate \ --dataset aflw2000-3d \ --engine lpdnn \ --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \ --deployment-package x86_64-ubuntu20_cuda \ --port 8889 \ --input-size 120 \ --model build/3dface-landmarks-v1.0-mobilenetv1-120x120/model.onnx In this `link `_ you can find example of running benchmark.all script in 3DDFA: .. code-block:: bash python -m bonseyes_3ddfa_v2.benchmark.all \ --dataset aflw2000-3d \ --engine all \ --input-sizes 120 \ --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \ --deployment-package x86_64-ubuntu20_cuda \ --port 8889 Generate graphs ^^^^^^^^^^^^^^^ Use Bonseyes AI Asset plot tool in ``//benchmark/generate_graphs.py`` to generate graphs from benchmark ``csv`` results. This function takes ``csv`` file specified with ``csv-path`` CLI argument as input and generates storage, accuracy, performance and resource consumption graphs. This graph will be saved in directory specified with ``output-path`` CLI argument with ``graph.jpg`` name. In this `link `_ you can find ``//benchmark/generate_graphs.py`` in AI Asset Container Generator. You can also find the example of ``//benchmark/generate_graphs.py`` script in `Bonseyes Openpifpaf Wholebody AI Asset `_. Here is the example of running ``//benchmark/generate_graphs.py`` script in Bonseyes Openpifpaf Wholebody AI Asset: .. code-block:: bash python -m bonseyes_openpifpaf_wholebody.benchmark.generate_graphs \ --csv-path /app/result.csv \ --output-path /app/ .. warning:: Potential benchmark issues and fixes: * If you are running benchmark in container on Jetson device and hardware statistics are all 0, make sure that you have mount ``/run/jtop.sock`` when running your container. So, in docker run command for running container with jetpack image ``-v /run/jtop.sock:/run/jtop.sock`` needs to be added IX. Utils ========= Bonseyes AI Assets provide tools for calculating hardware and environment information (GPU and CPU memory, power, model storage, GPU and CPU temperature, environment, code version, git branch and git commit hash). ``/utils`` package contains following scripts: 1. environment_info 2. gstreamer_pipelines 3. hardware_info 4. meter All utils scripts mentioned above can be found in `AI Asset Container Generator `_. In this `link `_ can be found utils scripts in Bonseyes Openpifpaf Wholebody AI Asset. environment_info ^^^^^^^^^^^^^^^^ This script gives contains ``EnvironmentInformation`` class which collects following information: 1. system libraries information - cmake, gcc, cuda and python versions 2. python libraries information - gets python package versions (for example onnx, onnxruntime, numpy, scipy, cython, pandas, torch, torchvision, numba) 3. code version - gets git branch and commit hash This script is imported in ``/benchmark/generate_graphs.py`` script and it is written in the beginning of the graph ``jpg`` file. Here is the `link `_ of ````/utils/environment_info.py`` script in Bonseyes Openpifpaf Wholebody AI Asset. gstreamer_pipelines ^^^^^^^^^^^^^^^^^^^ ``/utils/gstreamer_pipelines.py`` script contains commands for capturing and sincing video frames for all x86_64, NVIDIA Jetson devices and RaspberryPi. It is imported and used in ``/process/video.py`` and ``/process/camera.py`` scripts. Here is the `example `_ of this script in Bonseyes Openpifpaf Wholebody AI Asset. meter ^^^^^ ``/utils/meter.py`` contains ``HardwareStatusMeter`` class, which is used to calculate hardware statistics on GPU and CPU during execution on x86_64, Jetson devices or RaspberryPi4. This class detects environment and collects GPU and CPU memory, power, model storage, GPU and CPU temperature. This script is used in ``/utils/hardware_info.py`` and is also used in ``/benchmark/__main__.py`` script which uses hardware information and stores it to ``csv`` file. Here is the `link `_ of the ``/utils/meter.py`` script in Bonseyes Openpifpaf Wholebody AI Asset. hardware_info ^^^^^^^^^^^^^ ``/utils/hardware_info.py`` contains ``Hardware Information`` class which initializes ``HardwareStatusMeter`` class from ``meter.py`` script and calculates following informations: 1. GPU information - GPU model name, GPU number, drive rversion and CUDA version 2. CPU information - CPU architecture, model name, vendor and CPU number 3. memory information This script is used in ``/benchmark/generate_graphs.py`` where collected informations are written at the beginning of the graph. Here is the `link `_ of the ``/utils/hardware_info.py`` script in Bonseyes Openpifpaf Wholebody AI Asset. X. Testing ========== Implement automatic tests for all interfaces in ``/interface/tests/`` scripts. Interface scripts are used to call ```` modules and they are executed during testing and running AI Asset CLI. 1. Implement and use ``/interface/exporter.py`` script, which executes ``.export.all`` with specified CLI arguments 2. Implement and use ``/interface/optimizer.py`` script, which executes ``.optimize.post_training_quantization.all`` with specified CLI arguments 3. Implement and use ``/interface/image_processor.py`` script, which executes ``.process.image`` with specified CLI arguments 4. Implement and use ``/interface/video_processor.py`` script, which executes ``.process.video`` with specified CLI arguments 5. Implement and use ``/interface/camera_processor.py`` script, which executes ``.process.camera`` with specified CLI arguments 6. Implement and use ``/interface/server.py`` script, which executes ``.process.server`` with specified CLI arguments 7. Implement and use ``/interface/benchmark.py`` script which, executes ``.benchmark.all`` with specified CLI arguments 8. Implement and use ``/interface/trainer.py`` script, which executes ``.train`` with specified CLI arguments and training and validation dataset 9. Add test image in ``/interface/tests/samples/image/`` directory, which will be used while executing process image in test script 10. Add test video in ``/interface/tests/samples/video/`` directory, which will be used while executing process video in test script .. note:: Note that ``/interface/camera_processor.py``, ``/interface/server.py`` and ``/interface/trainer.py`` are not executed in tests. They are only executed using AI Asset CLI. ``/interface/tests/`` scripts are executing interface scripts on GPU or CPU (export, optimize, process image, process video and benchmark). Interface scripts are executed on different engines and different precisions. Tests on CPU can only be executed with Pytorch and ONNX models, while tests on GPU can be executed with Pytorch, ONNX and TensorRT models. 1. Implement CPU test cases in ``/interface/tests/test_interface_cpu.py``. 2. Implement GPU test cases in ``/interface/tests/test_interface_gpu.py`` You can run tests on GPU in your container by executing ``pytest -k gpu`` command in container. Tests on CPU in container can be run by executing ``pytest -k cpu`` command in container. Uncomment test stage in ``.gitlab-ci.yml`` file when all ``/interface/`` and ``/interface/tests/`` scripts are implemented .. warning:: Potential test issues and fixes: * TensorRT requires CUDA, so you can't run TensorRT export, optimize and benchmark on TensorRT models on CPU (those commands mustn't be added in ``/interface/tests/test_interface_cpu.py`` script) XI. AI Asset CLI Integration ============================ AI Asset CLI runs ``/interface/`` scripts explained in Testing section. Interface scripts, which are used in CLI and not used in test scripts, are ``/interface/camera_processor.py``, ``/interface/server.py`` and ``/interface/trainer.py`` XII. Documentation ================== Use Bonseyes documentation template stored in ``/doc`` to explain all implemented components. Store demo image, video and benchmark results in following directories: 1. Store demo images and processed demo images in ``/doc/examples/example_images/`` directory 2. Store demo video and processed demo video in ``/doc/examples/example_videos/`` directory 3. Store ``benchmark.csv``, ``benchmark.json`` and ``graph.jpg`` in: * ``/doc/eval_results/Server/`` directory for server benchmark results * ``/doc/eval_results/NVIDIA-Jetson-AGX`` directory for JetsonXavier AGX benchmark results * ``/doc/eval_results/NVIDIA-Jetson-NX`` directory for JetsonXavier NX benchmark results Implement following ``.rst`` scripts: 1. Implement and use ``/doc/paper.rst`` to add Official repository's paper reference, abstract and links to official git repository, git branch and commit used in source of AI Asset 2. Implement and use ``/doc/usage.rst`` and add following sections: * Installation - add docker pull and docker run commands for all platforms * Data - add paths to data or execution commands for automatic download data * Export - add ``export.all`` execution on CPU and GPU * Optimize - add ``post_training_quantization.all`` execution on CPU and GPU and suggested input sizes * Process - add ``process.image`` execution, add processed image and predictions from ``json`` file. For video add ``process.video`` execution command and add processed video gif from ``/doc/examples/examples_videos/`` directory. For camera add ``process.camera`` execution command * Benchmark - add single model benchmark and benchmark all execution command and copy json file with results for single model benchmark. 3. Implement and use ``/doc/install.rst`` and add Bonseyes AI Asset installation for target device. In Workstation/Server (x86_64), NVIDIA Jetson devices, RaspberryPi4 devices sections add: * System Requirements * Docker section - profiles for certain devices, docker build and docker run container commands and AI Asset setup in Dockerfile 4. Implement and use ``/doc/models.rst`` and add paths to pretrained Pytorch models and their model summaries on multiple input sizes 5. Implement and use ``/doc/train.rst`` and add section for train and validation dataset and section for executing training with CLI 6. Implement and use ``/doc/optimize.rst`` and add section for validation data and sections for executing ONNX, TensorRT quantization and quantization of all inference engines 7. Implement and use ``doc/export.rst`` and add sections for executing ``//export/torch2onnx.py``, ``//export/onnx2trt.py`` and ``//export/all.py`` scripts 8. Implement and use ``doc/process.rst`` and add sections for executing ``//process/image.py``, ``//process/video.py``, ``//process/camera.py`` and ``//process/server.py`` scripts 9. Implement and use ``doc/eval.rst`` and add sections: * Reproduce Published Results - add path where data is stored or how to execute automatic data download script and add execution command for running eval script of Official code if it exists. Copy evaluation result in this section * Single Model Benchmark - executing ``.benchmark`` and copy evaluation result in this section * Benchmark of All Models - executing ``.benchmark.all`` * Sample Processed Images - add sample demo images and processed demo images 10. Implement and use ``doc/benchmark.rst`` and upload ``benchmark.csv`` and ``graph.jpg`` files from examples folder for ``x86_64``, ``NVIDIA Jetson Xavier AGX``, ``NVIDIA Jetson Xavier NX`` and ``RaspberryPi4`` You can update documenation by running: .. code-block:: bash cd doc rm -rf _build && make html To view rendered HTML docs open ``/doc/_build/html/index.html`` .. warning:: Potential documentation issues and fixes: * If there is error called ``No module named`` for some library, add the library name to ``autodoc_mock_imports`` list in ``/doc/conf.py``