
Technical explanation about creating a new AI Asset.

Create AI Asset

This section describes how to setup local and server environment first and then how to create and develop new AI Assets:


In order to start developing new AI Asset first complete the initial setup

Gitlab setup

  1. Create Gitlab Account - GitLab Registration

  2. Setup your GitLab account and add ssh key - Gitlab & SSH keys

  3. Create empty GitLab repository named <BonseyesAIAssetName> in your dedicated group

Local setup

Setup localhost workstation/laptop for development, ensure that you have the following software installed:

  1. NVIDIA Drivers for your graphics card - NVIDIA Autodetect driver

  2. Docker - Install on Ubuntu

  3. NVIDIA container toolkit - Install container-toolkit

  4. Git

  5. Python3.6+

HPC setup

Setup HPC for training and image builds, ensure that you have the following software installed:

  1. NVIDIA Drivers for your graphics card - NVIDIA Autodetect driver

  2. Docker - Install on Ubuntu

  3. NVIDIA container toolkit - Install container-toolkit

  4. Docker Buildx - Install Buildx

  5. Install qemu and enable aarch64 emulation:

    # Install the qemu packages
    sudo apt-get install qemu binfmt-support qemu-user-static
    # Enable emulation
    docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
  6. Git

  7. Python3.6+

GitLab runner setup on HPC instance

1. On your GitLab repository, open Settings > CI/CD > Runners in order to disable shared and group runners and add specific runner required to support automated preconfigured CI/CD pipelines.

Disable shared runners on GitLab repository. Disable group runners on GitLab repository.
  1. Download Gitlab Runner installation script and copy script to the HPC instance

  2. Find specific runner configuration credentials on your GitLab repository under Settings > CI/CD > Runners

Disable group runners on GitLab repository.
  1. Execute script on the HPC instance:

    # Change file mod permission
    sudo chmod +x
    # Execute script providing proper values
    ./ <runner_name> <repository_registration_token>
    # Example run
    ./ bonseyes_3ddfa HuQV-VGty-HL7vprN5Rb

Start new project

  1. Clone AIAssetContainerGenerator on your local machine and

  2. Follow AIAssetContainerGenerator to create new AI Asset boilerplate project

  3. Initialize git in newly created boilerplate project

  4. If you plan to use existing network implementation as baseline attach it as submodule in /source directory of boilerplate root

    git submodule add <repo-url> /source/<repo-name>
  5. When creating new AI Asset Bonseyes framework suggests the following Git workflow:

    • Use master branch for stable tested release tagged with version e.g v1.0, v2.0, ...

    • Use dev branch for daily development

    • Use feature/feature_name branch from dev to implement new features

    • Tag commits on dev and master branches to trigger docker image builds

  6. Follow GitLab instructions in your newly created repo on how to push existing folder

    • Every commit on certain branch triggers GitLab runner, which executes .gitlab-ci.yml file in your project. .gitlab-ci decides which stages (of possible build, test, push, package and pages) for all listed platforms in it will be executed. Which stages will be executed depends on which branch we are currently.

    • If you encounter git error regarding unsafe direcotries during container builds, modify .gitlab-ci.yml and include line git config --global --add /path/to/unsafe/dir. If this does not solve the issue try to use --system instead of --global git option.

Local development workflow

  1. Pull x86_64 image that was built during CI/CD process or build image locally

    # Option 1: Pull built image (check registry tab on your GitLab project web page for url)
    docker pull <image-url>
    # Option 2: Build image on your local machine
    python3 <bonseyes_aiasset_name>/docker/ \
        --platform x86_64 \
        --profile <bonseyes_aiasset_name>/docker/profiles/x86_64/ubuntu18.04_cuda10.2_python3.6_tensorrt7.0.yml \
        --image-name <bonseyes_aiasset_name>x86_64:<v1.0>_cuda10.2_tensorrt7.0 \
        --device cuda
    • build script calls Dockerfile on specified platform (x86_64, Jetson devices and RaspberryPi4) and device (GPU or CPU). In Dockerfile you should run your script which contains all python packages with their versions used in your AI Asset for x86_64, Jetson devices and RaspberryPi4.

    • Dockerfiles for x86_64, Jetson and RaspberryPi4 are stored in /<bonseyes_aiasset_name>/docker/platforms/

    • Pytorch, CMake, OpenCV, ONNXRuntime, ONNX, TensorRT, Python versions which will be installed during docker build are written in /<bonseyes_aiasset_name>/docker/profiles/. These versions are sent as arguments to Dockerfiles.

    • Existing x86_64 profiles:

    • Existing NVIDIA Jetson profiles:

    • For RaspberryPi4 available profile is:

    • The result of build script (if everything works properly) is new docker image.


    If you want to make minor changes (very small changes from official code instead of writing in your Bonseyes AI Asset) in submodule, you mustn’t commit changes to official source repository. Instead of committing changes to official repository, you need to create git patch and save it to /source/patch/ directory. To apply patch to submodule use command in your container:

    cd /app/source/<submodule_name> && git apply /app/source/patch/modification_1.patch

    You also need to add this command in Dockerfile for building image with applying patch before setup.

  2. Run x86_64 image and mount your project root to /app

    • If you are using directory with images and annotations generated by DataTool, you should mount directory with datasets and annotations to <bonseyes_aiasset_name>/data/storage directory while executing docker run command. In this case you should run built container with:

      # Example how to run built container when you are using dataset and its annotations generated by DataTool
      cd <bonseyes_aiasset_name>
      docker run --name <bonseyes_aiasset_name> \
          --privileged --rm -it \
          --gpus 0 \
          --ipc=host \
          -p 8888:8888 \
          -v $(pwd):/app \
          -v /path/to/processed/dataset1:/app/<bonseyes_aiasset_name>/data/dataset1 \
          -v /path/to/<bonseyes_aiasset_name>/data/dataset1/ \
          -v /path/to/<bonseyes_aiasset_name>/data/dataset1/ \
          -v /path/to/processed/dataset2:/app/<bonseyes_aiasset_name>/data/dataset2 \
          -v /path/to/<bonseyes_aiasset_name>/data/dataset2/ \
          -v /path/to/<bonseyes_aiasset_name>/data/dataset2/ \
          -v /tmp/.X11-unix:/tmp/.X11-unix \
          --device /dev/video0 \
          -e DISPLAY=$DISPLAY \

    At this point you can develop on your host environment using IDE of your choice and test implementation inside of running docker container

I. Data

Make sure that you correctly attached datatool requirements and mounted generated processed datasets:

  1. Confirm that you have datatool_api submodule attached in AI Asset data directory

cd <project_root>
git submodule add -b python3.6 ../../../../../../artifacts/data_tools/apis/datatool-api.git <bonseyes_aiasset_name>/data/datatool_api
git submodule update --init --recursive <bonseyes_aiasset_name>/data/datatool_api

NOTE: If your AI Asset is not in your group root you will need additionally to change the relative path of the datatool-api submodule.

  1. Confirm that you have and mounted to for all datasets in data/dataset1 ... data/dataset2

  2. Confirm that when you are executing docker run, you properlly mounted directory which is result from DataTool(s) to /<bonseyes_aiasset_name>/data/dataset1 folder by adding -v /path/to/dataset1:/app/<bonseyes_aiasset_name>/data/dataset1 in docker run command.

Steps to Use the Datatools inside AI assets:

  1. Remove __future__ imports from and scripts (only for AI Assets with python version < 3.8)

    As part of datatool development, the AI-talents created the python based custom data model for each datatool which is defined by “” and “” scripts. This data model is the interface that should be used to load and read the datatool output inside AI-assets.

    Since there is a mismatch on the python versions between datatools (using python >= 3.9) and AI-assets (using python3.6.9, will be updated in future), the “” and “” scripts need to be modified to remove __future__ imports which are not supported by python3.6.9. To do so, remove the line “from __future__ import annotations” from the two scripts. This line is generally found at the beginning of the file. Please refer to the example images below to locate the line.


  2. Remove any return types from which are not supported due to removal of the __future__ imports (only for AI Assets with python version < 3.8)

    As a consequence of removing the __future__ import from the scripts, methods inside classes can not have annotations for the return type if the return type is the same as the class type that contains the method.

    To fix it, remove any return types from the “” script where the return type is the same as the class type. The image below shows an example where method “extract” has the “CocoBoundingBox2D” as its return type and this should be removed (hence “-> CocoBoundingBox2D” part should be removed) from the function signature.

  3. Rename the “” and “” scripts in case you are using multiple data tools inside the AI-asset

    In case your AI-assets uses multiple datatools and the datatools do not share the same data model, you need to rename the “” and “” scripts so that they are differentiable for the python interpreter at the import time.

    For example if you plan to use two datatools, datatool1 and datatool2, you can rename the files to [“”, “”] and “ [”, “”] for datatool1 and datatool2 respectively.

  4. Import custom data model inside the data loader script

    Once you have mounted the datatool output directories, mounted and scripts for each datatool after renaming them and added the Datatool API as a submodule inside your AI-asset by following the instructions provided in the AI-asset documentation, you can use the Datatool API and your custom data models to load the dataset inside your data loader scripts.

    To load the datasets using the data model classes, you need to add the relative paths to the Datatool API directory, and directory for each custom data model at the top of your data loader script and then import the “DTDatasetCustom” class for each data model.

    For example if you intend to load the datatool outputs from 2 datasets, you need will do the following for imports:

    import sys
    from dataset1.custom_dataset_model_dt1 import DTDatasetCustom as Dataset_1
    from dataset2.custom_dataset_model_dt2 import DTDatasetCustom as Dataset_2

    Then inside your loader function, you can simply use the Dataset classes to load the respective datasets.

    dt1 = Dataset_1(name='dt1', operatingMode='memory')
    for k, v in dt1.annotations.items():
        print(k, v.dict())
    dt2 = Dataset_2(name='dt2', operatingMode='memory')
    for k, v in dt2.annotations.items():
        print(k, v.dict())

II. Train

Bonseyes AI Assets provide training package which enables running source training scripts if they exist with specified hyperparameters for different backbones in config files.

Bonseyes AI Assets training tool contains:

  1. config directory which contains config files with device and system configurations, paths to datasets and annotations and hyperparameter configuration.

  2. <bonseyes_aiasset_name>/train/ script which uses hyperparameters and configurations from config.yml file and runs source training code if it exists.

Training scripts and config files can be found in AI Asset Container Generator.

Bonseyes training tool also needs training, validation and test datasets and annotations for training execution. Datasets can be downloaded and used in 2 ways:

  1. If you want to download datasets with their original annotations which are used in source repository, you need to implement script for downloading datasets and annotations in /<bonseyes_aiasset_name>/data/ script

  2. If you are using DataTool check how it can be used in I. Data section.

Scripts which need to be implemented and used for data and annotations downloading without DataTool you can find in AI Asset Container Generator.

Get data

Bonseyes AI Assets provide tool for downloading data with official annotations which are used in source code. It is stored in <bonseyes_aiasset_name>/data/ script which contains functions for downloading train, validation and test datasets with annotations.

In this link you can find example how those scripts are implemented in Bonseyes Openpifpaf Wholebody AI Asset.

Here is the example how you can download training dataset with annotations in Bonseyes Openpifpaf Wholebody AI Asset:

python -m \
--download train \
--dataset wholebody

Config file

Configuration yml files in Bonseyes AI Asset training tool are used for storing device and system configurations and hyperparameters which are sent to train script as CLI arguments.

For each backbone and each training experiment, different configuration files are created. Path to this file is set as CLI argument in <bonseyes_aiasset_name>/train/ script which reads all hyperparameters and sends it to train script as CLI arguments.

Configuration file contains multiple sections with parameters inside:

  1. device - GPU number, number of workers

  2. hyperparameters - number of epochs, learning rate, backbone, checkpoint and batch size

  3. data - paths to datasets and annotations files

  4. system - output where log is written


You should name config yml files by our naming convention. Example how config file can be named is following:


where v3.0 refers to tag version, shufflenetv2k16 is backbone name, flag default is for pretrained model from official repository, 641x641 is training input size and fp32 is model precision.

In this link you can find config file examples in Bonseyes Openpifpaf Wholebody AI Asset.

Here is the example of one config file:

  loader-workers: 16
  gpu_num: 4

  lr: 0.0001
  momentum: 0.95
  b-scale: 3.0
  epochs: 250
  lr-decay: [130, 140]
  lr-decay-epochs: 10
  batch-size: 16
  weight-decay: 1e-5
  basenet: "shufflenetv2k16"

  dataset: "wholebody"
  wholebody-upsample: 2
  wholebody-train-annotations: /app/source/data-mscoco/annotations/person_keypoints_train2017_wholebody_pifpaf_style.json
  wholebody-val-annotations: /app/source/data-mscoco/annotations/person_keypoints_val2017_wholebody_pifpaf_style.json
  wholebody-train-image-dir: /app/source/data-mscoco/images/train2017
  wholebody-val-image-dir: /app/source/data-mscoco/images/val2017

  output: "/app/bonseyes_openpifpaf_wholebody/train/outputs/openpifpaf_shufflenetv2k16_v13.pth"

train script

<bonseyes_aiasset_name>/train/ script loads yml config file, converts all hyperparameters from yml file to CLI arguments and runs source training code with extracted CLI arguments from file.

In this link you can find the example of training script in Bonseyes Openpifpaf Wholebody AI Asset.

Here is the example how <bonseyes_aiasset_name>/train/ script is called in Bonseyes Openpifpaf Wholebody AI Asset:

python3 -m bonseyes_openpifpaf_wholebody.train
    --config /app/bonseyes_openpifpaf_wholebody/train/configs/v3.0_shufflenetv2k16_default_641x641_fp32_config.yml

III. Model Catalog

Bonseyes AI Assets provide specific model nomenclature and directories where pretrained models should be stored using Git LFS (large file system). Also Bonseyes AI Assets provide model summary script for calculating total number of network parameters, number of floating point arithmetics (FLOPs), number of multiply-ads and memory usage.

Models nomenclature and storage

In Bonseyes AI Asset pretrained Pytorch/Tensorflow models should be stored in /<bonseyes_aiasset_name>/model/<pytorch|tensorflow>/<backbone>/<model_name>.<pth|tf> directory as Git LFS.

Here is the example how pretrained model is stored in Bonseyes Openpifpaf Wholebody AI Asset.


Follow Bonseyes guidelines for model file naming, for example Pytorch model can be named as:


where v3.0 refers to tag version, shufflenetv2k30 is backbone name, flag default is for pretrained model from official repository, 641x641 is training input size and fp32 is model precision.

Only pretrained models should be stored in gitlab, while inference engines (ONNXRuntime, TensorRT and torch2trt) shouldn’t be committed to Gitlab repository.

Model summary

Reuse and adjust if needed Bonseyes summary utility script /<bonseyes_aiasset_name>/benchmark/ to create pretrained model summary in json file, which contains:

  • Total number of network parameters

  • Theoretical amount of floating point arithmetics (FLOPs)

  • Theoretical amount of multiply-adds (MAdd)

  • Memory usage

/<bonseyes_aiasset_name>/benchmark/ script you can find in AI Asset Container Generator.

In this link you can find /<bonseyes_aiasset_name>/benchmark/ in Bonseyes Openpifpaf Wholebody AI Asset.

Here is the example of /<bonseyes_aiasset_name>/benchmark/ in Bonseyes Openpifpaf Wholebody AI Asset.

python -m bonseyes_openpifpaf_wholebody.benchmark.model_summary \
    --model-path /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \
    --engine pytorch \
    --input-size 640x640 \
    --backbone shufflenetv2k30 \
    --json-output /app/

Also, in this link you can find model summaries for multiple models with multiple input sizes in Bonseyes Openpifpaf Wholebody Asset.

IV. Algorithm

Algorithm is important part in every Bonseyes AI Asset which contains complete flow of image processing by model or inference engine. Algorithm components are listed below:

  1. AlgorithmInput class which structures input

  2. Algorithm class which contains functions for:

  • Loading Pytorch/Tensorflow, ONNXRuntime, TensorRT and torch2trt engines

  • Preprocessing input before passing it to inference engine

  • Pytorch/Tensorflow, ONNXRuntime, TensorRT and torch2trt inference

  • Postprocessing inference engines outputs

  • Inference engine processing which includes running preprocessing, inference and postprocessing functions and calculating their execution times. This function also stores postprocessing output and execution times in concrete form in AlgorithmResult class.

  • Rendering which displays postprocessing results on image

  • Destroying which runs inference engines destructor

  1. AlgorithmResult class where postprocess output is structured in concrete form. This class stores postprocessing outputs, preprocessing, inference, postprocessing time and latency in results dictionary.


Bonseyes AI Assets algorithm examples are provided for image processing, but it can be modified for any kind of input.

AlgorithmInput, Algorithm and AlgorithmResult classes are stored in /<bonseyes_aiasset_name>/algorithm/ and they need to inherit BaseAlgorithmInput, BaseAlgorithm and BaseAlgorithmResult abstract classes. In this purpose Bonseyes AI Assets provide BaseAlgorithmInput, BaseAlgorithm and BaseAlgorithmResult abstract classes which are stored in /<bonseyes_aiasset_name>/algorithm/ script.

Bonseyes AI Assets also provide LPDNN algorithm which executes LPDNN using HTTP Worker and runs process and render functions.

Bonseyes AI Assets can also provide Challenge Interface integration. Challenge represents problem definition in the techical/interface level and Bonseyes AI Asset implements the defined interface. The goal of the Challenge Interface Integration is that AlgorithmResult output in /<bonseyes_aiasset_name>/algorithm/ script should be in the Challenge Interface form which depends from task to task.

All supported tools for algorithm implementation (/<bonseyes_aiasset_name>/algorithm/ script with abstract algorithm classes, /<bonseyes_aiasset_name>/algorithm/ where LPDNNAlgorithm class is defined and /<bonseyes_aiasset_name>/algorithm/ script which needs to be implemented) can be found in AI Asset Container Generator. Also, Bonseyes AI Asset provides scripts for some of the steps of the algorithm. For instance, scripts for loading and inferencing ONNXRuntime, TensorRT and torch2trt inference engines can be found in AI Asset Container Generator.

Algorithm classes from /<bonseyes_aiasset_name>/algorithm/ script are used in process and benchmark tasks. In image processing case algorithms process and render functions are applied on image, while during video and camera process task they are applied on video frames. During benchmark execution, algorithms process function is ran on every image from validation dataset.

Algorithm base classes

Bonseyes AI Assets provide Algorithm Base classes for structuring algorithm input, model and inference engine loading, processing, rendering and structuring algorithm result.

This script contains BaseAlgorithmInput, BaseAlgorithm and BaseAlgorithmResult abstract classes, which need to be inherited in /<bonseyes_aiasset_name>/algorithm/ script.

Here is the example of script in AI Asset Container Generator, which needs to be inherited in script. script contains:

  1. BaseAlgorithmInput class, which is used for structuring algorithm input and it needs to be inherited with AlgorithmInput class.

  2. BaseAlgorithm class, which is used for loading model, preprocessing, inference, postprocessing and rendering results and it should be inherited with Algorithm class.

  3. BaseAlgorithmResult class, which is used for algorithm result structuring to json/dict form and it should be inherited with AlgorithmResult class.

Algorithm inherited classes

Inherit base algorithm classes, defined and implemented in /<bonseyes_aiasset_name>/algorithm/, for loading model, processing and rendering.

In this link you can find algorithm script in AI Asset Container Generator.

An example implementing algorithm script with ONNXRuntime and TensorRT inference engines can be found in Bonseyes Openpifpaf Wholebody algorithm.

Another example implementing algorithm script with ONNXRuntime, TensorRT and torch2trt inference engines can also be found in Bonseyes YOLOX algorithm.

Algorithm implementation process:

  1. Implement and use AlgorithmInput class by inheriting abstract BaseAlgorithmInput class for structuring algorithm input. This is optional step and should be used with more complex pipelines. For example if you have a face detector and face landmark detector, algorithm input can be ROI detected by face detector where landmark detector is primary algorithm.

  2. Implement and use Algorithm class by inheriting BaseAlgorithm class.

    In this class you need to implement:

    • __init__ of this class with specified model_path, engine_type (torch, onnx, tensorrt or torch2trt), input_size, backbone, device (cpu, gpu), thread_num and Bonseyes AI Assets specific arguments.

    • load_model function, which can load Pytorch/Tensorflow, ONNXRuntime and TensorRT models. You can load also torch2trt models if Pytorch is your starting point model. This function should be called at the end of the __init__ of Algorithm class.

    • preprocess function for all inference engines (Pytorch/Tensorflow, ONNXRuntime and TensorRT), which returns preprocessing result. You can load also torch2trt models if Pytorch is your starting point model.

    • infer functions for multiple inference engines (Pytorch, ONNXRuntime, TensorRT and possibly torch2trt). For inference implementation use Bonseyes AI Asset inference engine wrappers in /<bonseyes_aiasset_name>/algorithm/inference_engines/ to run inference. infer function needs to call infer_pytorch, infer_onnxruntime, infer_tensorrt or infer_torch2trt functions depending on the engine.

      Inference engines implementations (ONNXRuntime, TensorRT and torch2trt) can be found in AI Asset Container Generator. This function takes preprocessing output, runs model inference and returns model output.

    • postprocessing function for all inference engines (Pytorch/Tensorflow, ONNXRuntime, TensorRT and potentially torch2trt). This function takes model output from the infer function and returns postprocessing output.

    • process function for all inference engines, which needs to run preprocess, inference and postprocess functions, calculate time for their calls and store preprocessing, inference, postprocessing and processing times with postprocessing output to AlgotirhmResult class. This function takes input image as input and returns result object from AlgorithmResult class.

    • render function, which takes input image and output of process function (AlgorithmResult object) and applies render on it. The result of this function is rendered image.

    • destroy function, which runs destructor for inference engines (all except Pytorch).


    You can test /<bonseyes_aiasset_name>/algorithm/ functionalities by running some of the process scripts to se visual results.

  3. Implement and use AlgorithmResult class by inheriting BaseAlgorithmResult class to implement algorithm results in json/dict form. Here is the example of this form:

    self.dict = {
        "time": {
            "pre_processing_time": self.pre_processing_time,
            "infer_time": self.infer_time,
            "post_processing_time": self.post_processing_time,
            "processing_time": self.processing_time,
        "items": self.items

    Postprocessing outputs are extracted in self.items in AlgorithmResult class. For example in object detection case self.items is list of dictionaries, where each dictionary represents one prediction and contains keys and values for bbox informations, class name and confidence score. Preprocessing, inference, postprocessing and processing times are init arguments of the class. Also Algorithm postprocess output is init argument of this class.

LPDNN Algorithm

LPDNN Algorithm is used for running process, render and destroy on LPDNN inference engines. It starts AI App by instantiating HTTP worker for specified deployment package. In process function case inputs are sent to POST request and returns the process results.

LPDNN Algorithm class is stored in /<bonseyes_aiasset_name>/algorithm/ and it can be found in AI Asset Container Generator.

Implement LPDNNAlgorithm from AI Asset Container Generator by adding code for:

  1. Passing inputs to POST request in process function

  2. Parsing HTTP worker output in process function

  3. Modifying AlgorithmResult structure in process function

  4. render function implementation

The example of LPDNN Algorithm you can also find in Bonseyes 3DDFA Asset.


Only process, render and destroy functions of LPDNNAlgorithm class are used. Other functions are not required to be implemented for process and benchmark scripts.

Challenge Interface

Challenge Interface is used for reformating AlgorithmResult classes output to challenge defined format. The Challenge Interface format depends on the task.

Integrate Challenge Interface in /<bonseyes_aiasset_name>/algorithm/ with the following steps:

  • Add Challenge Interface repository as submodule in /<bonseyes_aiasset_name>/algorithm/ directory. Challenge Interface repositories for different tasks you can find in this link.

    Here is the example how Challenge Interface can be added as submodule:

    cd <bonseyes_aiasset_project_root>
    git submodule add ../../../../../../artifacts/challenges/<your_challenge_interface>.git <bonseyes_aiasset_name>/algorithm/challenge_interface
    git submodule update --init --recursive <bonseyes_aiasset_name>/algorithm/challenge_interface

    This is going to change your <bonseyes_aiasset_project_root>/.gitmodules file. It adds a new submodule information to it.

    When you run

    cd <bonseyes_aiasset_project_root>
    cat .gitmodules

    .gitmodules file should look something like this:

    [submodule "<bonseyes_aiasset_name>/algorithm/challenge_interface"]
    path = <bonseyes_aiasset_name>/algorithm/challenge_interface
    url = ../../../../../../artifacts/challenges/<your_challenge_interface>.git

    For example, Bonseyes Openpifpaf Wholebody AI Asset is using NV-Bodypose2D-BP2D challenge and imports it as submodule in bonseyes_openpifpaf_wholebody/bonseyes_openpifpaf_wholebody/algorithm/ directory.

    Commands how Challenge Interface is added in Bonseyes Openpifpaf Wholebody AI Asset is:

    git submodule add ../../challenges/nv-bodypose2d-bp2d.git bonseyes_openpifpaf_wholebody/algorithm/nv_bodypose2d_bp2d
    git submodule update --init --recursive bonseyes_openpifpaf_wholebody/algorithm/nv_bodypose2d_bp2d

    The example how this submodule is imported in Bonseyes Openpifpaf Wholebody AI Asset you can find in the following link.

  • Import Challenge Interface submodule classes in /<bonseyes_aiasset_name>/algorithm/ Use imported submodule classes and reformat AlgorithmResult outputs.

    Here is the example how Challenge Interface classes are imported in Bonseyes Openpifpaf Wholebody AI Asset /<bonseyes_aiasset_name>/algorithm/ script:

    from bonseyes_openpifpaf_wholebody.algorithm.nv_bodypose2d_bp2d.interfaces.NVBodypose2DBP2D_Result import The2DBodyJoints, The2DBoundingBox, NVBodypose2DBP2DResultElement

    Integrate challenge result class (for example NVBodypose2DBP2D_Result class) into AlgorithmResult class in /<bonseyes_aiasset_name>/algorithm/ and reformat AlgorithmResult output to be list of Challenge Interface Result classes.

    In this link you can find example how NVBodypose2DBP2D_Result class from NV-Bodypose2D-BP2D submodule is imported in /bonseyes_openpifpaf_wholebody/algorithm/ and how AlgorithmResult outputs are reformatted as list of NVBodypose2DBP2DResultElement class from Challenge Interface.

V. Export

Bonseyes AI Assets provide export tools for different precisions: floating point 32 (fp32) and floating point 16 (fp16) to convert AI model from a training format framework to a deployment format one. Deployment frameworks allow the creation of AI Applications that have lower storage, computation cost and runs more efficiently on the GPU or CPU. Exported engines weights and activations should have certain precision (fp32 or fp16).

Bonseyes AI Assets support models export to fp32 and fp16 precision through two inference engines: ONNXRuntime and TensorRT. Export for GPU deployment can be applied on both TensorRT and ONNXRuntime, while export for CPU deployment can only be applied with ONNXRuntime engine. Further, if a Pytorch model is the starting point, it is also possible to apply models export to fp32 and fp16 precision using torch2trt script directly.

Bonseyes AI Assets also provide export tool for AI App generation and exporting ONNX to LPDNN inference engines (LNE, ONNXRuntime and TensorRT).

Bonseyes tools for ONNXRuntime, TensorRT, torch2trt and LPDNN export can be found in the AI Asset Container Generator.

ONNX export

Bonseyes AI Assets provide ONNXRuntime export tools for Pytorch and TensorFlow/Keras starting point models.

torch2onnx export

  1. Use /<bonseyes_aiasset_name>/export/ to export Pytorch model to ONNX with defined input size (width and height specified as CLI input arguments) and fp32 precision.

    In this link you can find example of /<bonseyes_aiasset_name>/export/ of the Bonseyes YOLOX Asset.

    Here is the example of running /<bonseyes_aiasset_name>/export/ script of Bonseyes YOLOX Asset:

python -m bonseyes_yolox.export.torch2onnx \
    --model-input /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \
    --model-output /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \
    --input-width 640 \
    --input-height 640

or you can run /<bonseyes_aiasset_name>/export/ script with engine onnxruntime.

python -m bonseyes_yolox.export.all \
    --precisions fp32 \
    --input-sizes 640x640 \
    --engine onnxruntime \
    --backbone yolox_s
  1. Exported ONNXRuntime model should be saved in /<bonseyes_aiasset_name>/models/onnx/{args.backbone}/ directory and should be named as:


where v1.0 is version, yolox_s is backbone_name, default is that it is exported from pretrained Pytorch model, 640x640 is input size and fp32 is precision.

tf2onnx export

  1. Tensorflow/Keras as the staring point model can be used, but is not officially supported in the AI Asset. In /<bonseyes_aiasset_name>/export/ you can find script for Keras .h5 model export to ONNXRuntime.

  2. Add tensorflow (tf-2.0 or newer) and tf2onnx (tf2onnx-1.8.4 or newer) with their versions in and install them during building image.

  3. Use /<bonseyes_aiasset_name>/export/ to export Tensorflow/Keras model to ONNX with specified input and output model and input size (width and height specified as CLI input arguments) and fp32 precision.

    Here is the example of running /<bonseyes_aiasset_name>/export/ script:

    python3 -m <bonseyes_aiasset_name>.export.tf2onnx \
        --model-input /path/to/h5/model \
        --model-output /path/to/output/onnx/model \
        --input-width /input/width/ \
        --input-height /input/height/
  4. Add a subprocess call of /<bonseyes_aiasset_name>/export/ in ONNXRuntime case in /<bonseyes_aiasset_name>/export/ script.

  5. Exported ONNXRuntime model should be saved in /<bonseyes_aiasset_name>/models/onnx/{args.backbone}/ directory. Here is the example how exported model should be named:


where v1.0 is version, yolox_s is backbone_name, default is added as it comes from a pretrained model from official repository, 640x640 is input size and fp32 is precision.

TensorRT export

  1. Use /<bonseyes_aiasset_name>/export/ to export ONNX model to TensorRT with precisions fp16 or fp32.

In this link you can find example of /<bonseyes_aiasset_name>/export/ of the Bonseyes YOLOX Asset.

Here is the example of running /<bonseyes_aiasset_name>/export/ script in Bonseyes YOLOX Asset:

python -m bonseyes_yolox.export.onnx2trt \
    --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \
    --output-dir /app/bonseyes_yolox/models/tensorrt/Tesla_T4/yolox_s \
    --precision fp32

or you can run /<bonseyes_aiasset_name>/export.all script with engine tensorrt. Specify precision to be fp32 or fp16 or add both precisions in CLI argument. Example of running onnx2trt export to fp16 and fp32 through export.all script in Bonseyes YOLOX Asset is the following:

python -m bonseyes_yolox.export.all \
    --precisions fp32 fp16 \
    --input-sizes 640x640 \
    --engine tensorrt \
    --backbone yolox_s
  1. Exported TensorRT model should be saved in /<bonseyes_aiasset_name>/models/tensorrt/{gpu_name}/{args.backbone} directory and should be named as one of the following models:


    where v1.0 is version, yolox_s is backbone_name, default is that it is exported from ONNX model, which is exported from pretrained Pytorch model, 640x640 is input size, fp32 precision and dla_enabled or dla_disabled is information that about enabling DLA during export.

torch2trt export

  1. Make sure you added torch2trt installation in Dockerfile.cuda for x86 and jetson and installed torch2trt.

    To install torch2trt, you need to add

    RUN cd /tmp && git clone \
        && cd torch2trt \
        && git checkout 0400b38123d01cc845364870bdf0a0044ea2b3b2 \
        && wget \
        && git apply 8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \
        && python3 install --user

    in /<bonseyes_aiasset_name>/docker/platforms/x86_64/Dockerfile.cuda and /<bonseyes_aiasset_name>/docker/platforms/nvidia_jetson/Dockerfile.cuda after setup AI Asset.

    Example of torch2trt installation in Dockerfile.cuda for x86 on Bonseyes YOLOX you can find on this link.

    Example of this installation you can find in AI Asset Container Generator.

  2. Use /<bonseyes_aiasset_name>/export/ to export Pytorch model to TensorRT with precisions fp16 or fp32. It is also possible to use the use-onnx CLI argument during calling this script where torch2trt converts Pytorch model to ONNX first, and then exports the resulted ONNX to TensorRT model with fp32 or fp16 precision.

In this link you can find example of /<bonseyes_aiasset_name>/export/ of the Bonseyes YOLOX Asset.

Here is the example of running export/ script in Bonseyes YOLOX Asset:

python3 -m bonseyes_yolox.export.torch2trt \
    --input-path /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \
    --output-dir /app/ \
    --precision fp32 \
    --input-width 640 \
    --input-height 640

or you can run /<bonseyes_aiasset_name>/export/ script with engine torch2trt. Specify precision to be fp32 or fp16 or add both precisions in CLI argument.

python -m bonseyes_yolox.export.all \
    --precisions fp32 fp16 \
    --input-sizes 640x640 \
    --engine torch2trt \
    --backbone yolox_s


Note that /<bonseyes_aiasset_name>/export/ script exports 2 torch2trt models:

  1. Converting Pytorch to ONNX and then exporting to TensorRT and

  2. Directly exporting TensorRT model from Pytorch

  1. Exported torch2trt model should be saved in /<bonseyes_aiasset_name>/models/torch2trt/{gpu_name}/{args.backbone} directory.


    After torch2trt export, two torch2trt optimized models are saved in /<bonseyes_aiasset_name>/models/torch2trt/{gpu_name}/{args.backbone}, where one has .pth extension (Python) and another has .engine extension (tensorRT) and is used in C++. In the given example the exported .pth model is imported for inference. More information about loading and inferencing torch2trt models can be found in AI Asset Container Generator

torch2trt models with .pth extension, which are exported directly from Pytorch to TensorRT are named with one of the following names:


where v1.0 is version, yolox_s is backbone_name, default is that it is exported from pretrained Pytorch model, 640x640 is input size, fp32 precision and dla_enabled or dla_disabled is information about enabling DLA during export.

torch2trt models with .pth extension, which are exported from Pytorch to ONNX and then from ONNX to TensorRT are named with one of the following names:


where v1.0 is version, yolox_s is backbone_name, default is that it is exported from pretrained Pytorch model, 640x640 is input size, fp32 precision, dla_enabled or dla_disabled, which signs that DLA is enabled or disabled during export and _with_onnx is that model is exported from Pytorch to ONNX and then from ONNX to TensorRT.

onnx2lpdnn export

  1. For onnx2lpdnn export add following dependencies:

    • In /deps/requirements_<platform_name>.txt file add h5py, example: link.

    • In /<bonseyes_aiasset_name>/docker/platforms/<platform_name>/Dockerfile file add hdf5-tools, example: link.

  2. Use /<bonseyes_aiasset_name>/export/ to generate AI App and export ONNX model to LPDNN inference engines (LNE, ONNXRuntime or TensorRT) with precisions fp32 or fp16.

    You can find /<bonseyes_aiasset_name>/export/ in AI Asset Container Generator export directory. This script uses algorithm, challenge and deployment yaml files, which are stored in lpdnn directory and can be found in AI Asset Container Generator.

    In this link you can find example of /<bonseyes_aiasset_name>/export/ script in the Bonseyes 3DDFA Asset.

    Here is the example of running /<bonseyes_aiasset_name>/export/ script in Bonseyes 3DDFA Asset:

    python bonseyes_3ddfa_v2/export/ \
        --engine onnxruntime \
        --precision F32 \
        --algorithm-file bonseyes_3ddfa_v2/lpdnn/catalog/mobilenetv1-default-120x120-fp32/algorithm.yml \
        --challenge-file bonseyes_3ddfa_v2/lpdnn/challenge/challenge.yml \
        --deployment-file bonseyes_3ddfa_v2/lpdnn/deployment/deployment-file.yml \
        --deployment-package x86_64-ubuntu20_cuda \
        --output-dir build/3dface-landmarks-v1.0-mobilenetv1-120x120

    By running /<bonseyes_aiasset_name>/export/ you need to specify algorithm, challenge and deployment yaml files.

    More information about LPDNN’s YAML files you can be found in Create LPDNN’s file tree and about the available engines in LDPNN’s Inference engines.

  3. Exported models and additional files should be saved in the directory you specified with output-dir CLI argument.

All export

Use /<bonseyes_aiasset_name>/export/ to export to ONNXRuntime, TensorRT, torch2trt or to all engines with specified precision(s), backbone name, input sizes, ONNX opset version (optional) and enable DLA flag (optional).

In this link you can find an example of export of the Bonseyes YOLOX Asset. Also you can find export all script in AI Asset Container Generator.

Here is the example of running /<bonseyes_aiasset_name>/export/ script with all engines:

python -m bonseyes_yolox.export.all \
    --precisions fp32 fp16 \
    --input-sizes 640x640 \
    --engine all \
    --backbone yolox_s


Potential export issues and fixes:
  • If you have problem with exporting Pytorch to ONNX model, try changing opset version.

  • Note that you can only export TensorRT model with specific input size from existing ONNX model with the same input size (input size of the model will be written in exported ONNX models name).

  • You can set enable-dla CLI argument to True when calling /<bonseyes_aiasset_name>/export/ script or /<bonseyes_aiasset_name>/export/ on JetsonXavier AGX or JetsonXavier NX devices. This flag is enabling Deep Learning Accelerator and it can be used (stored to True) only on JetsonXavier AGX and JetsonXavier NX devices. On other devices or Server you shouldn’t set it to True.

  • Try changing workspace size when calling /<bonseyes_aiasset_name>/export/ script or /<bonseyes_aiasset_name>/export/ script to manage how much GPU memory is TensorRT using during export (this can be useful when you are working on edge devices that have low memory).

VI. Optimize

Bonseyes AI Assets provide optimisation methods such as Post-training Quantization (PTQ) and Quantization-aware Training (QAT) to reduce the memory footprint and improve the efficiency of DNNs. Quantization is a compression method that reduces the storage cost of a variable by employing reduced-numerical precision. This improves the arithmetic intensity of neural network inference by increasing the amount of computational work that can be performed for a given amount of memory traffic.

Post Training Quantization (PTQ)

Bonseyes AI Assets supports post-training quantization for both weights and activations. Weights can be directly quantized to 8-bit integer while the activations require a validation set to determine their dynamic range. PTQ methods usually applied layer fusion of the Bnorm layers by folding them back into the previous convolutions before quantizing the weights and activations, which might lead to small drops in accuracy in some cases.

Bonseyes AI Assets provide PTQ through two inference engines: ONNXRuntime and TensorRT. PTQ for GPU deployment can be applied on both TensorRT and ONNXRuntime, while PTQ for CPU deployment can only be applied with ONNXRuntime engine. Further, if a Pytorch model is the starting point, it is also possible to apply Post Training Quantization using torch2trt script directly.

Bonseyes optimization tools for ONNXRuntime, TensorRT and torch2trt Post Training Quantization can be found in the AI Asset Container Generator to optimize exported models (apply Post Training Quantization).

PTQ requires a callibration dataset to adjust the DNN’s activations’ range so as to calculate the activation’ scale and offset and retain a high amount of accuracy. Hence, the first step of PTQ is to implement a calibration_dataloader function in /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to provide data sample for int8 quantization (add default model path and validation images path as function arguments). Also in this script you can specify default input size and number of images, which are used for int8 calibration.

TensorRT PTQ

  1. Use INT8Calibrator calibrator class in /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to perform int8 post training quantization with TensorRT (you can specify number of images in main() when calling calibration_dataloader function and you can specify batch size when calling INT8Calibrator class also in main())

  2. Use /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to optimize TensorRT model to int8 precision. Specify ONNX fp32 model path when running this script.

In this link you can find example of script in BonseyesYOLOX.

Here is the example of running script:

python3 -m bonseyes_yolox.optimize.post_training_quantization.trt_quantize \
    --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \
    --output-dir /app/bonseyes_yolox/models/tensorrt/Tesla_T4/yolox_s/

or you can run /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script with engine tensorrt

python -m bonseyes_yolox.optimize.post_training_quantization.all \
    --engine tensorrt \
    --backbone yolox_s \
    --input-sizes 640x640
  1. Optimized TensorRT model using PTQ should be saved in /<bonseyes_aiasset_name>/models/tensorrt/{gpu_name}/{args.backbone} directory and should be named as one of the following models:


where v1.0 is version, yolox_s is backbone_name, default for optimized model coming from ONNX model, that in turn, is exported from official pretrained Pytorch model, 640x640 is input size, int8 precision and dla_enabled or dla_disabled is information about enabling DLA during optimization process.


  1. Make sure to have simplified and optimised the ONNX model by using the functions in /<bonseyes_aiasset_name>/export/

  2. Use DataReader calibrator class in /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to perform int8 post training quantization with ONNX (you can specify number of images in main() when calling DataReader class). Add default value of calibrate-dataset CLI argument to be path to validation dataset images directory.

  3. Use /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to optimize ONNX model to int8 precision.

In this link you can find example of in BonesyesYOLOX case.

Here is the example of running script:

python3 -m bonseyes_yolox.optimize.post_training_quantization.onnx_quantize \
    --input-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_fp32.onnx \
    --output-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_default_640x640_int8.onnx

or you can run /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script with engine onnxruntime

python -m bonseyes_yolox.optimize.post_training_quantization.all \
    --engine onnxruntime \
    --backbone yolox_s \
    --input-sizes 640x640
  1. Optimized ONNXRuntime model should be saved in /<bonseyes_aiasset_name>/models/onnx/{args.backbone}/ directory and should be named as:


    where v1.0 is version, yolox_s is backbone_name, default for optimized from ONNX model, which is exported from pretrained Pytorch model, 640x640 is input size and int8 is precision.

torch2trt PTQ

  1. Make sure you added torch2trt installation in Dockerfile.cuda for x86 and jetson and installed torch2trt.

To install torch2trt, you need to add

RUN cd /tmp && git clone \
    && cd torch2trt \
    && git checkout 0400b38123d01cc845364870bdf0a0044ea2b3b2 \
    && wget \
    && git apply 8b9fb46ddbe99c2ddf3f1ed148c97435cbeb8fd3.patch \
    && python3 install --user

in /<bonseyes_aiasset_name>/docker/platforms/x86_64/Dockerfile.cuda and /<bonseyes_aiasset_name>/docker/platforms/nvidia_jetson/Dockerfile.cuda after setup AI Asset.

Example of torch2trt installation in Dockerfile.cuda for x86 on Bonseyes YOLOX you can find on this link.

Example of this installation you can find in AI Asset Container Generator.

  1. Use calibration_dataloader function in /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to provide data sample for int8 quantization

  2. Use /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to optimize torch2trt model with int8 precision from Pytorch model. It is also possible to use the use-onnx CLI argument during calling this script where torch2trt converts Pytorch model to ONNX first, and then optimizes the resulted ONNX to TensorRT int8 model.

In this link you can find example of in BonesyesYOLOX case

Here is the example of running script:

python3 -m bonseyes_yolox.optimize.post_training_quantization.torch2trt_quantize \
    --pth-model /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth \
    --output-dir /app/bonseyes_yolox/models/torch2trt/Tesla_T4/yolox_s/ \
    --input-width 640 \
    --input-height 640

or you can run /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script with engine torch2trt

python -m bonseyes_yolox.optimize.post_training_quantization.all \
    --engine torch2trt \
    --backbone yolox_s \
    --input-sizes 640x640


Note that /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script optimizes 2 torch2trt models:

  1. Converting Pytorch to ONNX and then optimizing to TensorRT and

  2. Directly optimizing TensorRT model from Pytorch.

  1. Optimized torch2trt models should be saved in /<bonseyes_aiasset_name>/models/torch2trt/{gpu_name}/{args.backbone} directory.


After torch2trt PTQ, two torch2trt optimized models are saved in /<bonseyes_aiasset_name>/models/torch2trt/{gpu_name}/{args.backbone}, where one has .pth extension (Python) and another has .engine extension (tensorRT) and is used in C++. In the given example the optimised .pth model is imported for inference. More information about loading and inferencing torch2trt models can be found in AI Asset Container Generator

torch2trt models with .pth extension, which are directly optimized (PTQ) from Pytorch to TensorRT are named with one of the following names:


where v1.0 is version, yolox_s is backbone_name, default is that it is optimized from pretrained Pytorch model, 640x640 is input size, int8 precision and dla_enabled or dla_disabled is information that about enabling DLA during optimization process.

torch2trt models with .pth extension, which are exported from Pytorch to ONNX and then optimized from ONNX to TensorRT are named with one of the following names:


where v1.0 is version, yolox_s is backbone_name, default is that it is optimized from pretrained Pytorch model, 640x640 is input size, int8 precision, dla_enabled or dla_disabled, which signs that DLA is enabled or disabled during optimization process and _with_onnx is that model is exported from Pytorch to ONNX and then optimized from ONNX to TensorRT.


LPDNN supports Post Training Quantization for its inference engines (TensorRT, ONNXRuntime, NCNN and LNE).

Instructions for LPDNN Post Training Quantization can be found on Quantization workflow for LPDNN’s engines .


Use /<bonseyes_aiasset_name>/optimize/post_training_quantization/ to optimize TensorRT, ONNXRuntime, torch2trt or all models with specified input sizes, backbone name, calibration dataset and tag version. Also, make sure that you added validation image’s folder path as calibrate-dataset CLI argument.

In this link you can find PTQ example in BonseyesYOLOX.

Here is the example of running post_training_quantization/ script:

python -m bonseyes_yolox.optimize.post_training_quantization.all \
    --engine all \
    --backbone yolox_s \
    --input-sizes 640x640


Potential optimize issues and fixes:
  • If the quantization process is killed, there is a chance that too many images are used for optimization and device doesn’t have memory for this operation. Try changing images_num argument to lower number of images used when calling calibration_dataloader function in main part of /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script for TensorRT and change calibration_images_num argument while calling DataReader in main part of the /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script for ONNX. The lower the images_num the smaller will be AP (be careful to not set very small images number - it is recommended to set images_num argument from 100).

  • TensorRT optimized model is made from ONNX fp32 model, so if one wants to optimize TensorRT model with specific input size, ONNX fp32 model with that specific input size must already exist (if not then convert Pytorch model to ONNX fp32 model with that specific input size)

  • enable-dla CLI argument can be set to True when calling /<bonseyes_aiasset_name>/optimizepost_training_quantization/ script or /<bonseyes_aiasset_name>/optimize/post_training_quantization/ on JetsonXavier AGX or JetsonXavier NX. This flag is enabling Deep Learning Accelerator and it can be used (stored to True) only on JetsonXavier AGX and JetsonXavier NX devices. On other devices or Server you shouldn’t set it to True.

  • Try changing workspace size when calling /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script or /<bonseyes_aiasset_name>/optimize/post_training_quantization/ script to manage how much GPU memory is TensorRT using during quantization process (this can be useful when we are working on edge devices, which have low memory).

Quantization Aware Training (QAT)

PTQ might lead to a drop in accuracy when quantizing from fp32 to int8 formats. The goal of QAT is to recover the accuracy of the int8 models by fine-tuning the model weights while the quantization is performed.

In QAT, models are fine-tuned in Pytorch by simulating a quantization fordward pass, i.e., fake quantization, and updating the weights during the backward pass. Thereby, the model is re-trained, increasing the precision of the fake quantized model. After fine-tuning Pytorch models need to be exported to fake quantized ONNX models and finally explicit quantization needs to be applied from fake quantized ONNX to int8 using TensorRT functions.

QAT Tools

The following tools are used for QAT:

  1. Bonseyes optimization tool to calibrate data for QAT. This tool can be found in the container generator

  2. The pytorch_quantization package is used for QAT within the PyTorch training framework:

    • pytorch_quantization package is used for QAT process and export fake quantized Pytorch modelº.

    • For pytorch_quantization installation you need to have installed pytorch==1.10 and torchvision==0.11

    • We need to install pytorch_quantizaton 2.1.2 and for x86 you need to add pytorch_quantization installation by adding

      RUN cd /tmp && \
          gdown && \
          python3 -m pip install prettytable==3.2.0 pytorch_quantization-2.1.2-cp38-cp38-linux_x86_64.whl sphinx-glpi-theme==0.3 wcwidth==0.2.5  && \
          sudo rm -rf /tmp/* ; \

      in /<bonseyes_aiasset_name>/docker/platforms/x86_64/Dockerfile.cuda after setup AI Asset

      Example of pytorch_quantization installation in Dockerfile.cuda for x86 on YOLOX you can find on this link. The example of pytorch_quantization installation you can also find in container generator

    • On jetson devices it is not possible to install pytorch_quantization package since this package is only supported on x86. On jetson devices, it is only possible to start from the following step.

  3. and scripts to convert fake-quantized Pytorch models to ONNX model and TensorRT models, i.e., explicit quantization, after the QAT process (** qat flag needs to be set**).

QAT process

  1. Change the training code

    • Change config file for running training (add qat bool flag in config, which will be sent to training code). Here you can see config file example for Quantization Aware Training, as you can see there is qat flag enabled:

        loader-workers: 4
        gpu_num: 4
        fp16: True #Addopting mix precision training
        qat: True
        resume: False #resume training
        cache: False #caching imgs to RAM for fast training
        occupy: False #occupy GPU memory first for training
        experiment-name: yolox_s #experiment name
        name: 'yolox-s'
        dist-backend: 'nccl' #distributed backend
        dist-url: 'auto' #url used to set up distributed training
        batch-size: 16
        devices: 1 #number of GPUs 8 in their example for training
        exp_file: None #experiment description file
        ckpt: /app/bonseyes_yolox/models/pytorch/yolox_s/v1.0_yolox_s_default_640x640_fp32.pth # checkpoint file
        start_epoch: None #resume training start epoch
        num_machines: 1 #num of node for training
        machine_rank: 0 #node rank for multi-node training
        logger: tensorboard #local rank for dist training
        output: "/app/source/yolox/YOLOX_outputs/yolox_s/train_log.txt"

      In this link you can find YOLOX config example for QAT.

      You can also find train/ script, which runs AI Asset training script.

    • First, you need to add QAT case in your training code (add qat argument in training CLI, which will be set to True if qat flag in config is True).

    • Before loading model initialize quant modules by adding:

      from pytorch_quantization import quant_modules

      which signals Pytorch to use fake quantized layers instead of default layers (for example it uses QuantConv2D layer instead of Conv2D layer), which simulates quantization forward pass.


      Using quant_modules.initialize() we apply automatic fake quantization on layers. If you want only custom layers to be fake quantized, you can use QuantDescriptor and define which layers should be fake quantized. Here is example how to add custom fake quantized layer (in this case Conv2D and QuantMaxPool2D):

      from pytorch_quantization import nn as quant_nn
      from pytorch_quantization.tensor_quant import QuantDescriptor
      quant_desc_input = QuantDescriptor(calib_method=calibrator)
    • Get model and then load pretrained models state dict.

    • If you are working with Pytorch model, calibrate loaded model by importing /<bonseyes_aiasset_name>/optimize/quantization_aware_training/ to your training code and apply calibrate_model function to your model in training code. Use training dataset to calibrate model.

      from bonseyes_yolox.optimize.quantization_aware_training.calibrate_data import calibrate_model
      if self.args.qat and not self.calibrated:
          # Calibrate the model using max calibration technique.
          with torch.no_grad():
                  hist_percentile=[99.9, 99.99, 99.999, 99.9999],
          self.calibrated = True
    • Fine tune model (the rest of the code for training is the same as code for default training) with lower learning rate, lower number of iterations and low number of epochs (add case with qat flag for hyperparameters set up).

    • After fine tuning is done, save fine-tuned model in /<bonseyes_aiasset_name>/models/pytorch_qat/<backbone_name>/ directory, name of fine-tuned model differs from pretrained model without QAT by replacing default word in Pytorch model with qat Example for running training script:

      python3 -m bonseyes_yolox.train --config /app/bonseyes_yolox/train/configs/v1.0_yolox_s_qat_640x640_fp32_config.yml

    In this link you can see modified source training code of YOLOX with added QAT feature in it.

  2. Use /<bonseyes_aiasset_name>/export/ to export fine tuned QAT Pytorch model with fake quantized layers to ONNX with defined input size (width and height specified as CLI input arguments). ONNX model should have Quantize and Dequantize Layers added in it. torch2onnx script with --qat flag exports fake quantized Pytorch model to ONNX model with Quantize and Dequantize layers. You should use opset 13 for applying export to ONNX with fake quantized layers. You can also use <bonseyes_aiasset_name>/export/ with argument --qat to export Pytorch model to fake quantized ONNX. You can run script directly:

    python -m bonseyes_yolox.export.torch2onnx \
        --model-input /app/bonseyes_yolox/models/pytorch_qat/yolox_s/v1.0_yolox_s_qat_640x640_fp32.pth \
        --model-output /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_qat_640x640_fp32.onnx \
        --input-width 640 \
        --input-height 640 \

    or you can run it using export/ script:

    python -m bonseyes_yolox.export.all \
        --precisions fp32 \
        --input-sizes 640x640 \
        --engine onnxruntime \
        --backbone yolox_s \
  3. Use /<bonseyes_aiasset_name>/export/ to apply explicit quantization from fake quantized ONNX fp32 model to TensorRT model with int8 precision. Specify fake quantized ONNX fp32 model path when running this script. Set qat CLI argument to True and precision to int8 to apply explicit quantization:

    python -m bonseyes_yolox.export.onnx2trt \
        --onnx-model /app/bonseyes_yolox/models/onnx/yolox_s/v1.0_yolox_s_qat_640x640_fp32.onnx \
        --output-dir /app/bonseyes_yolox/models/tensorrt/{gpu_name}/{args.backbone} \
        --precision int8 \

    or you can run it using export/ script:

    python -m bonseyes_yolox.export.all \
        --precisions int8 \
        --input-sizes 640x640 \
        --engine tensorrt \
        --backbone yolox_s \


    For explicit quantization you don’t need calibration dataset since calibration is applied in Quantization Aware Training process.

    In this link you can find BonseyesYOLOX export folder where you can see examples of, and scripts.

  4. Apply benchmark on TensorRT explicit quantized model after QAT process:

    python3 -m bonseyes_yolox.benchmark.all \
        --input-size 640x640 \
        --dataset all \
        --device gpu \
        --backbone yolox_s \
        --engine tensorrt

    Note that /<bonseyes_aiasset_name>/benchmark/ script is not applied on ONNX qat model because it is only used for explicit quantization. New version with this addition you can find on template generator

  5. After applying benchmark you should add QAT models in /<bonseyes_aiasset_name>/benchmark/ script. You can find example for adding QAT models in graph in YOLOX

Pytorch model after Quantization Aware Training should be saved in /<bonseyes_aiasset_name>/models/pytorch_qat/<backbone_name>/ directory and the name of Pytorch QAT model should be as following:


where v1.0 refers to tag version, yolox_s is backbone name, flag qat is for fine-tuned model after Quantization Aware Training, 640x640 is training input size and fp32 is model precision.

Exported ONNX model from Pytorch QAT model should be stored in /<bonseyes_aiasset_name>/models/onnx/<backbone_name>/ directory and the name of this ONNX model should be as following:


TensorRT int8 model after applying explicit quantization from the fake quantized ONNX model should be stored in /<bonseyes_aiasset_name>/models/tensorrt/<GPU_name>/<backbone_name>/ directory and the name of this TensorRT model should be as following:



Potential QAT problems and fixes:
  • Number of iterations should be very low (for example 20 iterations), learning rate should be very low (around 1% or lower of default learning rate for training), number of epochs should also be very low (couple of epochs). Choose the best hyperparameter values experimentally.

  • You can do only Pytorch to ONNX export with opset 13 because lower opset versions doesn’t support fake quantized layers.

  • After export to ONNX, check ONNX model in Netron. You should see that Quantize and Dequantize layers are added to the model.

  • You need calibration dataset, which is training dataset, only when you are training model with Quantization Aware Training. When you are applying export from Pytorch to ONNX or from ONNX to TensorRT you don’t need calibration dataset since it is applying explicit quantization.

  • Pytorch and ONNX models after Quantization Aware Training have fp32 precision, but TensorRT has int8 precision. When applying explicit quantization TensorRT only applies quantization and layer fusion on layer blocks, which are between Quantize and Dequantize layers in ONNX model.

  • We are only using TensorRT quantized model for benchmarking since other models have fp32 precision and ONNX model has additional layers, which decrease model inference.

  • After whole process is successfully done, compare PTQ and QAT TensorRT int8 models precision and inference time (TensorRT QAT int8 model should have higher precision than PTQ TensorRT int8 model).

VII. Process

Bonseyes AI Assets provide tools to process, i.e., infer, an AI model taking input data in several formats (input file, video, or camera stream and HTTP worker), using all available inference engines (Pytorch, ONNXRuntime, TensorRT and LPDNN) with all available precisions (fp32, fp16 and int8).

Besides, if a Pytorch model is the starting point, it is possible to apply process using torch2trt inference engine with fp32, fp16 and int8 precisions. Devices that have Nvidia GPU and CPU support can process an input with all inference engines, while those devices only featuring a CPU can process the input with ONNXRuntime and LPDNN inference engine.

Bonseyes process tools for image, video, camera and client-server, including LPDNN process can be found in the AI Asset Container Generator.

Next, we describe the arguments that need to be used for pytorch, onnxruntime and tensorrt standalone engines. For LPDNN processing, refer to LPDNN process.

Image Process

The image-based process takes an input file and infers the AI model on it.

  1. The image processing script in /<bonseyes_aiasset_name>/process/ is used to process an input image. This script loads image or image folder, which need to be in .jpg format, instantiates an Algorithm class to process and render the image and finally outputs to a json file.

    In this link you can find example of process image script in Bonseyes Openpifpaf Wholebody.

    Here is the example of running script:

    # user@docker:/app$
    python -m bonseyes_openpifpaf_wholebody.process.image \
      --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \
      --input-size 641x641 \
      --engine pytorch \
      --jpg-input /app/bonseyes_openpifpaf_wholebody/process/demo/samples/image/test/demo_image_1.jpg \
      --jpg-output /app/ \
      --json-output /app/ \
      --logo \
      --device gpu
  2. Processed jpg image is saved in file or directory, which is defined with jpg-output CLI argument. If jbg-output is path to directory where processed image needs to be saved, the name of the processed image is the same as the name of the input image with processed_ prefix added to original image name. For example if jpg-output is path to directory and input image is traffic.jpg, processed image will be saved as processed_traffic.jpg in specified directory.

  3. json file with image predictions will be saved in file or directory, which is defined with json-output CLI argument. If json-output is path to directory where processed image needs to be saved, the name of the json file is the same as the name of processed image. For example if json-output is path to directory and input image is traffic.jpg, json file will be saved as processed_traffic.json in specified directory with json-output CLI argument.


Image has to be in .jpg format.


If selected docker image does not have CUDA support, replace --device gpu with --device cpu

Video Process

The video-based process takes a video as input and infers the AI model on it.

  1. The video processing script in /<bonseyes_aiasset_name>/process/ is used to process an input video. This script loads video file, which needs to be in .mp4 format, instantiates an Algorithm class to process and render the video and finally outputs to a json and csv files.

    In this link you can find example of process video script in Bonseyes Openpifpaf Wholebody.

    Here is the example of running script:

    # user@docker:/app$
    python -m \
      --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \
      --input-size 640x480 \
      --engine pytorch \
      --video-input /app/bonseyes_openpifpaf_wholebody/process/demo/samples/video/test/demo_video_1.mp4 \
      --video-output /app/ \
      --json-output /app/ \
      --csv-output /app/ \
      --logo \
      --debug-info \
      --device gpu
  2. Processed .mp4 video is saved in file or directory, which is defined with video-output CLI argument. If video-output is path to directory where processed video needs to be saved, the name of the processed video is the same as the name of the input video with processed_ prefix added to original image name. For example if video-output is path to directory and input video is test.mp4, processed video will be saved as processed_test.mp4 in specified directory.

  3. csv and json files with video predictions will be saved in files or directories, which are defined with csv-output and json-output CLI arguments. If json-output and csv-output are paths to directories, the name of the json and csv files is the same as the name of processed video. For example if input video is test.mp4, json file will be saved as processed_test.json in specified directory with json-output CLI argument and csv file will be saved as processed_test.csv in specified directory with csv-output CLI argument.


Video has to be in .mp4 format.


If selected docker image does not have CUDA support, replace --device gpu with --device cpu

Camera Process

Camera-based processing records from your camera and infers the AI model on the frames during recording. When running /<bonseyes_aiasset_name>/process/ script, the window with camera recording will be opened with algorithm predictions rendered in it. After recording is stopped (by pressing q), recorded rendered video after recording is saved to .mp4 file and output results are saved to .csv and .json files.

  1. The camera processing script in /<bonseyes_aiasset_name>/process/ is used to process an camera-based records. This script opens the window with camera recording, instantiates an Algorithm class to process and render the frames during recording and finally outputs to a json and csv files.

    In this link you can find example of process camera script in Bonseyes Openpifpaf Wholebody.

    Here is the example of running script:

    # user@docker:/app$
    python -m \
      --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \
      --input-size 320x320 \
      --engine pytorch \
      --video-output /app/recording.mp4 \
      --json-output /app/recording_predictions.json \
      --csv-output /app/recording_predictions.csv \
      --logo \
      --debug-info \
      --device gpu
  2. Processed .mp4 video is saved in file defined with video-output CLI argument.

  3. csv and json files with video predictions will be saved in files, which are defined with csv-output and json-output CLI arguments.

HTTP Worker Process

With HTTP worker-based processing (server), the input is sent from a remote client to HTTP server, which processes the input and returns model predictions to the client.

  1. HTTP worker-based processing script in /<bonseyes_aiasset_name>/process/ takes an input image or folder with images from the client, performs inference on them and returns predictionsto the client.

    In this link you can find example of process HTTP worker script in Bonseyes Openpifpaf Wholebody.

  2. To run HTTP worker process you need to:

    • Run Docker container on host with specified ports. Example of running Docker container for Bonseyes YOLOX HTTP Worker process is following:

      docker run --name bonseyes_openpifpaf_wholebody \
          --privileged --rm -it \
          --gpus 0 \
          --ipc=host \
          -p 8888:8888 \
          -v /tmp/.X11-unix:/tmp/.X11-unix \
          --device /dev/video0 \
          -e DISPLAY=$DISPLAY \

    • In executed container run /<bonseyes_aiasset_name>/process/ script with specified model, inference engine, input shape, port and device as CLI arguments.

      Example of running /<bonseyes_aiasset_name>/process/ in container is following:

      # user@docker:/app$
      python -m bonseyes_openpifpaf_wholebody.process.server \
        --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v2.0_shufflenetv2k30_default_641x641_fp32.pkl \
        --input-size 641x641 \
        --engine pytorch \
        --port 8888
    • Send image or image folder from client to server to process them. Here are examples of sending request from client to server:

      If you are sending request out of the container, run:

          curl --request POST \
              --data-binary @/path/to/image.jpg \

      for example

      curl --request POST \
          --data-binary @bonseyes_openpifpaf_wholebody/process/demo/samples/image/test/demo_image_1.jpg \
  3. After sending request to server (by running command above), image or folder with images is processed in server and string of jsonified predictions is returned to client.

LPDNN process

To process images, videos or camera steams with LPDNN, different arguments from those used for pytorch, tensorrt or, onnxruntime need to be passed to the processing scripts. The AI App config json file needs to be specified, which defines implicitly the underlying inference engine to used within LPDNN, i.e., lne, onnxruntime, ncnn or tensorrt.

Image process

The image-based process takes an input file and AI App config json file and infers the LPDNN engine on it.

  1. The image processing script in /<bonseyes_aiasset_name>/process/ is used to process an input image. This script loads image or image folder, which needs to be in .jpg format, executes LPDNN using HTTP Worker (instantiates LPDNNAlgorithm class) to process and render the image and finally outputs to a json file.

    In this link you can find example of LPDNN process image script in Bonseyes 3DDFA.

    By running process image script you need to specify --engine CLI argument to be ‘lpdnn’, --app-config path to the aiapp-config.json file, --deployment-package depending on the platform and --port on which the http-worker is running.

    Here is the example of running script:

    python -m bonseyes_3ddfa_v2.process.image \
       --engine lpdnn \
       --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \
       --deployment-package x86_64-ubuntu20_cuda \
       --port 8889 \
       --jpg-output /app/test.jpg \
       --json-output /app/test.json \
       --jpg-input /app/test.jpg
  2. Processed jpg image is saved in file or directory, which is defined with jpg-output CLI argument. If jbg-output is path to directory where processed image needs to be saved, the name of the processed image is the same as the name of the input image with processed_ prefix added to original image name. For example if jpg-output is path to directory and input image is traffic.jpg, processed image will be saved as processed_traffic.jpg in specified directory.

  3. json file with image predictions will be saved in file or directory, which is defined with json-output CLI argument. If json-output is path to directory where processed image needs to be saved, the name of the json file is the same as the name of processed image. For example if json-output is path to directory and input image is traffic.jpg, json file will be saved as processed_traffic.json in specified directory with json-output CLI argument.


Image has to be in .jpg format.


If selected docker image does not have CUDA support, replace --device gpu with --device cpu

Video Process

The video-based process takes a video and AI App config json file as inputs and infers the LPDNN engine on it.

  1. The video processing script in /<bonseyes_aiasset_name>/process/ is used to process an input video. This script loads video file, which needs to be in .mp4 format, executes LPDNN using HTTP Worker (instantiates LPDNNAlgorithm class) to process and render the video and finally outputs to a json and csv files.

    In this link you can find example of process video script in Bonseyes 3DDFA.

    By running process video script you need to specify --engine CLI argument to be ‘lpdnn’, --app-config path to the aiapp-config.json file, --deployment-package depending on the platform and --port on which the http-worker is running.

    Here is the example of running script:

    python -m \
        --engine lpdnn \
        --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \
        --deployment-package x86_64-ubuntu20_cuda \
        --port 8889 \
        --video-input /app/demo_video_1.mp4 \
        --video-output /app/prediction.mp4 \
        --json-output /app/prediction.json \
        --csv-output /app/prediction.csv
  2. Processed .mp4 video is saved in file or directory, which is defined with video-output CLI argument. If video-output is path to directory where processed video needs to be saved, the name of the processed video is the same as the name of the input video with processed_ prefix added to original image name. For example if video-output is path to directory and input video is test.mp4, processed video will be saved as processed_test.mp4 in specified directory.

  3. csv and json files with video predictions will be saved in files or directories, which are defined with csv-output and json-output CLI arguments. If json-output and csv-output are paths to directories, the name of the json and csv files is the same as the name of processed video. For example if input video is test.mp4, json file will be saved as processed_test.json in specified directory with json-output CLI argument and csv file will be saved as processed_test.csv in specified directory with csv-output CLI argument.


Video has to be in .mp4 format.


If selected docker image does not have CUDA support, replace --device gpu with --device cpu

Camera Process

Camera-based processing records from your camera and infers the LPDNN engine on the frames during recording. When running /<bonseyes_aiasset_name>/process/ script, the window with camera recording will be opened with LPDNNAlgorithm predictions rendered in it. After recording is stopped (by pressing q), recorded rendered video after recording is saved to .mp4 file and output results are saved to .csv and .json files.

  1. The camera processing script in /<bonseyes_aiasset_name>/process/ is used to process an camera-based records. This script opens the window with camera recording, executes LPDNN using HTTP Worker (instantiates LPDNNAlgorithm class) to process and render the frames during recording and finally outputs to a json and csv files.

    In this link you can find example of process camera script in Bonseyes 3DDFA.

    By running process camera script you need to specify --engine CLI argument to be ‘lpdnn’, --app-config path to the aiapp-config.json file, --deployment-package depending on the platform and --port on which the http-worker is running.

    Here is the example of running script:

    python -m \
       --engine lpdnn \
       --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \
       --deployment-package x86_64-ubuntu20_cuda \
       --port 8889 \
       --video-output /app/prediction.mp4 \
       --json-output /app/prediction.json \
       --csv-output /app/prediction.csv
  2. Processed .mp4 video is saved in file defined with video-output CLI argument.

  3. csv and json files with video predictions will be saved in files, which are defined with csv-output and json-output CLI arguments.


Potential process issues and fixes:
  • If you cannot run demo camera in your container, make sure that you have added --device /dev/video0 in docker run command to acces camera.

  • Possible issue with running camera in your container can be that you don’t have permission to /dev/ folder. In order to fix it run sudo chown -R user:user /dev/video0 in container, which enables access to camera.

  • If you have problem with sending request to server, check docker container port and make sure that you use the same port in curl request.

  • Make sure that input image used in /<bonseyes_aiasset_name>/process/ and image sent to server is in .jpg format and video used in /<bonseyes_aiasset_name>/process/ is in .mp4 format.

  • When root user in container, there is problem with permissions of mounted files/ directories/ devices. That created problems when running camera on Jetson Nano board (problematic connection of xserver so that app can display window with rendered output) and you got the error.

    No protocol specified
    Unable to init server: Could not connect: Connection refused

    then you should execute xhost local:root on board outside the docker container.

VIII. Benchmark

Bonseyes AI Assets provide benchmark tools for Pytorch, ONNX, TensorRT (and potentially torch2trt) inference engines evaluation on multiple input sizes. Benchmark tool is running /<bonseyes_aiasset_name>/algorithm/ for specified inference engine on evaluation dataset and then calculates statistics. Accuracy results with calculated preprocessing, inference and postprocessing time, latency, model statistics and hardware statistics (CPU and GPU memory and temperature, power consumption and energy efficiency) are stored in csv and json files.

Hardware statistics are calculated using /<bonseyes_aiasset_name>/utils/ which can be also found in AI Asset Container Generator.

Also we provide tool for graph generation based on benchmark results (csv file) which will be stored in .jpg graphic.

Bonseyes AI Asset benchmark tools you can find in AI Asset Container Generator.

Also, examples of benchmark code in Bonseyes Openpifpaf Wholebody AI Asset you can find in this link.

Models benchmark

  1. Implement Bonseyes benchmark script in /<bonseyes_aiasset_name>/benchmark/ which instantiates Algorithm class for certain inference engine and runs benchmark function. benchmark function is running model evaluation on validation dataset and computes hardware and accuracy statistics which need to be stored in results dictionary which is the output of this function. Implement benchmark function in this script:

    • Apply Algorithm process on every image from dataloader (load images from validation dataset)

    • Calculate average preprocessing, inference, postprocessing and processing time

    • Calculate accuracy statistics (for example AP, APM, APL, AR, ARM, ARL) and evaluation time and add them to accuracy_stats dictionary

    • Calculate model statistics from /<bonseyes_aiasset_name>/benchmark/ and add GFLOPs and #PARAMS to result dictionary

    • Calculate hardware statistics from HardwareStatusMeter class from /<bonseyes_aiasset_name>/utils/

    • Store all calculations in result, hw_stats and accuracy_stats dictionaries

    • Merge result, hw_stats and accuracy_stats dictionaries in one dictionary and return it as function output

    In this link you can find example of /<bonseyes_aiasset_name>/benchmark/ in Bonseyes Openpifpaf Wholebody AI Asset.

    Here is the example of running /<bonseyes_aiasset_name>/benchmark/ script in Bonseyes Openpifpaf Wholebody AI Asset:

    python -m bonseyes_openpifpaf_wholebody.benchmark \
      --model /app/bonseyes_openpifpaf_wholebody/models/pytorch/shufflenetv2k30/v3.0_shufflenetv2k30_default_641x641_fp32.pkl \
      --engine pytorch \
      --input-size 641x641 \
      --preprocess-with torchvision \
      --force-complete-pose \
      --seed-threshold 0.2

    Benchmark results (result.csv and result.json) should be saved in directory specified with result-directory CLI argument.

  2. Use /<bonseyes_aiasset_name>/benchmark/ to benchmark Pytorch, ONNX, TensorRT (and potentially torch2trt inference engine) or all engines with different precisions (fp32, fp16 and int8). The benchmark results will be stored in result.json and result.csv files. Specify possible backbone names as options in backbone CLI argument and dataset path in main() of this script.

    In this link you can find example of /<bonseyes_aiasset_name>/benchmark/ in Bonseyes Openpifpaf Wholebody AI Asset.

    Here is the example of running /<bonseyes_aiasset_name>/benchmark/ script in Bonseyes Openpifpaf Wholebody AI Asset:

    python -m bonseyes_openpifpaf_wholebody.benchmark.all \
        --input-sizes 28x72 256x192 512x512 \
        --device gpu \
        --backbone shufflenetv2k30 \
        --dataset wholebody

LPDNN benchmark

Benchmark tool is running LPDNNAlgorithm class on specified LPDNN inference engine on evaluation dataset and then calculates statistics.

In this link you can find example of running the benchmark with LPDNN engine in 3DDFA:

python -m bonseyes_3ddfa_v2.benchmark.evaluate \
   --dataset aflw2000-3d \
   --engine lpdnn \
   --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \
   --deployment-package x86_64-ubuntu20_cuda \
   --port 8889 \
   --input-size 120 \
   --model build/3dface-landmarks-v1.0-mobilenetv1-120x120/model.onnx

In this link you can find example of running benchmark.all script in 3DDFA:

python -m bonseyes_3ddfa_v2.benchmark.all \
   --dataset aflw2000-3d \
   --engine all \
   --input-sizes 120 \
   --app-config build/3dface-landmarks-v1.0-mobilenetv1-120x120/ai_app_config.json \
   --deployment-package x86_64-ubuntu20_cuda \
   --port 8889

Generate graphs

Use Bonseyes AI Asset plot tool in /<bonseyes_aiasset_name>/benchmark/ to generate graphs from benchmark csv results. This function takes csv file specified with csv-path CLI argument as input and generates storage, accuracy, performance and resource consumption graphs. This graph will be saved in directory specified with output-path CLI argument with graph.jpg name.

In this link you can find /<bonseyes_aiasset_name>/benchmark/ in AI Asset Container Generator.

You can also find the example of /<bonseyes_aiasset_name>/benchmark/ script in Bonseyes Openpifpaf Wholebody AI Asset.

Here is the example of running /<bonseyes_aiasset_name>/benchmark/ script in Bonseyes Openpifpaf Wholebody AI Asset:

python -m bonseyes_openpifpaf_wholebody.benchmark.generate_graphs \
    --csv-path /app/result.csv \
    --output-path /app/


Potential benchmark issues and fixes:
  • If you are running benchmark in container on Jetson device and hardware statistics are all 0, make sure that you have mount /run/jtop.sock when running your container. So, in docker run command for running container with jetpack image -v /run/jtop.sock:/run/jtop.sock needs to be added

IX. Utils

Bonseyes AI Assets provide tools for calculating hardware and environment information (GPU and CPU memory, power, model storage, GPU and CPU temperature, environment, code version, git branch and git commit hash).

<bonseyes_aiasset_name>/utils package contains following scripts:

  1. environment_info

  2. gstreamer_pipelines

  3. hardware_info

  4. meter

All utils scripts mentioned above can be found in AI Asset Container Generator.

In this link can be found utils scripts in Bonseyes Openpifpaf Wholebody AI Asset.


This script gives contains EnvironmentInformation class which collects following information:

  1. system libraries information - cmake, gcc, cuda and python versions

  2. python libraries information - gets python package versions (for example onnx, onnxruntime, numpy, scipy, cython, pandas, torch, torchvision, numba)

  3. code version - gets git branch and commit hash

This script is imported in <bonseyes_aiasset_name>/benchmark/ script and it is written in the beginning of the graph jpg file.

Here is the link of ``<bonseyes_aiasset_name>/utils/ script in Bonseyes Openpifpaf Wholebody AI Asset.


<bonseyes_aiasset_name>/utils/ script contains commands for capturing and sincing video frames for all x86_64, NVIDIA Jetson devices and RaspberryPi.

It is imported and used in <bonseyes_aiasset_name>/process/ and <bonseyes_aiasset_name>/process/ scripts.

Here is the example of this script in Bonseyes Openpifpaf Wholebody AI Asset.


<bonseyes_aiasset_name>/utils/ contains HardwareStatusMeter class, which is used to calculate hardware statistics on GPU and CPU during execution on x86_64, Jetson devices or RaspberryPi4. This class detects environment and collects GPU and CPU memory, power, model storage, GPU and CPU temperature.

This script is used in <bonseyes_aiasset_name>/utils/ and is also used in <bonseyes_aiasset_name>/benchmark/ script which uses hardware information and stores it to csv file.

Here is the link of the <bonseyes_aiasset_name>/utils/ script in Bonseyes Openpifpaf Wholebody AI Asset.


<bonseyes_aiasset_name>/utils/ contains Hardware Information class which initializes HardwareStatusMeter class from script and calculates following informations:

  1. GPU information - GPU model name, GPU number, drive rversion and CUDA version

  2. CPU information - CPU architecture, model name, vendor and CPU number

  3. memory information

This script is used in <bonseyes_aiasset_name>/benchmark/ where collected informations are written at the beginning of the graph.

Here is the link of the <bonseyes_aiasset_name>/utils/ script in Bonseyes Openpifpaf Wholebody AI Asset.

X. Testing

Implement automatic tests for all interfaces in /interface/tests/ scripts. Interface scripts are used to call <bonseyes_aiasset_name> modules and they are executed during testing and running AI Asset CLI.

  1. Implement and use /interface/ script, which executes <bonseyes_aiasset_name>.export.all with specified CLI arguments

  2. Implement and use /interface/ script, which executes <bonseyes_aiasset_name>.optimize.post_training_quantization.all with specified CLI arguments

  3. Implement and use /interface/ script, which executes <bonseyes_aiasset_name>.process.image with specified CLI arguments

  4. Implement and use /interface/ script, which executes <bonseyes_aiasset_name> with specified CLI arguments

  5. Implement and use /interface/ script, which executes <bonseyes_aiasset_name> with specified CLI arguments

  6. Implement and use /interface/ script, which executes <bonseyes_aiasset_name>.process.server with specified CLI arguments

  7. Implement and use /interface/ script which, executes <bonseyes_aiasset_name>.benchmark.all with specified CLI arguments

  8. Implement and use /interface/ script, which executes <bonseyes_aiasset_name>.train with specified CLI arguments and training and validation dataset

  9. Add test image in /interface/tests/samples/image/ directory, which will be used while executing process image in test script

  10. Add test video in /interface/tests/samples/video/ directory, which will be used while executing process video in test script


Note that /interface/, /interface/ and /interface/ are not executed in tests. They are only executed using AI Asset CLI.

/interface/tests/ scripts are executing interface scripts on GPU or CPU (export, optimize, process image, process video and benchmark). Interface scripts are executed on different engines and different precisions. Tests on CPU can only be executed with Pytorch and ONNX models, while tests on GPU can be executed with Pytorch, ONNX and TensorRT models.

  1. Implement CPU test cases in /interface/tests/

  2. Implement GPU test cases in /interface/tests/

You can run tests on GPU in your container by executing pytest -k gpu command in container. Tests on CPU in container can be run by executing pytest -k cpu command in container.

Uncomment test stage in .gitlab-ci.yml file when all /interface/ and /interface/tests/ scripts are implemented


Potential test issues and fixes:
  • TensorRT requires CUDA, so you can’t run TensorRT export, optimize and benchmark on TensorRT models on CPU (those commands mustn’t be added in /interface/tests/ script)

XI. AI Asset CLI Integration

AI Asset CLI runs /interface/ scripts explained in Testing section. Interface scripts, which are used in CLI and not used in test scripts, are /interface/, /interface/ and /interface/

XII. Documentation

Use Bonseyes documentation template stored in /doc to explain all implemented components.

Store demo image, video and benchmark results in following directories:

  1. Store demo images and processed demo images in /doc/examples/example_images/ directory

  2. Store demo video and processed demo video in /doc/examples/example_videos/ directory

  3. Store benchmark.csv, benchmark.json and graph.jpg in:

    • /doc/eval_results/Server/ directory for server benchmark results

    • /doc/eval_results/NVIDIA-Jetson-AGX directory for JetsonXavier AGX benchmark results

    • /doc/eval_results/NVIDIA-Jetson-NX directory for JetsonXavier NX benchmark results

Implement following .rst scripts:

  1. Implement and use /doc/paper.rst to add Official repository’s paper reference, abstract and links to official git repository, git branch and commit used in source of AI Asset

  2. Implement and use /doc/usage.rst and add following sections:

    • Installation - add docker pull and docker run commands for all platforms

    • Data - add paths to data or execution commands for automatic download data

    • Export - add export.all execution on CPU and GPU

    • Optimize - add post_training_quantization.all execution on CPU and GPU and suggested input sizes

    • Process - add process.image execution, add processed image and predictions from json file. For video add execution command and add processed video gif from /doc/examples/examples_videos/ directory. For camera add execution command

    • Benchmark - add single model benchmark and benchmark all execution command and copy json file with results for single model benchmark.

  3. Implement and use /doc/install.rst and add Bonseyes AI Asset installation for target device. In Workstation/Server (x86_64), NVIDIA Jetson devices, RaspberryPi4 devices sections add:

    • System Requirements

    • Docker section - profiles for certain devices, docker build and docker run container commands and AI Asset setup in Dockerfile

  4. Implement and use /doc/models.rst and add paths to pretrained Pytorch models and their model summaries on multiple input sizes

  5. Implement and use /doc/train.rst and add section for train and validation dataset and section for executing training with CLI

  6. Implement and use /doc/optimize.rst and add section for validation data and sections for executing ONNX, TensorRT quantization and quantization of all inference engines

  7. Implement and use doc/export.rst and add sections for executing /<bonseyes_aiasset_name>/export/, /<bonseyes_aiasset_name>/export/ and /<bonseyes_aiasset_name>/export/ scripts

  8. Implement and use doc/process.rst and add sections for executing /<bonseyes_aiasset_name>/process/, /<bonseyes_aiasset_name>/process/, /<bonseyes_aiasset_name>/process/ and /<bonseyes_aiasset_name>/process/ scripts

  9. Implement and use doc/eval.rst and add sections:

    • Reproduce Published Results - add path where data is stored or how to execute automatic data download script and add execution command for running eval script of Official code if it exists. Copy evaluation result in this section

    • Single Model Benchmark - executing <bonseyes_aiasset_name>.benchmark and copy evaluation result in this section

    • Benchmark of All Models - executing <bonseyes_aiasset_name>.benchmark.all

    • Sample Processed Images - add sample demo images and processed demo images

  10. Implement and use doc/benchmark.rst and upload benchmark.csv and graph.jpg files from examples folder for x86_64, NVIDIA Jetson Xavier AGX, NVIDIA Jetson Xavier NX and RaspberryPi4

You can update documenation by running:

cd doc
rm -rf _build && make html

To view rendered HTML docs open /doc/_build/html/index.html


Potential documentation issues and fixes:
  • If there is error called No module named for some library, add the library name to autodoc_mock_imports list in /doc/