Description
-----------

An AI Asset enables accelerated deployment of deep learning models to resource constrained low power embedded systems (Deep Edge).
The provided workflows deliver powerful and easy-to-deploy building blocks for creating complex AI models that can be deployed on cyber-physical systems.

By taking care of many of end-to-end tooling dependencies and providing standardized interfaces, Bonseyes AI Asset enable users to focus on producing optimal solutions while allowing faster feedback during the implementation of end user requirements.
The goal is to facilitate easier deployment to the deep edge with the Bonseyes AI Marketplace.

Requirements
------------

Hardware requirements
=====================
In order to utilize maximum potential of AI Assets - specially for training - it is required to have NVIDIA Graphic Card (``GTX1060`` and newer)
with CUDA support on x86_64 environments. Nonetheless, AI Assets can also be run using Intel/AMD CPUs.

We also provide support for Nvidia Jetsons devices as well as for platforms with arm64v8 achitectures. The support of these devices allows the user to evaluate any given AI Assets on them and obtain embedded-oriented benchmarks for a faster design process.

Software requirements
=====================
The following requirements need to be installed in the platform where the AI Asset will run:

Docker
~~~~~~
To install docker, follow the instructions in `here <https://docs.docker.com/engine/install/ubuntu/>`__.

By default, docker is not accessible to normal users. To allow the current user to access docker, run the following command:

.. code-block:: bash

    sudo groupadd docker
    sudo usermod -aG docker $USER
    newgrp docker

Verify that you can run docker commands without sudo:

.. code-block:: bash

     docker run hello-world

Git and Git LFS
~~~~~~~~~~~~~~~
Install git and git LFS by executing:

.. code-block:: bash

     sudo apt-get install git git-lfs

Git LFS is not active by default. To make sure git lfs is active, run the following command:

.. code-block:: bash

     git lfs install

The command will print some errors that can be safely ignored.

Due to a bug in Ubuntu 18.04 LTS, the binaries installed by pip are not available by default. To make sure that they are available, run the following command:

.. code-block:: bash

     echo 'export PATH="$HOME/.local/bin:$PATH"' >> .bashrc

Unfortunately, due to some limitations in Ubuntu 18.04 LTS, it is not possible to ensure that the new user groups are taken into account after a simple logout/login.
To complete the setup, it is necessary to restart the machine so that the changes to PATH and groups take effect.


.. _python_packages:

Packages
~~~~~~~~
The remaining packages can be installed by executing the following command:

.. code-block:: bash

    sudo apt-get install python3 python3-pip python3-wheel python3-setuptools

To be able to benchmark you AI Asset on your HW platform, you would need to install the following packages, based on the target platform:

**x86 (CPU)**

.. code-block:: bash

    pip3 install psutil 

**x86 (Cuda)**

.. code-block:: bash

    pip3 install psutil nvidia-smi nvidia-ml-py3

**arm64v8**

.. code-block:: bash

    pip3 install psutil

**Nvidia Jetsons**

.. code-block:: bash

    pip3 install psutil jetson-stats


NVIDIA Drivers
~~~~~~~~~~~~~~
If you are working on a x86 platform with a Nvidia GPU, ensure that you have installed the appropriate NVIDIA drivers. On Ubuntu, the easiest way of ensuring that you have the right version of the drivers set up is by installing a version of CUDA, at least as new as the image you intend to
use via the official NVIDIA CUDA download page. As an example, if you intend on using CUDA 10.2 you should ensure
that you have the correct graphics drivers, as described `here <https://docs.nvidia.com/deploy/cuda-compatibility/index.html>`_.

The following command can be used to verify your system for x86 platforms:

.. code-block:: bash

    docker run --gpus all nvidia/cuda:10.2-base nvidia-smi

If you are using a Nvidia Jetson device, it will be sufficient to set up the device following the DPE workflow described in `DPE <https://bonseyes.gitlab.io/bonseyes-cli/pages/user_guides/platform_index.html>`_.

Nvidia docker
~~~~~~~~~~~~~
You will also need to install the NVIDIA Container Toolkit to enable GPU device access within Docker containers.
Installation instructions can be found `here <https://github.com/NVIDIA/nvidia-docker>`_.


Setup
-----

AI Asset CLI
============
AI Asset CLI is a command line interface allowing end users to interact with AI Assets
providing functionalities for variety of tasks such as export, processing (video, image, camera), evaluation etc.

Install the Bonseyes AI Asset CLI on the intended device from remote:

.. code-block:: bash

    pip3 install git+https://gitlab.com/bonseyes-opensource/aiassets_cli.git

Add user path to system path:

.. code-block:: bash

    export PATH=$PATH:/home/${USER}/.local/bin

For detailed AI Asset CLI usage, please refer to official `documentation <https://bonseyes-opensource.gitlab.io/aiassets_cli/>`_


Board setup
===========
For board setup, please, follow first the DPE workflow explained in `DPE <https://bonseyes.gitlab.io/bonseyes-cli/pages/user_guides/platform_index.html>`_.


Usage
-----

Currently available AI Assets:

    - 3D Face Landmark detection (68 keypoints)
        - Backbones: mobilenetv1, mobilenetv0.5
        - Input-sizes: 120x120
        - Datasets: aflw, aflw2000-3d
    - Access token
        - Username: ``gitlab+deploy-token-483452``
        - Password: ``zsEqp4321jiCzWS-TUaG``
    - Whole Body Pose estimation (133 keypoints)
        - Backbones: resnet22, shufflenetv2k30, shufflenetv2k16
        - Input-sizes: 128x96, 128x128, 256x256, 384x216, 512x384
        - Datasets: wholebody
    - Access token
        - Username: ``gitlab+deploy-token-557315``
        - Password: ``AskgZQwcDRRYv3Da7BNB``

Currently available platforms and environments:

    - x86_64 machines
        - cpu
        - cuda10.2_tensorrt7.0
        - cuda11.2_tensorrt7.2_rtx3070
        - cuda11.4_tensorrt8.0
    - Nvidia Jetson Devices
        - jetpack4.4
        - jetpack4.6
    - Arm CPUs
        - arm64v8

If you face some issues during the workflow, you can export the DEBUG flag on your terminal to obtain more information about the issue:

.. code-block:: bash

    export DEBUG=True


Installation
============

Download and initialize specified demo AI Asset locally:

.. code-block:: bash

    bonseyes_aiassets_cli init
        --task {3dface_landmarks, whole_body_pose}
        --platform {x86_64, jetson, rpi}
        --environment {cpu,cuda10.2_tensorrt7.0,cuda11.2_tensorrt7.2_rtx3070,cuda11.4_tensorrt8.0,jetpack4.4,jetpack4.6,arm64v8}
        --version {v1.0, v2.0, ...}
        --user gitlab+deploy-token-USERNAME
        --password PASSWORD
        [--camera-id CAMERA_ID]

Check supported options by running:

.. code-block:: bash

    bonseyes_aiassets_cli init --help

Example:

.. code-block:: bash

    bonseyes_aiassets_cli init \
        --task whole_body_pose \
        --platform x86_64 \
        --environment cuda10.2_tensorrt7.0 \
        --version v1.0 \
        --camera-id 0 \
        --user <username> \
        --password <password>

Check if the container is running and on what port by executing:

.. code-block:: bash

    docker ps

If you want to stop running AI Asset

.. code-block:: bash

    docker kill <task_name>

Switch between AI Assets
========================

Use specific AI Asset locally:

.. code-block:: bash

    bonseyes_aiassets_cli use --task {3dface_landmarks, whole_body_pose}

Check supported tasks by running:

.. code-block:: bash

    bonseyes_aiassets_cli use --help


Train
=====

Train network and produce model based on available configuration files:

.. code-block:: bash

    bonseyes_aiassets_cli train start --config <config_name>

Check for available configs by running:

.. code-block:: bash

    bonseyes_aiassets_cli train start --help

Example:

.. code-block:: bash

    bonseyes_aiassets_cli train start --config v1.0_shufflenetv2k30_default_641x641_fp32_config

Check training status by running:

.. code-block:: bash

    bonseyes_aiassets_cli train status

Stop training process by running:

.. code-block:: bash

    bonseyes_aiassets_cli train stop

Export
======

Export pretrained models from ``PyTorch`` format to ``ONNX`` and/or ``TensorRT`` format(s):

.. code-block:: bash

    usage: bonseyes_aiassets_cli export [-h]
        --export-input-sizes EXPORT_INPUT_SIZES [EXPORT_INPUT_SIZES ...]
        --engine {all, onnxruntime, tensorrt}
        --precisions {fp32, fp16}
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
        [--workspace-unit {MB, GB}]
        [--workspace-size WORKSPACE_SIZE]
        [--enable-dla]

**Note:** When exporting models to ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to
specify lower workspace size in MBs.

Example:

.. code-block:: bash

    bonseyes_aiassets_cli export \
         --export-input-sizes 120x120 320x320 \
         --engine all \
         --backbone shufflenetv2k30 \
         --precisions fp32 fp16


Optimize
========

Optimize exported models by performing PTQ (post training quantization)

.. code-block:: bash

    usage: bonseyes_aiassets_cli optimize [-h]
        --optimize-input-sizes OPTIMIZE_INPUT_SIZES [OPTIMIZE_INPUT_SIZES ...]
        --engine {all, onnxruntime, tensorrt}
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
        [--workspace-unit {MB, GB}]
        [--workspace-size WORKSPACE_SIZE]
        [--enable-dla]

**Note:** When optimizing models for ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to
specify lower workspace size in MBs.

Example:

.. code-block:: bash

    bonseyes_aiassets_cli optimize \
        --optimize-input-sizes 120x120 320x320 \
        --engine tensorrt \
        --backbone shufflenetv2k30


Process
=======

Note: if you are using a VM in Virtual Box, you can share a camera (or a USB device) by selecting "Devices" > "Webcams" (or USB) and ticking the device you want to share with the VM.

Image
~~~~~

**Currently only supported format is .jpg**

.. code-block:: bash

    bonseyes_aiassets_cli demo image
        [--input-size INPUT_SIZE]
        [--engine {pytorch, onnxruntime, tensorrt}]
        [--precision {fp32, fp16, int8}]
        [--device {gpu, cpu}]
        [--cpu-num CPU_NUM]
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
        --image-input <image_absolute_path>

        3d face landmark specific:
        [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
        [--thickness THICKNESS]
        [--single-face-track]


Example:

.. code-block:: bash

    # CPU
    bonseyes_aiassets_cli demo image \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device cpu \
        --image-input </path/to/img.jpg>
    # GPU
    bonseyes_aiassets_cli demo image \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device gpu \
        --image-input </path/to/img.jpg>

----

Video
~~~~~

**Currently only supported format is .mp4**

.. code-block:: bash

    bonseyes_aiassets_cli demo video
        [--input-size INPUT_SIZE]
        [--engine {pytorch, onnxruntime, tensorrt}]
        [--precision {fp32, fp16, int8}]
        [--device {gpu, cpu}]
        [--cpu-num CPU_NUM]
        [--color COLOR]
        [--rotate {90, -90, 180}]
        --video-input <video_absolute_path>
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

        3d face landmark specific:
        [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
        [--thickness THICKNESS]
        [--single-face-track]


Example:

.. code-block:: bash

    # CPU
    bonseyes_aiassets_cli demo video \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device cpu \
        --video-input </path/to/video.mp4>
    # GPU
    bonseyes_aiassets_cli demo video \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device gpu \
        --video-input </path/to/video.mp4>

----

Camera
~~~~~~

.. code-block:: bash

    bonseyes_aiassets_cli demo camera
        [--input-size INPUT_SIZE]
        [--engine {pytorch, onnxruntime, tensorrt}]
        [--precision {fp32, fp16, int8}]
        [--device {gpu, cpu}]
        [--cpu-num CPU_NUM]
        [--color COLOR]
        [--rotate {90, -90, 180}]
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

        3d face landmark specific:
        [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
        [--thickness THICKNESS]
        [--single-face-track]


Example:

.. code-block:: bash

    # CPU
    bonseyes_aiassets_cli demo camera \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device cpu \
        --camera-id 0
    # GPU
    bonseyes_aiassets_cli demo camera \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device gpu \
        --camera-id 0 \

Server
~~~~~~

.. code-block:: bash

    bonseyes_aiassets_cli server start
        [--input-size INPUT_SIZE]
        [--engine {pytorch, onnxruntime, tensorrt}]
        [--precision {fp32, fp16, int8}]
        [--device {gpu, cpu}]
        [--cpu-num CPU_NUM]
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

        3d face landmark specific:
        [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
        [--thickness THICKNESS]
        [--single-face-track]


Example:

.. code-block:: bash

    # CPU
    bonseyes_aiassets_cli server start \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device cpu
    # GPU
    bonseyes_aiassets_cli server start \
        --input-size 320x320 \
        --engine pytorch \
        --precision fp32 \
        --backbone shufflenetv2k30 \
        --device gpu

You can test if server is running correctly by calling:

.. code-block:: bash

    curl --request POST --data-binary @/path/to/image.jpg http://localhost:<PORT>/inference

User <PORT> based on the aiasset you want ot use, each time you start the server PORT will be printed
out to standard output, you can either save it or check

.. code-block:: bash

    docker ps

And find out what port is AI Asset container exposing eg.

.. code-block:: bash

    CONTAINER ID  63bc638d1243
    IMAGE         registry.gitlab.com/bonseyes/assets/bonseyes_openpifpaf_wholebody/x86_64:v1.0_cuda10.2_tensorrt7.0
    COMMAND       "/usr/local/bin/nvid…"
    CREATED       58 minutes ago
    STATUS        Up 58 minutes
    PORTS         0.0.0.0:59838->59838/tcp, :::59838->59838/tcp
    NAMES         whole_body_pose


To stop the server execute:

.. code-block:: bash

    bonseyes_aiassets_cli server stop

Benchmark
=========

Evaluate exported and pretrained models:

.. code-block:: bash

    usage: bonseyes_aiassets_cli benchmark [-h]
        --benchmark-input-sizes INPUT_SIZES
        --engine {all, pytorch, onnxruntime, tensorrt}
        --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
        --device {gpu, cpu}

        3d face landmark specific:
        [--datasets {all, aflw, aflw2000-3d}]

Example:

.. code-block:: bash

    # CPU
    bonseyes_aiassets_cli benchmark \
        --benchmark-input-sizes 120x120 320x320 \
        --device cpu \
        --backbone shufflenetv2k30 \
        --engine pytorch onnxruntime \
        --dataset all
    # GPU
    bonseyes_aiassets_cli benchmark \
        --benchmark-input-sizes 120x120 320x320 \
        --device gpu \
        --backbone shufflenetv2k30 \
        --engine pytorch onnxruntime tensorrt \
        --dataset all