Description
-----------
An AI Asset enables accelerated deployment of deep learning models to resource constrained low power embedded systems (Deep Edge).
The provided workflows deliver powerful and easy-to-deploy building blocks for creating complex AI models that can be deployed on cyber-physical systems.
By taking care of many of end-to-end tooling dependencies and providing standardized interfaces, Bonseyes AI Asset enable users to focus on producing optimal solutions while allowing faster feedback during the implementation of end user requirements.
The goal is to facilitate easier deployment to the deep edge with the Bonseyes AI Marketplace.
Requirements
------------
Hardware requirements
=====================
In order to utilize maximum potential of AI Assets - specially for training - it is required to have NVIDIA Graphic Card (``GTX1060`` and newer)
with CUDA support on x86_64 environments. Nonetheless, AI Assets can also be run using Intel/AMD CPUs.
We also provide support for Nvidia Jetsons devices as well as for platforms with arm64v8 achitectures. The support of these devices allows the user to evaluate any given AI Assets on them and obtain embedded-oriented benchmarks for a faster design process.
Software requirements
=====================
The following requirements need to be installed in the platform where the AI Asset will run:
Docker
~~~~~~
To install docker, follow the instructions in `here `__.
By default, docker is not accessible to normal users. To allow the current user to access docker, run the following command:
.. code-block:: bash
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
Verify that you can run docker commands without sudo:
.. code-block:: bash
docker run hello-world
Git and Git LFS
~~~~~~~~~~~~~~~
Install git and git LFS by executing:
.. code-block:: bash
sudo apt-get install git git-lfs
Git LFS is not active by default. To make sure git lfs is active, run the following command:
.. code-block:: bash
git lfs install
The command will print some errors that can be safely ignored.
Due to a bug in Ubuntu 18.04 LTS, the binaries installed by pip are not available by default. To make sure that they are available, run the following command:
.. code-block:: bash
echo 'export PATH="$HOME/.local/bin:$PATH"' >> .bashrc
Unfortunately, due to some limitations in Ubuntu 18.04 LTS, it is not possible to ensure that the new user groups are taken into account after a simple logout/login.
To complete the setup, it is necessary to restart the machine so that the changes to PATH and groups take effect.
.. _python_packages:
Packages
~~~~~~~~
The remaining packages can be installed by executing the following command:
.. code-block:: bash
sudo apt-get install python3 python3-pip python3-wheel python3-setuptools
To be able to benchmark you AI Asset on your HW platform, you would need to install the following packages, based on the target platform:
**x86 (CPU)**
.. code-block:: bash
pip3 install psutil
**x86 (Cuda)**
.. code-block:: bash
pip3 install psutil nvidia-smi nvidia-ml-py3
**arm64v8**
.. code-block:: bash
pip3 install psutil
**Nvidia Jetsons**
.. code-block:: bash
pip3 install psutil jetson-stats
NVIDIA Drivers
~~~~~~~~~~~~~~
If you are working on a x86 platform with a Nvidia GPU, ensure that you have installed the appropriate NVIDIA drivers. On Ubuntu, the easiest way of ensuring that you have the right version of the drivers set up is by installing a version of CUDA, at least as new as the image you intend to
use via the official NVIDIA CUDA download page. As an example, if you intend on using CUDA 10.2 you should ensure
that you have the correct graphics drivers, as described `here `_.
The following command can be used to verify your system for x86 platforms:
.. code-block:: bash
docker run --gpus all nvidia/cuda:10.2-base nvidia-smi
If you are using a Nvidia Jetson device, it will be sufficient to set up the device following the DPE workflow described in `DPE `_.
Nvidia docker
~~~~~~~~~~~~~
You will also need to install the NVIDIA Container Toolkit to enable GPU device access within Docker containers.
Installation instructions can be found `here `_.
Setup
-----
AI Asset CLI
============
AI Asset CLI is a command line interface allowing end users to interact with AI Assets
providing functionalities for variety of tasks such as export, processing (video, image, camera), evaluation etc.
Install the Bonseyes AI Asset CLI on the intended device from remote:
.. code-block:: bash
pip3 install git+https://gitlab.com/bonseyes-opensource/aiassets_cli.git
Add user path to system path:
.. code-block:: bash
export PATH=$PATH:/home/${USER}/.local/bin
For detailed AI Asset CLI usage, please refer to official `documentation `_
Board setup
===========
For board setup, please, follow first the DPE workflow explained in `DPE `_.
Usage
-----
Currently available AI Assets:
- 3D Face Landmark detection (68 keypoints)
- Backbones: mobilenetv1, mobilenetv0.5
- Input-sizes: 120x120
- Datasets: aflw, aflw2000-3d
- Access token
- Username: ``gitlab+deploy-token-483452``
- Password: ``zsEqp4321jiCzWS-TUaG``
- Whole Body Pose estimation (133 keypoints)
- Backbones: resnet22, shufflenetv2k30, shufflenetv2k16
- Input-sizes: 128x96, 128x128, 256x256, 384x216, 512x384
- Datasets: wholebody
- Access token
- Username: ``gitlab+deploy-token-557315``
- Password: ``AskgZQwcDRRYv3Da7BNB``
Currently available platforms and environments:
- x86_64 machines
- cpu
- cuda10.2_tensorrt7.0
- cuda11.2_tensorrt7.2_rtx3070
- cuda11.4_tensorrt8.0
- Nvidia Jetson Devices
- jetpack4.4
- jetpack4.6
- Arm CPUs
- arm64v8
If you face some issues during the workflow, you can export the DEBUG flag on your terminal to obtain more information about the issue:
.. code-block:: bash
export DEBUG=True
Installation
============
Download and initialize specified demo AI Asset locally:
.. code-block:: bash
bonseyes_aiassets_cli init
--task {3dface_landmarks, whole_body_pose}
--platform {x86_64, jetson, rpi}
--environment {cpu,cuda10.2_tensorrt7.0,cuda11.2_tensorrt7.2_rtx3070,cuda11.4_tensorrt8.0,jetpack4.4,jetpack4.6,arm64v8}
--version {v1.0, v2.0, ...}
--user gitlab+deploy-token-USERNAME
--password PASSWORD
[--camera-id CAMERA_ID]
Check supported options by running:
.. code-block:: bash
bonseyes_aiassets_cli init --help
Example:
.. code-block:: bash
bonseyes_aiassets_cli init \
--task whole_body_pose \
--platform x86_64 \
--environment cuda10.2_tensorrt7.0 \
--version v1.0 \
--camera-id 0 \
--user \
--password
Check if the container is running and on what port by executing:
.. code-block:: bash
docker ps
If you want to stop running AI Asset
.. code-block:: bash
docker kill
Switch between AI Assets
========================
Use specific AI Asset locally:
.. code-block:: bash
bonseyes_aiassets_cli use --task {3dface_landmarks, whole_body_pose}
Check supported tasks by running:
.. code-block:: bash
bonseyes_aiassets_cli use --help
Train
=====
Train network and produce model based on available configuration files:
.. code-block:: bash
bonseyes_aiassets_cli train start --config
Check for available configs by running:
.. code-block:: bash
bonseyes_aiassets_cli train start --help
Example:
.. code-block:: bash
bonseyes_aiassets_cli train start --config v1.0_shufflenetv2k30_default_641x641_fp32_config
Check training status by running:
.. code-block:: bash
bonseyes_aiassets_cli train status
Stop training process by running:
.. code-block:: bash
bonseyes_aiassets_cli train stop
Export
======
Export pretrained models from ``PyTorch`` format to ``ONNX`` and/or ``TensorRT`` format(s):
.. code-block:: bash
usage: bonseyes_aiassets_cli export [-h]
--export-input-sizes EXPORT_INPUT_SIZES [EXPORT_INPUT_SIZES ...]
--engine {all, onnxruntime, tensorrt}
--precisions {fp32, fp16}
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
[--workspace-unit {MB, GB}]
[--workspace-size WORKSPACE_SIZE]
[--enable-dla]
**Note:** When exporting models to ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to
specify lower workspace size in MBs.
Example:
.. code-block:: bash
bonseyes_aiassets_cli export \
--export-input-sizes 120x120 320x320 \
--engine all \
--backbone shufflenetv2k30 \
--precisions fp32 fp16
Optimize
========
Optimize exported models by performing PTQ (post training quantization)
.. code-block:: bash
usage: bonseyes_aiassets_cli optimize [-h]
--optimize-input-sizes OPTIMIZE_INPUT_SIZES [OPTIMIZE_INPUT_SIZES ...]
--engine {all, onnxruntime, tensorrt}
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
[--workspace-unit {MB, GB}]
[--workspace-size WORKSPACE_SIZE]
[--enable-dla]
**Note:** When optimizing models for ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to
specify lower workspace size in MBs.
Example:
.. code-block:: bash
bonseyes_aiassets_cli optimize \
--optimize-input-sizes 120x120 320x320 \
--engine tensorrt \
--backbone shufflenetv2k30
Process
=======
Note: if you are using a VM in Virtual Box, you can share a camera (or a USB device) by selecting "Devices" > "Webcams" (or USB) and ticking the device you want to share with the VM.
Image
~~~~~
**Currently only supported format is .jpg**
.. code-block:: bash
bonseyes_aiassets_cli demo image
[--input-size INPUT_SIZE]
[--engine {pytorch, onnxruntime, tensorrt}]
[--precision {fp32, fp16, int8}]
[--device {gpu, cpu}]
[--cpu-num CPU_NUM]
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
--image-input
3d face landmark specific:
[--render {2d_sparse, 2d_dense, 3d, pose, axis}]
[--thickness THICKNESS]
[--single-face-track]
Example:
.. code-block:: bash
# CPU
bonseyes_aiassets_cli demo image \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device cpu \
--image-input
# GPU
bonseyes_aiassets_cli demo image \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device gpu \
--image-input
----
Video
~~~~~
**Currently only supported format is .mp4**
.. code-block:: bash
bonseyes_aiassets_cli demo video
[--input-size INPUT_SIZE]
[--engine {pytorch, onnxruntime, tensorrt}]
[--precision {fp32, fp16, int8}]
[--device {gpu, cpu}]
[--cpu-num CPU_NUM]
[--color COLOR]
[--rotate {90, -90, 180}]
--video-input
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
3d face landmark specific:
[--render {2d_sparse, 2d_dense, 3d, pose, axis}]
[--thickness THICKNESS]
[--single-face-track]
Example:
.. code-block:: bash
# CPU
bonseyes_aiassets_cli demo video \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device cpu \
--video-input
# GPU
bonseyes_aiassets_cli demo video \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device gpu \
--video-input
----
Camera
~~~~~~
.. code-block:: bash
bonseyes_aiassets_cli demo camera
[--input-size INPUT_SIZE]
[--engine {pytorch, onnxruntime, tensorrt}]
[--precision {fp32, fp16, int8}]
[--device {gpu, cpu}]
[--cpu-num CPU_NUM]
[--color COLOR]
[--rotate {90, -90, 180}]
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
3d face landmark specific:
[--render {2d_sparse, 2d_dense, 3d, pose, axis}]
[--thickness THICKNESS]
[--single-face-track]
Example:
.. code-block:: bash
# CPU
bonseyes_aiassets_cli demo camera \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device cpu \
--camera-id 0
# GPU
bonseyes_aiassets_cli demo camera \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device gpu \
--camera-id 0 \
Server
~~~~~~
.. code-block:: bash
bonseyes_aiassets_cli server start
[--input-size INPUT_SIZE]
[--engine {pytorch, onnxruntime, tensorrt}]
[--precision {fp32, fp16, int8}]
[--device {gpu, cpu}]
[--cpu-num CPU_NUM]
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
3d face landmark specific:
[--render {2d_sparse, 2d_dense, 3d, pose, axis}]
[--thickness THICKNESS]
[--single-face-track]
Example:
.. code-block:: bash
# CPU
bonseyes_aiassets_cli server start \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device cpu
# GPU
bonseyes_aiassets_cli server start \
--input-size 320x320 \
--engine pytorch \
--precision fp32 \
--backbone shufflenetv2k30 \
--device gpu
You can test if server is running correctly by calling:
.. code-block:: bash
curl --request POST --data-binary @/path/to/image.jpg http://localhost:/inference
User based on the aiasset you want ot use, each time you start the server PORT will be printed
out to standard output, you can either save it or check
.. code-block:: bash
docker ps
And find out what port is AI Asset container exposing eg.
.. code-block:: bash
CONTAINER ID 63bc638d1243
IMAGE registry.gitlab.com/bonseyes/assets/bonseyes_openpifpaf_wholebody/x86_64:v1.0_cuda10.2_tensorrt7.0
COMMAND "/usr/local/bin/nvid…"
CREATED 58 minutes ago
STATUS Up 58 minutes
PORTS 0.0.0.0:59838->59838/tcp, :::59838->59838/tcp
NAMES whole_body_pose
To stop the server execute:
.. code-block:: bash
bonseyes_aiassets_cli server stop
Benchmark
=========
Evaluate exported and pretrained models:
.. code-block:: bash
usage: bonseyes_aiassets_cli benchmark [-h]
--benchmark-input-sizes INPUT_SIZES
--engine {all, pytorch, onnxruntime, tensorrt}
--backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
--device {gpu, cpu}
3d face landmark specific:
[--datasets {all, aflw, aflw2000-3d}]
Example:
.. code-block:: bash
# CPU
bonseyes_aiassets_cli benchmark \
--benchmark-input-sizes 120x120 320x320 \
--device cpu \
--backbone shufflenetv2k30 \
--engine pytorch onnxruntime \
--dataset all
# GPU
bonseyes_aiassets_cli benchmark \
--benchmark-input-sizes 120x120 320x320 \
--device gpu \
--backbone shufflenetv2k30 \
--engine pytorch onnxruntime tensorrt \
--dataset all