Description ----------- An AI Asset enables accelerated deployment of deep learning models to resource constrained low power embedded systems (Deep Edge). The provided workflows deliver powerful and easy-to-deploy building blocks for creating complex AI models that can be deployed on cyber-physical systems. By taking care of many of end-to-end tooling dependencies and providing standardized interfaces, Bonseyes AI Asset enable users to focus on producing optimal solutions while allowing faster feedback during the implementation of end user requirements. The goal is to facilitate easier deployment to the deep edge with the Bonseyes AI Marketplace. Requirements ------------ Hardware requirements ===================== In order to utilize maximum potential of AI Assets - specially for training - it is required to have NVIDIA Graphic Card (``GTX1060`` and newer) with CUDA support on x86_64 environments. Nonetheless, AI Assets can also be run using Intel/AMD CPUs. We also provide support for Nvidia Jetsons devices as well as for platforms with arm64v8 achitectures. The support of these devices allows the user to evaluate any given AI Assets on them and obtain embedded-oriented benchmarks for a faster design process. Software requirements ===================== The following requirements need to be installed in the platform where the AI Asset will run: Docker ~~~~~~ To install docker, follow the instructions in `here `__. By default, docker is not accessible to normal users. To allow the current user to access docker, run the following command: .. code-block:: bash sudo groupadd docker sudo usermod -aG docker $USER newgrp docker Verify that you can run docker commands without sudo: .. code-block:: bash docker run hello-world Git and Git LFS ~~~~~~~~~~~~~~~ Install git and git LFS by executing: .. code-block:: bash sudo apt-get install git git-lfs Git LFS is not active by default. To make sure git lfs is active, run the following command: .. code-block:: bash git lfs install The command will print some errors that can be safely ignored. Due to a bug in Ubuntu 18.04 LTS, the binaries installed by pip are not available by default. To make sure that they are available, run the following command: .. code-block:: bash echo 'export PATH="$HOME/.local/bin:$PATH"' >> .bashrc Unfortunately, due to some limitations in Ubuntu 18.04 LTS, it is not possible to ensure that the new user groups are taken into account after a simple logout/login. To complete the setup, it is necessary to restart the machine so that the changes to PATH and groups take effect. .. _python_packages: Packages ~~~~~~~~ The remaining packages can be installed by executing the following command: .. code-block:: bash sudo apt-get install python3 python3-pip python3-wheel python3-setuptools To be able to benchmark you AI Asset on your HW platform, you would need to install the following packages, based on the target platform: **x86 (CPU)** .. code-block:: bash pip3 install psutil **x86 (Cuda)** .. code-block:: bash pip3 install psutil nvidia-smi nvidia-ml-py3 **arm64v8** .. code-block:: bash pip3 install psutil **Nvidia Jetsons** .. code-block:: bash pip3 install psutil jetson-stats NVIDIA Drivers ~~~~~~~~~~~~~~ If you are working on a x86 platform with a Nvidia GPU, ensure that you have installed the appropriate NVIDIA drivers. On Ubuntu, the easiest way of ensuring that you have the right version of the drivers set up is by installing a version of CUDA, at least as new as the image you intend to use via the official NVIDIA CUDA download page. As an example, if you intend on using CUDA 10.2 you should ensure that you have the correct graphics drivers, as described `here `_. The following command can be used to verify your system for x86 platforms: .. code-block:: bash docker run --gpus all nvidia/cuda:10.2-base nvidia-smi If you are using a Nvidia Jetson device, it will be sufficient to set up the device following the DPE workflow described in `DPE `_. Nvidia docker ~~~~~~~~~~~~~ You will also need to install the NVIDIA Container Toolkit to enable GPU device access within Docker containers. Installation instructions can be found `here `_. Setup ----- AI Asset CLI ============ AI Asset CLI is a command line interface allowing end users to interact with AI Assets providing functionalities for variety of tasks such as export, processing (video, image, camera), evaluation etc. Install the Bonseyes AI Asset CLI on the intended device from remote: .. code-block:: bash pip3 install git+https://gitlab.com/bonseyes-opensource/aiassets_cli.git Add user path to system path: .. code-block:: bash export PATH=$PATH:/home/${USER}/.local/bin For detailed AI Asset CLI usage, please refer to official `documentation `_ Board setup =========== For board setup, please, follow first the DPE workflow explained in `DPE `_. Usage ----- Currently available AI Assets: - 3D Face Landmark detection (68 keypoints) - Backbones: mobilenetv1, mobilenetv0.5 - Input-sizes: 120x120 - Datasets: aflw, aflw2000-3d - Access token - Username: ``gitlab+deploy-token-483452`` - Password: ``zsEqp4321jiCzWS-TUaG`` - Whole Body Pose estimation (133 keypoints) - Backbones: resnet22, shufflenetv2k30, shufflenetv2k16 - Input-sizes: 128x96, 128x128, 256x256, 384x216, 512x384 - Datasets: wholebody - Access token - Username: ``gitlab+deploy-token-557315`` - Password: ``AskgZQwcDRRYv3Da7BNB`` Currently available platforms and environments: - x86_64 machines - cpu - cuda10.2_tensorrt7.0 - cuda11.2_tensorrt7.2_rtx3070 - cuda11.4_tensorrt8.0 - Nvidia Jetson Devices - jetpack4.4 - jetpack4.6 - Arm CPUs - arm64v8 If you face some issues during the workflow, you can export the DEBUG flag on your terminal to obtain more information about the issue: .. code-block:: bash export DEBUG=True Installation ============ Download and initialize specified demo AI Asset locally: .. code-block:: bash bonseyes_aiassets_cli init --task {3dface_landmarks, whole_body_pose} --platform {x86_64, jetson, rpi} --environment {cpu,cuda10.2_tensorrt7.0,cuda11.2_tensorrt7.2_rtx3070,cuda11.4_tensorrt8.0,jetpack4.4,jetpack4.6,arm64v8} --version {v1.0, v2.0, ...} --user gitlab+deploy-token-USERNAME --password PASSWORD [--camera-id CAMERA_ID] Check supported options by running: .. code-block:: bash bonseyes_aiassets_cli init --help Example: .. code-block:: bash bonseyes_aiassets_cli init \ --task whole_body_pose \ --platform x86_64 \ --environment cuda10.2_tensorrt7.0 \ --version v1.0 \ --camera-id 0 \ --user \ --password Check if the container is running and on what port by executing: .. code-block:: bash docker ps If you want to stop running AI Asset .. code-block:: bash docker kill Switch between AI Assets ======================== Use specific AI Asset locally: .. code-block:: bash bonseyes_aiassets_cli use --task {3dface_landmarks, whole_body_pose} Check supported tasks by running: .. code-block:: bash bonseyes_aiassets_cli use --help Train ===== Train network and produce model based on available configuration files: .. code-block:: bash bonseyes_aiassets_cli train start --config Check for available configs by running: .. code-block:: bash bonseyes_aiassets_cli train start --help Example: .. code-block:: bash bonseyes_aiassets_cli train start --config v1.0_shufflenetv2k30_default_641x641_fp32_config Check training status by running: .. code-block:: bash bonseyes_aiassets_cli train status Stop training process by running: .. code-block:: bash bonseyes_aiassets_cli train stop Export ====== Export pretrained models from ``PyTorch`` format to ``ONNX`` and/or ``TensorRT`` format(s): .. code-block:: bash usage: bonseyes_aiassets_cli export [-h] --export-input-sizes EXPORT_INPUT_SIZES [EXPORT_INPUT_SIZES ...] --engine {all, onnxruntime, tensorrt} --precisions {fp32, fp16} --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} [--workspace-unit {MB, GB}] [--workspace-size WORKSPACE_SIZE] [--enable-dla] **Note:** When exporting models to ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to specify lower workspace size in MBs. Example: .. code-block:: bash bonseyes_aiassets_cli export \ --export-input-sizes 120x120 320x320 \ --engine all \ --backbone shufflenetv2k30 \ --precisions fp32 fp16 Optimize ======== Optimize exported models by performing PTQ (post training quantization) .. code-block:: bash usage: bonseyes_aiassets_cli optimize [-h] --optimize-input-sizes OPTIMIZE_INPUT_SIZES [OPTIMIZE_INPUT_SIZES ...] --engine {all, onnxruntime, tensorrt} --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} [--workspace-unit {MB, GB}] [--workspace-size WORKSPACE_SIZE] [--enable-dla] **Note:** When optimizing models for ``TensorRT`` format on devices with lower RAM size (<4GB) it is recommended to specify lower workspace size in MBs. Example: .. code-block:: bash bonseyes_aiassets_cli optimize \ --optimize-input-sizes 120x120 320x320 \ --engine tensorrt \ --backbone shufflenetv2k30 Process ======= Note: if you are using a VM in Virtual Box, you can share a camera (or a USB device) by selecting "Devices" > "Webcams" (or USB) and ticking the device you want to share with the VM. Image ~~~~~ **Currently only supported format is .jpg** .. code-block:: bash bonseyes_aiassets_cli demo image [--input-size INPUT_SIZE] [--engine {pytorch, onnxruntime, tensorrt}] [--precision {fp32, fp16, int8}] [--device {gpu, cpu}] [--cpu-num CPU_NUM] --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} --image-input 3d face landmark specific: [--render {2d_sparse, 2d_dense, 3d, pose, axis}] [--thickness THICKNESS] [--single-face-track] Example: .. code-block:: bash # CPU bonseyes_aiassets_cli demo image \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device cpu \ --image-input # GPU bonseyes_aiassets_cli demo image \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device gpu \ --image-input ---- Video ~~~~~ **Currently only supported format is .mp4** .. code-block:: bash bonseyes_aiassets_cli demo video [--input-size INPUT_SIZE] [--engine {pytorch, onnxruntime, tensorrt}] [--precision {fp32, fp16, int8}] [--device {gpu, cpu}] [--cpu-num CPU_NUM] [--color COLOR] [--rotate {90, -90, 180}] --video-input --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} 3d face landmark specific: [--render {2d_sparse, 2d_dense, 3d, pose, axis}] [--thickness THICKNESS] [--single-face-track] Example: .. code-block:: bash # CPU bonseyes_aiassets_cli demo video \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device cpu \ --video-input # GPU bonseyes_aiassets_cli demo video \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device gpu \ --video-input ---- Camera ~~~~~~ .. code-block:: bash bonseyes_aiassets_cli demo camera [--input-size INPUT_SIZE] [--engine {pytorch, onnxruntime, tensorrt}] [--precision {fp32, fp16, int8}] [--device {gpu, cpu}] [--cpu-num CPU_NUM] [--color COLOR] [--rotate {90, -90, 180}] --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} 3d face landmark specific: [--render {2d_sparse, 2d_dense, 3d, pose, axis}] [--thickness THICKNESS] [--single-face-track] Example: .. code-block:: bash # CPU bonseyes_aiassets_cli demo camera \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device cpu \ --camera-id 0 # GPU bonseyes_aiassets_cli demo camera \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device gpu \ --camera-id 0 \ Server ~~~~~~ .. code-block:: bash bonseyes_aiassets_cli server start [--input-size INPUT_SIZE] [--engine {pytorch, onnxruntime, tensorrt}] [--precision {fp32, fp16, int8}] [--device {gpu, cpu}] [--cpu-num CPU_NUM] --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} 3d face landmark specific: [--render {2d_sparse, 2d_dense, 3d, pose, axis}] [--thickness THICKNESS] [--single-face-track] Example: .. code-block:: bash # CPU bonseyes_aiassets_cli server start \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device cpu # GPU bonseyes_aiassets_cli server start \ --input-size 320x320 \ --engine pytorch \ --precision fp32 \ --backbone shufflenetv2k30 \ --device gpu You can test if server is running correctly by calling: .. code-block:: bash curl --request POST --data-binary @/path/to/image.jpg http://localhost:/inference User based on the aiasset you want ot use, each time you start the server PORT will be printed out to standard output, you can either save it or check .. code-block:: bash docker ps And find out what port is AI Asset container exposing eg. .. code-block:: bash CONTAINER ID 63bc638d1243 IMAGE registry.gitlab.com/bonseyes/assets/bonseyes_openpifpaf_wholebody/x86_64:v1.0_cuda10.2_tensorrt7.0 COMMAND "/usr/local/bin/nvid…" CREATED 58 minutes ago STATUS Up 58 minutes PORTS 0.0.0.0:59838->59838/tcp, :::59838->59838/tcp NAMES whole_body_pose To stop the server execute: .. code-block:: bash bonseyes_aiassets_cli server stop Benchmark ========= Evaluate exported and pretrained models: .. code-block:: bash usage: bonseyes_aiassets_cli benchmark [-h] --benchmark-input-sizes INPUT_SIZES --engine {all, pytorch, onnxruntime, tensorrt} --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16} --device {gpu, cpu} 3d face landmark specific: [--datasets {all, aflw, aflw2000-3d}] Example: .. code-block:: bash # CPU bonseyes_aiassets_cli benchmark \ --benchmark-input-sizes 120x120 320x320 \ --device cpu \ --backbone shufflenetv2k30 \ --engine pytorch onnxruntime \ --dataset all # GPU bonseyes_aiassets_cli benchmark \ --benchmark-input-sizes 120x120 320x320 \ --device gpu \ --backbone shufflenetv2k30 \ --engine pytorch onnxruntime tensorrt \ --dataset all