Train Lite (Embedded Friendly) Object Detection models using https://github.com/open-mmlab/mmdetection

TexasInstruments, updated 🕥 2023-01-09 04:49:31

EdgeAI-MMDetection

Notice

  • If you have not visited the landing page of at https://github.com/TexasInstruments/edgeai, please do so before attempting to use this repository. We skip most of the introduction in this repository.
  • This repository is located in Github at: https://github.com/TexasInstruments/edgeai-torchvision

This repository is an extension of the popular mmdetection open source repository for object detection training. While mmdetection focuses on a wide variety of models, typically at high complexity, we focus on models that are optimized for speed and accuracy so that they run efficiently on embedded devices. For this purpose, we have added a set of embedded friendly model configurations and scripts - please see the Usage for more information.

If the accuracy degradation with Post Training Quantization (PTQ) is higher than expected, this repository provides instructions and functionality required to do Quantization Aware Training (QAT).


Release Notes

See notes about recent changes/updates in this repository in release notes

Installation

These installation instructions were tested using Miniconda Python 3.7 on a Linux Machine with Ubuntu 18.04 OS.

Make sure that your Python version is indeed 3.7 or higher by typing:
python --version

Please clone and install EdgeAI-Torchvision as this repository uses several components from there - especially to define low complexity models and to do Quantization Aware Training (QAT) or Calibration.

After that, install this repository by running ./setup.sh

After installation, a python package called "mmdet" will be listed if you do pip list

In order to use a local folder as a repository, your PYTHONPATH must start with a : or a .: Please add the following to your .bashrc startup file for bash (assuming you are using bash shell). export PYTHONPATH=:$PYTHONPATH Make sure to close your current terminal or start a new one for the change in .bashrc to take effect or one can do source ~/.bashrc after the update.

Get Started

Please see Usage for training and testing with this repository.

Object Detection Model Zoo

Complexity and Accuracy report of several trained models is available at the Detection Model Zoo

Quantization

This tutorial explains more about quantization and how to do Quantization Aware Training (QAT) of detection models.

ONNX & Prototxt Export

Export of ONNX model (.onnx) and additional meta information (.prototxt) is supported. The .prototxt contains meta information specified by TIDL for object detectors.

The export of meta information is now supported for SSD and RetinaNet detectors.

For more information please see Usage

Advanced documentation

Kindly take time to read through the documentation of the original mmdetection before attempting to use extensions added this repository.

The setup script setup.sh in this repository has the commonly used settings. If your CUDA version is different or your Python version is different or if you have some missing packages in your system, this script can fail. In those scenarios, please refer to installation instructions for original mmdetection for detailed installation instructions.

Also see documentation of MMDetection for the basic usage of original mmdetection.

Acknowledgement

This is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.

We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to train existing detectors and also to develop their own new detectors.

License

Please see LICENSE file of this repository.

Citation

This package/toolbox is an extension of mmdetection (https://github.com/open-mmlab/mmdetection). If you use this repository or benchmark in your research or work, please cite the following:

@article{EdgeAI-MMDetection, title = {{EdgeAI-MMDetection}: An Extension To Open MMLab Detection Toolbox and Benchmark}, author = {Texas Instruments EdgeAI Development Team, [email protected]}, journal = {https://github.com/TexasInstruments/edgeai}, year={2021} } @article{mmdetection, title = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark}, author = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua}, journal= {arXiv preprint arXiv:1906.07155}, year={2019} }

References

[1] MMDetection: Open MMLab Detection Toolbox and Benchmark, https://arxiv.org/abs/1906.07155, Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

Issues

QAT training backbone problem?

opened on 2022-10-11 06:28:42 by betterhalfwzm

when i train yolo v3 with darknet, qat training accuracy is ok, but replace the backbone network like vgg,and floating point training is ok,but qat training accuracy is very low @mathmanu (backbone): VGG( (stage0): VGGBlock( (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (stage1): Sequential( (0): VGGBlock( (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (1): VGGBlock( (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) ) (stage2): Sequential( (0): VGGBlock( (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (1): VGGBlock( (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (2): VGGBlock( (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (3): VGGBlock( (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) ) (stage3): Sequential( (0): VGGBlock( (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (1): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (2): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (3): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (4): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (5): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (6): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) (7): VGGBlock( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) ) (stage4): Sequential( (0): VGGBlock( (conv): Conv2d(96, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (activate): ReLU(inplace=True) ) ) )

@mathmanu

SSD import error

opened on 2022-04-14 00:02:52 by noringname

Hi, I trained a ssd_regnetx_fpn_bgr_lite model and encountered some problems while importing it to bin. 1. I got 4 files: model.onnx model.prototxt model-proto.onnx model-proto.prototxt after running the export script, is this correct? 2. I got "[libprotobuf ERROR google/protobuf/text_format.cc:309] Error parsing text-format tidl_meta_arch.TIDLMetaArch: 16:9: Non-repeated field "output" is specified multiple times." when importing the onnx, the auto generated output fields are output: "dets" and output: "labels".

Thanks.

Run mmdetect demo code and get incorrect results

opened on 2022-03-07 08:52:37 by Patrick-Woo

I am new to mmdetection and TI edgeai mmdtection.

After installing this repository, I try to run some demo codes from original mmdrection repo to perform inference on a demo picture .

Unfortunately, the demo code below get the incorrect results on the demo picture. The bboxes are not on the right places and the classes are totally wrong.

Could you please be so kind to tell me how to perform inference and get the correct bbox with your edgeai mmdtection repository?

the demo code is below****** from mmdet.apis import init_detector, inference_detector import mmcv config_file = './configs/edgeailite/ssd/ssd_regnet_fpn_bgr_lite.py' checkpoint_file = './checkpoints/ssd_regnetx-800mf_fpn_bgr_lite_512x512_20200919_checkpoint.pth' model = init_detector(config_file, checkpoint_file, device='cuda:0') img = 'demo/demo.jpg'
result = inference_detector(model, img) model.show_result(img, result) model.show_result(img, result, out_file='result.jpg') * the demo code is above**

BTY, after running the ./run_detection_test.sh according the usage guide, I could get the right result with content of "mmdet - INFO - OrderedDict([('bbox_mAP', 0.328), ('bbox_mAP_50', 0.528).........."

QAT model configuration

opened on 2022-02-16 09:52:08 by ginamathew

Hi,

I would like to do model inference and pytorch to onnx conversion of custom object detection model(not in mmdetection) after QAT. Can you please help me in sharing sample code for the same.

I have model.py before QAT and as I understand this configuration get changed after doing QAT. I would like to know on how to change the model.py after QAT, for inference and also for onnx conversion.

regards, Gina

RuntimeError: NCCL communicator was aborted on rank 1

opened on 2022-01-18 05:20:01 by lilyswang

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?

./run_detection_train.sh

  1. Did you make any modifications on the code or config? Did you understand what you have modified? NO .

  2. What dataset did you use?

My own dataset (like bdd100k), about 11.2W pics in training dataset

Thanks for your nice work,Now we have some problems and need your help. I start training with my own data set. When the training ends at one epoch, the following error will be reported:(see the attachment for the specific log)

image

20220112_010819.log

We look forward to your reply !!! Thanks a lot!

AttributeError: module 'torchvision.edgeailite.xnn.model_surgery' has no attribute 'get_replacements_dict'

opened on 2022-01-03 09:20:42 by sathyapatel

I'm getting following error when i tried to run a ./run_detection_train.sh

work_dir = './work_dirs/yolov3_regnet_bgr_lite' gpu_ids = range(0, 1)

2022-01-03 09:13:58,990 - mmdet - INFO - Set random seed to 886029822, deterministic: False 2022-01-03 09:13:59,511 - mmdet - INFO - initialize RegNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'open-mmlab://regnetx_1.6gf'} 2022-01-03 09:13:59,512 - mmcv - INFO - load model from: open-mmlab://regnetx_1.6gf 2022-01-03 09:13:59,512 - mmcv - INFO - load checkpoint from openmmlab path: open-mmlab://regnetx_1.6gf 2022-01-03 09:13:59,562 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Traceback (most recent call last): File "./scripts/train_detection_main.py", line 65, in train_mmdet.main(args) File "/home/ubuntu/edgeai-mmdetection/tools/train.py", line 172, in main model = convert_to_lite_model(model, cfg) File "/home/ubuntu/edgeai-mmdetection/mmdet/utils/model_surgery.py", line 38, in convert_to_lite_model replacements_dict = copy.deepcopy(xnn.model_surgery.get_replacements_dict()) AttributeError: module 'torchvision.edgeailite.xnn.model_surgery' has no attribute 'get_replacements_dict' Done.

Texas Instruments

TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and embedded processors, along with software and tools.

GitHub Repository