Training domain specific models for efficient object detection arXivpaper
This is a framework to train domain specific model, which is accurate + computation efficient!
Faster-RCNN implementation is based on faster-rcnn.pytorch by jwyang. Thanks!
I strongly recommend to take a look at their readme if you get stuck on frcnn codes.
For a backbone of object detection, Resnet101 is a very good model but TOO BIG! But a small model like Resnet18 has a low accuracy, due to small network capacity.
However, do we need such a big model to do object detection in a limited domain? (Like your office or a particular intersection) Since backgrounds does not change, even small model should do very well if trained properly!
A domain specific model(DSM) is a model focusing on achieving high accuracy at such limited domain (e.g. fixed view of an intersection). We argue that DSMs can capture essential features well even with a small model size.
In this repo, we train a small domain specific model (say res18) in with a dataset of a limited domain.
We see that by training, small models can achieve very high accuracy!
Take a look at a Youtube Video Demo!
By domain specific training, the mAP improves ~20%.
Pytorch 0.4.0
Python 3.x
CUDA 8.0 or higher
Lets start off by cloning this repo.
git clone
cd training-domain-specific-models
You may need to compile the rpn scripts.
Please see jwyang's repo for details.
We need to prepare Resnet101 and Resnet18 Faster-RCNN model.
cd training-domain-specific-models
tar -zxvf files.tar.gz
If the models and the video are set, we can prepare the dataset.
- Res101 model generates the teacher labels.
- The dataset is prepared in a PASCAL_VOC format for training.
This is done in a single script.
Just run:
# for dataset coral
python --dataset coral
# for dataset jackson2
python --dataset jackson2
We prepared a dataset.tar in the link bellow, if you want to take a short cut.
Actually cloning the repo will get you the pickle lable files (output/baseline/)
This will take about 2 hours on TitanXp.
python --cuda --r True --dataset pascal_voc_jackson2
# or for coral,
python --cuda --r True --dataset pascal_voc_coral
We evaluate the accuracy (mAP) with validation images.
The res101 outputs are utilized as ground truth here, since labeling them are cubersome.
python --net res18 --dataset pascal_voc_jackson2 --cuda --checksession 1 --checkepoch 20 --checkpoint 1 --image_dir images/jackson2_val/ --truth output/baseline/jackson2val-res101.pkl
We also plan to release the hand-labeled ground truth as well.
Interestingly, domain specific model outperforms the accuracy that of res101.