Object Detection Application

This application performs real-time object detection on images and videos using the DETR (DEtection TRansformer) model from HuggingFace Transformers.

Machine Learning Model

The DETR model is a transformer-based object detector that eliminates the need for many hand-designed components like NMS.

Architecture

The DETR model follows an encoder-decoder architecture:

Encoder - A CNN backbone (e.g. ResNet-50) that extracts features from the input image
Decoder - A transformer model that sequentially predicts objects and their coordinates in the image

The encoder output is flattened and passed to the decoder along with learned object queries. The transformer decoder then works to associate each object query with an object in the image by predicting the bounding box coordinates and classification label.

This overall architecture allows DETR to be trained end-to-end instead of relying on a complex pipeline of multiple stages.

Training

DETR is trained using a bipartite matching loss that matches the predicted and ground truth objects without relying on NMS. The loss function contains:

Hungarian matching loss to match predictions to GT
Bounding box loss (L1) between matched boxes
Classification loss (cross entropy)

These losses allow end-to-end training of the model directly on bounding box annotations.

Strengths

Some key strengths of DETR:

Achieves strong object detection performance comparable to Faster R-CNN
Avoids NMS post-processing step
Modular components make architecture very flexible
Parallel decoding step enables fast inference

The DETR model provides an elegant transformer-based implementation for object detection.

Overview

The application has the following key components:

FastAPI Server - Provides the following endpoints:
- /annotate/ - Annotate an uploaded image file with detected objects
- /annotate_video/ - Annotate an uploaded video file with detected objects
- /delete_files/ - Delete all temporary files
DETR Model - Pretrained DETR model loaded from HuggingFace Transformers
Helper Functions - Used for image preprocessing, postprocessing and drawing bounding boxes
Async Processing - Allows concurrent processing of multiple image annotation tasks
Docker Config - For containerized deployment of the application

Running the App

Local Development

To run the app locally:

Clone the repository
Install dependencies: pip install -r requirements.txt
Run: uvicorn app:fastapi_app --reload
Open http://localhost:8000/docs for API docs

Docker

Build and run the Docker image:

docker build -t detr-detection .
docker run -p 8080:8080 detr-detection

Alternatively, use docker-compose up to spin up containers.

Usage

Upload image to /annotate/ to detect objects
Upload video to /annotate_video/ to get annotated video
Use /delete_files/ to delete temporary files
Refer API docs for endpoint usage examples

Optimizations

Lightweight Docker image
Asynchronous model loading and processing
Caching recent detections in LFU cache

Contributing

If you'd like to contribute to this project, please reach out to us at krmohit101@gmail.com . I welcome any improvements to the code or suggestions for new features.!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
input		input
output		output
temp_data		temp_data
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.prod		Dockerfile.prod
LICENSE		LICENSE
app.py		app.py
docker-compose.yaml		docker-compose.yaml
docker-requirements.txt		docker-requirements.txt
gunicorn_run.py		gunicorn_run.py
helpers.py		helpers.py
main.py		main.py
model.py		model.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object Detection Application

Machine Learning Model

Architecture

Training

Strengths

Overview

Running the App

Local Development

Docker

Usage

Optimizations

Contributing

About

Releases

Packages

Contributors 2

Languages

License

Mohitkr95/detr-object-detection

Folders and files

Latest commit

History

Repository files navigation

Object Detection Application

Machine Learning Model

Architecture

Training

Strengths

Overview

Running the App

Local Development

Docker

Usage

Optimizations

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages