Image Captioning

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning.

Image captioning is a popular research area of Artificial Intelligence (AI) that deals with image understanding and a language description for that image. Image understanding needs to detect and recognize objects. It also needs to understand scene type or location, object properties and their interactions. Generating well-formed sentences requires both syntactic and semantic understanding of the language.

Dataset

MS COCO Dataset: Microsoft COCO Dataset is a very large dataset for image recognition, segmentation, and captioning. There are various features of MS COCO dataset such as object segmentation, recognition in context, multiple objects per class, more than 300,000 images, more than 2 million instances, 80 object categories, and 5 captions per image. Many image captioning methods use the dataset in their experiments.

Model Architecture

In this project, global image features are extracted from the hidden activations of CNN and then fed them into an LSTM to generate a sequence of words. A vanilla CNN is used to obtain the scene type, to detect the objects and their relationships.The output of CNN is used by a language model to convert them into words, combined phrases that produce an image captions. This project uses a CNN for image representations and an LSTM for generating image captions.

Predictions

a man riding a horse on a beach near a body of water .

a clock tower with a clock on it 's side .

a man is flying a kite on the beach

a street scene with cars and a bus stop .

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
models		models
predictions		predictions
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Dataset

Model Architecture

Predictions

About

Releases

Packages

Languages

junthbasnet/Automated-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Dataset

Model Architecture

Predictions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages