This repository contains Pytorch implementation of the image captioning model published in the paper Show attend and tell (Xu et al, 2015)
- Ubuntu 18.04
- CUDA 11.0
- cuDNN
- Nvidia GeForce RTX 2080Ti
- Java 8
- Python 3.8.5
- Pytorch 1.7.0
- Other Python libraries specified in requirements.txt
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt
Run
(.env) $ python train.py
You can change some hyperparameters by modifying config.py
.
Encoder | Trained on | BLEU4 | CIDEr | METEOR | ROUGE_L |
---|---|---|---|---|---|
VGG | COCO2014 | 24.16 | 51.67 | 22.0 | - |
Resnet101 | COCO2014 | - | 76.2 | 23.9 | 64.2 |