Skip to content

πŸ“πŸ”πŸ–ΌοΈ A deep learning application for retrieving images by searching with text.

Notifications You must be signed in to change notification settings

koushikvikram/multimodal-image-retrieval

Repository files navigation

Multimodal Image Retrieval

Pylint Pytest Open in Streamlit GitHub URL LinkedIn URL

πŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈπŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈπŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈ Repo Under Construction πŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈπŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈπŸš¦βš οΈπŸ‘·β€β™‚οΈπŸ—οΈ

Note: Our model hasn't been trained sufficiently and the results are nowhere close to our expectations. We'll be improving the model as we find time and more GPU resources. Until then, play around with this (not so great) model.
Things we're looking to try:

  • Improve preprocessing
    • Replace special characters with space
  • Play around with embedding dimensions
  • Use the entire InstaNY100K Dataset
  • Train Word2Vec again
  • Use different CNNs for regressing Word2Vec embeddings from images.
  • Try different post-processing strategies for embeddings.
  • Train with MSELoss
  • Experiment with other distance functions

A deep learning application to retrieve images by searching with text.

Try out the application here: https://share.streamlit.io/koushikvikram/multimodal-image-retrieval/main/app.py

Project Workflow

Project Workflow

Dataset

Download the InstaNY100K dataset from this Google Drive link

Extract the dataset in the path, ./datasets/raw/. You folder structure should look like the one below:

./datasets/raw/
|
|-- InstaNY100K
    |
    |-- captions
    |   |
    |   |-- newyork
    |      | 1487768220566960691.txt
    |      | 1490727714071958379.txt
    |      | ...
    |   
    |-- img_resized
        |
        |-- newyork
            | 1480879485913200243.jpg
            | 1480879539524935620.jpg
            | ...

GitHub Actions for this Repository

Pylint - Code Quality Check

Pytest - Functionality and Behavioral Tests for Classes and Models

Exploring the Word2Vec Model

We recommend using the TensorFlow Embedding Projector to visualize our Word2Vec model.

Load the tensor and metadata tsv files provided in the model directory and visualize words that interest you!

Samples from TensorFlow Embedding Projector:

You can also use models/explore_word2vec.ipynb to explore words of interest.

Samples from the Jupyter Notebook:

Acknowledgment

Articles used as reference during development are documented in the references directory.

If you run into issues while using the repo, please create an issue on this GitHub repository at the following link and I'll be glad to fix it: https://github.com/koushikvikram/multimodal-image-retrieval/issues

If you'd like to collaborate with me or hire me, please feel free to send an email to koushikvikram91@gmail.com

Make sure to check out other repositories on my homepage.