General Multimodal Embedding for Image Search

This project is developed based on the GME model and is used for testing image retrieval under arbitrary inputs.

Paper: GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Setup

# Set Environment
conda create -n gme python=3.10
conda activate gme
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c pytorch -c nvidia faiss-gpu=1.9.0
pip install transformers                               # test with 4.47.1
pip install gradio                                     # test with 5.9.1

# Get Model
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --local-dir gme-Qwen2-VL-2B-Instruct

How to Use

Prepare the database for retrieval, use build_index.py for feature extraction and index building.
run retrieval_app.py for online retrieval.

Detailed usage

usage: build_index.py [-h] [--model_path MODEL_PATH] [--image_dir IMAGE_DIR] [--batch_size BATCH_SIZE] [--embeddings_output EMBEDDINGS_OUTPUT] [--index_output INDEX_OUTPUT] [--image_paths_output IMAGE_PATHS_OUTPUT]
options:
  --model_path MODEL_PATH
                        Path to the GmeQwen2VL model.
  --image_dir IMAGE_DIR
                        Path to the directory containing new images.
  --batch_size BATCH_SIZE
                        Batch size for embedding extraction.
  --embeddings_output EMBEDDINGS_OUTPUT
                        Output file for saving image embeddings.
  --index_output INDEX_OUTPUT
                        Output file for saving FAISS index.
  --image_paths_output IMAGE_PATHS_OUTPUT
                        Output file for saving image paths.

usage: retrieval_app.py [-h] [--model_path MODEL_PATH] [--image_embeddings_file IMAGE_EMBEDDINGS_FILE] [--faiss_index_file FAISS_INDEX_FILE] [--image_paths_file IMAGE_PATHS_FILE]
options:
  --model_path MODEL_PATH
                        Path to the GME model.
  --image_embeddings_file IMAGE_EMBEDDINGS_FILE
                        Path to the image embeddings file.
  --faiss_index_file FAISS_INDEX_FILE
                        Path to the FAISS index file.
  --image_paths_file IMAGE_PATHS_FILE
                        Path to the file containing image paths.

Results

gallery.zip : the set of images used to build the database(1,131 images).
query.zip : some query images(17 images).
Below are some test results along with their visualizations.
Test with GeForce RTX 4070 Ti SUPER(16GB) on WSL2.

Image(+Text) -> Image

I+T.mp4

Text -> Image[Chinese input]

T-zh.mp4

Text -> Image[English input]

T-en.mp4

Text(long) -> Image

LT.mp4

License

This project is released under the MIT License.

Citation

@misc{Open Grounding Dino,
  author = {Wei Li},
  title = {Gradio app with GME for Image Search},
  howpublished = {\url{https://github.com/BIGBALLON/GME-Search}},
  year = {2025}
}

Please create a pull request if you find any bugs or want to contribute code. 😄

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_index.py		build_index.py
gme_model.py		gme_model.py
retrieval_app.py		retrieval_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General Multimodal Embedding for Image Search

Setup

How to Use

Results

License

Citation

About

Releases 1

Packages

Languages

License

BIGBALLON/GME-Search

Folders and files

Latest commit

History

Repository files navigation

General Multimodal Embedding for Image Search

Setup

How to Use

Results

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages