Skip to content

A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.

License

Notifications You must be signed in to change notification settings

BIGBALLON/GME-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

General Multimodal Embedding for Image Search

This project is developed based on the GME model and is used for testing image retrieval under arbitrary inputs.

demo

Setup

# Set Environment
conda create -n gme python=3.10
conda activate gme
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c pytorch -c nvidia faiss-gpu=1.9.0
pip install transformers                               # test with 4.47.1
pip install gradio                                     # test with 5.9.1
# Get Model
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --local-dir gme-Qwen2-VL-2B-Instruct

How to Use

  1. Prepare the database for retrieval, use build_index.py for feature extraction and index building.
  2. run retrieval_app.py for online retrieval.
Detailed usage
usage: build_index.py [-h] [--model_path MODEL_PATH] [--image_dir IMAGE_DIR] [--batch_size BATCH_SIZE] [--embeddings_output EMBEDDINGS_OUTPUT] [--index_output INDEX_OUTPUT] [--image_paths_output IMAGE_PATHS_OUTPUT]
options:
  --model_path MODEL_PATH
                        Path to the GmeQwen2VL model.
  --image_dir IMAGE_DIR
                        Path to the directory containing new images.
  --batch_size BATCH_SIZE
                        Batch size for embedding extraction.
  --embeddings_output EMBEDDINGS_OUTPUT
                        Output file for saving image embeddings.
  --index_output INDEX_OUTPUT
                        Output file for saving FAISS index.
  --image_paths_output IMAGE_PATHS_OUTPUT
                        Output file for saving image paths.

usage: retrieval_app.py [-h] [--model_path MODEL_PATH] [--image_embeddings_file IMAGE_EMBEDDINGS_FILE] [--faiss_index_file FAISS_INDEX_FILE] [--image_paths_file IMAGE_PATHS_FILE]
options:
  --model_path MODEL_PATH
                        Path to the GME model.
  --image_embeddings_file IMAGE_EMBEDDINGS_FILE
                        Path to the image embeddings file.
  --faiss_index_file FAISS_INDEX_FILE
                        Path to the FAISS index file.
  --image_paths_file IMAGE_PATHS_FILE
                        Path to the file containing image paths.        

Results

  • gallery.zip : the set of images used to build the database(1,131 images).
  • query.zip : some query images(17 images).
  • Below are some test results along with their visualizations.
  • Test with GeForce RTX 4070 Ti SUPER(16GB) on WSL2.
Image(+Text) -> Image

I+T.mp4

Text -> Image[Chinese input]

T-zh.mp4

Text -> Image[English input]

T-en.mp4

Text(long) -> Image

LT.mp4

License

This project is released under the MIT License.

Citation

@misc{Open Grounding Dino,
  author = {Wei Li},
  title = {Gradio app with GME for Image Search},
  howpublished = {\url{https://github.com/BIGBALLON/GME-Search}},
  year = {2025}
}

Please create a pull request if you find any bugs or want to contribute code. 😄

About

A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages