Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 1.67 KB

model_zoo.md

File metadata and controls

28 lines (23 loc) · 1.67 KB

Model Zoo

In the following, we list the models acquired in Vision Search Assistant. It's expected that those models will be automatically downloaded to your device when you run the code for the first time. If you encounter any problems, you can manually download them.

Grounding Model

We use GroundingDINO as the gounding model.

Model Box AP on COCO Weights
GroundingDINO-Tiny 48.4 Huggingface
GroundingDINO-Base 56.7 Huggingface

Vision Language Model

We use LLaVA-v1.6 as the core Vision Language Model.

Version LLM LLaVA-Bench-Wild Weights
LLaVA-1.6 Vicuna-7B 81.6 Huggingface
LLaVA-1.6 Vicuna-13B 87.3 Huggingface
LLaVA-1.6 Mistral-7B 83.2 Huggingface
LLaVA-1.6 Hermes-Yi-34B 89.6 Huggingface

Searching Model

We use InternLM as the searching model.

Model CMMLU Weights
InternLM2.5-1.8B-Chat - Huggingface
InternLM2.5-7B-Chat 78.0 Huggingface
InternLM2.5-20B-Chat - Huggingface