Model Zoo

In the following, we list the models acquired in Vision Search Assistant. It's expected that those models will be automatically downloaded to your device when you run the code for the first time. If you encounter any problems, you can manually download them.

Grounding Model

We use GroundingDINO as the gounding model.

Model	Box AP on COCO	Weights
GroundingDINO-Tiny	48.4	Huggingface
GroundingDINO-Base	56.7	Huggingface

Vision Language Model

We use LLaVA-v1.6 as the core Vision Language Model.

Version	LLM	LLaVA-Bench-Wild	Weights
LLaVA-1.6	Vicuna-7B	81.6	Huggingface
LLaVA-1.6	Vicuna-13B	87.3	Huggingface
LLaVA-1.6	Mistral-7B	83.2	Huggingface
LLaVA-1.6	Hermes-Yi-34B	89.6	Huggingface

Searching Model

We use InternLM as the searching model.

Model	CMMLU	Weights
InternLM2.5-1.8B-Chat	-	Huggingface
InternLM2.5-7B-Chat	78.0	Huggingface
InternLM2.5-20B-Chat	-	Huggingface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_zoo.md

model_zoo.md

Model Zoo

Grounding Model

Vision Language Model

Searching Model

Files

model_zoo.md

Latest commit

History

model_zoo.md

File metadata and controls

Model Zoo

Grounding Model

Vision Language Model

Searching Model