vision-language-models

Here are 29 public repositories matching this topic...

baaivision / EVE

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Oct 2, 2024
Python

snap-research / MyVLM

Star

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

personalization vision-language-models

Updated Jul 5, 2024
Python

baaivision / DenseFusion

Star

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

Updated Dec 6, 2024
Python

BAAI-Agents / GPA-LM

Star

This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".

games ai gcc planning gameplay awesome-list agents gameai vlm multimodal agent-framework large-language-models llm generative-ai vision-language-models general-computer-control

Updated Sep 3, 2024

NishilBalar / Awesome-LVLM-Hallucination

Star

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

Updated Jan 9, 2025

yu-rp / apiprompting

Star

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

visual-prompting prompting vision-language-model large-vision-language-model large-vision-language-models large-multimodal-models vision-language-models

Updated Oct 10, 2024
Python

elkhouryk / RS-TransCLIP

Star

[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"

remote-sensing aerial-imagery image-classification satellite-imagery earth-observation scene-classification transductive-learning zero-shot-classification vision-language-models

Updated Jan 9, 2025
Python

erfanshayegani / Jailbreak-In-Pieces

Star

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

alignment ai-safety vlm llm vision-language-models cross-modality-safety-alignment multi-modal-models

Updated Jun 6, 2024
Python

vanillaer / CPL-ICML2024

Star

[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"

unlabeled-data pseudolabels vision-language-models

Updated Jun 21, 2024
Python

drive-bench / toolkit

Star

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Jan 8, 2025

lezhang7 / SAIL

Star

[Under review] Assessing and Learning Alignment of Unimodal Vision and Language Models

efficient-learning vision-language-models

Updated Dec 19, 2024
Jupyter Notebook

ytaek-oh / fsc-clip

Star

[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

compositionality zero-shot-classification image-text-retrieval vision-language-models

Updated Oct 8, 2024
Python

danelpeng / Awesome-Continual-Leaning-with-PTMs

Star

This is a curated list of "Continual Learning with Pretrained Models" research.

awesome adapters pretrained-models continual-learning mixture-of-experts diffusion-models embodied-ai prompt-tuning large-language-models parameter-efficient-tuning vision-language-models

Updated Oct 30, 2024

jiayuww / SpatialEval

Star

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

machine-learning gemini reasoning claude spatial-reasoning multimodal-deep-learning foundation-models large-language-models gpt-4v vision-language-models llama3 gpt-4o

Updated Dec 9, 2024
Python

ytaek-oh / awesome-vl-compositionality

Star

Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.

vision-language-models vision-language-compositionality

Updated Dec 16, 2024

chu0802 / SnD

Star

This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24

eccv continual-learning vision-language-models eccv2024

Updated Dec 31, 2024
Python

sled-group / COMFORT

Star

Repo for the paper "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"

spatial-reasoning vision-language-models

Updated Oct 24, 2024
Python

sitamgithub-MSIT / PicQ

Star

PicQ: Demo for MiniCPM-V 2.6 to answer questions about images using natural language.

python question-answering gradio multilingual-models huggingface-transformers gradio-interface huggingface-spaces generative-ai vision-language-models minicpm-v

Updated Dec 7, 2024
Python

Shengwei-Peng / TOCFL-MultiBench

Star

TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.

benchmark natural-language-processing deep-learning chinese-language multimodal large-language-models multimodal-large-language-models vision-language-models

Updated Dec 16, 2024
Python

fork123aniket / VLM-powered-Multimodal-Conversational-AI-Bot

Star

Streamlit App Combining Vision, Language, and Audio AI Models

conversational-interface conversational-ai multimodal-learning multimodal multimodal-deep-learning multimodal-data conversational-agent conversational-bot vision-language vision-language-transformer vision-language-model vision-language-navigation multimodal-large-language-models vision-language-learning vision-language-models internvl internvl2

Updated Jan 6, 2025
Python

Improve this page

Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-models

Here are 29 public repositories matching this topic...

baaivision / EVE

snap-research / MyVLM

baaivision / DenseFusion

BAAI-Agents / GPA-LM

NishilBalar / Awesome-LVLM-Hallucination

yu-rp / apiprompting

elkhouryk / RS-TransCLIP

erfanshayegani / Jailbreak-In-Pieces

vanillaer / CPL-ICML2024

drive-bench / toolkit

lezhang7 / SAIL

ytaek-oh / fsc-clip

danelpeng / Awesome-Continual-Leaning-with-PTMs

jiayuww / SpatialEval

ytaek-oh / awesome-vl-compositionality

chu0802 / SnD

sled-group / COMFORT

sitamgithub-MSIT / PicQ

Shengwei-Peng / TOCFL-MultiBench

fork123aniket / VLM-powered-Multimodal-Conversational-AI-Bot

Improve this page

Add this topic to your repo