GitHub - Zetianuser/cv-arxiv-daily: 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]

Updated on 2025.01.26

Usage instructions: here

Table of Contents

Camouflage
In-context
VLM
Visual In-context
V-ICL

Camouflage

Publish Date	Title	Authors	PDF	Code
2025-01-22	Observation of Strong Nonreciprocal Thermal Emission	Zhenong Zhang et.al.	2501.12947	null
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-21	Library-Attack: Reverse Engineering Approach for Evaluating Hardware IP Protection	Aritra Dasgupta et.al.	2501.12292	null
2025-01-19	Green Video Camouflaged Object Detection	Xinyu Wang et.al.	2501.10914	null
2025-01-13	Toward Realistic Camouflaged Object Detection: Benchmarks and Method	Zhimeng Xin et.al.	2501.07297	link
2025-01-10	A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection	Tsui Qin Mok et.al.	2501.06038	null
2025-01-20	Tailored Thin Films: Modulating Soft Photonics with Dynamically Tunable Large Area Microstructures via Controlled Thermal Processing	Srijeeta Biswas et.al.	2501.05736	null
2025-01-02	Anti-counterfeiting tags with camouflaged QR codes on nanocavities, using polymer-dispersed-liquid-crystals	Giuseppe Nicoletta et.al.	2501.02011	null
2025-01-03	Innate behavioural mechanisms and defensive traits in ecological models of predator-prey types	Sangeeta Saha et.al.	2501.01687	null
2024-12-31	B2Net: Camouflaged Object Detection via Boundary Aware and Boundary Fusion	Junmin Cai et.al.	2501.00426	null
2025-01-15	CGCOD: Class-Guided Camouflaged Object Detection	Chenxi Zhang et.al.	2412.18977	link
2025-01-05	Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks Against GNN-Based Fraud Detectors	Jinhyeok Choi et.al.	2412.18370	link
2024-12-22	Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection	Yi Liu et.al.	2412.16840	link
2024-12-18	Novel AI Camera Camouflage: Face Cloaking Without Full Disguise	David Noever et.al.	2412.13507	null
2024-12-14	Unconstrained Salient and Camouflaged Object Detection	Zhangjun Zhou et.al.	2412.10943	null
2024-12-14	CATALOG: A Camera Trap Language-guided Contrastive Learning Model	Julian D. Santamaria et.al.	2412.10624	link
2024-12-10	CapGen:An Environment-Adaptive Generator of Adversarial Patches	Chaoqun Li et.al.	2412.07253	null
2024-12-02	Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Xiaoqi Zhao et.al.	2412.01240	null
2024-11-28	COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection	Xiaoqin Zhang et.al.	2411.18858	link
2024-11-15	Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors	Jiawei Zhou et.al.	2411.10029	null
2024-11-10	SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains	Bijoy Ahmed Saiem et.al.	2411.06426	null
2024-11-22	Financial Fraud Detection using Jump-Attentive Graph Neural Networks	Prashank Kadam et.al.	2411.05857	link
2024-10-28	TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors	Adonisz Dimitriu et.al.	2410.21443	null
2024-10-23	PlantCamo: Plant Camouflage Detection	Jinyu Yang et.al.	2410.17598	link
2024-10-22	Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Cheng Lei et.al.	2410.16953	null
2024-10-20	Lying mirror	Yuhang Li et.al.	2410.15521	null
2024-10-15	Octopus-Swimming-Like Robot with Soft Asymmetric Arms	Bobing Zhang et.al.	2410.11764	null
2024-10-05	Exploring Strengths and Weaknesses of Super-Resolution Attack in Deepfake Detection	Davide Alessandro Coccomini et.al.	2410.04205	null
2024-10-05	Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection	Dingwen Zhang et.al.	2410.03987	null
2024-09-27	When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation	Yuli Zhou et.al.	2409.18653	link
2024-09-26	CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors	Linye Lyu et.al.	2409.17963	link
2024-09-25	Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2	Chunhui Zhang et.al.	2409.16902	link
2024-09-24	Phase-space gaussian ensemble quantum camouflage	Alex E. Bernardini et.al.	2409.16377	null
2024-09-24	MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios	Jiacheng Ruan et.al.	2409.16084	link
2024-09-19	Frequency-Guided Spatial Adaptation for Camouflaged Object Detection	Shizhou Zhang et.al.	2409.12421	null
2024-09-01	NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques	Leand Thaqi et.al.	2409.10547	null
2024-09-15	Optimality of Motion Camouflage Under Escape Uncertainty	Mallory Gaspard et.al.	2409.09890	null
2024-09-15	GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection	Yanguang Sun et.al.	2409.09588	link
2024-09-11	Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning	Yingling Lu et.al.	2409.07238	link
2024-09-05	Active Fake: DeepFake Camouflage	Pu Sun et.al.	2409.03200	null
2024-09-04	Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation	Tiantian Zhang et.al.	2409.02567	link
2024-09-03	Frequency-Spatial Entanglement Learning for Camouflaged Object Detection	Yanguang Sun et.al.	2409.01686	link
2024-09-04	ExpoSort: Breaking the quasi-polynomial-time barrier for reluctant sorting	Mikkel Abrahamsen et.al.	2409.00794	null
2024-08-29	Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning	Luyao Tang et.al.	2408.16310	link
2024-09-21	Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection	Siyuan Yao et.al.	2408.15020	link
2024-08-26	A Survey of Camouflaged Object Detection and Beyond	Fengyang Xiao et.al.	2408.14562	link
2024-08-25	Camouflaged_Object_Tracking__A_Benchmark	Xiaoyu Guo et.al.	2408.13877	null
2024-08-22	BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking	Hanzheng Wang et.al.	2408.12232	null
2024-08-22	Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy	Hong Zhang et.al.	2408.12086	link
2024-08-20	Just a Hint: Point-Supervised Camouflaged Object Detection	Huafeng Chen et.al.	2408.10777	null
2024-08-20	SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection	Huafeng Chen et.al.	2408.10760	null
2024-08-20	Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory	Yongxin Deng et.al.	2408.10608	null
2024-08-19	Microscopic Analysis on LLM players via Social Deduction Game	Byungjun Kim et.al.	2408.09946	null
2024-08-19	Games with Planned Actions and Scouting	Wolfgang Kuhle et.al.	2408.09778	null
2024-08-17	Depth-guided Texture Diffusion for Image Semantic Segmentation	Wei Sun et.al.	2408.09097	null
2024-08-16	SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation	Xinyu Xiong et.al.	2408.08870	link
2024-08-15	CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection	Xunfa Lai et.al.	2408.08050	null
2024-08-12	Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes	Ke Zhou et.al.	2408.05936	null
2024-08-10	SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More	Tianrun Chen et.al.	2408.04579	null
2024-08-02	PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network	Changqun Xia et.al.	2408.01137	null
2024-08-01	VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection	Fei Xiao et.al.	2408.00513	null
2024-07-31	Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2	Lv Tang et.al.	2407.21596	null
2024-08-18	Global Confidence Degree Based Graph Neural Network for Financial Fraud Detection	Jiaxun Liu et.al.	2407.17333	null
2024-07-18	Learning Camouflaged Object Detection from Noisy Pseudo Label	Jin Zhang et.al.	2407.13157	null
2024-07-18	FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection	Jianwei Zhao et.al.	2407.13133	null
2024-07-17	Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection	Zhenni Yu et.al.	2407.12339	link
2024-07-10	Edge-dominance games on graphs	Farid Arthaud et.al.	2407.07785	null
2024-07-02	Adversarial Magnification to Deceive Deepfake Detection through Super Resolution	Davide Alessandro Coccomini et.al.	2407.02670	link
2024-06-18	PFID: Privacy First Inference Delegation Framework for LLMs	Haoyan Yang et.al.	2406.12238	null
2024-06-17	YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection	Tamara R. Lenhard et.al.	2406.11641	null
2024-06-09	SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention	Muhammad Nawfal Meeran et.al.	2406.05802	link
2024-06-09	Utilizing Grounded SAM for self-supervised frugal camouflaged human detection	Matthias Pijarowski et.al.	2406.05776	null
2024-05-25	GreenCOD: A Green Camouflaged Object Detection Method	Hong-Shuo Chen et.al.	2405.16144	null
2024-05-09	Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection	Xinran Liua et.al.	2405.05614	null
2024-05-10	Honeyfile Camouflage: Hiding Fake Files in Plain Sight	Roelien C. Timmer et.al.	2405.04758	null
2024-05-07	Adaptive Guidance Learning for Camouflaged Object Detection	Zhennan Chen et.al.	2405.02824	null
2024-05-28	Spider: A Unified Framework for Context-dependent Concept Segmentation	Xiaoqi Zhao et.al.	2405.01002	link
2024-04-24	BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers	Buyun He et.al.	2404.15070	link
2024-04-18	An Overview of Electromagnetic Illusions: Empowering Smart Environments with Reconfigurable Metasurfaces	Hamidreza Taghvaee et.al.	2404.12089	null
2024-04-18	Enhance Robustness of Language Models Against Variation Attack through Graph Integration	Zi Xiong et.al.	2404.12014	null
2024-04-13	Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage	Yang Hu et.al.	2404.08936	null
2024-04-04	InsectMamba: Insect Pest Classification with State Space Model	Qianning Wang et.al.	2404.03611	null
2024-04-13	LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion	Pancheng Zhao et.al.	2404.00292	link
2024-03-21	Latent Diffusion Models for Attribute-Preserving Image Anonymization	Luca Piano et.al.	2403.14790	null
2024-03-04	Weaponization of Conscience in Cybercrime and Online Fraud: A Novel Systems Theory	Michelle Espinoza et.al.	2403.14667	null
2024-03-14	Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations	Xinyu Xiong et.al.	2403.09315	null
2024-05-04	Effectiveness Assessment of Recent Large Vision-Language Models	Yao Jiang et.al.	2403.04306	null
2024-03-04	Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection	Xin Zhang et.al.	2403.01968	null
2024-02-29	A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection	Chao Hao et.al.	2402.18922	link
2024-02-28	Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond	Ziyun Yang et.al.	2402.18698	null
2024-02-28	Living-off-The-Land Reverse-Shell Detection by Informed Data Augmentation	Dmitrijs Trizna et.al.	2402.18329	null
2024-02-24	RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation	Jiawei Zhou et.al.	2402.15853	link
2024-02-21	Flexible Physical Camouflage Generation Based on a Differential Approach	Yang Li et.al.	2402.13575	null
2024-02-15	Camouflage is all you need: Evaluating and Enhancing Language Model Robustness Against Camouflage Adversarial Attacks	Álvaro Huertas-García et.al.	2402.09874	null
2024-02-16	Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues	Zhiyuan Chang et.al.	2402.09091	null
2024-02-03	CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse	Cunhan Guo et.al.	2402.02217	null
2024-01-29	The Reasoning Under Uncertainty Trap: A Structural AI Risk	Toby D. Pilditch et.al.	2402.01743	null
2024-01-30	Camouflage Adversarial Attacks on Multiple Agent Systems	Ziqing Lu et.al.	2401.17405	null
2024-01-22	Concealed Object Segmentation with Hierarchical Coherence Modeling	Fengyang Xiao et.al.	2401.11767	null
2024-01-17	The problem of optimal camouflaging	Alexander Plakhov et.al.	2401.08928	null
2024-01-16	Localised Thermal Emission from Topological Interfaces	M. Said Ergoktas et.al.	2401.08316	null
2024-01-07	Dynamic Multi Color Switching using Ultrathin Vanadium Oxide on Aluminium based Asymmetric Fabry-Perot Resonant Structure	Shubhangi Saini et.al.	2401.03543	null
2024-01-02	Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector	Jitao Ma et.al.	2401.01093	null
2023-12-30	TPatch: A Triggered Physical Adversarial Patch	Wenjun Zhu et.al.	2401.00148	link
2023-12-29	Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation	Tuan-Anh Vu et.al.	2312.17505	null
2024-01-12	MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World	Zheng Zhou et.al.	2312.17431	null
2023-12-27	Natural Adversarial Patch Generation Method Based on Latent Diffusion Model	Xianyi Chen et.al.	2312.16401	null
2023-12-18	Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects	Jian Hu et.al.	2312.07374	link
2023-12-06	Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation	Haojie Zhang et.al.	2312.03502	link
2023-12-06	Antibody-loading of biological nanocarrier vesicles derived from red-blood-cell membranes	Maryam Sanaee et.al.	2312.03417	null
2023-11-28	Large Model Based Referring Camouflaged Object Detection	Shupeng Cheng et.al.	2311.17122	null
2023-11-28	Cross-level Attention with Overlapped Windows for Camouflaged Object Detection	Jiepan Li et.al.	2311.16618	null
2023-11-25	VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning	Ziyang Luo et.al.	2311.15011	link
2023-11-19	Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens	Lv Tang et.al.	2311.11273	link
2023-11-19	Open-Vocabulary Camouflaged Object Segmentation	Youwei Pang et.al.	2311.11241	link
2023-11-15	Infrared thermochromic antenna composite for self-adaptive thermoregulation	Francisco V. Ramirez-Cuevas et.al.	2311.08633	null
2023-11-10	Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16	T. T Lemani et.al.	2311.05981	null

(back to top)

In-context

Publish Date	Title	Authors	PDF	Code
2025-01-23	EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents	Yuhui Yun et.al.	2501.13746	null
2025-01-21	Compositional Instruction Following with Language Models and Reinforcement Learning	Vanya Cohen et.al.	2501.12539	null
2025-01-21	CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification	Cristiano Patrício et.al.	2501.12266	null
2025-01-21	Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs	Saiful Haq et.al.	2501.11833	null
2025-01-20	Trojan Detection Through Pattern Recognition for Large Language Models	Vedant Bhasin et.al.	2501.11621	null
2025-01-19	AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model	Lipeng Ma et.al.	2501.11031	link
2025-01-18	Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments	Hongjin Su et.al.	2501.10893	null
2025-01-18	Visual RAG: Expanding MLLM visual knowledge without fine-tuning	Mirco Bonomo et.al.	2501.10834	null
2025-01-18	GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems	Amin Robatian et.al.	2501.10734	null
2025-01-17	Tabular-TX: Theme-Explanation Structure-based Table Summarization via In-Context Learning	TaeYoon Kwack et.al.	2501.10487	null
2025-01-16	Confidence Estimation for Error Detection in Text-to-SQL Systems	Oleg Somov et.al.	2501.09527	null
2025-01-16	Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval	Jesus Lovon et.al.	2501.09384	null
2025-01-16	A Study of In-Context-Learning-Based Text-to-SQL Errors	Jiawei Shen et.al.	2501.09310	link
2025-01-16	Perspective Transition of Large Language Models for Solving Subjective Tasks	Xiaolong Wang et.al.	2501.09265	null
2025-01-16	Task Vectors in In-Context Learning: Emergence, Formation, and Benefit	Liu Yang et.al.	2501.09240	null
2025-01-15	Exploring Task-Level Optimal Prompts for Visual In-Context Learning	Yan Zhu et.al.	2501.08841	null
2025-01-15	Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning	Alain Komaty et.al.	2501.08799	null
2025-01-15	The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities	Irina Bigoulaeva et.al.	2501.08716	link
2025-01-13	SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models	Fabien Bernier et.al.	2501.07639	null
2025-01-13	Enhancing Retrieval-Augmented Generation: A Study of Best Practices	Siran Li et.al.	2501.07391	link
2025-01-13	Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models	Yongyu Mu et.al.	2501.07086	link
2025-01-12	An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering	Zaber Al Hassan Ayon et.al.	2501.06837	null
2025-01-09	What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning	Jelena Bratulić et.al.	2501.06256	null
2025-01-09	Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding	Mohammed Elhenawy et.al.	2501.05566	null
2025-01-08	Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations	Kirandeep Kaur et.al.	2501.04762	null
2025-01-08	ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training	Xinfa Zhu et.al.	2501.04416	null
2025-01-09	More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives	Xiaoqing Zhang et.al.	2501.04070	link
2025-01-08	A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval	Shuo Tong et.al.	2501.03295	null
2025-01-06	BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning	Beichen Zhang et.al.	2501.03226	link
2025-01-06	Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text	Ali Al-Lawati et.al.	2501.03166	link
2025-01-03	Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models	Lei Tang et.al.	2501.01679	null
2025-01-01	Unraveling Indirect In-Context Learning Using Influence Functions	Hadi Askari et.al.	2501.01473	null
2025-01-05	Learning Spectral Methods by Transformers	Yihan He et.al.	2501.01312	null
2025-01-02	Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction	Alexander Brinkmann et.al.	2501.01237	link
2025-01-02	ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning	Wonduk Seo et.al.	2501.01031	null
2024-12-31	Robust and Adaptive Optimization under a Large Language Model Lens	Dimitris Bertsimas et.al.	2501.00568	null
2024-12-31	SPDZCoder: Teaching LLMs to Synthesize Privacy Computing Code without Massive Training Data	Xiaoning Dong et.al.	2501.00363	null
2024-12-29	ICLR: In-Context Learning of Representations	Core Francisco Park et.al.	2501.00070	null
2024-12-29	Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection	Dmitri Roussinov et.al.	2412.20595	link
2024-12-29	Towards Neural No-Resource Language Translation: A Comparative Evaluation of Approaches	Madhavendra Thakur et.al.	2412.20584	null
2024-12-27	TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data	Xiang Huang et.al.	2412.19544	link
2024-12-27	Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs	Zhe Yang et.al.	2412.19513	link
2024-12-26	SILC-EFSA: Self-aware In-context Learning Correction for Entity-level Financial Sentiment Analysis	Senbin Zhu et.al.	2412.19140	link
2024-12-26	SketchFill: Sketch-Guided Code Generation for Imputing Derived Missing Values	Yunfan Zhang et.al.	2412.19113	null
2024-12-26	Let the Rule Speak: Enhancing In-context Learning Debiasing with Interpretability	Ruixi Lin et.al.	2412.19018	null
2024-12-30	TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization	Yucong Luo et.al.	2412.18185	null
2024-12-24	Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner	Aizierjiang Aiersilan et.al.	2412.18086	link
2024-12-23	The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting	Shuzhang Cai et.al.	2412.17891	null
2024-12-22	SAIL: Sample-Centric In-Context Learning for Document Information Extraction	Jinyu Zhang et.al.	2412.17092	link
2024-12-22	PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask	Jeongho Kim et.al.	2412.16978	link
2024-12-22	Revisiting In-Context Learning with Long Context Language Models	Jinheon Baek et.al.	2412.16926	null
2024-12-21	Dynamical Behaviors of the Gradient Flows for In-Context Learning	Songtao Lu et.al.	2412.16683	null
2024-12-21	Learning Cross-Task Generalities Across Graphs via Task-trees	Zehong Wang et.al.	2412.16441	null
2024-12-20	Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?	Mengyu Ye et.al.	2412.15628	null
2024-12-20	Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification	Gyutae Park et.al.	2412.15603	null
2024-12-20	In-context Continual Learning Assisted by an External Continual Learner	Saleh Momeni et.al.	2412.15563	null
2024-12-19	Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models	Nishtha N. Vaidya et.al.	2412.15309	null
2024-12-19	LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks	Yushi Bai et.al.	2412.15204	link
2024-12-19	Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture	Thomas F Burns et.al.	2412.15113	link
2024-12-19	MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance	Hallee E. Wong et.al.	2412.15058	null
2024-12-19	DS $^2$ -ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis	Hongling Xu et.al.	2412.14849	link
2024-12-19	Relational Programming with Foundation Models	Ziyang Li et.al.	2412.14515	null
2024-12-18	LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning	Yansheng Mao et.al.	2412.13626	null
2024-12-17	In-context learning for medical image segmentation	Eichi Takaya et.al.	2412.13299	null
2024-12-17	In-Context Learning Distillation for Efficient Few-Shot Fine-Tuning	Yifei Duan et.al.	2412.13243	null
2024-12-17	Jailbreaking? One Step Is Enough!	Weixiong Zheng et.al.	2412.12621	null
2024-12-17	Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL	Geling Liu et.al.	2412.12522	null
2024-12-16	Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering	Jinhe Bi et.al.	2412.12359	link
2024-12-18	Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers	Seungwook Han et.al.	2412.12276	null
2024-12-16	Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning	Yuti Liu et.al.	2412.11952	null
2024-12-16	PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection	Sepideh Mamooler et.al.	2412.11923	null
2024-12-16	PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Kun Ouyang et.al.	2412.11906	null
2024-12-16	A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection	Simon Hachmeier et.al.	2412.11851	link
2024-12-16	ColorFlow: Retrieval-Augmented Image Sequence Colorization	Junhao Zhuang et.al.	2412.11815	null
2024-12-16	Embodied CoT Distillation From LLM To Off-the-shelf Agents	Wonje Choi et.al.	2412.11499	null
2024-12-16	Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory	Shuo Wang et.al.	2412.11459	null
2024-12-15	HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation	Tengfei Liu et.al.	2412.11070	link
2024-12-14	Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning	Piyapath T Spencer et.al.	2412.10960	null
2024-12-13	ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL	Yang Qin et.al.	2412.10138	link
2024-12-13	CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models	Zhihao Du et.al.	2412.10117	link
2024-12-13	RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector	Zhensheng Wang et.al.	2412.10104	link
2024-12-12	A Systematic Review of Knowledge Tracing and Large Language Models in Education: Opportunities, Issues, and Future Research	Yongwan Cho et.al.	2412.09248	null
2024-12-12	Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning	Mateo Alejandro Rojas et.al.	2412.08955	null
2024-12-11	In-Context Learning with Topological Information for Knowledge Graph Completion	Udari Madhushani Sehwag et.al.	2412.08742	null
2024-12-11	Fast Prompt Alignment for Text-to-Image Generation	Khalil Mrini et.al.	2412.08639	link
2024-12-11	Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages	Ashutosh Bajpai et.al.	2412.08090	link
2024-12-11	Using Large Language Models for Parametric Shape Optimization	Xinxin Zhang et.al.	2412.08072	null
2024-12-11	Federated In-Context LLM Agent Learning	Panlong Wu et.al.	2412.08054	null
2024-12-10	DRUM: Learning Demonstration Retriever for Large MUlti-modal Models	Ellen Yi-Ge et.al.	2412.07619	null
2024-12-09	A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension	Saahith Janapati et.al.	2412.06245	null
2024-12-08	Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective	Andrew Jesson et.al.	2412.06033	null
2024-12-07	PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks	Soumya Suvra Ghosal et.al.	2412.05710	null
2024-12-07	On the effective transfer of knowledge from English to Hindi Wikipedia	Paramita Das et.al.	2412.05708	null
2024-12-06	A text-to-tabular approach to generate synthetic patient data using LLMs	Margaux Tornqvist et.al.	2412.05153	link
2024-12-06	REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments	Kaustubh Sridhar et.al.	2412.04759	null
2024-12-05	Improving LLM Group Fairness on Tabular Data via In-Context Learning	Valeriia Cherepanova et.al.	2412.04642	null
2024-12-05	Demonstration Selection for In-Context Learning via Reinforcement Learning	Xubin Wang et.al.	2412.03966	null
2024-12-09	The broader spectrum of in-context learning	Andrew Kyle Lampinen et.al.	2412.03782	null
2024-12-04	Intent-driven In-context Learning for Few-shot Dialogue State Tracking	Zihao Yi et.al.	2412.03270	null
2024-12-03	Minimization of Boolean Complexity in In-Context Concept Learning	Leroy Z. Wang et.al.	2412.02823	null
2024-12-03	CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?	Vaishnavi Bhargava et.al.	2412.02735	null
2024-12-03	A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis	Changzhi Zhou et.al.	2412.02279	null
2024-12-03	Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs	Zixuan Hu et.al.	2412.02220	null
2024-12-03	VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding	Kangsan Kim et.al.	2412.02186	link
2024-12-02	X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models	Zeyi Sun et.al.	2412.01824	link
2024-12-02	Can Large Language Models Serve as Evaluators for Code Summarization?	Yang Wu et.al.	2412.01333	link
2024-12-02	RL2: Reinforce Large Language Model to Assist Safe Reinforcement Learning for Energy Management of Active Distribution Networks	Xu Yang et.al.	2412.01303	null
2024-12-03	CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search	Kaixin Wu et.al.	2412.01269	null
2024-12-02	Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Xiaoqi Zhao et.al.	2412.01240	null
2024-12-03	Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation	Bolin Lai et.al.	2412.01027	null
2024-12-01	Competition Dynamics Shape Algorithmic Phases of In-Context Learning	Core Francisco Park et.al.	2412.01003	link
2024-11-29	In-Context Learning with Noisy Labels	Junyong Kang et.al.	2411.19581	null
2024-11-29	KV Shifting Attention Enhances Language Modeling	Mingyu Xu et.al.	2411.19574	link
2024-11-28	ICLERB: In-Context Learning Embedding and Reranker Benchmark	Marie Al Ghossein et.al.	2411.18947	null
2024-11-27	Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS	Jinyang Wu et.al.	2411.18478	null
2024-11-27	Curriculum Demonstration Selection for In-Context Learning	Duc Anh Vu et.al.	2411.18126	null
2024-11-26	On the ERM Principle in Meta-Learning	Yannay Alon et.al.	2411.17898	null
2024-11-26	MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation	Harsh Singh et.al.	2411.17636	null
2024-11-26	"Stupid robot, I want to speak to a human!" User Frustration Detection in Task-Oriented Dialog Systems	Mireia Hernandez Caralt et.al.	2411.17437	null
2024-11-26	Using Large Language Models for Expert Prior Elicitation in Predictive Modelling	Alexander Capstick et.al.	2411.17284	link
2024-11-27	MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing	Feifei Shao et.al.	2411.16773	null
2024-11-25	Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training	Weimin Wu et.al.	2411.16549	null
2024-11-25	Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain	Hangyul Yoon et.al.	2411.16123	link
2024-11-24	Can a Large Language Model Learn Matrix Functions In Context?	Paimon Goulart et.al.	2411.15675	link
2024-11-23	Multi-label Sequential Sentence Classification via Large Language Model	Mengfei Lan et.al.	2411.15623	link
2024-11-23	From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars	Albert Kornilov et.al.	2411.15577	link
2024-11-23	From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set	Mara Finkelstein et.al.	2411.15387	null
2024-11-22	There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks	Miguel Espinosa et.al.	2411.15288	link
2024-11-22	Optimizing Social Media Annotation of HPV Vaccine Skepticism and Misinformation Using Large Language Models: An Experimental Evaluation of In-Context Learning and Fine-Tuning Stance Detection Across Multiple Models	Luhang Sun et.al.	2411.14720	null
2024-11-20	Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL	Zhibo Chu et.al.	2411.13244	link
2024-11-19	Instant Policy: In-Context Imitation Learning via Graph Diffusion	Vitalis Vosylius et.al.	2411.12633	null
2024-11-22	SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization	Hongrui Jia et.al.	2411.11909	link
2024-11-18	LaVin-DiT: Large Vision Diffusion Transformer	Zhaoqing Wang et.al.	2411.11505	null
2024-11-18	Re-examining learning linear functions in context	Omar Naim et.al.	2411.11465	null
2024-11-18	ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification	Son T. Luu et.al.	2411.11247	link
2024-11-17	AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers	Jake Grigsby et.al.	2411.11188	link
2024-11-17	Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering	Zeping Yu et.al.	2411.10950	link
2024-11-16	SPICA: Retrieving Scenarios for Pluralistic In-Context Alignment	Quan Ze Chen et.al.	2411.10912	null
2024-11-16	One-Layer Transformer Provably Learns One-Nearest Neighbor In Context	Zihao Li et.al.	2411.10830	null
2024-11-16	IntentGPT: Few-shot Intent Discovery with Large Language Models	Juan A. Rodriguez et.al.	2411.10670	null
2024-11-15	Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data	Kai Helli et.al.	2411.10634	null
2024-11-15	Does Prompt Formatting Have Any Impact on LLM Performance?	Jia He et.al.	2411.10541	null
2024-11-15	Zero-shot Voice Conversion with Diffusion Transformers	Songting Liu et.al.	2411.09943	link
2024-11-14	Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models	Kirill Vasilevski et.al.	2411.09837	null
2024-11-14	StreamAdapter: Efficient Test Time Adaptation from Contextual Streams	Dilxat Muhtar et.al.	2411.09289	null
2024-11-13	XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL	Yingqi Gao et.al.	2411.08599	link
2024-11-13	Towards Optimizing a Retrieval Augmented Generation using Large Language Model on Academic Data	Anum Afzal et.al.	2411.08438	null
2024-11-12	Decision Feedback In-Context Symbol Detection over Block-Fading Channels	Li Fan et.al.	2411.07600	null
2024-11-11	Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks	Madeline Brumley et.al.	2411.07213	null
2024-11-11	Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Kaijian Zou et.al.	2411.07130	null
2024-11-11	Universal Response and Emergence of Induction in LLMs	Niclas Luick et.al.	2411.07071	null
2024-11-10	In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages	Joseph Gatto et.al.	2411.06549	link
2024-11-10	One controller to rule them all	Riccardo Busetto et.al.	2411.06482	null
2024-11-09	A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization	Haoxin Liu et.al.	2411.06018	null
2024-11-08	Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass	Tong Chen et.al.	2411.05877	null
2024-11-14	SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark	Sithursan Sivasubramaniam et.al.	2411.05521	link
2024-11-08	WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning	Xiangyu Zhao et.al.	2411.05420	null
2024-11-07	Adversarial Robustness of In-Context Learning in Transformers for Linear Regression	Usman Anwar et.al.	2411.05189	null
2024-11-07	Vision Language Models are In-Context Value Learners	Yecheng Jason Ma et.al.	2411.04549	null
2024-11-06	Enhancing Security Control Production With Generative AI	Chen Ling et.al.	2411.04284	null
2024-11-06	Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences	Niklas Schmidinger et.al.	2411.04165	link
2024-11-06	Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval	Davide Buoso et.al.	2411.04006	null
2024-11-06	Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks	Ryan Campbell et.al.	2411.03945	link
2024-11-06	EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning	Kiran Purohit et.al.	2411.03877	link
2024-11-06	From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond	Harsha Nori et.al.	2411.03590	null
2024-11-05	Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature	Viviane Torres da Silva et.al.	2411.03484	null
2024-11-05	LLMs for Domain Generation Algorithm Detection	Reynier Leyva La O et.al.	2411.03307	null
2024-11-05	Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation	Francisco Giral et.al.	2411.02975	null
2024-11-05	Mixtures of In-Context Learners	Giwon Hong et.al.	2411.02830	null
2024-11-04	Fair In-Context Learning via Latent Concept Variables	Karuna Bhaila et.al.	2411.02671	link
2024-11-04	TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos	Leonardo Plini et.al.	2411.02570	link
2024-11-04	TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives	Maitreya Patel et.al.	2411.02545	null
2024-11-04	Pretrained transformer efficiently learns low-dimensional target functions in-context	Kazusato Oko et.al.	2411.02544	null
2024-11-04	Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages	Hoang Nguyen et.al.	2411.02398	null
2024-11-04	Defining and Evaluating Physical Safety for Large Language Models	Yung-Chen Tang et.al.	2411.02317	null
2024-11-04	Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning	Dake Bu et.al.	2411.02199	null
2024-11-04	Shortcut Learning in In-Context Learning: A Survey	Rui Song et.al.	2411.02018	null
2024-11-04	N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs	Ilya Zisman et.al.	2411.01958	null
2024-11-03	Robust Neural Processes for Noisy Data	Chen Shapira et.al.	2411.01670	null
2024-11-01	Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization	Zeyuan Ma et.al.	2411.00625	link
2024-11-01	STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing	Jiaru Zou et.al.	2411.00387	null
2024-10-31	In-Context Learned Equalization in Cell-Free Massive MIMO via State-Space Models	Zihang Song et.al.	2410.23882	null
2024-10-31	Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?	Zhanke Zhou et.al.	2410.23856	link
2024-10-31	What is Wrong with Perplexity for Long-context Language Modeling?	Lizhe Fang et.al.	2410.23771	link
2024-10-31	Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs	Shuyang Yu et.al.	2410.23605	null
2024-11-01	Large Language Models for Patient Comments Multi-Label Classification	Hajar Sakai et.al.	2410.23528	null
2024-10-30	EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning	Peide Huang et.al.	2410.23234	null
2024-10-30	Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning	Keqin Bao et.al.	2410.23136	link
2024-10-30	Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning	Dong Shu et.al.	2410.23099	link
2024-10-30	Toward Understanding In-context vs. In-weight Learning	Bryan Chan et.al.	2410.23042	null
2024-10-29	Improving In-Context Learning with Small Language Model Ensembles	M. Mehdi Mojarradi et.al.	2410.21868	link
2024-10-29	On the Role of Depth and Looping for In-Context Learning with Task Diversity	Khashayar Gatmiry et.al.	2410.21698	null
2024-10-28	CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity	Yutong Cheng et.al.	2410.21060	null
2024-10-28	Matryoshka: Learning to Drive Black-Box LLMs with LLMs	Changhao Li et.al.	2410.20749	null
2024-10-27	What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration	Libo Qin et.al.	2410.20482	null
2024-10-27	Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications	Xilun Zhang et.al.	2410.20357	null
2024-10-26	DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning	Xinyu Tang et.al.	2410.20215	link
2024-10-26	RARe: Retrieval Augmented Retrieval with In-Context Examples	Atula Tejaswi et.al.	2410.20088	link
2024-10-25	SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies	Weiqin Chen et.al.	2410.19982	null
2024-10-24	Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models	Yue Li et.al.	2410.19195	null
2024-10-24	Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code	Jipeng Zhang et.al.	2410.18957	null
2024-10-24	GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning	Rita Ramos et.al.	2410.18702	null
2024-10-23	TabDPT: Scaling Tabular Foundation Models	Junwei Ma et.al.	2410.18164	link
2024-10-23	Scaling Diffusion Language Models via Adaptation from Autoregressive Models	Shansan Gong et.al.	2410.17891	link
2024-10-23	Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks	Paul Smolensky et.al.	2410.17498	null
2024-10-22	In Context Learning and Reasoning for Symbolic Regression with Large Language Models	Samiha Sharlin et.al.	2410.17448	link
2024-10-22	Interpreting Affine Recurrence Learning in GPT-style Transformers	Samarth Bhargav et.al.	2410.17438	null
2024-10-22	Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods	Tsachi Blau et.al.	2410.17222	null
2024-10-22	Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models	Zhijie Tan et.al.	2410.16983	null
2024-10-21	Can Transformers In-Context Learn Behavior of a Linear Dynamical System?	Usman Akram et.al.	2410.16546	null
2024-10-21	A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration	Yingqian Cui et.al.	2410.16540	null
2024-10-21	Bayesian scaling laws for in-context learning	Aryaman Arora et.al.	2410.16531	link
2024-10-21	Analyzing Context Contributions in LLM-based Machine Translation	Emmanouil Zaranis et.al.	2410.16246	null
2024-10-21	CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning	Kumar Manas et.al.	2410.16207	null
2024-10-20	How Aligned are Generative Models to Humans in High-Stakes Decision-Making?	Sarah Tan et.al.	2410.15471	null
2024-10-20	BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression	Yuankai Li et.al.	2410.15277	link
2024-10-18	Provable In-context Learning for Mixture of Linear Regressions using Transformers	Yanhao Jin et.al.	2410.14183	null
2024-10-18	LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems	Nan Xu et.al.	2410.14166	null
2024-10-17	In-context learning and Occam's razor	Eric Elmoznino et.al.	2410.14086	link
2024-10-17	Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection	Chuhong Mai et.al.	2410.14049	null
2024-10-17	Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles	Xiao Pu et.al.	2410.14042	null
2024-10-17	Personalized Adaptation via In-Context Preference Learning	Allison Lau et.al.	2410.14001	null
2024-10-17	On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery	Renpu Liu et.al.	2410.13981	null
2024-10-18	BenTo: Benchmark Task Reduction with In-Context Transferability	Hongyu Zhao et.al.	2410.13804	link
2024-10-18	Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors	Georgios Chochlakis et.al.	2410.13776	null
2024-10-17	MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs	Andreas Opedal et.al.	2410.13502	null
2024-10-17	Repetition Neurons: How Do Language Models Produce Repetitions?	Tatsuya Hiraoka et.al.	2410.13497	null
2024-10-17	Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models	Yu Yuan et.al.	2410.13343	null
2024-10-17	Retrieval-Enhanced Named Entity Recognition	Enzo Shiraishi et.al.	2410.13118	null
2024-10-16	MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization	Ruiqi Li et.al.	2410.12957	null
2024-10-16	Context-Scaling versus Task-Scaling in In-Context Learning	Amirhesam Abedsoltan et.al.	2410.12783	null
2024-10-16	In-Context Learning Enables Robot Action Prediction in LLMs	Yida Yin et.al.	2410.12782	null
2024-10-16	A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning	Yuanning Cui et.al.	2410.12288	link
2024-10-16	Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection	Yong Xie et.al.	2410.12278	null
2024-10-16	Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree	Harbani Jaggi et.al.	2410.12217	null
2024-10-15	Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning	Fengyu Gao et.al.	2410.12085	null
2024-10-15	Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability	Tsz Ting Chung et.al.	2410.11786	null
2024-10-15	On the Training Convergence of Transformers for In-Context Classification	Wei Shen et.al.	2410.11778	null
2024-10-15	Zero-shot Model-based Reinforcement Learning using Large Language Models	Abdelhakim Benechehab et.al.	2410.11711	link
2024-10-15	State-space models can learn in-context by gradient descent	Neeraj Mohan Sushma et.al.	2410.11687	null
2024-10-15	BSM: Small but Powerful Biological Sequence Model for Genes and Proteins	Weixi Xiang et.al.	2410.11499	null
2024-10-16	How Transformers Implement Induction Heads: Approximation and Optimization Analysis	Mingze Wang et.al.	2410.11474	null
2024-10-15	Instructive Code Retriever: Learn from Large Language Model's Feedback for Code Intelligence Tasks	Jiawei Lu et.al.	2410.11300	link
2024-10-15	Cognitive Overload Attack:Prompt Injection for Long Context	Bibek Upadhayay et.al.	2410.11272	link
2024-10-15	Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent	Bo Chen et.al.	2410.11268	null
2024-10-15	In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions	Alireza Shamshiri et.al.	2410.11265	null
2024-10-15	SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers	Enze Xie et.al.	2410.10629	null
2024-10-14	Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?	Gabriel Roccabruna et.al.	2410.10476	link
2024-10-14	KBLaM: Knowledge Base augmented Language Model	Xi Wang et.al.	2410.10450	null
2024-10-14	Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement	Joseph Shtok et.al.	2410.10348	null
2024-10-14	Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies	Jiajie Yu et.al.	2410.10212	null
2024-10-14	Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning	Chengsong Huang et.al.	2410.10074	link
2024-10-13	Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models	Chengshuai Shi et.al.	2410.09701	null
2024-10-13	Can In-context Learning Really Generalize to Out-of-distribution Tasks?	Qixun Wang et.al.	2410.09695	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-12	Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study	Pengfei He et.al.	2410.09411	null
2024-10-11	On-Chip Learning via Transformer In-Context Learning	Jan Finkbeiner et.al.	2410.08711	null
2024-10-11	StraGo: Harnessing Strategic Guidance for Prompt Optimization	Yurong Wu et.al.	2410.08601	null
2024-10-10	SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation	Guanhua Zhang et.al.	2410.08356	null
2024-10-10	Metalic: Meta-Learning In-Context with Protein Language Models	Jacob Beck et.al.	2410.08355	null
2024-10-10	Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?	Khashayar Gatmiry et.al.	2410.08292	null
2024-10-10	Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning	David D. Baek et.al.	2410.08255	null
2024-10-10	Uncovering Overfitting in Large Language Model Editing	Mengqi Zhang et.al.	2410.07819	null
2024-10-10	Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data	Can Wang et.al.	2410.07737	link
2024-10-10	DemoShapley: Valuation of Demonstrations for In-Context Learning	Shan Xie et.al.	2410.07523	null
2024-10-09	Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning	Abhinav Bandari et.al.	2410.07461	link
2024-10-09	Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning	Zhengyu Hu et.al.	2410.07074	null
2024-10-09	Retrieval-Augmented Decision Transformer: External Memory for In-context RL	Thomas Schmied et.al.	2410.07071	link
2024-10-09	Generative Model for Less-Resourced Language with 1 billion parameters	Domen Vreš et.al.	2410.06898	null
2024-10-10	Mind Your Questions! Towards Backdoor Attacks on Text-to-Visualization Models	Shuaimin Li et.al.	2410.06782	null
2024-10-09	Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?	Fumiya Uchiyama et.al.	2410.06735	link
2024-10-09	Tree of Problems: Improving structured problem solving with compositionality	Armel Zebaze et.al.	2410.06634	link
2024-10-09	MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data	Mingu Kang et.al.	2410.06442	null
2024-10-08	Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?	Shenbin Qian et.al.	2410.06338	link
2024-10-08	The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning	Xiyan Fu et.al.	2410.06272	link
2024-10-08	ConML: A Universal Meta-Learning Framework with Task-Level Contrastive Learning	Shiguang Wu et.al.	2410.05975	null
2024-10-07	Differential Transformer	Tianzhu Ye et.al.	2410.05258	link
2024-10-07	Density estimation with LLMs: a geometric investigation of in-context learning trajectories	Toni J. B. Liu et.al.	2410.05218	null
2024-10-08	A Simple Image Segmentation Framework via In-Context Examples	Yang Liu et.al.	2410.04842	link
2024-10-07	Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning	Qingyu Yin et.al.	2410.04691	link
2024-10-06	GAMformer: In-Context Learning for Generalized Additive Models	Andreas Mueller et.al.	2410.04560	null
2024-10-06	Revisiting In-context Learning Inference Circuit in Large Language Models	Hakaze Cho et.al.	2410.04468	null
2024-10-06	Inference Scaling for Long-Context Retrieval Augmented Generation	Zhenrui Yue et.al.	2410.04343	null
2024-10-05	Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning	Gang Liu et.al.	2410.04223	link
2024-10-04	PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models	Lemei Zhang et.al.	2410.03905	link
2024-10-08	Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs	Louis Serrano et.al.	2410.03437	null
2024-10-04	Enhanced Transformer architecture for in-context learning of dynamical systems	Matteo Rufolo et.al.	2410.03291	null
2024-10-04	Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models	Yuxiang Zhang et.al.	2410.03212	null
2024-10-04	Generating bilingual example sentences with large language models as lexicography assistants	Raphael Merx et.al.	2410.03182	link
2024-10-04	In-context Learning in Presence of Spurious Correlations	Hrayr Harutyunyan et.al.	2410.03140	link
2024-10-04	On Unsupervised Prompt Learning for Classification with Black-box Language Models	Zhen-Yu Zhang et.al.	2410.03124	null
2024-10-04	RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning	Zihao Zhao et.al.	2410.03122	link
2024-10-03	Demonstration Attack against In-Context Learning for Code Intelligence	Yifei Ge et.al.	2410.02841	null
2024-10-03	ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI	Ahmad Elawady et.al.	2410.02751	link
2024-10-04	IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models	Tuo An et.al.	2410.02429	null
2024-10-04	Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation	Muzhi Zhu et.al.	2410.02369	link
2024-10-03	Simplicity bias and optimization threshold in two-layer ReLU networks	Etienne Boursier et.al.	2410.02348	null
2024-10-03	Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference	Wei Cheng et.al.	2410.02210	null
2024-10-03	GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning	Jiale Fu et.al.	2410.02203	null
2024-10-03	Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis	Hongkang Li et.al.	2410.02167	null
2024-10-02	Intent Detection in the Age of LLMs	Gaurav Arora et.al.	2410.01627	null
2024-10-02	ENTP: Encoder-only Next Token Prediction	Ethan Ewer et.al.	2410.01600	null
2024-10-02	Bayes' Power for Explaining In-Context Learning Generalizations	Samuel Müller et.al.	2410.01565	link
2024-10-02	In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks	Dingzirui Wang et.al.	2410.01548	link
2024-10-02	Disentangling Latent Shifts of In-Context Learning Through Self-Training	Josip Jukić et.al.	2410.01508	null
2024-10-02	SecCoder: Towards Generalizable and Robust Secure Code Generation	Boyu Zhang et.al.	2410.01488	null
2024-10-02	Agent-Driven Large Language Models for Mandarin Lyric Generation	Hong-Hsiang Liu et.al.	2410.01450	null
2024-10-02	Unveiling Language Skills under Circuits	Hang Chen et.al.	2410.01334	link
2024-10-03	Mitigating Copy Bias in In-Context Learning through Neuron Pruning	Ameen Ali et.al.	2410.01288	null
2024-10-02	Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models	Can Demircan et.al.	2410.01280	null
2024-09-30	Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments	Mohamed Elnoor et.al.	2409.20445	null
2024-09-30	PersonalLLM: Tailoring LLMs to Individual Preferences	Thomas P. Zollo et.al.	2409.20296	link
2024-09-30	TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks	Areeg Fahad Rasheed et.al.	2409.20189	link
2024-09-30	Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models	Luohe Shi et.al.	2409.20181	null
2024-09-30	Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis	Luka Andrenšek et.al.	2409.20054	null
2024-09-29	Efficient Long-Form Speech Recognition for General Speech In-Context Learning	Hao Yen et.al.	2409.19757	null
2024-10-02	T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition	Chen Yeh et.al.	2409.19734	link
2024-09-26	AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models	Xin Hong et.al.	2409.18339	null
2024-09-26	Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion	Hengrui Gu et.al.	2409.17928	link
2024-09-25	Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?	Bowen Zhao et.al.	2409.17080	link
2024-09-26	Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition	Pritika Ramu et.al.	2409.17073	null
2024-09-25	A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates	Paulina Garcia Corral et.al.	2409.16807	null
2024-09-24	Do the Right Thing, Just Debias! Multi-Category Bias Mitigation Using LLMs	Amartya Roy et.al.	2409.16371	null
2024-09-26	In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations	Moucheng Xu et.al.	2409.15867	link
2024-09-24	Small Language Models: Survey, Measurements, and Insights	Zhenyan Lu et.al.	2409.15790	link
2024-09-24	Making Text Embedders Few-Shot Learners	Chaofan Li et.al.	2409.15700	link
2024-09-23	Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction	Yuanchao Li et.al.	2409.15551	link
2024-09-23	In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models	Pengrui Han et.al.	2409.15454	link
2024-09-24	PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models	Zhiyuan Wang et.al.	2409.15188	link
2024-09-23	A Controlled Study on Long Context Extension and Generalization in LLMs	Yi Lu et.al.	2409.12181	link
2024-09-18	M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper	Jiaming Zhou et.al.	2409.11889	null
2024-09-18	Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation	Kejia Chen et.al.	2409.11863	null
2024-09-18	RoboMorph: In-Context Meta-Learning for Robot Dynamics Modeling	Manuel Bianchi Bazzi et.al.	2409.11815	null
2024-09-18	RUIE: Retrieval-based Unified Information Extraction using Large Language Model	Xincheng Liao et.al.	2409.11673	null
2024-09-17	HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection	Theo King et.al.	2409.11579	link
2024-09-17	THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Mengfei Liang et.al.	2409.11353	link
2024-09-17	Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse	Maojia Song et.al.	2409.11242	link
2024-09-17	Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning	Yukang Lin et.al.	2409.11147	link
2024-09-17	Semformer: Transformer Language Models with Semantic Planning	Yongjing Yin et.al.	2409.11143	null
2024-09-18	Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming	Chalamalasetti Kranti et.al.	2409.11041	null
2024-09-16	LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning	Jicong Ao et.al.	2409.10444	link
2024-09-16	Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages	Ming-Hao Hsu et.al.	2409.10429	null
2024-09-16	From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs	Navya Jain et.al.	2409.10245	null
2024-09-16	Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization	Xiaoxue Gao et.al.	2409.10157	null
2024-09-16	SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL	Ke Shen et.al.	2409.10007	link
2024-09-15	AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs	Madhusudan Ghosh et.al.	2409.09704	link
2024-09-14	Language Models "Grok" to Copy	Ang Lv et.al.	2409.09281	null
2024-09-13	Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach	Siqi Li et.al.	2409.09009	link
2024-09-13	LLM-based Weak Supervision Framework for Query Intent Classification in Video Search	Farnoosh Javadi et.al.	2409.08931	null
2024-09-13	LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation	Shaojun Li et.al.	2409.08597	null
2024-09-12	Fine-tuning Large Language Models for Entity Matching	Aaron Steiner et.al.	2409.08185	link
2024-09-11	MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications	Praveen K Kanithi et.al.	2409.07314	null
2024-09-10	Quantifying and Enabling the Interpretability of CLIP-like Models	Avinash Madasu et.al.	2409.06579	null
2024-09-10	Inference is All You Need: Self Example Retriever for Cross-domain Dialogue State Tracking with ChatGPT	Jihyun Lee et.al.	2409.06243	null
2024-09-10	Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks	Georgios Chochlakis et.al.	2409.06173	link
2024-09-09	Seek and Solve Reasoning for Table Question Answering	Ruya Jiang et.al.	2409.05286	null
2024-09-10	Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion	Zhengyang Chen et.al.	2409.05004	null
2024-09-07	MILE: A Mutation Testing Framework of In-Context Learning Systems	Zeming Wei et.al.	2409.04831	link
2024-09-06	Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs	Aliakbar Nafar et.al.	2409.04318	link
2024-09-06	Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers	Gorka Abad et.al.	2409.04142	null
2024-09-05	CACER: Clinical Concept Annotations for Cancer Events and Relations	Yujuan Fu et.al.	2409.03905	link
2024-09-07	The representation landscape of few-shot learning and fine-tuning in large language models	Diego Doimo et.al.	2409.03662	link
2024-09-05	FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications	Hao-Han Guo et.al.	2409.03283	null
2024-09-03	How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?	Saeid Asgari Taghanaki et.al.	2409.02253	link
2024-09-03	Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs	Zhuo Li et.al.	2409.01552	null
2024-09-03	Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition	Yaozong Gan et.al.	2409.01534	null
2024-09-02	The Compressor-Retriever Architecture for Language Model OS	Yuan Yang et.al.	2409.01495	link
2024-09-02	PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science	Menglin Liu et.al.	2409.01466	null
2024-09-02	Membership Inference Attacks Against In-Context Learning	Rui Wen et.al.	2409.01380	null
2024-08-30	AWRaCLe: All-Weather Image Restoration using Visual In-Context Learning	Sudarshan Rajagopalan et.al.	2409.00263	null
2024-08-28	Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning	Momin Abbas et.al.	2409.00124	null
2024-08-29	DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Yongjie Fu et.al.	2408.16647	null
2024-08-29	Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning	Rochelle Choenni et.al.	2408.16482	null
2024-08-28	Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games	Nicholas R. Waytowich et.al.	2408.15950	null
2024-09-04	Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models	Hédi Zeghidi et.al.	2408.15796	link
2024-08-28	Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings	Lingyu Gao et.al.	2408.15650	null
2024-08-26	MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues	Kuluhan Binici et.al.	2408.14418	null
2024-08-26	Probing Causality Manipulation of Large Language Models	Chenyang Zhang et.al.	2408.14380	link
2024-09-03	Foundation Models for Music: A Survey	Yinghao Ma et.al.	2408.14340	link
2024-08-26	Epidemic Information Extraction for Event-Based Surveillance using Large Language Models	Sergio Consoli et.al.	2408.14277	null
2024-08-26	Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach	Vittoriano Muttillo et.al.	2408.14259	null
2024-08-26	Focused Large Language Models are Stable Many-Shot Learners	Peiwen Yuan et.al.	2408.13987	null
2024-08-24	Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models	Sakhinana Sagar Srinivas et.al.	2408.13621	null
2024-08-23	In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting	Haowei Du et.al.	2408.13028	null
2024-08-23	Multimodal Contrastive In-Context Learning	Yosuke Miyanishi et.al.	2408.12959	null
2024-08-23	Causal-Guided Active Learning for Debiasing Large Language Models	Zhouhao Sun et.al.	2408.12942	link
2024-08-23	Investigating LLM Applications in E-Commerce	Chester Palen-Michel et.al.	2408.12779	null
2024-08-22	Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models	Meiyun Wang et.al.	2408.12326	link
2024-08-22	Transformers are Minimax Optimal Nonparametric In-Context Learners	Juno Kim et.al.	2408.12186	null
2024-08-26	uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization	Aishik Nagar et.al.	2408.12095	null
2024-08-22	Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs	Ronit Singhal et.al.	2408.12060	link
2024-08-21	Memorization In In-Context Learning	Shahriar Golchin et.al.	2408.11546	null
2024-08-20	Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks	Nathaniel Pinckney et.al.	2408.11053	link
2024-08-20	Benchmarking Large Language Models for Math Reasoning Tasks	Kathrin Seßler et.al.	2408.10839	link
2024-08-19	Self-Refined Generative Foundation Models for Wireless Traffic Prediction	Chengming Hu et.al.	2408.10390	null
2024-08-19	In-Context Learning with Representations: Contextual Generalization of Trained Transformers	Tong Yang et.al.	2408.10147	null
2024-08-19	Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning	Jingyu Hu et.al.	2408.09757	null
2024-08-19	Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts	Jiaqing Liu et.al.	2408.09688	null
2024-08-18	Out-of-distribution generalization via composition: a lens through induction heads in Transformers	Jiajun Song et.al.	2408.09503	link
2024-08-16	Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning	Jinwei Hu et.al.	2408.08959	null
2024-08-16	xGen-MM (BLIP-3): A Family of Open Large Multimodal Models	Le Xue et.al.	2408.08872	null
2024-08-20	Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions	Chenming Tang et.al.	2408.08780	null
2024-08-16	LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression	Yuqi Ye et.al.	2408.08682	null
2024-08-15	ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models	Faris Hijazi et.al.	2408.07983	link
2024-08-16	MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL	Wenxuan Xie et.al.	2408.07930	link
2024-08-14	Cropper: Vision-Language Model for Image Cropping through In-Context Learning	Seung Hyun Lee et.al.	2408.07790	null
2024-08-14	Large Language Models Know What Makes Exemplary Contexts	Quanyu Long et.al.	2408.07505	null
2024-08-13	SceneGPT: A Language Model for 3D Scene Understanding	Shivam Chandhok et.al.	2408.06926	null
2024-08-13	HLSPilot: LLM-based High-Level Synthesis	Chenwei Xiong et.al.	2408.06810	link
2024-08-12	Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning	Chuanneng Sun et.al.	2408.06520	null
2024-08-12	Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models	Yen-Che Hsiao et.al.	2408.06458	link
2024-08-11	LLM-Based Robust Product Classification in Commerce and Compliance	Sina Gholamian et.al.	2408.05874	null
2024-08-10	In-Context Exploiter for Extensive-Form Games	Shuxin Li et.al.	2408.05575	null
2024-08-10	Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction	Jung Hoon Lim et.al.	2408.05555	null
2024-08-10	LaiDA: Linguistics-aware In-context Learning with Data Augmentation for Metaphor Components Identification	Hongde Liu et.al.	2408.05404	link
2024-08-09	SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation	Chenming Tang et.al.	2408.04872	link
2024-08-06	LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations	Lei Shi et.al.	2408.04665	null
2024-08-08	Learning Fine-Grained Grounded Citations for Attributed Large Language Models	Lei Huang et.al.	2408.04568	link
2024-08-08	How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression	Xingwu Chen et.al.	2408.04532	null
2024-08-08	Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning	Seong-Il Park et.al.	2408.04414	null
2024-08-07	Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks	Zaijing Li et.al.	2408.03615	link
2024-08-06	Can LLMs Serve As Time Series Anomaly Detectors?	Manqing Dong et.al.	2408.03475	null
2024-08-06	Pre-training and in-context learning IS Bayesian inference a la De Finetti	Naimeng Ye et.al.	2408.03307	null
2024-08-06	Enhancing Complex Causality Extraction via Improved Subtask Interaction and Knowledge Fusion	Jinglong Gao et.al.	2408.03079	null
2024-08-06	Hide and Seek: Fingerprinting Large Language Models with Evolutionary Learning	Dmitri Iourovitski et.al.	2408.02871	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-05	OneLove beyond the field -- A few-shot pipeline for topic and sentiment analysis during the FIFA World Cup in Qatar	Christoph Rauchegger et.al.	2408.02520	null
2024-08-05	A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models	Vanni Zavarella et.al.	2408.02377	null
2024-08-05	Spin glass model of in-context learning	Yuhao Li et.al.	2408.02288	null
2024-08-04	Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process	Peng Wang et.al.	2408.02103	null
2024-08-04	Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages	Tomáš Filip et.al.	2408.02044	null
2024-08-03	Can LLMs predict the convergence of Stochastic Gradient Descent?	Oussama Zekri et.al.	2408.01736	null
2024-08-02	OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models	Zeyang Ma et.al.	2408.01585	link
2024-08-02	NOLO: Navigate Only Look Once	Bohan Zhou et.al.	2408.01384	null
2024-08-02	Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs	Phillip Schneider et.al.	2408.01088	link
2024-08-02	ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models	Hojae Han et.al.	2408.00994	link
2024-08-01	Intermittent Semi-working Mask: A New Masking Paradigm for LLMs	Mingcong Lu et.al.	2408.00539	null
2024-08-01	Jailbreaking Text-to-Image Models with LLM-Based Agents	Yingkai Dong et.al.	2408.00523	null
2024-08-01	In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation	Armel Zebaze et.al.	2408.00397	link
2024-08-01	Adversarial Text Rewriting for Text-aware Recommender Systems	Sejoon Oh et.al.	2408.00312	link
2024-08-01	QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression	Wenshan Wang et.al.	2408.00274	link
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-07-31	Distributed In-Context Learning under Non-IID Among Clients	Siqi Liang et.al.	2408.00144	null
2024-07-31	Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM	Can Wang et.al.	2407.21333	null
2024-07-27	LawLLM: Law Large Language Model for the US Legal System	Dong Shu et.al.	2407.21065	null
2024-07-30	SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition	Hao Tan et.al.	2407.20920	null
2024-07-30	SceneTeller: Language-to-3D Scene Generation	Başak Melis Öcal et.al.	2407.20727	null
2024-07-30	CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge	Tianshi Zheng et.al.	2407.20564	null
2024-07-29	AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs	Muhammad Arbab Arshad et.al.	2407.19617	null
2024-07-27	Polynomial Regression as a Task for Understanding In-context Learning Through Finetuning and Alignment	Max Wilcoxson et.al.	2407.19346	link
2024-07-27	Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications	Till Speicher et.al.	2407.19262	null
2024-07-26	Many-Shot In-Context Learning for Molecular Inverse Design	Saeed Moayedpour et.al.	2407.19089	null
2024-07-24	Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning	Hongwei Jin et.al.	2407.17545	link
2024-07-24	Grammar-based Game Description Generation using Large Language Models	Tsunehiko Tanaka et.al.	2407.17404	null
2024-07-24	Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism	Anhao Zhao et.al.	2407.17011	link
2024-07-24	SelfPiCo: Self-Guided Partial Code Execution with LLMs	Zhipeng Xue et.al.	2407.16974	null
2024-07-23	Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack	Xiaoyue Xu et.al.	2407.16695	link
2024-07-23	Can Large Language Models Automatically Jailbreak GPT-4V?	Yuanwei Wu et.al.	2407.16686	null
2024-07-23	Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data	Julian Schelb et.al.	2407.16516	null
2024-07-23	Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction	Rithik Sachdev et.al.	2407.16370	link
2024-07-23	PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing	Blazej Manczak et.al.	2407.16318	link
2024-07-22	Multilingual Fine-Grained News Headline Hallucination Detection	Jiaming Shen et.al.	2407.15975	null
2024-07-22	Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability	Zhuoyan Xu et.al.	2407.15720	link
2024-07-22	In-Context Learning Improves Compositional Understanding of Vision-Language Models	Matteo Nulli et.al.	2407.15487	link
2024-07-22	ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning	Senbin Zhu et.al.	2407.15341	null
2024-07-21	MIBench: Evaluating Multimodal Large Language Models over Multiple Images	Haowei Liu et.al.	2407.15272	null
2024-07-19	Prompted Aspect Key Point Analysis for Quantitative Review Summarization	An Quang Tang et.al.	2407.14049	link
2024-07-19	ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?	Siddhant Waghjale et.al.	2407.14044	link
2024-07-18	FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking	Zhuoer Wang et.al.	2407.13945	null
2024-07-18	Large Language Models as Reliable Knowledge Bases?	Danna Zheng et.al.	2407.13578	null
2024-07-18	Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks	Samy Ateia et.al.	2407.13511	link
2024-07-18	Learning-From-Mistakes Prompting for Indigenous Language Translation	You-Cheng Liao et.al.	2407.13343	null
2024-07-17	R+X: Retrieval and Execution from Everyday Human Videos	Georgios Papagiannis et.al.	2407.12957	null
2024-07-16	Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection	Ye Jiang et.al.	2407.12879	null
2024-07-17	Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning	Mustafa Dogan et.al.	2407.12498	null
2024-07-16	Private prediction for large-scale synthetic text generation	Kareem Amin et.al.	2407.12108	null
2024-07-16	AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization	Anum Afzal et.al.	2407.11591	link
2024-07-16	Reasoning with Large Language Models, a Survey	Aske Plaat et.al.	2407.11511	null
2024-07-16	Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach	Sojung Lucia Kim et.al.	2407.11368	null
2024-07-16	Large Vision-Language Models as Emotion Recognizers in Context Awareness	Yuxuan Lei et.al.	2407.11300	null
2024-07-15	Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection	Chenwei Wu et.al.	2407.11188	null
2024-07-15	GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM	Keshav Bimbraw et.al.	2407.10870	null
2024-07-16	Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning	Yulong Wang et.al.	2407.10718	link
2024-07-15	Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems	Yunxiao Shi et.al.	2407.10670	link
2024-07-14	Visual Prompt Selection for In-Context Learning Segmentation	Wei Suo et.al.	2407.10233	link
2024-07-13	Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond	Yingcong Li et.al.	2407.10005	null
2024-07-12	HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context	Federico Arangath Joseph et.al.	2407.09375	null
2024-07-12	SpreadsheetLLM: Encoding Spreadsheets for Large Language Models	Yuzhang Tian et.al.	2407.09025	null
2024-07-12	Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models	Ye Liu et.al.	2407.08967	link
2024-07-12	Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection	Ye Liu et.al.	2407.08952	null
2024-07-11	DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding	Jincen Jiang et.al.	2407.08801	null
2024-07-12	RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL	Zhenhe Wu et.al.	2407.08273	null
2024-07-10	Video In-context Learning	Wentao Zhang et.al.	2407.07356	null
2024-07-09	Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning	J. Crosbie et.al.	2407.07011	null
2024-07-09	ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization	Wai Man Si et.al.	2407.06955	null
2024-07-08	Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs	Sanjeet Singh et.al.	2407.05887	link
2024-07-08	Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition	Yaozong Gan et.al.	2407.05814	null
2024-07-08	Empirical Study of Symmetrical Reasoning in Conversational Chatbots	Daniela N. Rim et.al.	2407.05734	null
2024-07-08	FairPFN: Transformers Can do Counterfactual Fairness	Jake Robertson et.al.	2407.05732	null
2024-07-08	Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation	Jian Qian et.al.	2407.05693	link
2024-07-08	Retrieved In-Context Principles from Previous Mistakes	Hao Sun et.al.	2407.05682	null
2024-07-08	GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks	Xuan Wang et.al.	2407.05566	null
2024-07-07	Just read twice: closing the recall gap for recurrent language models	Simran Arora et.al.	2407.05483	link
2024-07-04	FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs	Tongyi SpeechTeam et.al.	2407.04051	link
2024-07-03	Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning	Zhili Shen et.al.	2407.03227	null
2024-07-03	Exploring the Capabilities of LLMs for Code Change Related Tasks	Lishui Fan et.al.	2407.02824	link
2024-07-02	Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms	Viet Cuong Nguyen et.al.	2407.02662	null
2024-07-02	RVISA: Reasoning and Verification for Implicit Sentiment Analysis	Wenna Lai et.al.	2407.02340	null
2024-07-02	Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts	Chunlan Ma et.al.	2407.02320	null
2024-07-02	Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks	Adrian Rebmann et.al.	2407.02310	link
2024-07-02	Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions	Xiang Li et.al.	2407.02028	link
2024-07-02	SADL: An Effective In-Context Learning Method for Compositional Visual QA	Long Hoang Dang et.al.	2407.01983	null
2024-07-03	MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation	Yongan Zhang et.al.	2407.01910	link
2024-07-01	Dynamic Few-Shot Learning for Knowledge Graph Question Answering	Jacopo D'Abramo et.al.	2407.01409	null
2024-07-01	TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval	Wenbo Xu et.al.	2407.01183	null
2024-07-01	Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?	Nicy Scaria et.al.	2407.00996	link
2024-07-01	Universal Approximation Theory: The basic theory for large language models	Wei Wang et.al.	2407.00958	null
2024-06-28	Mining Reasons For And Against Vaccination From Unstructured Data Using Nichesourcing and AI Data Augmentation	Damián Ariel Furman et.al.	2406.19951	null
2024-06-27	Aligning Teacher with Student Preferences for Tailored Training Data Generation	Yantao Liu et.al.	2406.19227	null
2024-06-27	STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis	Wenbin Li et.al.	2406.19065	link
2024-06-27	Efficient course recommendations with T5-based ranking and summarization	Thijmen Bijl et.al.	2406.19018	link
2024-06-27	Can we teach language models to gloss endangered languages?	Michael Ginn et.al.	2406.18895	null
2024-06-27	SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models	Vipul Rathore et.al.	2406.18880	link
2024-06-26	ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models	Yuxuan Yin et.al.	2406.18770	null
2024-06-26	PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation	Christoph Leiter et.al.	2406.18528	link
2024-06-26	Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming	Zhenghao Zhou et.al.	2406.18501	null
2024-06-26	BADGE: BADminton report Generation and Evaluation with LLM	Shang-Hsuan Chiang et.al.	2406.18116	link
2024-06-26	Octo-planner: On-device Language Model for Planner-Action Agents	Wei Chen et.al.	2406.18082	null
2024-06-26	Automated Clinical Data Extraction with Knowledge Conditioned LLMs	Diya Li et.al.	2406.18027	null
2024-06-25	LABOR-LLM: Language-Based Occupational Representations with Large Language Models	Tianyu Du et.al.	2406.17972	null
2024-06-25	BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning	Ercong Nie et.al.	2406.17764	null
2024-06-25	Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels	Nicholas Pangakis et.al.	2406.17633	null
2024-06-25	Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft	Chalamalasetti Kranti et.al.	2406.17553	null
2024-06-25	Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification	Huiyao Chen et.al.	2406.17534	link
2024-06-25	Enhancing Tool Retrieval with Iterative Feedback from Large Language Models	Qiancheng Xu et.al.	2406.17465	link
2024-06-25	A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs	Vaibhav Singh et.al.	2406.17377	null
2024-06-25	Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement	Yunlong Feng et.al.	2406.17233	link
2024-06-24	Finding Transformer Circuits with Edge Pruning	Adithya Bhaskar et.al.	2406.16778	link
2024-06-24	Token-based Decision Criteria Are Suboptimal in In-context Learning	Hakaze Cho et.al.	2406.16535	null
2024-06-24	DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task	Wenhan Liu et.al.	2406.16332	link
2024-06-23	Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning	Bowen Zheng et.al.	2406.16007	null
2024-06-22	Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts	Louis Give et.al.	2406.15871	null
2024-06-21	Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem	Sara Court et.al.	2406.15625	null
2024-06-21	Automated radiotherapy treatment planning guided by GPT-4Vision	Sheng Liu et.al.	2406.15609	null
2024-06-21	Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning	Brandon Huang et.al.	2406.15334	link
2024-06-21	ICLEval: Evaluating In-Context Learning Ability of Large Language Models	Wentong Chen et.al.	2406.14955	link
2024-06-20	Learning to Retrieve Iteratively for In-Context Learning	Yunmo Chen et.al.	2406.14739	null
2024-06-20	ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights	Gabriel Sarch et.al.	2406.14596	null
2024-06-20	Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	Johannes Treutlein et.al.	2406.14546	link
2024-06-20	Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary	Xingmeng Zhao et.al.	2406.14500	null
2024-06-20	Data-Centric AI in the Age of Large Language Models	Xinyi Xu et.al.	2406.14473	null
2024-06-20	SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots	Weixing Wang et.al.	2406.14208	null
2024-06-20	Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning	Xiaolei Wang et.al.	2406.14022	link
2024-06-23	Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations	Arie Cattan et.al.	2406.13632	null
2024-06-19	InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising	Zhepei Wei et.al.	2406.13629	link
2024-06-19	In-Context In-Context Learning with Transformer Neural Processes	Matthew Ashman et.al.	2406.13493	null
2024-06-19	ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models	Hwiyeol Jo et.al.	2406.13342	null
2024-06-19	In-Context Learning on a Budget: A Case Study in Named Entity Recognition	Uri Berger et.al.	2406.13274	null
2024-06-18	Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?	Zhe Yang et.al.	2406.12809	link
2024-06-18	In-Context Learning of Energy Functions	Rylan Schaeffer et.al.	2406.12785	null
2024-06-18	Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs	Ahmad Mohsin et.al.	2406.12513	null
2024-06-18	Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems	Nasim Borazjanizadeh et.al.	2406.12172	null
2024-06-17	Soft Prompting for Unlearning in Large Language Models	Karuna Bhaila et.al.	2406.12038	link
2024-06-17	Multi-Layer Ranking with Large Language Models for News Source Recommendation	Wenjia Zhang et.al.	2406.11745	null
2024-06-17	Meta Reasoning for Large Language Models	Peizhong Gao et.al.	2406.11698	null
2024-06-17	Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!	Mingyang Song et.al.	2406.11629	link
2024-06-17	How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment	Heyan Huang et.al.	2406.11474	null
2024-06-17	A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences	Leonardo Bertolazzi et.al.	2406.11341	link
2024-06-17	Fine-grained Controllable Text Generation through In-context Learning with Feedback	Sarubi Thillainathan et.al.	2406.11338	null
2024-06-17	Hallucination Mitigation Prompts Long-term Video Understanding	Yiwei Sun et.al.	2406.11333	null
2024-06-17	FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation	Bangzheng Li et.al.	2406.11243	null
2024-06-17	Probing the Decision Boundaries of In-context Learning in Large Language Models	Siyan Zhao et.al.	2406.11233	link
2024-06-17	In-Context Editing: Learning Knowledge from Self-Induced Distributions	Siyuan Qi et.al.	2406.11194	link
2024-06-14	UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner	Dongchao Yang et.al.	2406.10056	link
2024-06-14	GeoSEE: Regional Socio-Economic Estimation With a Large Language Model	Sungwon Han et.al.	2406.09799	null
2024-06-13	Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI	Mohammed-Khalil Ghali et.al.	2406.09621	null
2024-06-13	Automated Molecular Concept Generation and Labeling with Large Language Models	Shichang Zhang et.al.	2406.09612	link
2024-06-13	Chain-of-Though (CoT) prompting strategies for medical error detection and correction	Zhaolong Wu et.al.	2406.09103	null
2024-06-13	XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning	Alexander Nikulin et.al.	2406.08973	null
2024-06-13	mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus	Matthieu Futeral et.al.	2406.08707	null
2024-06-12	State Soup: In-Context Skill Learning, Retrieval and Mixing	Maciej Pióro et.al.	2406.08423	null
2024-06-13	OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text	Qingyun Li et.al.	2406.08418	link
2024-06-12	Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation	Javad Pourmostafa Roshan Sharami et.al.	2406.07970	link
2024-06-12	DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning	Yuxi Feng et.al.	2406.07913	null
2024-06-12	An Empirical Study of Mamba-based Language Models	Roger Waleffe et.al.	2406.07887	link
2024-06-12	Are Large Language Models Good Statisticians?	Yizhang Zhu et.al.	2406.07815	link
2024-06-11	Estimating the Hallucination Rate of Generative AI	Andrew Jesson et.al.	2406.07457	null
2024-06-11	On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations	Shiao Meng et.al.	2406.07444	link
2024-06-11	Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning	Menglong Cui et.al.	2406.07081	null
2024-06-11	DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs	Haishuo Fang et.al.	2406.07080	link
2024-06-11	CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only	Junhee Cho et.al.	2406.06947	link
2024-06-11	Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems	Mohammed Elhenawy et.al.	2406.06865	null
2024-06-10	Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing	Enshuo Hsu et.al.	2406.06723	null
2024-06-10	In-Context Learning and Fine-Tuning GPT for Argument Mining	Jérémie Cabessa et.al.	2406.06699	link
2024-06-10	Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue	Simone Alghisi et.al.	2406.06399	link
2024-06-09	LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning	Utsav Singh et.al.	2406.05881	null
2024-06-09	TR2MTL: LLM based framework for Metric Temporal Logic Formalization of Traffic Rules	Kumar Manas et.al.	2406.05709	null
2024-06-08	ThatiAR: Subjectivity Detection in Arabic News Sentences	Reem Suwaileh et.al.	2406.05559	null
2024-06-08	RAG-Enhanced Commit Message Generation	Linghao Zhang et.al.	2406.05514	null
2024-06-07	TabPFGen -- Tabular Data Generation with TabPFN	Junwei Ma et.al.	2406.05216	null
2024-06-07	Retrieval & Fine-Tuning for In-Context Tabular Models	Valentin Thomas et.al.	2406.05207	null
2024-06-07	Scenarios and Approaches for Situated Natural Language Explanations	Pengshuo Qiu et.al.	2406.05035	null
2024-06-07	BERTs are Generative In-Context Learners	David Samuel et.al.	2406.04823	link
2024-06-07	Large Language Model-guided Document Selection	Xiang Kong et.al.	2406.04638	null
2024-06-06	**llmNER: (Zero	Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models**	Fabián Villena et.al.	2406.04528
2024-06-06	Aligning Large Language Models with Self-generated Preference Data	Dongyoung Kim et.al.	2406.04412	null
2024-06-06	VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation	Prashanth Vijayaraghavan et.al.	2406.04379	null
2024-06-08	What Do Language Models Learn in Context? The Structured Task Hypothesis	Jiaoda Li et.al.	2406.04216	link
2024-06-06	Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following	Anshul Gupta et.al.	2406.03907	null
2024-06-06	Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective	Xinhao Yao et.al.	2406.03768	link
2024-06-06	FastGAS: Fast Graph-based Annotation Selection for In-Context Learning	Zihan Chen et.al.	2406.03730	null
2024-06-05	Log Parsing with Self-Generated In-Context Learning and Self-Correction	Yifan Wu et.al.	2406.03376	null
2024-06-06	StatBot.Swiss: Bilingual Open Data Exploration in Natural Language	Farhad Nooralahzadeh et.al.	2406.03170	null
2024-06-05	Improving In-Context Learning with Prediction Feedback for Sentiment Analysis	Hongling Xu et.al.	2406.02911	link
2024-06-06	Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers	Brian K Chen et.al.	2406.02847	null
2024-06-04	E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory	Zhou Yang et.al.	2406.02642	null
2024-06-04	Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks	Tianyu He et.al.	2406.02550	link
2024-06-04	Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Philip Anastassiou et.al.	2406.02430	link
2024-06-04	Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion	Ruiqi Li et.al.	2406.02429	null
2024-06-04	Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis	Kun Zhou et.al.	2406.02009	null
2024-06-04	Eliciting the Priors of Large Language Models using Iterated In-Context Learning	Jian-Qiao Zhu et.al.	2406.01860	null
2024-06-03	In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs	Grzegorz Kaszuba et.al.	2406.01808	null
2024-06-03	Universal In-Context Approximation By Prompting Fully Recurrent Models	Aleksandar Petrov et.al.	2406.01424	link
2024-06-03	Demonstration Augmentation for Zero-shot In-context Learning	Yi Su et.al.	2406.01224	link
2024-06-03	Guiding ChatGPT to Generate Salient Domain Summaries	Jun Gao et.al.	2406.01070	null
2024-06-03	Selectively Answering Visual Questions	Julian Martin Eisenschlos et.al.	2406.00980	null
2024-05-31	In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought	Sili Huang et.al.	2405.20692	link
2024-05-31	UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation	Hanzhang Zhou et.al.	2405.20612	link
2024-05-31	The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes	Alissa A. Valentine et.al.	2405.20582	null
2024-05-30	Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads	Avelina Asada Hadji-Kyriacou et.al.	2405.20053	link
2024-05-30	From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems	Jianliang He et.al.	2405.19883	null
2024-05-30	Is In-Context Learning Sufficient for Instruction Following in LLMs?	Hao Zhao et.al.	2405.19874	link
2024-05-30	Why Larger Language Models Do In-context Learning Differently?	Zhenmei Shi et.al.	2405.19592	null
2024-05-29	Does learning the right latent variables necessarily improve in-context learning?	Sarthak Mittal et.al.	2405.19162	link
2024-05-28	A Theoretical Understanding of Self-Correction through In-context Alignment	Yifei Wang et.al.	2405.18634	null
2024-05-28	Multi-modal Generation via Cross-Modal In-Context Learning	Amandeep Kumar et.al.	2405.18304	link
2024-05-28	IM-Context: In-Context Learning for Imbalanced Regression Tasks	Ismail Nejjar et.al.	2405.18202	link
2024-05-28	Knowledge Circuits in Pretrained Transformers	Yunzhi Yao et.al.	2405.17969	link
2024-05-28	FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction	Zhonghang Li et.al.	2405.17898	link
2024-05-28	Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents	Andrew H. Lee et.al.	2405.17840	null
2024-05-28	EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?	Boshen Xu et.al.	2405.17719	link
2024-05-27	RAGSys: Item-Cold-Start Recommender as RAG System	Emile Contal et.al.	2405.17587	null
2024-05-27	On the Noise Robustness of In-Context Learning for Text Generation	Hongfu Gao et.al.	2405.17264	link
2024-05-27	Transformer In-Context Learning for Categorical Data	Aaron T. Wang et.al.	2405.17248	null
2024-05-29	Benchmarking General Purpose In-Context Learning	Fan Wang et.al.	2405.17234	link
2024-05-27	Unifying Demonstration Selection and Compression for In-Context Learning	Jun Gao et.al.	2405.17062	null
2024-05-27	SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself	Jun Gao et.al.	2405.17052	null
2024-05-27	On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability	Chenyu Zheng et.al.	2405.16845	link
2024-05-27	Automatic Domain Adaptation by Transformers in In-Context Learning	Ryuichiro Hataya et.al.	2405.16819	null
2024-05-27	ARC: A Generalist Graph Anomaly Detector with In-Context Learning	Yixin Liu et.al.	2405.16771	link
2024-05-25	Learning to Reason via Program Generation, Emulation, and Search	Nathaniel Weir et.al.	2405.16337	link
2024-05-25	Mixture of In-Context Prompters for Tabular PFNs	Derek Xu et.al.	2405.16156	null
2024-05-24	MLPs Learn In-Context	William L. Tong et.al.	2405.15618	link
2024-05-24	Synergizing In-context Learning with Hints for End-to-end Task-oriented Dialog Systems	Vishal Vivek Saley et.al.	2405.15585	link
2024-05-24	Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs	Siyuan Guo et.al.	2405.15485	null
2024-05-24	Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation	Ge Qu et.al.	2405.15307	link
2024-05-24	Towards Global Optimal Visual In-Context Learning Prompt Selection	Chengming Xu et.al.	2405.15279	null
2024-05-24	Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor	Haoxuan Qu et.al.	2405.15267	null
2024-05-24	Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification	Shang Liu et.al.	2405.15115	null
2024-05-23	Linking In-context Learning in Transformers to Human Episodic Memory	Li Ji-An et.al.	2405.14992	link
2024-05-23	In-context Time Series Predictor	Jiecheng Lu et.al.	2405.14982	null
2024-05-23	Evaluating Large Language Models for Public Health Classification and Extraction Tasks	Joshua Harris et.al.	2405.14766	null
2024-05-23	Implicit In-context Learning	Zhuowei Li et.al.	2405.14660	link
2024-05-23	Emotion Identification for French in Written Texts: Considering their Modes of Expression as a Step Towards Text Complexity Analysis	Aline Étienne et.al.	2405.14385	null
2024-05-23	Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition	Chan-Jan Hsu et.al.	2405.14259	link
2024-05-22	Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning	Jiuqi Wang et.al.	2405.13861	null
2024-05-22	Why In-Context Learning Transformers are Tabular Data Classifiers	Felix den Breejen et.al.	2405.13396	link
2024-05-21	Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting	Krishna Prasad Varadarajan Srinivasan et.al.	2405.13181	null
2024-05-21	Quantifying Emergence in Large Language Models	Hang Chen et.al.	2405.12617	link
2024-05-20	Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning	Guanglin Zhou et.al.	2405.12217	link
2024-05-20	Asymptotic theory of in-context learning by linear attention	Yue M. Lu et.al.	2405.11751	link
2024-05-19	Effective In-Context Example Selection through Data Compression	Zhongxiang Sun et.al.	2405.11465	null
2024-05-19	MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning	Sanchit Sinha et.al.	2405.11446	null
2024-05-19	Large Language Models are Biased Reinforcement Learners	William M. Hayes et.al.	2405.11422	link
2024-05-18	Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models	Yan Wang et.al.	2405.11196	link
2024-05-17	Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection	Han Zhang et.al.	2405.11002	null
2024-05-17	Feature-Adaptive and Data-Scalable In-Context Learning	Jiahao Li et.al.	2405.10738	link
2024-05-20	Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks	Anwoy Chatterjee et.al.	2405.10548	link
2024-05-17	In-context Contrastive Learning for Event Causality Identification	Chao Liang et.al.	2405.10512	link
2024-05-16	Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction	Chinedu Ekuma et.al.	2405.10448	link
2024-05-16	Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model	Zheng Gu et.al.	2405.10316	null
2024-05-16	Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction	Jianhao Chen et.al.	2405.10288	link
2024-05-16	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	Xianzheng Ma et.al.	2405.10255	link
2024-05-16	LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting	Stijn Verdenius et.al.	2405.10093	link
2024-05-16	Many-Shot In-Context Learning in Multimodal Foundation Models	Yixing Jiang et.al.	2405.09798	link
2024-05-14	Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach	Syed Mhamudul Hasan et.al.	2405.08755	null
2024-05-14	PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles	Satya Kesav Gundabathula et.al.	2405.08373	null
2024-05-14	Compositional Text-to-Image Generation with Dense Blob Representations	Weili Nie et.al.	2405.08246	null
2024-05-13	AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models	Shuo Liu et.al.	2405.07626	link
2024-05-13	COBias and Debias: Minimizing Language Model Pairwise Accuracy Bias via Nonlinear Integer Programming	Ruixi Lin et.al.	2405.07623	null
2024-05-13	MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation	Dongjun Lee et.al.	2405.07467	null
2024-05-10	An Empirical Study on the Effectiveness of Large Language Models for SATD Identification and Classification	Mohammad Sadegh Sheikhaei et.al.	2405.06806	link
2024-05-10	Linearizing Large Language Models	Jean Mercat et.al.	2405.06640	link
2024-05-13	Memory Mosaics	Jianyu Zhang et.al.	2405.06394	link
2024-05-15	XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare	Fatemeh Nazary et.al.	2405.06270	null
2024-05-08	XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples	Peiqin Lin et.al.	2405.05116	link
2024-05-08	P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models	Guochao Jiang et.al.	2405.04960	link
2024-05-08	AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models	Yongheng Zhang et.al.	2405.04753	null
2024-05-07	ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning	Jing Lin et.al.	2405.04533	null
2024-05-07	In-context Learning for Automated Driving Scenarios	Ziqi Zhou et.al.	2405.04135	link
2024-05-08	Locally Differentially Private In-Context Learning	Chunyan Zheng et.al.	2405.04032	null
2024-05-06	OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs	Jiahao Nick Li et.al.	2405.03901	null
2024-05-06	Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning	Yubo Mai et.al.	2405.03509	null
2024-05-06	OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization	Weidong Wang et.al.	2405.03215	null
2024-05-04	CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions	Hanchong Zhang et.al.	2405.02712	link
2024-05-04	Enhancing News Summarization with ELearnFit through Efficient In-Context Learning and Efficient Fine-Tuning	Che Guan et.al.	2405.02710	null
2024-05-04	PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation	Ye Liu et.al.	2405.02580	link
2024-05-03	Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning	Hyeong Kyu Choi et.al.	2405.02501	link
2024-05-03	Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression	Karthik Duraisamy et.al.	2405.02462	null
2024-05-03	FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems	Yashar Deldjoo et.al.	2405.02219	null
2024-05-03	Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo	Mahmoud Masoud et.al.	2405.01997	null
2024-05-03	Understanding LLMs Requires More Than Statistical Generalization	Patrik Reizinger et.al.	2405.01964	link
2024-05-02	Question Suggestion for Conversational Shopping Assistants Using Product Metadata	Nikhita Vedula et.al.	2405.01738	null
2024-05-02	DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection	Yanjing Yang et.al.	2405.01202	link
2024-05-02	"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"	Andrew Parry et.al.	2405.01116	null
2024-05-01	Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations	Kirandeep Kaur et.al.	2405.00824	null
2024-04-30	Graphical Reasoning: LLM-based Semi-Open Relation Extraction	Yicheng Tao et.al.	2405.00216	link
2024-04-30	In-Context Learning with Long-Context Models: An In-Depth Exploration	Amanda Bertsch et.al.	2405.00200	null
2024-04-29	It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments	Petter Mæhlum et.al.	2404.18832	null
2024-05-01	Capabilities of Gemini Models in Medicine	Khaled Saab et.al.	2404.18416	null
2024-04-28	From Persona to Personalization: A Survey on Role-Playing Language Agents	Jiangjie Chen et.al.	2404.18231	null
2024-05-01	Exploring the Robustness of In-Context Learning with Noisy Labels	Chen Cheng et.al.	2404.18191	link
2024-04-30	ComposerX: Multi-Agent Symbolic Music Composition with LLMs	Qixin Deng et.al.	2404.18081	link
2024-04-27	Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language	Tsimur Hadeliya et.al.	2404.17832	null
2024-04-27	Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction	Guozheng Li et.al.	2404.17809	null
2024-04-27	Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors	Guozheng Li et.al.	2404.17807	null
2024-04-26	Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study	Yang Wu et.al.	2404.17136	link
2024-04-25	Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models	Eren Dogan et.al.	2404.17010	null
2024-04-25	Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning	Tianhui Zhang et.al.	2404.16807	link
2024-04-25	In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization	Herilalaina Rakotoarison et.al.	2404.16795	link
2024-04-25	What Makes Multimodal In-Context Learning Work?	Folco Bertini Baldassini et.al.	2404.15736	link
2024-04-23	XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference	João Monteiro et.al.	2404.15420	null
2024-04-21	Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following	Suyeon Shin et.al.	2404.15190	null
2024-04-23	Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond	Pengyu Xue et.al.	2404.14824	link
2024-04-23	Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities	Siyin Wang et.al.	2404.14716	null
2024-04-23	FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction	Hang Hua et.al.	2404.14715	null
2024-04-23	FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model	Zezheng Song et.al.	2404.14688	link
2024-04-21	AnyPattern: Towards In-context Image Copy Detection	Wenhao Wang et.al.	2404.13788	link
2024-04-21	"A good pun is its own reword": Can Large Language Models Understand Puns?	Zhijun Xu et.al.	2404.13599	link
2024-04-19	Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs	Biyang Guo et.al.	2404.13033	link
2024-04-19	Stronger Random Baselines for In-Context Learning	Gregory Yauney et.al.	2404.13020	link
2024-04-19	Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction	Qinyuan Wu et.al.	2404.12957	link
2024-04-19	How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?	Yang Luo et.al.	2404.12866	link
2024-04-19	Requirements Satisfiability with In-Context Learning	Sarah Santos et.al.	2404.12576	link
2024-04-18	Point-In-Context: Understanding Point Cloud via In-Context Learning	Mengyuan Liu et.al.	2404.12352	link
2024-04-18	Exploring the landscape of large language models: Foundations, techniques, and challenges	Milad Moradi et.al.	2404.11973	null
2024-04-17	In-Context Learning State Vector with Inner and Momentum Optimization	Dongfang Li et.al.	2404.11225	link
2024-04-17	Position Engineering: Boosting Large Language Models through Positional Information Manipulation	Zhiyuan He et.al.	2404.11216	null
2024-04-17	Many-Shot In-Context Learning	Rishabh Agarwal et.al.	2404.11018	null
2024-04-16	Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning	Moghis Fereidouni et.al.	2404.10887	null
2024-04-16	Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning	Xiao Wang et.al.	2404.10552	null
2024-04-15	Memory Sharing for Large Language Model based Agents	Hang Gao et.al.	2404.09982	link
2024-04-15	Evolving Interpretable Visual Classifiers with Large Language Models	Mia Chiquier et.al.	2404.09941	null
2024-04-15	In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation	Han Xue et.al.	2404.09633	null
2024-04-15	Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning	Sungwon Han et.al.	2404.09491	link
2024-04-14	GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning	Amani Namboori et.al.	2404.09163	null
2024-04-13	Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model	Zita Lifelo et.al.	2404.09045	null
2024-04-11	Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models	Tanmay Gautam et.al.	2404.08080	null
2024-04-11	LLoCO: Learning Long Contexts Offline	Sijun Tan et.al.	2404.07979	link
2024-04-11	Discourse-Aware In-Context Learning for Temporal Expression Normalization	Akash Kumar Gautam et.al.	2404.07775	null
2024-04-11	Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning	Quanyu Long et.al.	2404.07546	link
2024-04-10	Adaptive behavior with stable synapses	Cristiano Capone et.al.	2404.07150	link
2024-04-10	What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation	Aaditya K. Singh et.al.	2404.07129	link
2024-04-10	What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs	Anna Wegmann et.al.	2404.06670	link
2024-04-09	Neuromorphic In-Context Learning for Energy-Efficient MIMO Symbol Detection	Zihang Song et.al.	2404.06469	null
2024-04-11	Privacy Preserving Prompt Engineering: A Survey	Kennedy Edemacu et.al.	2404.06001	null
2024-04-08	WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents	Michael Lutz et.al.	2404.05902	null
2024-04-08	Enhancing Software Related Information Extraction with Generative Language Models through Single-Choice Question Answering	Wolfgang Otto et.al.	2404.05587	null
2024-04-11	Cell-Free Multi-User MIMO Equalization via In-Context Learning	Matteo Zecchin et.al.	2404.05538	link
2024-04-07	How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?	Ishani Mondal et.al.	2404.05088	null
2024-04-05	Exploring Autonomous Agents through the Lens of Large Language Models: A Review	Saikat Barua et.al.	2404.04442	null
2024-04-05	Deciphering Political Entity Sentiment in News with Large Language Models: Zero-Shot and Few-Shot Strategies	Alapan Kuila et.al.	2404.04361	link
2024-04-05	Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving	Gulsum Yigit et.al.	2404.03938	null
2024-04-04	SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection	Bradley P. Allen et.al.	2404.03732	link
2024-04-04	How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes	Harmon Bhasin et.al.	2404.03558	link
2024-04-03	GPT-DETOX: An In-Context Learning-Based Paraphraser for Text Detoxification	Ali Pesaranghader et.al.	2404.03052	null
2024-04-03	Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison	Maxime Bouthors et.al.	2404.02835	null
2024-04-03	Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM	Zhe Liu et.al.	2404.02706	null
2024-04-03	Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation	Zhe Xu et.al.	2404.02505	link
2024-04-03	uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers?	Pouya Sadeghi et.al.	2404.02474	link
2024-04-03	Task Agnostic Architecture for Algorithm Induction via Implicit Composition	Sahil J. Sindhi et.al.	2404.02450	null
2024-04-03	Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data	Parth Patwa et.al.	2404.02422	null
2024-04-02	Emergent Abilities in Reduced-Scale Generative Language Models	Sherin Muckatira et.al.	2404.02204	link
2024-04-02	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Maksym Andriushchenko et.al.	2404.02151	link
2024-04-02	Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Wanyong Feng et.al.	2404.02124	link
2024-04-04	Long-context LLMs Struggle with Long In-context Learning	Tianle Li et.al.	2404.02060	link
2024-04-02	Deconstructing In-Context Learning: Understanding Prompts via Corruption	Namrata Shivagunde et.al.	2404.02054	link
2024-04-02	Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts	Zhuo Chen et.al.	2404.02022	link
2024-04-02	Large Language Models for Orchestrating Bimanual Robots	Kun Chu et.al.	2404.02018	link
2024-04-02	Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4	Dan Schumacher et.al.	2404.01961	link
2024-04-02	Self-Improvement Programming for Temporal Knowledge Graph Question Answering	Zhuo Chen et.al.	2404.01720	null
2024-04-01	Structured Information Matters: Incorporating Abstract Meaning Representation into LLMs for Improved Open-Domain Dialogue Evaluation	Bohao Yang et.al.	2404.01129	link
2024-04-01	Efficient Prompting Methods for Large Language Models: A Survey	Kaiyan Chang et.al.	2404.01077	null
2024-03-29	Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science	Yazheng Yang et.al.	2403.20208	null
2024-03-28	Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models	Yucheng Shi et.al.	2403.19631	link
2024-03-28	Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation	Chenming Tang et.al.	2403.19285	null
2024-03-28	Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction	Chenming Tang et.al.	2403.19283	null
2024-03-28	Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation	Yutong He et.al.	2403.19103	null
2024-03-26	Large Language Models Enhanced Collaborative Filtering	Zhongxiang Sun et.al.	2403.17688	null
2024-03-26	Language Models for Text Classification: Is In-Context Learning Enough?	Aleksandra Edwards et.al.	2403.17661	null
2024-03-26	Naive Bayes-based Context Extension for Large Language Models	Jianlin Su et.al.	2403.17552	link
2024-03-26	ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler	Paramita Mirza et.al.	2403.17536	link
2024-03-25	A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection	Benjamin Steenhoek et.al.	2403.17218	null
2024-03-25	MetaAligner: Conditional Weak-to-Strong Correction for Generalizable Multi-Objective Alignment of Language Models	Kailai Yang et.al.	2403.17141	link
2024-03-25	The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition	Georgios Chochlakis et.al.	2403.17125	null
2024-03-25	SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging	Lingdong Shen et.al.	2403.16578	null
2024-03-27	LLMs Are Few-Shot In-Context Low-Resource Language Learners	Samuel Cahyawijaya et.al.	2403.16512	link
2024-03-25	LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification	Liu Junhua et.al.	2403.16504	null
2024-03-24	SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder	Mohammadreza Pourreza et.al.	2403.16204	null
2024-03-23	IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models	Haz Sameen Shahgir et.al.	2403.15952	link
2024-03-21	Sequence-to-Sequence Language Models for Character and Emotion Detection in Dream Narratives	Gustave Cortal et.al.	2403.15486	null
2024-03-22	ESG Classification by Implicit Rule Learning via GPT-4	Hyo Jeong Yun et.al.	2403.15040	null
2024-03-22	Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation	Shanthi Karpurapu et.al.	2403.14965	link
2024-03-22	Stance Reasoner: Zero-Shot Stance Detection on Social Media with Explicit Reasoning	Maksym Taranukhin et.al.	2403.14895	link
2024-03-21	Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning	Changtong Zan et.al.	2403.14399	link
2024-03-21	PE-GPT: A Physics-Informed Interactive Large Language Model for Power Converter Modulation Design	Fanfan Lin et.al.	2403.14059	null
2024-03-19	VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning	Yongshuo Zong et.al.	2403.13164	link
2024-03-19	Towards Multimodal In-Context Learning for Vision & Language Models	Sivan Doveh et.al.	2403.12736	null
2024-03-19	CrossTune: Black-Box Few-Shot Classification with Label Enhancement	Danqing Luo et.al.	2403.12468	null
2024-03-19	An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis	Yifan Peng et.al.	2403.12402	null
2024-03-18	Transfer Learning Beyond Bounded Density Ratios	Alkis Kalavasis et.al.	2403.11963	null
2024-03-18	CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification	Korbinian Randl et.al.	2403.11904	link
2024-03-18	Towards Understanding the Relationship between In-context Learning and Compositional Generalization	Sungjun Han et.al.	2403.11834	null
2024-03-18	Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis	Vishnu Sashank Dorbala et.al.	2403.11487	null
2024-03-16	Interpretable Machine Learning for TabPFN	David Rundel et.al.	2403.10923	link
2024-03-16	Zero-shot Generative Linguistic Steganography	Ke Lin et.al.	2403.10856	link
2024-03-15	Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Tian Meng et.al.	2403.10287	null
2024-03-15	Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning	Shang-Hsuan Chiang et.al.	2403.10281	link
2024-03-15	The Whole is Better than the Sum: Using Aggregated Demonstrations in In-Context Learning for Sequential Recommendation	Lei Wang et.al.	2403.10135	link
2024-03-14	MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training	Brandon McKinzie et.al.	2403.09611	null
2024-03-15	WavCraft: Audio Editing and Generation with Natural Language Prompts	Jinhua Liang et.al.	2403.09527	link
2024-03-14	Rectifying Demonstration Shortcut in In-Context Learning	Joonwon Jang et.al.	2403.09488	link
2024-03-14	Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity	Zhuo Zhi et.al.	2403.09428	link
2024-03-14	Unveiling the Generalization Power of Fine-Tuned Large Language Models	Haoran Yang et.al.	2403.09162	link
2024-03-14	Large Language Models are Parallel Multilingual Learners	Yongyu Mu et.al.	2403.09073	link
2024-03-13	Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking	Ming Dong et.al.	2403.08492	null
2024-03-12	BAGEL: Bootstrapping Agents by Guiding Exploration with Language	Shikhar Murty et.al.	2403.08140	null
2024-03-12	In-context learning enables multimodal large language models to classify cancer pathology images	Dyke Ferber et.al.	2403.07407	null
2024-03-13	Knowledge Graph Large Language Model (KG-LLM) for Link Prediction	Dong Shu et.al.	2403.07311	null
2024-03-11	SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data	Jialu Li et.al.	2403.06952	null
2024-03-12	MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning	Yichuan Li et.al.	2403.06914	link
2024-03-11	In-context Exploration-Exploitation for Reinforcement Learning	Zhenwen Dai et.al.	2403.06826	null
2024-03-11	'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification	Manish Chandra et.al.	2403.06402	null
2024-03-10	FedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning	Zhuo Zhang et.al.	2403.06131	null
2024-03-10	In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model	Junhui Yin et.al.	2403.06126	null
2024-03-09	Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages	Christopher Toukmaji et.al.	2403.06018	null
2024-03-08	A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries	Asad Aali et.al.	2403.05720	link
2024-03-08	DP-TabICL: In-Context Learning with Differentially Private Tabular Data	Alycia N. Carey et.al.	2403.05681	null
2024-03-08	InstructGIE: Towards Generalizable Image Editing	Zichong Meng et.al.	2403.05018	null
2024-03-07	LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error	Boshi Wang et.al.	2403.04746	link
2024-03-08	How Far Are We from Intelligent Visual Deductive Reasoning?	Yizhe Zhang et.al.	2403.04732	link
2024-03-07	Where does In-context Translation Happen in Large Language Models	Suzanna Sia et.al.	2403.04510	null
2024-03-07	DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning	Xingwei Qu et.al.	2403.04233	null
2024-03-07	On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models	Xinpeng Wang et.al.	2403.04204	null
2024-03-06	German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset	Laura Mascarell et.al.	2403.03750	link
2024-03-06	Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Yuhong Sun et.al.	2403.03558	link
2024-03-06	Japanese-English Sentence Translation Exercises Dataset for Automatic Grading	Naoki Miura et.al.	2403.03396	null
2024-03-05	How Well Can Transformers Emulate In-context Newton's Method?	Angeliki Giannou et.al.	2403.03183	null
2024-03-05	MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting	Fangchen Liu et.al.	2403.03174	null
2024-03-06	Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation	Bin Zhang et.al.	2403.02951	null
2024-03-05	Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment	Congzhi Zhang et.al.	2403.02738	null
2024-03-04	Not all Layers of LLMs are Necessary during Inference	Siqi Fan et.al.	2403.02181	null
2024-03-04	Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?	Evgeniia Razumovskaia et.al.	2403.01929	null
2024-03-03	Transformers for Supervised Online Continual Learning	Jorg Bornschein et.al.	2403.01554	null
2024-03-03	Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models	Amal Rannen-Triki et.al.	2403.01518	null
2024-03-02	Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal	Jianheng Huang et.al.	2403.01244	link
2024-03-02	Distilling Text Style Transfer With Self-Explanation From LLMs	Chiyu Zhang et.al.	2403.01106	null
2024-03-02	FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis	Songhua Yang et.al.	2403.01063	link
2024-03-01	DFIN-SQL: Integrating Focused Schema with DIN-SQL for Superior Accuracy in Large-Scale Databases	Shai Volvovsky et.al.	2403.00872	null
2024-02-29	ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph	Xukun Liu et.al.	2403.00839	null
2024-03-01	LLMs for Targeted Sentiment in News Headlines: Exploring Different Levels of Prompt Prescriptiveness	Jana Juroš et.al.	2403.00418	null
2024-03-01	Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish	Recep Firat Cekinel et.al.	2403.00411	link
2024-02-29	Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality	Siyu Chen et.al.	2402.19442	null
2024-02-29	Teaching Large Language Models an Unseen Language on the Fly	Chen Zhang et.al.	2402.19167	link
2024-02-29	Dual Operating Modes of In-Context Learning	Ziqian Lin et.al.	2402.18819	link
2024-02-28	Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling	Mahdi Karami et.al.	2402.18508	null
2024-02-28	Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification	Garima Chhikara et.al.	2402.18502	null
2024-02-28	Large Language Models As Evolution Strategies	Robert Tjarko Lange et.al.	2402.18381	null
2024-02-28	From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs	Yulong Liu et.al.	2402.18157	null
2024-02-28	Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation	Shicheng Xu et.al.	2402.18150	link
2024-02-28	All in a Single Image: Large Multimodal Models are In-Image Learners	Lei Wang et.al.	2402.17971	link
2024-02-27	Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models	Yunpeng Huang et.al.	2402.17671	null
2024-02-27	Reinforced In-Context Black-Box Optimization	Lei Song et.al.	2402.17423	link
2024-02-27	Video as the New Language for Real-World Decision Making	Sherry Yang et.al.	2402.17139	null
2024-02-25	DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers	Xirui Li et.al.	2402.16914	link
2024-02-28	Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models	Anchun Gui et.al.	2402.16696	null
2024-02-26	Long-Context Language Modeling with Parallel Context Encoding	Howard Yen et.al.	2402.16617	link
2024-02-25	LLMs with Chain-of-Thought Are Non-Causal Reasoners	Guangsheng Bao et.al.	2402.16048	link
2024-02-25	Likelihood-based Mitigation of Evaluation Bias in Large Language Models	Masanari Ohi et.al.	2402.15987	link
2024-02-24	Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning	Wuyang Chen et.al.	2402.15734	link
2024-02-23	Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models	Yanzheng Xiang et.al.	2402.15637	link
2024-02-23	Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis	Hongkang Li et.al.	2402.15607	null
2024-02-23	Evaluating the Performance of ChatGPT for Spam Email Detection	Yuwei Wu et.al.	2402.15537	null
2024-02-23	Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models	Guanming Xiong et.al.	2402.15131	link
2024-02-23	Studying LLM Performance on Closed- and Open-source Data	Toufique Ahmed et.al.	2402.15100	null
2024-02-23	Fine-tuning Large Language Models for Domain-specific Machine Translation	Jiawei Zheng et.al.	2402.15061	null
2024-02-22	In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization	Ruiqi Zhang et.al.	2402.14951	null
2024-02-22	How Transformers Learn Causal Structure with Gradient Descent	Eshaan Nichani et.al.	2402.14735	link
2024-02-23	Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis	Takehiro Takayanagi et.al.	2402.14484	null
2024-02-22	On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe	Ningyu Xu et.al.	2402.14404	link
2024-02-22	A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation	Yuyue Zhou et.al.	2402.14300	link
2024-02-21	Analysing The Impact of Sequence Composition on Language Model Pre-Training	Yu Zhao et.al.	2402.13991	link
2024-02-21	$\texttt{Se}^2$: $\textit{Se}$quential Example $\textit{Se}$ lection for In-Context Learning	Haoyu Liu et.al.	2402.13874	link
2024-02-21	Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction	Guozheng Li et.al.	2402.13741	null
2024-02-21	Unsupervised Text Style Transfer via LLMs and Attention Masking with Multi-way Interactions	Lei Pan et.al.	2402.13647	null
2024-02-21	A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation	Yunxin Li et.al.	2402.13587	link
2024-02-21	CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory	Zexue He et.al.	2402.13449	null
2024-02-20	Harnessing Large Language Models as Post-hoc Correctors	Zhiqiang Zhong et.al.	2402.13414	link
2024-02-20	Identifying Semantic Induction Heads to Understand In-Context Learning	Jie Ren et.al.	2402.13055	null
2024-02-20	The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis	Miaoran Zhang et.al.	2402.12976	link
2024-02-20	Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need?	Branislav Pecher et.al.	2402.12819	null
2024-02-20	On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices	Branislav Pecher et.al.	2402.12817	link
2024-02-19	Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation	Joseph Marvin Imperial et.al.	2402.12593	link
2024-02-19	Parallel Structures in Pre-training Data Yield In-Context Learning	Yanda Chen et.al.	2402.12530	null
2024-02-19	Task-Oriented Dialogue with In-Context Learning	Tom Bocklisch et.al.	2402.12234	link
2024-02-19	Do Large Language Models Understand Logic or Just Mimick Context?	Junbing Yan et.al.	2402.12091	null
2024-02-19	Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations	Milan Bhan et.al.	2402.12038	null
2024-02-19	Modularized Networks for Few-shot Hateful Meme Detection	Rui Cao et.al.	2402.11845	link
2024-02-19	In-Context Learning Demonstration Selection via Influence Analysis	Vinay M. S. et.al.	2402.11750	null
2024-02-18	GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network	Shuzhou Yuan et.al.	2402.11709	link
2024-02-18	In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness	Liam Collins et.al.	2402.11639	null
2024-02-18	Visual In-Context Learning for Large Vision-Language Models	Yucheng Zhou et.al.	2402.11574	null
2024-02-18	Learning to Learn Faster from Human Feedback with Language Model Predictive Control	Jacky Liang et.al.	2402.11450	null
2024-02-18	In-Context Example Ordering Guided by Label Distributions	Zhichao Xu et.al.	2402.11447	null
2024-02-16	RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model	Jianhao Yuan et.al.	2402.10828	null
2024-02-16	Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning	Yinpeng Liu et.al.	2402.10738	link
2024-02-16	Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm	Yuanzhen Xie et.al.	2402.10671	link
2024-02-16	Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL	Dingzirui Wang et.al.	2402.10663	link
2024-02-16	Linear Transformers with Learnable Kernel Functions are Better In-Context Models	Yaroslav Aksenov et.al.	2402.10644	link
2024-02-16	LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty	Zhen Zhang et.al.	2402.10573	link
2024-02-16	Understanding In-Context Learning with a Pelican Soup Framework	Ting-Rui Chiang et.al.	2402.10424	null
2024-02-16	Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting	Jiaheng Wei et.al.	2402.10412	null
2024-02-15	Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models	Kang He et.al.	2402.10353	null
2024-02-15	Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models	Chen Ling et.al.	2402.10189	link
2024-02-15	Self-Augmented In-Context Learning for Unsupervised Word Translation	Yaoyiran Li et.al.	2402.10024	link
2024-02-15	Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation	Jiashu Pu et.al.	2402.09954	null
2024-02-15	Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language Models	Chenyang Shao et.al.	2402.09836	null
2024-02-15	QuRating: Selecting High-Quality Data for Training Language Models	Alexander Wettig et.al.	2402.09739	link
2024-02-14	Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems	Liang Zhang et.al.	2402.09584	null
2024-02-14	HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation	Yihao Fang et.al.	2402.09390	link
2024-02-14	ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization	Feifan Song et.al.	2402.09320	link
2024-02-14	GrounDial: Human-norm Grounded Safe Dialog Response Generation	Siwon Kim et.al.	2402.08968	null
2024-02-13	Human Curriculum Effects Emerge with In-Context Learning in Neural Networks	Jacob Russin et.al.	2402.08674	null
2024-02-12	Text-centric Alignment for Multi-Modality Learning	Yun-Da Tsai et.al.	2402.08086	null
2024-02-12	Universal link predictor by In-context Learning	Kaiwen Dong et.al.	2402.07738	null
2024-02-12	Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping	Haoyu Wang et.al.	2402.07610	null
2024-02-12	VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization	Dongsheng Zhu et.al.	2402.07398	link
2024-02-12	Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples	Qingkai Zeng et.al.	2402.07386	link
2024-02-12	Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning	Gabriel Simmons et.al.	2402.07368	null
2024-02-10	In-Context Data Distillation with TabPFN	Junwei Ma et.al.	2402.06971	null
2024-02-09	NICE: To Optimize In-Context Examples or Not?	Pragya Srivastava et.al.	2402.06733	null
2024-02-09	Entropy-Regularized Token-Level Policy Optimization for Large Language Models	Muning Wen et.al.	2402.06700	link
2024-02-09	On the Out-Of-Distribution Generalization of Multimodal Large Language Models	Xingxuan Zhang et.al.	2402.06599	null
2024-02-09	InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Huaiyuan Ying et.al.	2402.06332	link
2024-02-08	In-Context Learning Can Re-learn Forbidden Tasks	Sophie Xhonneux et.al.	2402.05723	null
2024-02-08	NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning	Yufeng Zhao et.al.	2402.05515	link
2024-02-09	In-Context Principle Learning from Mistakes	Tianjun Zhang et.al.	2402.05403	null
2024-02-07	InCoRo: In-Context Learning for Robotics Control with Feedback Loops	Jiaqiang Ye Zhu et.al.	2402.05188	null
2024-02-07	L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ	Hyesung Jeon et.al.	2402.04902	null
2024-02-06	Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks	Jongho Park et.al.	2402.04248	link
2024-02-06	In-context learning agents are asymmetric belief updaters	Johannes A. Schubert et.al.	2402.03969	null
2024-02-06	Rethinking Skill Extraction in the Job Market Domain using Large Language Models	Khanh Cao Nguyen et.al.	2402.03832	link
2024-02-05	Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations	Álvaro Martín-Cortinas et.al.	2402.03407	null
2024-02-05	The Matrix: A Bayesian learning model for LLMs	Siddhartha Dalal et.al.	2402.03175	null
2024-02-05	Multi: Multimodal Understanding Leaderboard with Text and Images	Zichen Zhu et.al.	2402.03173	null
2024-02-05	Is Mamba Capable of In-Context Learning?	Riccardo Grazzi et.al.	2402.03170	link
2024-02-05	Automatic Combination of Sample Selection Strategies for Few-Shot Learning	Branislav Pecher et.al.	2402.03038	null
2024-02-05	How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning	Zeping Yu et.al.	2402.02872	link
2024-02-04	Are Large Language Models Table-based Fact-Checkers?	Hangwen Zhang et.al.	2402.02549	link
2024-02-04	KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion	Yanbin Wei et.al.	2402.02389	link
2024-02-04	Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning	Tong Niu et.al.	2402.02388	null
2024-02-04	AutoTimes: Autoregressive Time Series Forecasters via Large Language Models	Yong Liu et.al.	2402.02370	link
2024-02-04	The Developmental Landscape of In-Context Learning	Jesse Hoogland et.al.	2402.02364	null
2024-02-02	Can MLLMs Perform Text-to-Image In-Context Learning?	Yuchen Zeng et.al.	2402.01293	link
2024-02-02	Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape	Juno Kim et.al.	2402.01258	null
2024-02-02	In-Context Learning for Few-Shot Nested Named Entity Recognition	Meishan Zhang et.al.	2402.01182	null
2024-02-02	CABINET: Content Relevance based Noise Reduction for Table Question Answering	Sohan Patnaik et.al.	2402.01155	link
2024-02-01	Can Large Language Models Understand Context?	Yilun Zhu et.al.	2402.00858	null
2024-02-01	Unlearnable Algorithms for In-context Learning	Andrei Muresanu et.al.	2402.00751	null
2024-02-01	Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement	Xin Quan et.al.	2402.00745	link
2024-02-01	Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data	Yue Xing et.al.	2402.00743	null
2024-02-01	Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning	Jitao Sang et.al.	2402.00667	link
2024-01-31	Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction	Jialiang Wu et.al.	2401.17716	null
2024-01-31	Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning	Yuelyu Ji et.al.	2401.17602	link
2024-01-30	Superiority of Multi-Head Attention in In-Context Linear Regression	Yingqian Cui et.al.	2401.17426	null
2024-01-30	Customizing Language Model Responses with Contrastive In-Context Learning	Xiang Gao et.al.	2401.17390	null
2024-01-29	ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Bolei Ma et.al.	2401.16589	link
2024-01-29	APIGen: Generative API Method Recommendation	Yujia Chen et.al.	2401.15843	link
2024-01-28	An Information-Theoretic Analysis of In-Context Learning	Hong Jun Jeon et.al.	2401.15530	null
2024-01-26	Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning	Tao He et.al.	2401.14626	null
2024-01-25	Language Modelling Approaches to Adaptive Machine Translation	Yasmin Moslem et.al.	2401.14559	null
2024-01-25	K-QA: A Real-World Medical Q&A Benchmark	Itay Manes et.al.	2401.14493	link
2024-01-24	Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4	Xuchao Zhang et.al.	2401.13810	null
2024-01-24	Tyche: Stochastic In-Context Learning for Medical Image Segmentation	Marianne Rakic et.al.	2401.13650	link
2024-01-24	MaLA-500: Massive Language Adaptation of Large Language Models	Peiqin Lin et.al.	2401.13303	null
2024-01-30	In-Context Language Learning: Architectures and Algorithms	Ekin Akyürek et.al.	2401.12973	link
2024-01-22	Enhancing In-context Learning via Linear Probe Calibration	Momin Abbas et.al.	2401.12406	link
2024-01-22	In-Context Learning for Extreme Multi-Label Classification	Karel D'Oosterlinck et.al.	2401.12178	link
2024-01-22	An Empirical Analysis of In-context Learning Abilities of LLMs for MT	Pranjal A. Chitale et.al.	2401.12097	link
2024-01-22	Revisiting Demonstration Selection Strategies in In-Context Learning	Keqin Peng et.al.	2401.12087	link
2024-01-23	In-context Learning with Retrieved Demonstrations for Language Models: A Survey	Man Luo et.al.	2401.11624	null
2024-01-20	Analyzing Task-Encoding Tokens in Large Language Models	Yu Bai et.al.	2401.11323	null
2024-01-18	Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation	Zdeněk Kasner et.al.	2401.10186	null
2024-01-18	Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning	Yong Zhang et.al.	2401.09783	null
2024-01-16	HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance	Huanjun Kong et.al.	2401.08772	link
2024-01-16	The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing	Masahiro Kaneko et.al.	2401.08511	null
2024-01-16	Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian Directions	Nooshin Pourkamali et.al.	2401.08429	null
2024-01-14	A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models	Namjoon Suh et.al.	2401.07187	null
2024-01-13	Fast and Accurate Zero-Training Classification for Tabular Engineering Data	Cyril Picard et.al.	2401.06948	null
2024-01-12	Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements	Anton Voronov et.al.	2401.06766	link
2024-01-12	The Unreasonable Effectiveness of Easy Training Data for Hard Tasks	Peter Hase et.al.	2401.06751	link
2024-01-12	Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning	Kaiyi Zhang et.al.	2401.06469	link
2024-01-12	Misconfidence-based Demonstration Selection for LLM In-Context Learning	Shangqing Xu et.al.	2401.06301	null
2024-01-12	Universal Vulnerabilities in Large Language Models: In-context Learning Backdoor Attacks	Shuai Zhao et.al.	2401.05949	link
2024-01-11	Probing Structured Semantics Understanding and Generation of Language Models via Question Answering	Jinxin Liu et.al.	2401.05777	null
2024-01-16	POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation	Shilong Pan et.al.	2401.05596	null
2024-01-10	Leveraging Print Debugging to Improve Code Generation in Large Language Models	Xueyu Hu et.al.	2401.05319	null
2024-01-09	SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning	Hector A. Gonzalez et.al.	2401.04491	null
2024-01-09	Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding	Zilong Wang et.al.	2401.04398	null
2024-01-04	MobileAgent: enhancing mobile control via human-machine interaction and SOP integration	Tinghe Ding et.al.	2401.04124	link
2024-01-08	Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection	Georgios Fatouros et.al.	2401.03737	null
2024-01-10	Grimoire is All You Need for Enhancing Large Language Models	Ding Chen et.al.	2401.03385	link
2024-01-05	Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks	Kevin Everson et.al.	2401.02921	null
2024-01-05	Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task	Gabriel Lino Garcia et.al.	2401.02909	null
2024-01-04	DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models	Songbo Hu et.al.	2401.02208	link
2024-01-01	A Computational Framework for Behavioral Assessment of LLM Therapists	Yu Ying Chiu et.al.	2401.00820	link
2024-01-01	The Earth is Flat? Unveiling Factual Errors in Large Language Models	Wenxuan Wang et.al.	2401.00761	null
2024-01-01	A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models	Yuxuan Wan et.al.	2401.00757	link
2023-12-29	Overview of the PromptCBLUE Shared Task in CHIP2023	Wei Zhu et.al.	2312.17522	link
2023-12-28	Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos	Houlun Chen et.al.	2312.17117	null
2023-12-28	Improving In-context Learning via Bidirectional Alignment	Chengwei Qin et.al.	2312.17055	null
2023-12-27	How Robust are LLMs to In-Context Majority Label Bias?	Karan Gupta et.al.	2312.16549	null
2023-12-26	Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation	Zhu Sun et.al.	2312.16262	null
2023-12-26	RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation	Sichun Luo et.al.	2312.16018	link
2023-12-26	Supervised Knowledge Makes Large Language Models Better In-context Learners	Linyi Yang et.al.	2312.15918	link
2023-12-25	EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data	Shirong Ma et.al.	2312.15696	null
2023-12-22	On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning	Chengzu Li et.al.	2312.13772	link
2023-12-19	RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios	Wenhao Ding et.al.	2312.13303	null
2023-12-20	Generative Multimodal Models are In-Context Learners	Quan Sun et.al.	2312.13286	link
2023-12-20	Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest	Emily Groves et.al.	2312.12989	null
2023-12-20	Fine-tuning Large Language Models for Adaptive Machine Translation	Yasmin Moslem et.al.	2312.12740	link
2023-12-21	Can Transformers Learn Sequential Function Classes In Context?	Ryan Campbell et.al.	2312.12655	link
2023-12-19	Emergence of In-Context Reinforcement Learning from Noise Distillation	Ilya Zisman et.al.	2312.12275	link
2023-12-18	DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation	Yu Wang et.al.	2312.11336	null
2023-12-19	Split and Rephrase with Large Language Models	David Ponce et.al.	2312.11075	null
2023-12-18	APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation	Chengran Yang et.al.	2312.10934	null

(back to top)

VLM

Publish Date	Title	Authors	PDF	Code
2025-01-23	Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models	Linh Tran et.al.	2501.13904	null
2025-01-23	Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning	Shiyu Zhang et.al.	2501.13859	null
2025-01-23	Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes	Shiling Deng et.al.	2501.13851	link
2025-01-23	Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models	Chaolei Han et.al.	2501.13795	null
2025-01-23	Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak	Erjia Xiao et.al.	2501.13772	null
2025-01-23	EventVL: Understand Event Streams via Multimodal Large Language Model	Pengteng Li et.al.	2501.13707	null
2025-01-23	Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task	Mohit Vaishnav et.al.	2501.13620	null
2025-01-23	Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving	Lu Wang et.al.	2501.13563	null
2025-01-23	Text-driven Online Action Detection	Manuel Benavent-Lledo et.al.	2501.13518	link
2025-01-23	Iterative Shaping of Multi-Particle Aggregates based on Action Trees and VLM	Hoi-Yin Lee et.al.	2501.13507	null
2025-01-22	Patent Figure Classification using Large Vision-language Models	Sushil Awale et.al.	2501.12751	link
2025-01-22	TeD-Loc: Text Distillation for Weakly Supervised Object Localization	Shakeeb Murtaza et.al.	2501.12632	link
2025-01-22	ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality	Yanming Xiu et.al.	2501.12553	link
2025-01-21	Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models	Tabinda Aman et.al.	2501.12433	null
2025-01-20	ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models	Jingwei Yi et.al.	2501.12418	link
2025-01-21	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Yuhang Zang et.al.	2501.12368	link
2025-01-21	Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2	Md. Rakibul Islam et.al.	2501.12356	null
2025-01-21	CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification	Cristiano Patrício et.al.	2501.12266	null
2025-01-21	Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model	Kazi Hasan Ibn Arif et.al.	2501.12206	null
2025-01-20	Human-AI Collaborative Game Testing with Vision Language Models	Boran Zhang et.al.	2501.11782	null
2025-01-20	SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models	Shu Zou et.al.	2501.11485	link
2025-01-20	Verifying Cross-modal Entity Consistency in News using Vision-language Models	Sahar Tahmasebi et.al.	2501.11403	null
2025-01-20	KPL: Training-Free Medical Knowledge Mining of Vision-Language Models	Jiaxiang Liu et.al.	2501.11231	link
2025-01-19	ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models	Yassir Bendou et.al.	2501.11175	null
2025-01-19	Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding	Zhanpeng Chen et.al.	2501.10967	link
2025-01-17	HiMix: Reducing Computational Complexity in Large Vision-Language Models	Xuange Zhang et.al.	2501.10318	null
2025-01-17	SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning	Yuecheng Liu et.al.	2501.10074	null
2025-01-17	CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment	Yating Liu et.al.	2501.10071	null
2025-01-17	MSTS: A Multimodal Safety Test Suite for Vision-Language Models	Paul Röttger et.al.	2501.10057	link
2025-01-17	Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions	Zhijie Tan et.al.	2501.10011	null
2025-01-17	Explainable artificial intelligence (XAI): from inherent explainability to large language models	Fuseini Mumuni et.al.	2501.09967	null
2025-01-16	Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key	Zhihe Yang et.al.	2501.09695	link
2025-01-16	Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark	Alexis Roger et.al.	2501.09672	null
2025-01-16	Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness	Zeyu Wang et.al.	2501.09446	null
2025-01-16	Vision-Language Models Do Not Understand Negation	Kumail Alhamoud et.al.	2501.09425	null
2025-01-16	YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks	Saptarashmi Bandyopadhyay et.al.	2501.09355	null
2025-01-16	RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects	Zhen Luo et.al.	2501.09307	null
2025-01-16	Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning	Harrison Fuller et.al.	2501.09294	null
2025-01-16	Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites	Abdalwhab Abdalwhab et.al.	2501.09267	null
2025-01-16	Exploring the Capabilities of Vision-Language Models to Detect Visual Bugs in HTML5 Applications	Finlay Macklon et.al.	2501.09236	null
2025-01-15	Embodied Scene Understanding for Vision Language Models via MetaVQA	Weizhen Wang et.al.	2501.09167	null
2025-01-15	CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation	Qi Ma et.al.	2501.08982	null
2025-01-15	Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning	Julian Perry et.al.	2501.08597	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding	Liping Yuan et.al.	2501.07888	null
2025-01-14	Visual Language Models as Operator Agents in the Space Domain	Alejandro Carrasco et.al.	2501.07802	null
2025-01-14	BMIP: Bi-directional Modality Interaction Prompt Learning for VLM	Song-Lin Lv et.al.	2501.07769	null
2025-01-13	SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing	Varun Biyyala et.al.	2501.07554	link
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525	link
2025-01-13	Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models	Yasiru Ranasinghe et.al.	2501.07396	null
2025-01-14	GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction	Oleg Kobzarev et.al.	2501.07295	null
2025-01-13	Can Vision-Language Models Evaluate Handwritten Math?	Oikantik Nath et.al.	2501.07244	null
2025-01-13	TimeLogic: A Temporal Logic Benchmark for Video QA	Sirnam Swetha et.al.	2501.07214	null
2025-01-13	BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature	Alejandro Lozano et.al.	2501.07171	link
2025-01-13	Duplex: Dual Prototype Learning for Compositional Zero-Shot Learning	Zhong Peng et.al.	2501.07114	null
2025-01-12	MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis	Sadia Kamal et.al.	2501.06887	null
2025-01-12	Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving	Haoxiang Gao et.al.	2501.06680	null
2025-01-10	VideoAuteur: Towards Long Narrative Video Generation	Junfei Xiao et.al.	2501.06173	null
2025-01-10	CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems	Haichao Liu et.al.	2501.06132	link
2025-01-10	Generate, Transduct, Adapt: Iterative Transduction with VLMs	Oindrila Saha et.al.	2501.06031	null
2025-01-10	Scalable Vision Language Model Training via High Quality Data Curation	Hongyuan Dong et.al.	2501.05952	null
2025-01-10	Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Ziheng Wu et.al.	2501.05901	link
2025-01-10	Super-class guided Transformer for Zero-Shot Attribute Classification	Sehyung Kim et.al.	2501.05728	link
2025-01-10	From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities	Dominick Reilly et.al.	2501.05711	null
2025-01-09	Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding	Mohammed Elhenawy et.al.	2501.05566	null
2025-01-09	Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation	Darius Petermann et.al.	2501.05413	null
2025-01-09	Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection	Pei-Kang Lee et.al.	2501.05228	null
2025-01-09	Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model	Gregor Geigle et.al.	2501.05122	null
2025-01-09	DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving	Xuran Zheng et.al.	2501.05081	null
2025-01-09	ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark	Ronghao Dang et.al.	2501.05031	null
2025-01-09	Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments	Yifan Xu et.al.	2501.04947	null
2025-01-08	Re-ranking the Context for Multimodal Retrieval Augmented Generation	Matin Mortaheb et.al.	2501.04695	null
2025-01-08	Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations	Archita Srivastava et.al.	2501.04675	null
2025-01-08	DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests	Charles Corbière et.al.	2501.04671	null
2025-01-08	A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI	Kazusato Oko et.al.	2501.04641	link
2025-01-08	Supervision-free Vision-Language Alignment	Giorgio Giannone et.al.	2501.04568	null
2025-01-08	Online Gaussian Test-Time Adaptation of Vision-Language Models	Clément Fuchs et.al.	2501.04352	link
2025-01-08	Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Zeyi Huang et.al.	2501.04336	null
2025-01-08	Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts	Miao Rang et.al.	2501.04322	null
2025-01-08	Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation	Senwei Xie et.al.	2501.04268	null
2025-01-07	MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation	Siddharth Joshi et.al.	2501.04155	link
2025-01-07	Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives	Shaoyuan Xie et.al.	2501.04003	link
2025-01-07	Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Haobo Yuan et.al.	2501.04001	link
2025-01-07	RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance	Matin Mortaheb et.al.	2501.03995	null
2025-01-07	VLM-driven Behavior Tree for Context-aware Task Planning	Naoki Wake et.al.	2501.03968	link
2025-01-07	Vision Language Models as Values Detectors	Giulio Antonio Abbo et.al.	2501.03957	null
2025-01-07	OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints	Mingjie Pan et.al.	2501.03841	null
2025-01-07	KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration	Chengyuan Li et.al.	2501.03786	null
2025-01-07	Realistic Test-Time Adaptation of Vision-Language Models	Maxime Zanella et.al.	2501.03729	link
2025-01-07	Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein	Xiaotong Guo et.al.	2501.03722	null
2025-01-07	SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning	Andrew Li et.al.	2501.03675	null
2025-01-06	Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation	Yuhui Zhang et.al.	2501.03225	link
2025-01-06	Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches	Alhassan Mumuni et.al.	2501.03151	null
2025-01-06	MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Wenyi Hong et.al.	2501.02955	null
2025-01-06	Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology	Susu Sun et.al.	2501.02922	null
2025-01-06	Large Language Models for Video Surveillance Applications	Ulindu De Silva et.al.	2501.02850	null
2025-01-05	Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Simon Park et.al.	2501.02669	link
2025-01-05	Efficient Architectures for High Resolution Vision-Language Models	Miguel Carvalho et.al.	2501.02584	link
2025-01-05	FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models	Hui Lin et.al.	2501.02461	null
2025-01-04	Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations	Kangyu Zhu et.al.	2501.02385	null
2025-01-04	Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4	Messi H. J. Lee et.al.	2501.02211	null
2025-01-03	Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding	Jiaming Li et.al.	2501.01926	link
2025-01-03	MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning	Pu Yang et.al.	2501.01834	null
2025-01-03	LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction	Er Jin et.al.	2501.01767	null
2025-01-03	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-03	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428	null
2025-01-02	Training Medical Large Vision-Language Models with Abnormal-Aware Feedback	Yucheng Zhou et.al.	2501.01377	null
2025-01-02	CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering	Ben Vardi et.al.	2501.01371	null
2025-01-02	Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability	Dong Shu et.al.	2501.01346	null
2025-01-02	CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries	Shudong Liu et.al.	2501.01282	null
2025-01-03	2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining	Wenqi Zhang et.al.	2501.00958	link
2025-01-01	Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models	Emily Johnson et.al.	2501.00917	null
2025-01-01	FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation	Bingyu Li et.al.	2501.00877	link
2025-01-01	IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models	Yiming Zhang et.al.	2501.00848	null
2024-12-31	ICONS: Influence Consensus for Vision-Language Data Selection	Xindi Wu et.al.	2501.00654	null
2024-12-30	Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model	Yifei Huang et.al.	2412.21080	link
2024-12-30	UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI	Fangwei Zhong et.al.	2412.20977	null
2024-12-30	Low-Light Image Enhancement via Generative Perceptual Priors	Han Zhou et.al.	2412.20916	null
2024-12-30	WalkVLM:Aid Visually Impaired People Walking by Vision Language Model	Zhiqiang Yuan et.al.	2412.20903	null
2024-12-30	Towards Compatible Fine-tuning for Vision-Language Model Updates	Zhengbo Wang et.al.	2412.20895	null
2024-12-30	ReStory: VLM-augmentation of Social Human-Robot Interaction Datasets	Fanjun Bu et.al.	2412.20826	null
2024-12-30	Are Vision-Language Models Truly Understanding Multi-vision Sensor?	Sangyun Chung et.al.	2412.20750	link
2024-12-30	UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models	Yujie Li et.al.	2412.20742	link
2024-12-30	M $^3$ oralBench: A MultiModal Moral Benchmark for LVLMs	Bei Yan et.al.	2412.20718	link
2024-12-30	ChartAdapter: Large Vision-Language Model for Chart Summarization	Peixin Xu et.al.	2412.20715	null
2024-12-27	MVTamperBench: Evaluating Robustness of Vision-Language Models	Amit Agarwal et.al.	2412.19794	null
2024-12-27	OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis	Qiushi Sun et.al.	2412.19723	null
2024-12-27	Is Your Text-to-Image Model Robust to Caption Noise?	Weichen Yu et.al.	2412.19531	null
2024-12-27	MBQ: Modality-Balanced Quantization for Large Vision-Language Models	Shiyao Li et.al.	2412.19509	link
2024-12-27	Multi-P $^2$ A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models	Jie Zhang et.al.	2412.19496	link
2024-12-27	Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation	Chengyang Ye et.al.	2412.19492	link
2024-12-26	CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models	Kiet A. Nguyen et.al.	2412.19331	null
2024-12-26	Sketch-MoMa: Teleoperation for Mobile Manipulator via Interpretation of Hand-Drawn Sketches	Kosei Tanada et.al.	2412.19153	null
2024-12-26	MoPD: Mixture-of-Prompts Distillation for Vision-Language Models	Yang Chen et.al.	2412.19087	null
2024-12-26	Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation	Tao Liu et.al.	2412.19021	null
2024-12-24	Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models	Tahira Kazimi et.al.	2412.18604	null
2024-12-24	The Key of Understanding Vision Tasks: Explanatory Instructions	Yang Shen et.al.	2412.18525	link
2024-12-24	LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Chao Deng et.al.	2412.18424	link
2024-12-24	Weak Scaling Capability in Token Space: An Observation from Large Vision Language Model	Tenghui Li et.al.	2412.18387	link
2024-12-24	Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model	Yushu Li et.al.	2412.18303	null
2024-12-24	Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight	Xi Ding et.al.	2412.18298	link
2024-12-24	Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration	Zhixuan Shen et.al.	2412.18292	link
2024-12-24	EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation	Shuhao Han et.al.	2412.18150	link
2024-12-24	MMFactory: A Universal Solution Search Engine for Vision-Language Tasks	Wan-Cyuan Fan et.al.	2412.18072	null
2024-12-23	ChatGarment: Garment Estimation, Generation and Editing via Large Language Models	Siyuan Bian et.al.	2412.17811	null
2024-12-23	Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Yitong Chen et.al.	2412.17800	link
2024-12-23	Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective	Xinmiao Yu et.al.	2412.17787	null
2024-12-23	Reasoning to Attend: Try to Understand How Token Works	Rui Qian et.al.	2412.17741	link
2024-12-23	Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection	Fenfang Tao et.al.	2412.17619	link
2024-12-23	Personalized Large Vision-Language Models	Chau Pham et.al.	2412.17610	null
2024-12-23	Retention Score: Quantifying Jailbreak Risks for Vision Language Models	Zaitang Li et.al.	2412.17544	null
2024-12-23	On the Feasibility of Vision-Language Models for Time-Series Classification	Vinay Prithyani et.al.	2412.17304	link
2024-12-23	GCS-M3VLT: Guided Context Self-Attention based Multi-modal Medical Vision Language Transformer for Retinal Image Captioning	Teja Krishna Cherukuri et.al.	2412.17251	null
2024-12-22	ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues	Shaina Raza et.al.	2412.17052	link
2024-12-20	HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding	Chenxin Tao et.al.	2412.16158	null
2024-12-20	Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training	Mingliang Liang et.al.	2412.16148	link
2024-12-20	Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring	Ahmet Bahaddin Ersoz et.al.	2412.16108	null
2024-12-20	VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models	Dexter Neo et.al.	2412.15739	null
2024-12-20	Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage	Zhi Gao et.al.	2412.15606	null
2024-12-20	VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving	Zilin Huang et.al.	2412.15544	null
2024-12-20	PolySmart @ TRECVid 2024 Video-To-Text	Jiaxin Wu et.al.	2412.15509	null
2024-12-19	TalkWithMachines: Enhancing Human-Robot Interaction for Interpretable Industrial Robotics Through Large/Vision Language Models	Ammar N. Abbas et.al.	2412.15462	null
2024-12-19	PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation	Muntasir Wahed et.al.	2412.15209	null
2024-12-19	AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving	Shuo Xing et.al.	2412.15206	link
2024-12-19	EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues	Sagar Soni et.al.	2412.15190	null
2024-12-19	LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation	Weijia Shi et.al.	2412.15188	null
2024-12-19	A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space	Yonghao He et.al.	2412.14680	link
2024-12-19	FiVL: A Framework for Improved Vision-Language Alignment	Estelle Aflalo et.al.	2412.14672	null
2024-12-19	HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model	Masanari Ohi et.al.	2412.14613	null
2024-12-19	Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation	Jihao Gu et.al.	2412.14487	null
2024-12-19	GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering	Saumya Saxena et.al.	2412.14480	null
2024-12-19	MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval	Junjie Zhou et.al.	2412.14475	null
2024-12-18	Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation	Jianyu Zhang et.al.	2412.14145	null
2024-12-18	Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models	Ido Cohen et.al.	2412.14133	link
2024-12-18	Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models	Xinghang Li et.al.	2412.14058	null
2024-12-18	Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence	Jinghan He et.al.	2412.13949	null
2024-12-18	Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition	Ethan Baron et.al.	2412.13947	null
2024-12-18	Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection	Le Yang et.al.	2412.13817	link
2024-12-18	RelationField: Relate Anything in Radiance Fields	Sebastian Koch et.al.	2412.13652	null
2024-12-18	Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation	Changsun Lee et.al.	2412.13558	null
2024-12-18	Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning	Yingjie Zhu et.al.	2412.13540	link
2024-12-17	Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality	Qitong Wang et.al.	2412.13333	link
2024-12-17	HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction	Chen Bao et.al.	2412.13187	null
2024-12-17	Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration	Mark Endo et.al.	2412.13180	null
2024-12-17	CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models	Zihui Cheng et.al.	2412.12932	null
2024-12-17	An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions	Shreeyash Gowaikar et.al.	2412.12898	null
2024-12-17	ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation	Shiqi Huang et.al.	2412.12798	link
2024-12-17	CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels	Shizhuo Deng et.al.	2412.12793	null
2024-12-17	Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference	Siyuan Wang et.al.	2412.12785	null
2024-12-17	Defending LVLMs Against Vision Attacks through Partial-Perception Supervision	Qi Zhou et.al.	2412.12722	null
2024-12-17	SPHERE: A Hierarchical Evaluation on Spatial Perception and Reasoning for Vision-Language Models	Wenyu Zhang et.al.	2412.12693	null
2024-12-17	DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation	Qingtao Pan et.al.	2412.12492	link
2024-12-16	Does VLM Classification Benefit from LLM Description Semantics?	Pingchuan Ma et.al.	2412.11917	link
2024-12-17	From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach	Xilin Wang et.al.	2412.11892	null
2024-12-16	LMM-Regularized CLIP Embeddings for Image Classification	Maria Tzelepi et.al.	2412.11663	null
2024-12-16	Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves	Shihan Wu et.al.	2412.11509	link
2024-12-16	Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents	Wonje Choi et.al.	2412.11484	null
2024-12-16	OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference	Wei Chen et.al.	2412.11475	null
2024-12-16	MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation	Quan-Sheng Zeng et.al.	2412.11464	link
2024-12-16	Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes	Antonio Carlos Rivera et.al.	2412.11396	null
2024-12-16	Temporal Contrastive Learning for Video Temporal Reasoning in Large Vision-Language Models	Rafael Souza et.al.	2412.11391	null
2024-12-15	Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval	Zelong Sun et.al.	2412.11087	null
2024-12-13	UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities	Muhammad Uzair Khattak et.al.	2412.10372	link
2024-12-13	A dual contrastive framework	Yuan Sun et.al.	2412.10348	null
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation	Hyeonseok Lim et.al.	2412.10151	null
2024-12-13	WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model	Songyan Zhang et.al.	2412.09951	null
2024-12-13	CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models	Dongyu Yao et.al.	2412.09936	link
2024-12-13	Selective State Space Memory for Large Vision-Language Models	Chee Ng et.al.	2412.09875	null
2024-12-12	BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation	Pablo Morales-Álvarez et.al.	2412.09718	null
2024-12-13	V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding	Junqi Ge et.al.	2412.09616	link
2024-12-12	PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models	Chenyu Yang et.al.	2412.09613	null
2024-12-12	Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM	Han Wang et.al.	2412.09530	link
2024-12-12	Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis	Shengxuming Zhang et.al.	2412.09521	null
2024-12-12	ATPrompt: Textual Prompt Learning with Embedded Attributes	Zheng Li et.al.	2412.09442	null
2024-12-12	Causal Graphical Models for Vision-Language Compositional Understanding	Fiorenzo Parascandolo et.al.	2412.09353	link
2024-12-12	Learning Novel Skills from Language-Generated Demonstrations	Ao-Qun Jin et.al.	2412.09286	null
2024-12-12	VLMs meet UDA: Boosting Transferability of Open Vocabulary Segmentation with Unsupervised Domain Adaptation	Roberto Alcover-Couso et.al.	2412.09240	null
2024-12-12	A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter	Zirun Guo et.al.	2412.08979	null
2024-12-12	GaGA: Towards Interactive Global Geolocation Assistant	Zhiyang Dou et.al.	2412.08907	null
2024-12-11	Synthetic Vision: Training Vision-Language Models to Understand Physics	Vahid Balazadeh et.al.	2412.08619	null
2024-12-12	Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Fan Lu et.al.	2412.08614	link
2024-12-11	SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting	Pallavi Jain et.al.	2412.08536	link
2024-12-11	POINTS1.5: Building a Vision-Language Model towards Real World Applications	Yuan Liu et.al.	2412.08443	null
2024-12-11	LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba	Yubo Cui et.al.	2412.08388	null
2024-12-11	HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models	Shiding Zhu et.al.	2412.08378	null
2024-12-11	Position-aware Guided Point Cloud Completion with CLIP Model	Feng Zhou et.al.	2412.08271	null
2024-12-11	TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning	Jingjing Xie et.al.	2412.08176	link
2024-12-11	Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models	Quang-Hung Le et.al.	2412.08125	link
2024-12-11	Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models	Sri Harsha Dumpala et.al.	2412.08111	null
2024-12-10	RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models	Greg Heinrich et.al.	2412.07679	link
2024-12-10	DRUM: Learning Demonstration Retriever for Large MUlti-modal Models	Ellen Yi-Ge et.al.	2412.07619	null
2024-12-10	Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic Scenarios	Jiaqi Fan et.al.	2412.07518	link
2024-12-10	SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World	Jiaqi Zhang et.al.	2412.07472	link
2024-12-10	MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models	Sayak Chakrabarty et.al.	2412.07148	link
2024-12-10	Maya: An Instruction Finetuned Multilingual Multimodal Model	Nahid Alam et.al.	2412.07112	link
2024-12-10	Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling	Donggeun Kim et.al.	2412.07077	null
2024-12-09	Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models	Yi-Lun Lee et.al.	2412.06775	link
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774	null
2024-12-09	Ranking-aware adapter for text-driven image ordering with CLIP	Wei-Hsiang Yu et.al.	2412.06760	link
2024-12-09	ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities	Adhiraj Ghosh et.al.	2412.06745	null
2024-12-09	The Narrow Gate: Localized Image-Text Communication in Vision-Language Models	Alessandro Serra et.al.	2412.06646	null
2024-12-09	From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding	Yixiong Fang et.al.	2412.06474	link
2024-12-09	Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models	Wei Suo et.al.	2412.06458	null
2024-12-09	No Annotations for Object Detection in Art through Stable Diffusion	Patrick Ramos et.al.	2412.06286	link
2024-12-09	iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models	Lianyu Hu et.al.	2412.06263	link
2024-12-09	DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction	Yunheng Li et.al.	2412.06244	null
2024-12-06	Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies	Recep Firat Cekinel et.al.	2412.05155	link
2024-12-06	Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora	Michael Y. Hu et.al.	2412.05149	null
2024-12-06	$S^3$ : Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models	Xiaojie Yin et.al.	2412.04925	null
2024-12-06	Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model	Keunwoo Peter Yu et.al.	2412.04729	null
2024-12-05	Cross-Self KV Cache Pruning for Efficient Vision-Language Inference	Xiaohuan Pei et.al.	2412.04652	link
2024-12-05	VisionZip: Longer is Better but Not Necessary in Vision Language Models	Senqiao Yang et.al.	2412.04467	link
2024-12-05	Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection	Enshen Zhou et.al.	2412.04455	null
2024-12-05	Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Shaunak Halbe et.al.	2412.04429	link
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding	Rong Li et.al.	2412.04383	null
2024-12-05	Discriminative Fine-tuning of LVLMs	Yassine Ouali et.al.	2412.04378	null
2024-12-05	3D Part Segmentation via Geometric Aggregation of 2D Visual Features	Marco Garosi et.al.	2412.04247	null
2024-12-06	VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction	Jiahao Zhang et.al.	2412.04237	null
2024-12-05	Unified Framework for Open-World Compositional Zero-shot Learning	Hirunima Jayasekara et.al.	2412.04083	link
2024-12-05	GenChaR: A Dataset for Stock Chart Captioning	Le Qiu et.al.	2412.04041	null
2024-12-04	FLAIR: VLM with Fine-grained Language-informed Image Representations	Rui Xiao et.al.	2412.03561	link
2024-12-04	Best-of-N Jailbreaking	John Hughes et.al.	2412.03556	link
2024-12-04	PaliGemma 2: A Family of Versatile VLMs for Transfer	Andreas Steiner et.al.	2412.03555	null
2024-12-04	PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation	Ao Wang et.al.	2412.03409	link
2024-12-04	A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs	Wangbo Zhao et.al.	2412.03324	link
2024-12-04	Composed Image Retrieval for Training-Free Domain Conversion	Nikos Efthymiadis et.al.	2412.03297	link
2024-12-04	Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation	Gianni Franchi et.al.	2412.03178	null
2024-12-04	AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?	Shouwei Ruan et.al.	2412.03002	null
2024-12-04	Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch	Qing Zhang et.al.	2412.02978	null
2024-12-04	Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis	Po-Hsuan Huang et.al.	2412.02946	null
2024-12-03	Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback	Hiroki Furuta et.al.	2412.02617	null
2024-12-03	CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs	Abhas Kumar et.al.	2412.02602	null
2024-12-03	OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation	Junyuan Zhang et.al.	2412.02592	link
2024-12-03	Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey	Chenyang Liu et.al.	2412.02573	link
2024-12-03	SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection	Joongwon Chae et.al.	2412.02565	link
2024-12-03	Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks	Jinjin Cai et.al.	2412.02531	null
2024-12-03	OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations	Caixin Kang et.al.	2412.02479	null
2024-12-03	BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding	Chenguang Huang et.al.	2412.02449	null
2024-12-03	Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation	Sepand Dyanatkar et.al.	2412.02262	null
2024-12-03	LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models	Fan-Yun Sun et.al.	2412.02193	null
2024-11-29	SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks	Kim-Celine Kahl et.al.	2411.19688	link
2024-11-29	CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation	Qixiu Li et.al.	2411.19650	null
2024-11-29	Interleaved-Modal Chain-of-Thought	Jun Gao et.al.	2411.19488	null
2024-11-29	Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis	Ruoqi Wang et.al.	2411.19475	null
2024-11-28	Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation	Luca Barsellotti et.al.	2411.19331	link
2024-11-28	GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Muhammad Sohail Danish et.al.	2411.19325	link
2024-11-28	GRAPE: Generalizing Robot Policy via Preference Alignment	Zijian Zhang et.al.	2411.19309	null
2024-11-28	VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models	Jeongho Ju et.al.	2411.19103	null
2024-11-27	ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics	Letian Chen et.al.	2411.18825	null
2024-11-27	Generative Visual Communication in the Era of Vision-Language Models	Yael Vinker et.al.	2411.18727	null
2024-11-27	Visual Adversarial Attack on Vision-Language Models for Autonomous Driving	Tianyuan Zhang et.al.	2411.18275	null
2024-11-27	SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought	Aladin Djuhera et.al.	2411.18212	null
2024-11-27	From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects	Zizhao Li et.al.	2411.18207	link
2024-11-27	Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning	Di Zhang et.al.	2411.18203	null
2024-11-27	DistinctAD: Distinctive Audio Description Generation in Contexts	Bo Fang et.al.	2411.18180	null
2024-11-27	COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models	Xiao An et.al.	2411.18145	null
2024-11-27	When Large Vision-Language Models Meet Person Re-Identification	Qizao Wang et.al.	2411.18111	null
2024-11-27	Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis	Weiqin Zhao et.al.	2411.18101	link
2024-11-27	VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis	Donggoo Kang et.al.	2411.18038	null
2024-11-28	Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models	Shuyang Hao et.al.	2411.18000	null
2024-11-26	What's in the Image? A Deep-Dive into the Vision of Vision Language Models	Omri Kaduri et.al.	2411.17491	null
2024-11-26	VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models	Lei Li et.al.	2411.17451	null
2024-11-26	CoA: Chain-of-Action for Generative Semantic Labels	Meng Wei et.al.	2411.17406	link
2024-11-26	Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment	Dongping Chen et.al.	2411.17188	null
2024-11-26	Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation	Chanyoung Kim et.al.	2411.17150	null
2024-11-26	Free $^2$ Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models	Jaemin Kim et.al.	2411.17041	null
2024-11-26	Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation	Shambhavi Mishra et.al.	2411.17002	link
2024-11-25	Probing the limitations of multimodal language models for chemistry and materials research	Nawaf Alampara et.al.	2411.16955	link
2024-11-25	Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge	Yaqi Zhao et.al.	2411.16824	null
2024-11-25	Generating Out-Of-Distribution Scenarios Using Language Models	Erfan Aasi et.al.	2411.16554	null
2024-11-25	RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Chan Hee Song et.al.	2411.16537	null
2024-11-25	Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis	Boming Miao et.al.	2411.16503	null
2024-11-25	A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models	Manuel Schwonberg et.al.	2411.16407	null
2024-11-25	CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain	Jingchao Peng et.al.	2411.16327	null
2024-11-25	Open-Vocabulary Octree-Graph for 3D Scene Understanding	Zhigang Wang et.al.	2411.16253	null
2024-11-25	Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models	Niloufar Alipour Talemi et.al.	2411.16018	null
2024-11-24	Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation	Sule Bai et.al.	2411.15869	link
2024-11-24	ResCLIP: Residual Attention for Training-free Dense Vision-language Inference	Yuhang Yang et.al.	2411.15851	link
2024-11-24	VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding	Jiaqi Wang et.al.	2411.15839	null
2024-11-22	Context-Aware Multimodal Pretraining	Karsten Roth et.al.	2411.15099	null
2024-11-22	Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning	Junjie Shan et.al.	2411.14937	link
2024-11-22	ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos	Tanveer Hannan et.al.	2411.14901	null
2024-11-22	VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models	Camilo Chacón Sartori et.al.	2411.14832	null
2024-11-22	Continual SFT Matches Multimodal RLHF with Negative Supervision	Ke Zhu et.al.	2411.14797	null
2024-11-22	VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection	Songhao Han et.al.	2411.14794	link
2024-11-22	Effective SAM Combination for Open-Vocabulary Semantic Segmentation	Minhyeok Lee et.al.	2411.14723	null
2024-11-21	GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI	Tianbin Li et.al.	2411.14522	link
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	FoPru: Focal Pruning for Efficient Large Vision-Language Models	Lei Jiang et.al.	2411.14164	null
2024-11-21	Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset	Heejeong Nam et.al.	2411.14137	link
2024-11-20	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Davide Paglieri et.al.	2411.13543	null
2024-11-20	Teaching VLMs to Localize Specific Objects from In-context Examples	Sivan Doveh et.al.	2411.13317	link
2024-11-20	XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation	Ziyi Wang et.al.	2411.13243	link
2024-11-21	ViSTa Dataset: Do vision-language models understand sequential tasks?	Evžen Wybitul et.al.	2411.13211	link
2024-11-20	TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models	Xin Wang et.al.	2411.13136	null
2024-11-19	VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge	Vishwesh Nath et.al.	2411.12915	null
2024-11-19	CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs	Zhehan Kan et.al.	2411.12713	null
2024-11-18	Vision Language Models Are Few-Shot Audio Spectrogram Classifiers	Satvik Dixit et.al.	2411.12058	null
2024-11-18	ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements	M. Arda Aydın et.al.	2411.12044	link
2024-11-18	MC-LLaVA: Multi-Concept Personalized Vision-Language Model	Ruichuan An et.al.	2411.11706	link
2024-11-18	TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World	Xianlong Wang et.al.	2411.11683	null
2024-11-18	VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation	Bangguo Yu et.al.	2411.11609	null
2024-11-18	Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment	Zhendong Liu et.al.	2411.11543	null
2024-11-19	Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models	Chenhang Cui et.al.	2411.11496	link
2024-11-18	Exploring Emerging Trends and Research Opportunities in Visual Place Recognition	Antonios Gasteratos et.al.	2411.11481	null
2024-11-18	Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Jingxuan Li et.al.	2411.11479	null
2024-11-18	Efficient Transfer Learning for Video-language Foundation Models	Haoxing Chen et.al.	2411.11223	link
2024-11-17	Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection	Wentao Bao et.al.	2411.10922	null
2024-11-16	MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection	Xu Cao et.al.	2411.10888	link
2024-11-15	VeriGraph: Scene Graphs for Execution Verifiable Robot Planning	Daniel Ekpo et.al.	2411.10446	null
2024-11-15	LLaVA-o1: Let Vision Language Models Reason Step-by-Step	Guowei Xu et.al.	2411.10440	link
2024-11-15	SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning	Zewen Chen et.al.	2411.10161	link
2024-11-15	Federated Domain Generalization via Prompt Learning and Aggregation	Shuai Gong et.al.	2411.10063	link
2024-11-15	Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement	Yanyan Huang et.al.	2411.09894	link
2024-11-14	LLV-FSR: Exploiting Large Language-Vision Prior for Face Super-resolution	Chenyang Wang et.al.	2411.09293	null
2024-11-13	ClevrSkills: Compositional Language and Visual Reasoning in Robotics	Sanjay Haresh et.al.	2411.09052	link
2024-11-13	DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models	Yongdong Wang et.al.	2411.09022	link
2024-11-13	Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions	Moran Yanuka et.al.	2411.09018	null
2024-11-13	The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models	Daniel P. Jeong et.al.	2411.08870	link
2024-11-13	Sharingan: Extract User Action Sequence from Desktop Recordings	Yanting Chen et.al.	2411.08768	null
2024-11-13	Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification	Jose-Luis Matez-Bandera et.al.	2411.08727	link
2024-11-13	LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation	Pengwei Yin et.al.	2411.08606	null
2024-11-13	NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation	Youzhi Liu et.al.	2411.08579	null
2024-11-13	Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints	Nishanth Kumar et.al.	2411.08253	null
2024-11-12	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	Yiyang Ma et.al.	2411.07975	link
2024-11-12	Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer's Disease	Francesco Chiumento et.al.	2411.07871	null
2024-11-12	BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions	Anas Awadalla et.al.	2411.07461	null
2024-11-11	SAMPart3D: Segment Any Part in 3D Objects	Yunhan Yang et.al.	2411.07184	link
2024-11-11	StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Yichen He et.al.	2411.07076	link
2024-11-11	UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models	Jiachen Liang et.al.	2411.06921	null
2024-11-11	Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning	Hongsheng Zhang et.al.	2411.06764	null
2024-11-11	Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models	Jungseok Hong et.al.	2411.06752	null
2024-11-11	Renaissance: Investigating the Pretraining of Vision-Language Encoders	Clayton Fields et.al.	2411.06657	link
2024-11-09	Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models	Arshia Hemmat et.al.	2411.06287	link
2024-11-09	Aquila-plus: Prompt-Driven Visual-Language Models for Pixel-Level Remote Sensing Image Understanding	Kaixuan Lu et.al.	2411.06142	null
2024-11-09	Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension	Kaixuan Lu et.al.	2411.06074	null
2024-11-09	GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection	Jiyul Ham et.al.	2411.06071	link
2024-11-08	End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering	Dylan Goetting et.al.	2411.05755	link
2024-11-08	Poze: Sports Technique Feedback under Data Constraints	Agamdeep Singh et.al.	2411.05734	null
2024-11-08	A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis	Cristiano Patrício et.al.	2411.05609	link
2024-11-08	Enhancing Visual Classification using Comparative Descriptors	Hankyeol Lee et.al.	2411.05357	link
2024-11-08	Real-World Offline Reinforcement Learning from Vision Language Model Feedback	Sreyas Venkataraman et.al.	2411.05273	null
2024-11-07	On Erroneous Agreements of CLIP Image Embeddings	Siting Li et.al.	2411.05195	null
2024-11-07	Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model	Sheng Cheng et.al.	2411.05079	link
2024-11-07	DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation	Peiqi Liu et.al.	2411.04999	link
2024-11-07	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model	Panwen Hu et.al.	2411.04942	null
2024-11-07	In the Era of Prompt Learning with Vision-Language Models	Ankit Jha et.al.	2411.04892	null
2024-11-07	TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models	Jonathan Fhima et.al.	2411.04642	null
2024-11-07	Vision Language Models are In-Context Value Learners	Yecheng Jason Ma et.al.	2411.04549	null
2024-11-07	BendVLM: Test-Time Debiasing of Vision-Language Embeddings	Walter Gerych et.al.	2411.04420	link
2024-11-06	Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models	Saketh Bachu et.al.	2411.04291	null
2024-11-06	Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?	Daniel P. Jeong et.al.	2411.04118	link
2024-11-06	RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models	Maya Varma et.al.	2411.04097	link
2024-11-06	H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models	Nhi Pham et.al.	2411.04077	null
2024-11-06	Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval	Davide Buoso et.al.	2411.04006	null
2024-11-06	Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models	Minh Duc Bui et.al.	2411.03888	link
2024-11-06	DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model	Tianhao He et.al.	2411.03827	null
2024-11-06	Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction	Muhammad Tayyab Khan et.al.	2411.03707	null
2024-11-05	Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Yingzi Ma et.al.	2411.03554	link
2024-11-05	VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation	Haochen Zhang et.al.	2411.03540	link
2024-11-05	An Application-Agnostic Automatic Target Recognition System Using Vision Language Models	Anthony Palladino et.al.	2411.03491	null
2024-11-05	Inference Optimal VLMs Need Only One Visual Token but Larger Models	Kevin Y. Li et.al.	2411.03312	link
2024-11-05	HumanVLM: Foundation for Human-Scene Vision-Language Model	Dawei Dai et.al.	2411.03034	null
2024-11-05	Membership Inference Attacks against Large Vision-Language Models	Zhan Li et.al.	2411.02902	link
2024-11-05	Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs	Muhammad Tayyab Khan et.al.	2411.02810	null
2024-11-05	Label Critic: Design Data Before Models	Pedro R. A. S. Bassi et.al.	2411.02753	link
2024-11-05	DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark	Haodong Li et.al.	2411.02733	link
2024-11-05	V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization	Yuxi Xie et.al.	2411.02712	link
2024-11-04	Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models	Meng Cao et.al.	2411.02564	link
2024-11-04	INQUIRE: A Natural World Text-to-Image Retrieval Benchmark	Edward Vendrow et.al.	2411.02537	link
2024-11-04	One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering	Deepayan Das et.al.	2411.02210	null
2024-11-04	GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery	Bhupendra Solanki et.al.	2411.02074	null
2024-11-03	Addressing Failures in Robotics using Vision-Based Language Models (VLMs) and Behavior Trees (BT)	Faseeh Ahmad et.al.	2411.01568	null
2024-11-03	Integration of Large Vision Language Models for Efficient Post-disaster Damage Assessment and Reporting	Zhaohui Chen et.al.	2411.01511	null
2024-11-03	A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning	Fei Wang et.al.	2411.01445	null
2024-11-01	Identifying Implicit Social Biases in Vision-Language Models	Kimia Hamidieh et.al.	2411.00997	null
2024-11-01	Retrieval-enriched zero-shot image classification in low-resource domains	Nicola Dall'Asen et.al.	2411.00988	null
2024-11-01	Does GenAI Make Usability Testing Obsolete?	Ali Ebrahimi Pourasad et.al.	2411.00634	null
2024-11-01	CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision	Gi-Cheon Kang et.al.	2411.00508	null
2024-11-01	Right this way: Can VLMs Guide Us to See More to Answer Questions?	Li Liu et.al.	2411.00394	link
2024-10-31	$π_0$ : A Vision-Language-Action Flow Model for General Robot Control	Kevin Black et.al.	2410.24164	null
2024-10-31	Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age	Nouar AlDahoul et.al.	2410.24148	null
2024-10-31	Bayesian-guided Label Mapping for Visual Reprogramming	Chengyi Cai et.al.	2410.24018	link
2024-10-31	EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection	Qinqian Lei et.al.	2410.23904	link
2024-10-31	Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP	Chen Huang et.al.	2410.23698	null
2024-10-31	Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey	Chiyu Zhang et.al.	2410.23687	null
2024-10-31	SuctionPrompt: Visual-assisted Robotic Picking with a Suction Cup Using Vision-Language Models and Facile Hardware Design	Tomohiro Motoda et.al.	2410.23640	null
2024-10-30	Keypoint Abstraction using Large Models for Object-Relative Imitation Learning	Xiaolin Fang et.al.	2410.23254	null
2024-10-30	OS-ATLAS: A Foundation Action Model for Generalist GUI Agents	Zhiyong Wu et.al.	2410.23218	link
2024-10-30	VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning	Yichao Liang et.al.	2410.23156	null
2024-10-30	Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models	Junjie Wu et.al.	2410.23114	link
2024-10-30	An Individual Identity-Driven Framework for Animal Re-Identification	Yihao Wu et.al.	2410.22927	link
2024-10-30	Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector	Youcheng Huang et.al.	2410.22888	link
2024-10-30	Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Kento Kawaharazuka et.al.	2410.22707	null
2024-10-30	SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset	Ngoc Dung Huynh et.al.	2410.22648	null
2024-10-29	Image2Struct: Benchmarking Structure Extraction for Vision-Language Models	Josselin Somerville Roberts et.al.	2410.22456	null
2024-10-29	Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier	Kai Wang et.al.	2410.22317	link
2024-10-29	Natural Language Inference Improves Compositionality in Vision-Language Models	Paola Cascante-Bonilla et.al.	2410.22315	null
2024-10-29	Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving	Bo Jiang et.al.	2410.22313	link
2024-10-29	ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising	Ashutosh Chaubey et.al.	2410.22233	link
2024-10-29	Active Learning for Vision-Language Models	Bardia Safaei et.al.	2410.22187	null
2024-10-29	Are VLMs Really Blind	Ayush Singh et.al.	2410.22029	link
2024-10-29	Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation	Halil Utku Unlu et.al.	2410.21926	null
2024-10-30	Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models	Lu Yu et.al.	2410.21802	link
2024-10-29	PerSRV: Personalized Sticker Retrieval with Vision-Language Model	Heng Er Metilda Chee et.al.	2410.21801	link
2024-10-29	AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?	Han Bao et.al.	2410.21259	link
2024-10-28	Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce	Zhantao Yang et.al.	2410.21237	null
2024-10-28	Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines	Zhixin Zhang et.al.	2410.21220	link
2024-10-29	Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction	Qintong Zhang et.al.	2410.21169	null
2024-10-28	Zero-Shot Action Recognition in Surveillance Videos	Joao Pereira et.al.	2410.21113	null
2024-10-28	BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks	Yunhan Zhao et.al.	2410.20971	null
2024-10-29	VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions	Guanyan Chen et.al.	2410.20927	null
2024-10-28	Improving Generalization in Visual Reasoning via Self-Ensemble	Tien-Huy Nguyen et.al.	2410.20883	null
2024-10-28	Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments	Sangmim Song et.al.	2410.20666	null
2024-10-27	MatViX: Multimodal Information Extraction from Visually Rich Articles	Ghazal Khalighinejad et.al.	2410.20494	null
2024-10-25	Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models	Yucheng Zhou et.al.	2410.19732	null
2024-10-25	GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing	Hosam Elgendy et.al.	2410.19552	link
2024-10-25	Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?	Antonia Wüst et.al.	2410.19546	link
2024-10-25	EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data	Xuetian Chen et.al.	2410.19461	null
2024-10-25	COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training	Haocheng Xi et.al.	2410.19313	link
2024-10-25	Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting	Xingyu Zhu et.al.	2410.19294	null
2024-10-24	Probabilistic Language-Image Pre-Training	Sanghyuk Chun et.al.	2410.18857	link
2024-10-24	Zero-shot Object Navigation with Vision-Language Models Reasoning	Congcong Wen et.al.	2410.18570	null
2024-10-24	Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data	Shuhao Gu et.al.	2410.18558	null
2024-10-24	Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Jinghao Hu et.al.	2410.18537	null
2024-10-23	Lightweight Neural App Control	Filippos Christianos et.al.	2410.17883	null
2024-10-23	ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting	Shaofei Cai et.al.	2410.17856	link
2024-10-23	RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification	Marco Mistretta et.al.	2410.17827	null
2024-10-23	An Intelligent Agentic System for Complex Image Restoration Problems	Kaiwen Zhu et.al.	2410.17809	link
2024-10-23	MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models	Ziyu Liu et.al.	2410.17637	link
2024-10-22	AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents	Chejian Xu et.al.	2410.17401	null
2024-10-22	Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities	Zheyuan Zhang et.al.	2410.17385	link
2024-10-22	PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction	Long Xing et.al.	2410.17247	link
2024-10-22	MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model	Meng Xu et.al.	2410.16840	null
2024-10-21	Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives	Angelo Moroncelli et.al.	2410.16411	link
2024-10-21	VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use	Zhehao Zhang et.al.	2410.16400	null
2024-10-21	Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping	Ryan Li et.al.	2410.16232	null
2024-10-21	Improve Vision Language Model Chain-of-thought Reasoning	Ruohong Zhang et.al.	2410.16198	link
2024-10-21	Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning	Yihong Tang et.al.	2410.16162	null
2024-10-21	Mitigating Object Hallucination via Concentric Causal Attention	Yun Xing et.al.	2410.15926	link
2024-10-21	MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images	Pablo Meseguer et.al.	2410.15881	null
2024-10-21	Task-oriented Robotic Manipulation with Vision Language Models	Nurhan Bulus Guran et.al.	2410.15863	null
2024-10-21	An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps	Ziyi Liu et.al.	2410.15780	link
2024-10-22	Reducing Hallucinations in Vision-Language Models via Latent Space Steering	Sheng Liu et.al.	2410.15778	link
2024-10-21	CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models	Jianjun Gao et.al.	2410.15657	null
2024-10-21	A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM	ByungOk Han et.al.	2410.15549	null
2024-10-18	NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples	Baiqi Li et.al.	2410.14669	null
2024-10-18	Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets	Namid R. Stillman et.al.	2410.14587	null
2024-10-18	CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection	Andrea Appiani et.al.	2410.14509	null
2024-10-18	Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Josiah Aklilu et.al.	2410.14340	null
2024-10-18	E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model	Haoran Lai et.al.	2410.14200	null
2024-10-18	LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs	Yujun Zhou et.al.	2410.14182	null
2024-10-18	MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems	Zifeng Zhu et.al.	2410.14179	null
2024-10-18	ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom	Jingqi Zhou et.al.	2410.14138	null
2024-10-17	Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers	Yuxin Wen et.al.	2410.14072	null
2024-10-17	Reproducibility study of "LICO: Explainable Models with Language-Image Consistency"	Luan Fletcher et.al.	2410.13989	link
2024-10-17	VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding	Runsen Xu et.al.	2410.13860	link
2024-10-17	Differentiable Robot Rendering	Ruoshi Liu et.al.	2410.13851	null
2024-10-17	Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning	Xiaodan Xing et.al.	2410.13823	link
2024-10-17	Improving Multi-modal Large Language Model through Boosting Vision Capabilities	Yanpeng Sun et.al.	2410.13733	null
2024-10-17	VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks	Shailaja Keyur Sampat et.al.	2410.13666	link
2024-10-17	H2OVL-Mississippi Vision Language Models Technical Report	Shaikat Galib et.al.	2410.13611	null
2024-10-17	GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models	Aditya Sharma et.al.	2410.13510	null
2024-10-17	Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding	Kyungmin Min et.al.	2410.13321	null
2024-10-17	Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead	Kuleen Sasse et.al.	2410.13146	link
2024-10-17	Trust but Verify: Programmatic VLM Evaluation in the Wild	Viraj Prabhu et.al.	2410.13121	null
2024-10-16	Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models	Ce Zhang et.al.	2410.12790	link
2024-10-16	Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions	Zhenyu Jiang et.al.	2410.12773	null
2024-10-16	WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation	João Matos et.al.	2410.12722	link
2024-10-16	WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines	Genta Indra Winata et.al.	2410.12705	link
2024-10-16	VividMed: Vision Language Model with Versatile Visual Grounding for Medicine	Lingxiao Luo et.al.	2410.12694	link
2024-10-16	Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models	Shicheng Xu et.al.	2410.12662	null
2024-10-16	FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion	Jiacheng Ruan et.al.	2410.12564	link
2024-10-16	Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety	Lucas Choi et.al.	2410.12225	null
2024-10-16	Leveraging Large Vision Language Model For Better Automatic Web GUI Testing	Siyi Wang et.al.	2410.12157	null
2024-10-15	Enabling Data-Driven and Empathetic Interactions: A Context-Aware 3D Virtual Agent in Mixed Reality for Enhanced Financial Customer Experience	Cindy Xu et.al.	2410.12051	null
2024-10-15	A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem	Kun Ding et.al.	2410.11686	null
2024-10-15	MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval	Reno Kriz et.al.	2410.11619	null
2024-10-15	PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model	Shang-Ching Liu et.al.	2410.11564	null
2024-10-15	LargePiG: Your Large Language Model is Secretly a Pointer Generator	Zhongxiang Sun et.al.	2410.11366	null
2024-10-15	CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification	Huazhong Zhao et.al.	2410.11255	null
2024-10-15	Tree of Attributes Prompt Learning for Vision-Language Models	Tong Ding et.al.	2410.11201	null
2024-10-14	Locality Alignment Improves Vision-Language Models	Ian Covert et.al.	2410.11087	null
2024-10-14	Towards Foundation Models for 3D Vision: How Close Are We?	Yiming Zuo et.al.	2410.10799	link
2024-10-14	VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents	Shi Yu et.al.	2410.10594	link
2024-10-14	Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification	Jiaxiang Gou et.al.	2410.10573	link
2024-10-14	MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks	Jiacheng Chen et.al.	2410.10563	link
2024-10-14	LG-CAV: Train Any Concept Activation Vector with Language Guidance	Qihan Huang et.al.	2410.10308	null
2024-10-14	Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection	Jiawen Zhu et.al.	2410.10289	link
2024-10-14	LOBG:Less Overfitting for Better Generalization in Vision-Language Model	Chenhao Ding et.al.	2410.10247	null
2024-10-14	MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models	Peng Xia et.al.	2410.10139	link
2024-10-14	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	null
2024-10-14	Can We Predict Performance of Large Models across Vision-Language Tasks?	Qinyu Zhao et.al.	2410.10112	link
2024-10-11	Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models	Qin Liu et.al.	2410.09047	null
2024-10-11	The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals	Xiaofeng Wu et.al.	2410.09013	null
2024-10-11	SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation	Haosheng Li et.al.	2410.08901	null
2024-10-11	Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation	Kun Ding et.al.	2410.08895	null
2024-10-11	RoRA-VLM: Robust Retrieval-Augmented Vision Language Models	Jingyuan Qi et.al.	2410.08876	null
2024-10-11	Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies	Yingqiang Gao et.al.	2410.08860	null
2024-10-11	VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model	Beichen Wang et.al.	2410.08792	null
2024-10-11	Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models	Reza Abbasi et.al.	2410.08791	link
2024-10-11	Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Yue Yang et.al.	2410.08695	link
2024-10-11	Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models	Mengyuan Chen et.al.	2410.08611	link
2024-10-10	MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models	Wenbo Hu et.al.	2410.08182	null
2024-10-10	On the Evaluation of Generative Robotic Simulations	Feng Chen et.al.	2410.08172	null
2024-10-10	Q-VLM: Post-training Quantization for Large Vision-Language Models	Changyuan Wang et.al.	2410.08119	link
2024-10-10	Unsupervised Data Validation Methods for Efficient Model Training	Yurii Paniv et.al.	2410.07880	null
2024-10-10	HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter	Yumiao Zhao et.al.	2410.07854	null
2024-10-10	FLIER: Few-shot Language Image Models Embedded with Latent Representations	Zhinuo Zhou et.al.	2410.07648	null
2024-10-10	A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks	Hoin Jung et.al.	2410.07593	link
2024-10-10	3D Vision-Language Gaussian Splatting	Qucheng Peng et.al.	2410.07577	null
2024-10-10	How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?	Seongyun Lee et.al.	2410.07571	null
2024-10-10	CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection	Guankun Wang et.al.	2410.07540	link
2024-10-09	Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate	Qidong Huang et.al.	2410.07167	link
2024-10-09	Towards Interpreting Visual Information Processing in Vision-Language Models	Clement Neo et.al.	2410.07149	link
2024-10-10	EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models	Rui Zhao et.al.	2410.07133	link
2024-10-09	VHELM: A Holistic Evaluation of Vision Language Models	Tony Lee et.al.	2410.07112	link
2024-10-09	Pixtral 12B	Pravesh Agrawal et.al.	2410.07073	link
2024-10-09	Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback	Dennis Hein et.al.	2410.07025	null
2024-10-09	$\texttt{ModSCAN}$ : Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities	Yukun Jiang et.al.	2410.06967	link
2024-10-09	Compositional Entailment Learning for Hyperbolic Vision-Language Models	Avik Pal et.al.	2410.06912	null
2024-10-09	From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models	Yuying Shang et.al.	2410.06795	null
2024-10-09	Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models	Yubo Wang et.al.	2410.06699	null
2024-10-07	Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia	Mohammad Fahes et.al.	2410.05270	link
2024-10-07	TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens	Ya-Qi Yu et.al.	2410.05261	null
2024-10-08	TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Rabin Adhikari et.al.	2410.05239	link
2024-10-07	LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation	Zhijie Wang et.al.	2410.05191	null
2024-10-07	VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks	Ziyan Jiang et.al.	2410.05160	null
2024-10-07	HE-Drive: Human-Like End-to-End Driving with Vision Language Models	Junming Wang et.al.	2410.05051	null
2024-10-07	TLDR: Token-Level Detective Reward Model for Large Vision Language Models	Deqing Fu et.al.	2410.04734	null
2024-10-06	Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress	Christopher Agia et.al.	2410.04640	null
2024-10-06	Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models	Salma Abdel Magid et.al.	2410.04634	null
2024-10-06	LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking	Alimohammad Beigi et.al.	2410.04616	null
2024-10-04	Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models	Tinghui Zhu et.al.	2410.03659	link
2024-10-04	LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos	Noriaki Hirose et.al.	2410.03603	null
2024-10-04	An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation	Ahmed Abdulaal et.al.	2410.03334	null
2024-10-04	Generalizable Prompt Tuning for Vision-Language Models	Qian Zhang et.al.	2410.03189	null
2024-10-04	Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models	Yufang Liu et.al.	2410.03176	link
2024-10-04	CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization	Shigemichi Matsuzaki et.al.	2410.03054	null
2024-10-07	Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL	Naoaki Kanazawa et.al.	2410.02874	null
2024-10-03	Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations	Nick Jiang et.al.	2410.02762	link
2024-10-03	DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects	Zhaowei Wang et.al.	2410.02730	link
2024-10-03	Unified Multi-Modal Interleaved Document Representation for Information Retrieval	Jaewoo Lee et.al.	2410.02729	null
2024-10-03	Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models	Shuoyuan Wang et.al.	2410.02681	null
2024-10-03	LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model	Duy M. H. Nguyen et.al.	2410.02615	null
2024-10-03	Guiding Long-Horizon Task and Motion Planning with Vision Language Models	Zhutian Yang et.al.	2410.02193	null
2024-10-02	Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning	Xiao Yu et.al.	2410.02052	null
2024-10-02	Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description	Mahshid Dehghani et.al.	2410.02049	null
2024-10-02	Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval	Kyle Buettner et.al.	2410.02027	null
2024-10-02	Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker	Xinlong Hou et.al.	2410.01966	null
2024-10-03	Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks	Mengzhao Jia et.al.	2410.01744	link
2024-10-03	LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models	Zhenyue Qin et.al.	2410.01620	null
2024-10-02	Toward a Holistic Evaluation of Robustness in CLIP Models	Weijie Tu et.al.	2410.01534	null
2024-10-03	LEGO: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion	Dexuan Ding et.al.	2410.01506	null
2024-10-02	Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models	Ching-Chia Kao et.al.	2410.01438	null
2024-10-02	Backdooring Vision-Language Models with Out-Of-Distribution Data	Weimin Lyu et.al.	2410.01264	null
2024-10-02	UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Hasnat Md Abdullah et.al.	2410.01180	link
2024-10-01	ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding	Liang Shi et.al.	2410.00982	null
2024-10-01	Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion	Lakshmi Nair et.al.	2410.00731	link
2024-10-01	Find Everything: A General Vision Language Model Approach to Multi-Object Search	Daniel Choi et.al.	2410.00388	null
2024-09-30	UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models	Qiaojun Yu et.al.	2409.20551	null
2024-09-30	Robi Butler: Remote Multimodal Interactions with Household Robot Assistant	Anxing Xiao et.al.	2409.20548	null
2024-09-30	Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments	Mohamed Elnoor et.al.	2409.20445	null
2024-09-30	HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding	Fan Yuan et.al.	2409.20429	null
2024-09-30	World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering	Jiacong Wang et.al.	2409.20424	link
2024-09-30	CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset	Akshatha Arodi et.al.	2409.20353	link
2024-09-30	Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function	Chenyi Zhuang et.al.	2409.19967	link
2024-09-30	Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels	Heeseong Shin et.al.	2409.19846	null
2024-09-30	Textual Training for the Hassle-Free Removal of Unwanted Visual Data	Saehyung Lee et.al.	2409.19840	link
2024-09-29	PALM: Few-Shot Prompt Learning for Audio Language Models	Asif Hanif et.al.	2409.19806	null
2024-09-27	Image-guided topic modeling for interpretable privacy classification	Alina Elena Baia et.al.	2409.18674	link
2024-09-26	SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation	Xin Li et.al.	2409.18082	null
2024-09-26	Infering Alt-text For UI Icons With Large Language Models During App Development	Sabrina Haque et.al.	2409.18060	null
2024-09-26	EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions	Kai Chen et.al.	2409.18042	null
2024-09-26	DARE: Diverse Visual Question Answering with Robustness Evaluation	Hannah Sterz et.al.	2409.18023	null
2024-09-26	The Hard Positive Truth about Vision-Language Compositionality	Amita Kamath et.al.	2409.17958	link
2024-09-26	Cascade Prompt Learning for Vision-Language Model Adaptation	Ge Wu et.al.	2409.17805	link
2024-09-26	Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications	Nghia Nguyen et.al.	2409.17727	null
2024-09-26	AP-VLM: Active Perception Enabled by Vision-Language Models	Venkatesh Sripada et.al.	2409.17641	null
2024-09-26	P4Q: Learning to Prompt for Quantization in Visual-language Models	Huixin Sun et.al.	2409.17634	null
2024-09-26	Leveraging Semantic and Geometric Information for Zero-Shot Robot-to-Human Handover	Jiangshan Liu et.al.	2409.17621	null
2024-09-25	Attention Prompting on Image for Large Vision-Language Models	Runpeng Yu et.al.	2409.17143	link
2024-09-25	Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset	Andrew Goldberg et.al.	2409.17126	null
2024-09-25	Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?	Bowen Zhao et.al.	2409.17080	link
2024-09-25	GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design	Phillip Mueller et.al.	2409.17045	null
2024-09-25	Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification	Ming Li et.al.	2409.16718	link
2024-09-24	A Unified Hallucination Mitigation Framework for Large Vision-Language Models	Yue Chang et.al.	2409.16494	link
2024-09-24	BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes	Kasun Weerakoon et.al.	2409.16484	null
2024-09-24	Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation	Yong Xien Chng et.al.	2409.16278	null
2024-09-24	ComiCap: A VLMs pipeline for dense captioning of Comic Panels	Emanuele Vivoli et.al.	2409.16159	link
2024-09-24	Bridging Environments and Language with Rendering Functions and Vision-Language Models	Theo Cachet et.al.	2409.16024	null
2024-09-18	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Peng Wang et.al.	2409.12191	link
2024-09-18	Mixture of Prompt Learning for Vision Language Models	Yu Du et.al.	2409.12011	null
2024-09-18	GauTOAO: Gaussian-based Task-Oriented Affordance of Objects	Jiawen Wang et.al.	2409.11941	null
2024-09-18	LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models	Amaia Cardiel et.al.	2409.11919	null
2024-09-17	CAST: Cross-modal Alignment Similarity Test for Vision Language Models	Gautier Dagan et.al.	2409.11007	link
2024-09-17	KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph	Yanbei Jiang et.al.	2409.10921	link
2024-09-16	Benchmarking VLMs' Reasoning About Persuasive Atypical Images	Sina Malakouti et.al.	2409.10719	null
2024-09-16	MotIF: Motion Instruction Fine-tuning	Minyoung Hwang et.al.	2409.10683	null
2024-09-16	Do Pre-trained Vision-Language Models Encode Object States?	Kaleb Newman et.al.	2409.10488	null
2024-09-16	CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera	Jingpei Lu et.al.	2409.10441	null
2024-09-16	HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models	Vineet Bhat et.al.	2409.10419	null
2024-09-16	NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions	Zhixi Cai et.al.	2409.10196	null
2024-09-16	MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior	Weijing Tao et.al.	2409.10090	link
2024-09-17	IRIS: Interactive Responsive Intelligent Segmentation for 3D Affordance Analysis	Meng Chu et.al.	2409.10078	null
2024-09-15	FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots	Bo Peng et.al.	2409.09845	null
2024-09-15	Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models	Yuan-Hong Liao et.al.	2409.09788	null
2024-09-15	Finetuning CLIP to Reason about Pairwise Differences	Dylan Sam et.al.	2409.09721	link
2024-09-15	Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs	Mengmeng Ren et.al.	2409.09715	null
2024-09-13	Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation	Hangyu Li et.al.	2409.08598	null
2024-09-13	ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning	Pei Deng et.al.	2409.08582	null
2024-09-13	Generalization Boosted Adapter for Open-Vocabulary Segmentation	Wenhao Xu et.al.	2409.08468	null
2024-09-12	Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations	Samyak Rawlekar et.al.	2409.08381	null
2024-09-12	ComAlign: Compositional Alignment in Vision-Language Models	Ali Abdollah et.al.	2409.08206	null
2024-09-12	What Makes a Maze Look Like a Maze?	Joy Hsu et.al.	2409.08202	null
2024-09-12	DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?	Liqiang Jing et.al.	2409.07703	link
2024-09-12	Open-Vocabulary Remote Sensing Image Semantic Segmentation	Qinglong Cao et.al.	2409.07683	link
2024-09-11	Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks	Md Zarif Hossain et.al.	2409.07353	link
2024-09-14	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-11	Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations	Keumgang Cha et.al.	2409.07048	null
2024-09-10	ExIQA: Explainable Image Quality Assessment Using Distortion Attributes	Sepehr Kazemi Ranjbar et.al.	2409.06853	null
2024-09-10	DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks	Amin Karimi Monsefi et.al.	2409.06809	link
2024-09-09	NeIn: Telling What You Don't Want	Nhat-Tan Bui et.al.	2409.06481	null
2024-09-10	MAGDA: Multi-agent guideline-driven diagnostic assistance	David Bani-Harouni et.al.	2409.06351	null
2024-09-10	INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding	Ji Ha Jang et.al.	2409.06210	null
2024-09-10	Revisiting Prompt Pretraining of Vision-Language Models	Zhenyuan Chen et.al.	2409.06166	null
2024-09-09	PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems	Aditya Narayanan et.al.	2409.06078	link
2024-09-09	DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments	Chengzhong Ma et.al.	2409.05493	null
2024-09-09	From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models	Tessa Pulli et.al.	2409.05413	null
2024-09-11	A Survey of Multimodal Composite Editing and Retrieval	Suyan Li et.al.	2409.05405	link
2024-09-09	Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling	Georgios Pantazopoulos et.al.	2409.05395	link
2024-09-08	PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions	Yudong Zhang et.al.	2409.05076	link
2024-09-07	POINTS: Improving Your Vision-language Model with Affordable Strategies	Yuan Liu et.al.	2409.04828	null
2024-09-07	Enhancing Outlier Knowledge for Few-Shot Out-of-Distribution Detection with Extensible Local Prompts	Fanhu Zeng et.al.	2409.04796	null
2024-09-07	MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality	Ruiting Dai et.al.	2409.04693	null
2024-09-06	COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes	Koen Kraaijveld et.al.	2409.04053	link
2024-09-06	Automating Robot Failure Recovery Using Vision-Language Models With Optimized Prompts	Hongyi Chen et.al.	2409.03966	null
2024-09-05	Few-shot Adaptation of Medical Vision-Language Models	Fereshteh Shakeri et.al.	2409.03868	link
2024-09-05	Text-Guided Mixup Towards Long-Tailed Image Categorization	Richard Franklin et.al.	2409.03583	link
2024-09-05	Have Large Vision-Language Models Mastered Art History?	Ombretta Strafforello et.al.	2409.03521	null
2024-09-04	Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving	Yuhang Lu et.al.	2409.02914	null
2024-09-04	Benchmarking Spurious Bias in Few-Shot Image Classifiers	Guangtao Zheng et.al.	2409.02882	link
2024-09-04	Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection	Kaiqing Lin et.al.	2409.02664	null
2024-09-04	Multi-modal Situated Reasoning in 3D Scenes	Xiongkun Linghu et.al.	2409.02389	null
2024-09-03	Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems	Sanjita Prajapati et.al.	2409.02278	null
2024-09-03	How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?	Saeid Asgari Taghanaki et.al.	2409.02253	link
2024-09-03	Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models	Jiaqi Xu et.al.	2409.02101	link
2024-09-03	GraspSplats: Efficient Manipulation with 3D Feature Splatting	Mazeyu Ji et.al.	2409.02084	null
2024-09-03	Boosting Vision-Language Models for Histopathology Classification: Predict all at once	Maxime Zanella et.al.	2409.01883	link
2024-09-03	Towards Generative Class Prompt Learning for Few-shot Visual Recognition	Soumitri Chattopadhyay et.al.	2409.01835	link
2024-09-03	Open-vocabulary Temporal Action Localization using VLMs	Naoki Wake et.al.	2408.17422	null
2024-09-02	LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation	Shuyi Ouyang et.al.	2408.17347	null
2024-08-30	Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning	Xiaoye Qu et.al.	2408.17150	link
2024-08-29	VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition	Zaiwei Zhang et.al.	2408.16930	null
2024-08-29	PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning	Noor Hussein et.al.	2408.16769	link
2024-08-29	VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation	Shiwei Wu et.al.	2408.16730	null
2024-08-29	Space3D-Bench: Spatial 3D Question Answering Benchmark	Emilia Szymanska et.al.	2408.16662	null
2024-08-29	DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Yongjie Fu et.al.	2408.16647	null
2024-08-29	Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning	Zhengqing Gao et.al.	2408.16486	link
2024-08-29	Text-Enhanced Zero-Shot Action Recognition: A training-free approach	Massimo Bosetti et.al.	2408.16412	null
2024-08-29	Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models	Kengo Nakata et.al.	2408.16296	null
2024-08-29	Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation	Vivek Myers et.al.	2408.16228	null
2024-08-30	LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models	Jingyi Wang et.al.	2408.16224	null
2024-08-28	VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images	M. Maruf et.al.	2408.16176	link
2024-08-28	Visual Prompt Engineering for Medical Vision Language Models in Radiology	Stefan Denner et.al.	2408.15802	null
2024-08-28	Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail	Bianca Lamm et.al.	2408.15626	null
2024-08-28	Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models	Wei Chen et.al.	2408.15518	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-28	VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities	Shusaku Egami et.al.	2408.14895	link
2024-08-27	HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling	Yubin Wang et.al.	2408.14812	null
2024-08-27	MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation	Yuanbing Zhu et.al.	2408.14776	null
2024-08-27	RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models	Junyao Ge et.al.	2408.14744	link
2024-08-27	Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild	Tianqi Wei et.al.	2408.14723	null
2024-08-26	Social perception of faces in a vision-language model	Carina I. Hausladen et.al.	2408.14435	link
2024-08-26	More Pictures Say More: Visual Intersection Network for Open Set Object Detection	Bingcheng Dong et.al.	2408.14032	null
2024-08-26	Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models	Shuai Fu et.al.	2408.13979	link
2024-08-25	LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task	Ali Asgarov et.al.	2408.13909	link
2024-08-25	Evaluating Attribute Comprehension in Large Vision-Language Models	Haiwen Zhang et.al.	2408.13898	link
2024-08-23	VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models	Purushothaman Natarajan et.al.	2408.12808	link
2024-08-23	Cap2Sum: Learning to Summarize Videos by Generating Captions	Cairong Zhao et.al.	2408.12800	null
2024-08-22	Building and better understanding vision-language models: insights and future directions	Hugo Laurençon et.al.	2408.12637	null
2024-08-22	Adapt CLIP as Aggregation Instructor for Image Dehazing	Xiaozhe Zhang et.al.	2408.12317	null
2024-08-22	TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model	Yuhao Wang et.al.	2408.12141	null
2024-08-23	SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models	Youngjoon Yu et.al.	2408.12114	link
2024-08-22	RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data	Chenglong Wang et.al.	2408.12109	link
2024-08-22	DH-Bench: Probing Depth and Height Perception of Large Visual-Language Models	Shehreen Azad et.al.	2408.11748	link
2024-08-21	CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering	Yuliang Cai et.al.	2408.11742	link
2024-08-21	MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning	Minghao Han et.al.	2408.11505	link
2024-08-21	Enabling Small Models for Zero-Shot Classification through Model Label Learning	Jia Zhang et.al.	2408.11449	null
2024-08-21	Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models	Kento Kawaharazuka et.al.	2408.11380	null
2024-08-21	Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework	Xiao Han et.al.	2408.11312	null
2024-08-21	UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Xiangyu Zhao et.al.	2408.11305	link
2024-08-21	Making Large Vision Language Models to be Good Few-shot Learners	Fan Liu et.al.	2408.11297	null
2024-08-21	Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models	Yunpu Zhao et.al.	2408.11261	null
2024-08-20	HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments	Kazi Hasan Ibn Arif et.al.	2408.10945	link
2024-08-21	V-RoAst: A New Dataset for Visual Road Assessment	Natchapon Jongwiriyanurak et.al.	2408.10872	link
2024-08-20	TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning	Bin Wang et.al.	2408.10688	link
2024-08-20	MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval	Haoran Tang et.al.	2408.10575	link
2024-08-19	CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs	Yassine Ouali et.al.	2408.10433	null
2024-08-19	SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP	Yusuke Hirota et.al.	2408.10202	null
2024-08-21	LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Fuzhao Xue et.al.	2408.10188	link
2024-08-19	Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype	Yadong Lu et.al.	2408.09984	null
2024-08-19	Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit	Qizhou Chen et.al.	2408.09916	link
2024-08-19	Cross-composition Feature Disentanglement for Compositional Zero-shot Learning	Yuxia Geng et.al.	2408.09786	null
2024-08-19	MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model	Xinyang Wang et.al.	2408.09706	null
2024-08-18	PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding	Dawei Dai et.al.	2408.09530	link
2024-08-18	Image-Based Geolocation Using Large Vision-Language Models	Yi Liu et.al.	2408.09474	null
2024-08-17	V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models	Junwei You et.al.	2408.09251	null
2024-08-16	DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models	Eman Ali et.al.	2408.08855	null
2024-08-16	Beyond the Hype: A dispassionate look at vision-language models in medical scenario	Yang Nan et.al.	2408.08704	null
2024-08-16	TextCAVs: Debugging vision models using text	Angus Nicolson et.al.	2408.08652	link
2024-08-16	\textit{MMJ-Bench}: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models	Fenghua Weng et.al.	2408.08464	link
2024-08-15	Penny-Wise and Pound-Foolish in Deepfake Detection	Yabin Wang et.al.	2408.08412	link
2024-08-15	Level Up Your Tutorials: VLMs for Game Tutorials Quality Assessment	Daniele Rege Cambrin et.al.	2408.08396	link
2024-08-15	Towards Flexible Visual Relationship Segmentation	Fangrui Zhu et.al.	2408.08305	null
2024-08-14	Cropper: Vision-Language Model for Image Cropping through In-Context Learning	Seung Hyun Lee et.al.	2408.07790	null
2024-08-14	Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach	Shizhou Zhang et.al.	2408.07500	link
2024-08-13	Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces	Zhiling Chen et.al.	2408.07146	null
2024-08-13	Do Vision-Language Foundational models show Robust Visual Perception?	Shivam Chandhok et.al.	2408.06781	link
2024-08-13	Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities	Shivam Chandhok et.al.	2408.06721	null
2024-08-13	IFShip: A Large Vision-Language Model for Interpretable Fine-grained Ship Classification via Domain Knowledge-Enhanced Instruction Tuning	Mingning Guo et.al.	2408.06631	null
2024-08-13	ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding	Yubin Wang et.al.	2408.06622	null
2024-08-12	Long-Form Answers to Visual Questions from Blind and Low Vision People	Mina Huh et.al.	2408.06303	null
2024-08-12	OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning	Mushui Liu et.al.	2408.06158	link
2024-08-12	Adapting a Foundation Model for Space-based Tasks	Matthew Foutter et.al.	2408.05924	null
2024-08-13	Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts	Peng Wu et.al.	2408.05905	null
2024-08-12	GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models	Zixuan Wu et.al.	2408.05894	link
2024-08-11	Efficient Test-Time Prompt Tuning for Vision-Language Models	Yuhan Zhu et.al.	2408.05775	null
2024-08-11	Reference-free Hallucination Detection for Large Vision-Language Models	Qing Li et.al.	2408.05767	null
2024-08-11	Decoder Pre-Training with only Text for Scene Text Recognition	Shuai Zhao et.al.	2408.05706	link
2024-08-09	Hyperbolic Learning with Multimodal Large Language Models	Paolo Mandica et.al.	2408.05097	null
2024-08-09	Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model	Jaehyuk Heo et.al.	2408.04917	link
2024-08-09	VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving	Keke Long et.al.	2408.04821	null
2024-08-09	UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling	Haider Al-Tahan et.al.	2408.04810	link
2024-08-07	Prompt and Prejudice	Lorenzo Berlincioni et.al.	2408.04671	null
2024-08-07	ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling	William Y. Zhu et.al.	2408.04102	link
2024-08-07	How Well Can Vision Language Models See Image Details?	Chenhui Gou et.al.	2408.03940	null
2024-08-07	Target Prompting for Information Extraction with Vision Language Model	Dipankar Medhi et.al.	2408.03834	null
2024-08-07	Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling	Zilyu Ye et.al.	2408.03695	link
2024-08-07	Teach CLIP to Develop a Number Sense for Ordinal Regression	Yao Du et.al.	2408.03574	link
2024-08-07	Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection	Subaru Kimura et.al.	2408.03554	null
2024-08-09	GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI	Pengcheng Chen et.al.	2408.03361	link
2024-08-06	Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization	Yanghai Zhang et.al.	2408.03149	link
2024-08-05	Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services	Shaopeng Fu et.al.	2408.02814	link
2024-08-05	MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models	Fanqing Meng et.al.	2408.02718	null
2024-08-07	TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments	Daeun Song et.al.	2408.02454	null
2024-08-05	Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs	Jeongkee Lim et.al.	2408.02261	link
2024-08-05	Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets	Lucas Choi et.al.	2408.02244	null
2024-08-05	REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models	Agneet Chatterjee et.al.	2408.02231	null
2024-08-04	Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models	Fushuo Huo et.al.	2408.02032	link
2024-08-04	AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis	Townim F. Chowdhury et.al.	2408.02001	link
2024-08-04	Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI	Robert Wolfe et.al.	2408.01959	null
2024-08-04	Visual Grounding for Object-Level Generalization in Reinforcement Learning	Haobin Jiang et.al.	2408.01942	link
2024-08-03	Is Generative Communication between Embodied Agents Good for Zero-Shot ObjectNav?	Vishnu Sashank Dorbala et.al.	2408.01877	null
2024-08-03	Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis	Hiroshi Takato et.al.	2408.01682	null
2024-08-02	Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation	Jheng-Hong Yang et.al.	2408.01363	null
2024-08-02	The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models	Simone Caldarella et.al.	2408.01228	null
2024-08-01	Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)	Bin Han et.al.	2408.00932	null
2024-08-01	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	Siyu Jiao et.al.	2408.00744	link
2024-08-01	ExpertAF: Expert Actionable Feedback from Video	Kumar Ashutosh et.al.	2408.00672	null
2024-08-01	Are Bigger Encoders Always Better in Vision Large Models?	Bozhou Li et.al.	2408.00620	null
2024-08-01	Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation	Xiaoye Qu et.al.	2408.00555	null
2024-08-01	Mitigating Multilingual Hallucination in Large Vision-Language Models	Xiaoye Qu et.al.	2408.00550	link
2024-08-01	Jailbreaking Text-to-Image Models with LLM-Based Agents	Yingkai Dong et.al.	2408.00523	null
2024-08-01	DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation	Rakshith Subramanyam et.al.	2408.00331	link
2024-08-01	OmniParser for Pure Vision Based GUI Agent	Yadong Lu et.al.	2408.00203	null
2024-07-31	Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey	Atsuyuki Miyai et.al.	2407.21794	null
2024-07-31	Vision-Language Model Based Handwriting Verification	Mihir Chauhan et.al.	2407.21788	null
2024-07-31	Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs	Shi Liu et.al.	2407.21771	null
2024-07-31	Open-Vocabulary Audio-Visual Semantic Segmentation	Ruohao Guo et.al.	2407.21721	null
2024-08-01	Defending Jailbreak Attack in VLMs via Cross-modality Information Detector	Yue Xu et.al.	2407.21659	link
2024-07-31	MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment	Anurag Das et.al.	2407.21654	null
2024-07-31	Conditioned Prompt-Optimization for Continual Deepfake Detection	Francesco Laiti et.al.	2407.21554	link
2024-07-31	MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection	Kuo Wang et.al.	2407.21465	link
2024-07-31	Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering	Danfeng Guo et.al.	2407.21368	null
2024-07-31	SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving	Peiru Zheng et.al.	2407.21293	null
2024-07-30	GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Ali Abdollahi et.al.	2407.21001	link
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-30	SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition	Hao Tan et.al.	2407.20920	null
2024-07-30	Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning	Norman Di Palo et.al.	2407.20798	null
2024-07-30	OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance	Yongqiang Yao et.al.	2407.20761	link
2024-07-30	SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models	Zheng Liu et.al.	2407.20756	link
2024-07-30	Autonomous Improvement of Instruction Following Skills via Foundation Models	Zhiyuan Zhou et.al.	2407.20635	null
2024-07-29	FlexAttention for Efficient High-Resolution Vision-Language Models	Junyan Li et.al.	2407.20228	null
2024-07-29	Normality Addition via Normality Detection in Industrial Image Anomaly Detection Models	Jihun Yi et.al.	2407.19849	null
2024-07-29	Harnessing Large Vision and Language Models in Agriculture: A Review	Hongyan Zhu et.al.	2407.19679	null
2024-07-27	GP-VLS: A general-purpose vision language model for surgery	Samuel Schmidgall et.al.	2407.19305	null
2024-07-26	Solving Robotics Problems in Zero-Shot with Vision-Language Models	Zidan Wang et.al.	2407.19094	null
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-25	UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models	Xinyu Pi et.al.	2407.18391	null
2024-07-25	$\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs	Vlad Sobal et.al.	2407.18134	null
2024-07-25	Efficient Inference of Vision Instruction-Following Models with Elastic Cache	Zuyan Liu et.al.	2407.18121	link
2024-07-25	Cost-effective Instruction Learning for Pathology Vision and Language Analysis	Kaitao Chen et.al.	2407.17734	link
2024-07-24	DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation	Qian Feng et.al.	2407.17348	null
2024-07-26	Selective Vision-Language Subspace Projection for Few-shot CLIP	Xingyu Zhu et.al.	2407.16977	link
2024-07-23	Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions	Kai Liu et.al.	2407.16725	link
2024-07-23	Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models	Aristeidis Panos et.al.	2407.16526	null
2024-07-23	Cross Anything: General Quadruped Robot Navigation through Complex Terrains	Shaoting Zhu et.al.	2407.16412	null
2024-07-22	Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models	Raza Imam et.al.	2407.15913	link
2024-07-22	AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection	Yunkang Cao et.al.	2407.15795	link
2024-07-22	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning	Emanuele Frascaroli et.al.	2407.15793	link
2024-07-22	Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels	Zhuorui Ye et.al.	2407.15786	null
2024-07-22	Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders	Laura Niss et.al.	2407.15731	null
2024-07-23	SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection	Dimitrios Kollias et.al.	2407.15728	null
2024-07-22	HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning	Zhecan Wang et.al.	2407.15680	link
2024-07-22	In-Context Learning Improves Compositional Understanding of Vision-Language Models	Matteo Nulli et.al.	2407.15487	link
2024-07-22	WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding	Quan Kong et.al.	2407.15350	null
2024-07-21	Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective	Mariya Hendriksen et.al.	2407.15239	null
2024-07-21	When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?	Rylan Schaeffer et.al.	2407.15211	null
2024-07-19	DEAL: Disentangle and Localize Concept-level Explanations for VLMs	Tang Li et.al.	2407.14412	link
2024-07-19	Multimodal Misinformation Detection using Large Vision-Language Models	Sahar Tahmasebi et.al.	2407.14321	null
2024-07-19	Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models	Dionis Totsila et.al.	2407.14229	link
2024-07-19	EVLM: An Efficient Vision-Language Model for Visual Understanding	Kaibing Chen et.al.	2407.14177	null
2024-07-19	Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition	Rui Zhang et.al.	2407.14146	null
2024-07-19	Multi-modal Relation Distillation for Unified 3D Representation Learning	Huiqun Wang et.al.	2407.14007	null
2024-07-18	Simultaneous Localization and Affordance Prediction for Tasks in Egocentric Video	Zachary Chavis et.al.	2407.13856	null
2024-07-18	Which objects help me to act effectively? Reasoning about physically-grounded affordances	Anne Kemmeren et.al.	2407.13811	null
2024-07-18	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models	Moon Ye-Bin et.al.	2407.13442	null
2024-07-18	Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction	Gertjan Burghouts et.al.	2407.13368	null
2024-07-17	R+X: Retrieval and Execution from Everyday Human Videos	Georgios Papagiannis et.al.	2407.12957	null
2024-07-17	ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference	Mengcheng Lan et.al.	2407.12442	null
2024-07-17	NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models	Gengze Zhou et.al.	2407.12366	link
2024-07-17	VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions	Seokha Moon et.al.	2407.12345	null
2024-07-17	ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map	Yilin Ye et.al.	2407.12315	link
2024-07-17	VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation	Zhen Qu et.al.	2407.12276	link
2024-07-16	XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach	Truong Thanh Hung Nguyen et.al.	2407.11771	link
2024-07-16	VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models	Haodong Duan et.al.	2407.11691	link
2024-07-16	FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models	Pengxiang Li et.al.	2407.11522	null
2024-07-16	Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation	Shijie Chang et.al.	2407.11503	null
2024-07-16	Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models	Jinrui Zhang et.al.	2407.11422	null
2024-07-16	Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain	Hyeon Bae Kim et.al.	2407.11375	link
2024-07-16	Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities	Xu Zheng et.al.	2407.11351	null
2024-07-16	LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction	Penghui Du et.al.	2407.11335	link
2024-07-16	Large Vision-Language Models as Emotion Recognizers in Context Awareness	Yuxuan Lei et.al.	2407.11300	null
2024-07-15	Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques	Rishika Bhagwatkar et.al.	2407.11121	null
2024-07-15	Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?	Ruisheng Cao et.al.	2407.10956	link
2024-07-15	Benchmarking Vision Language Models for Cultural Understanding	Shravan Nayak et.al.	2407.10920	null
2024-07-15	GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM	Keshav Bimbraw et.al.	2407.10870	null
2024-07-15	Physics-Inspired Generative Models in Medical Imaging: A Review	Dennis Hein et.al.	2407.10856	null
2024-07-15	Quantized Prompt for Efficient Generalization of Vision-Language Models	Tianxiang Hao et.al.	2407.10704	link
2024-07-15	OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer	Yu Wang et.al.	2407.10655	link
2024-07-15	NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models	Pranshu Pandya et.al.	2407.10380	null
2024-07-14	Affordance-Guided Reinforcement Learning via Visual Prompting	Olivia Y. Lee et.al.	2407.10341	null
2024-07-13	VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation	Wentao Zhao et.al.	2407.09829	link
2024-07-13	3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance	Xiaoxu Xu et.al.	2407.09826	link
2024-07-12	Open Vocabulary Multi-Label Video Classification	Rohit Gupta et.al.	2407.09073	null
2024-07-12	Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing	Jun Zhu et.al.	2407.09053	link
2024-07-12	Textual Query-Driven Mask Transformer for Domain Generalized Segmentation	Byeonghyun Pak et.al.	2407.09033	link
2024-07-12	OVExp: Open Vocabulary Exploration for Object-Oriented Navigation	Meng Wei et.al.	2407.09016	null
2024-07-12	LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models	Yabin Zhang et.al.	2407.08966	link
2024-07-11	Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design	Jingyi Xie et.al.	2407.08882	null
2024-07-11	CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting	Naman Sharma et.al.	2407.08811	null
2024-07-11	Extracting Training Data from Document-Based VQA Models	Francesco Pinto et.al.	2407.08707	null
2024-07-11	HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models	Runhui Huang et.al.	2407.08706	null
2024-07-12	Robotic Control via Embodied Chain-of-Thought Reasoning	Michał Zawalski et.al.	2407.08693	null
2024-07-11	NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning	Yi Zhang et.al.	2407.08672	null
2024-07-11	Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement	Zijie Yue et.al.	2407.08507	null
2024-07-11	Specialist vision-language models for clinical ophthalmology	Robbie Holland et.al.	2407.08410	link
2024-07-11	Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization	Jinlong Li et.al.	2407.08374	null
2024-07-11	Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation	Tong Shao et.al.	2407.08268	link
2024-07-11	AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization	Shixiong Xu et.al.	2407.08156	link
2024-07-11	Live Fitness Coaching as a Testbed for Situated Interaction	Sunny Panchal et.al.	2407.08101	link
2024-07-10	Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison	Qian Yang et.al.	2407.07840	null
2024-07-10	Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs	Hao-Tien Lewis Chiang et.al.	2407.07775	null
2024-07-10	PaliGemma: A versatile 3B VLM for transfer	Lucas Beyer et.al.	2407.07726	link
2024-07-11	Tuning Vision-Language Models with Candidate Labels by Prompt Alignment	Zhifang Zhang et.al.	2407.07638	null
2024-07-10	IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model	Yatai Ji et.al.	2407.07577	link
2024-07-10	A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends	Daizong Liu et.al.	2407.07403	link
2024-07-10	Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems	Chashi Mahiul Islam et.al.	2407.07392	null
2024-07-10	Towards a text-based quantitative and explainable histopathology image analysis	Anh Tien Nguyen et.al.	2407.07360	link
2024-07-10	CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging	Raza Imam et.al.	2407.07315	null
2024-07-09	Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization	Jeongseok Hyun et.al.	2407.07024	link
2024-07-09	CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection	Shuang Hao et.al.	2407.06780	link
2024-07-09	LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition	Teng Wang et.al.	2407.06730	null
2024-07-09	Vision language models are blind	Pooyan Rahmanzadehgervi et.al.	2407.06581	link
2024-07-08	A Single Transformer for Scalable Vision-Language Modeling	Yangyi Chen et.al.	2407.06438	link
2024-07-08	Multi-Object Hallucination in Vision-Language Models	Xuweiyi Chen et.al.	2407.06192	link
2024-07-08	Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Orr Zohar et.al.	2407.06189	link
2024-07-08	Vision-Language Models under Cultural and Inclusive Considerations	Antonia Karamolegkou et.al.	2407.06177	null
2024-07-08	Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding	Aaron Lohner et.al.	2407.05910	null
2024-07-09	HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Yingying Jiang et.al.	2407.05795	null
2024-07-08	OneDiff: A Generalist Model for Image Difference	Erdong Hu et.al.	2407.05645	null
2024-07-07	Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models	Longxiang Tang et.al.	2407.05342	link
2024-07-07	WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks	Léo Boisvert et.al.	2407.05291	link
2024-07-07	Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image	Pengkun Jiao et.al.	2407.05256	null
2024-07-06	FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding	Huitong Pan et.al.	2407.05183	link
2024-07-05	AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation	Yuhan Zhu et.al.	2407.04603	link
2024-07-05	Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model	Duy M. H. Nguyen et.al.	2407.04489	null
2024-07-05	Smart Vision-Language Reasoners	Denisa Roberts et.al.	2407.04212	link
2024-07-04	VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation	I-Chun Arthur Liu et.al.	2407.04152	link
2024-07-04	MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis	Asma Alkhaldi et.al.	2407.04106	link
2024-07-04	Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners	Mushui Liu et.al.	2407.04003	null
2024-07-04	Concept Bottleneck Models Without Predefined Concepts	Simon Schrodi et.al.	2407.03921	null
2024-07-04	Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning	Thong Nguyen et.al.	2407.03788	link
2024-07-04	Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models	Chang-Sheng Kao et.al.	2407.03615	link
2024-07-04	Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations	Zhiyang Xu et.al.	2407.03604	null
2024-07-03	InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Pan Zhang et.al.	2407.03320	link
2024-07-03	BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations	Zhantao Yang et.al.	2407.03314	null
2024-07-03	Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation	Marco Mistretta et.al.	2407.03056	link
2024-07-03	SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning	Bac Nguyen et.al.	2407.03036	null
2024-07-03	VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values	Zhe Hu et.al.	2407.03000	null
2024-07-03	Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective	Zhaotian Weng et.al.	2407.02814	null
2024-07-03	MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context	Zishan Gu et.al.	2407.02730	link
2024-07-02	Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models	Xu Han et.al.	2407.02716	null
2024-07-02	Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models	Annie S. Chen et.al.	2407.02666	null
2024-07-02	Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models	Joan Nwatu et.al.	2407.02623	link
2024-07-02	Conceptual Codebook Learning for Vision-Language Models	Yi Zhang et.al.	2407.02350	null
2024-07-02	Why do LLaVA Vision-Language Models Reply to Images in English?	Musashi Hinck et.al.	2407.02333	null
2024-07-02	Multi-Modal Video Dialog State Tracking in the Wild	Adnen Abdessaied et.al.	2407.02218	null
2024-07-02	BiasDora: Exploring Hidden Biased Associations in Vision-Language Models	Chahat Raj et.al.	2407.02066	link
2024-07-02	Fake News Detection and Manipulation Reasoning via Large Vision-Language Models	Ruihan Jin et.al.	2407.02042	null
2024-07-03	ViG-Bias: Visually Grounded Bias Discovery and Mitigation	Badr-Eddine Marani et.al.	2407.01996	link
2024-07-02	SADL: An Effective In-Context Learning Method for Compositional Visual QA	Long Hoang Dang et.al.	2407.01983	null
2024-07-02	VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs	Qiucheng Wu et.al.	2407.01863	link
2024-07-01	CLIP the Divergence: Language-guided Unsupervised Domain Adaptation	Jinjing Zhu et.al.	2407.01842	null
2024-07-01	μ-Bench: A Vision-Language Benchmark for Microscopy Understanding	Alejandro Lozano et.al.	2407.01791	link
2024-06-28	LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Xiang Li et.al.	2406.20095	link
2024-06-28	EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model	Yuxuan Zhang et.al.	2406.20076	link
2024-06-28	STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical	Guohao Sun et.al.	2406.19973	link
2024-06-28	From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis	Chuanqi Cheng et.al.	2406.19934	link
2024-06-28	Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model	Longrong Yang et.al.	2406.19905	link
2024-06-27	PathAlign: A vision-language model for whole slide images in histopathology	Faruk Ahmed et.al.	2406.19578	null
2024-06-27	RAVEN: Multitask Retrieval Augmented Vision-Language Learning	Varun Nagaraj Rao et.al.	2406.19150	null
2024-06-27	CELLO: Causal Evaluation of Large Vision-Language Models	Meiqi Chen et.al.	2406.19131	link
2024-06-27	Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis	Yibo Gao et.al.	2406.19130	link
2024-06-27	RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton	Fanfan Liu et.al.	2406.18977	link
2024-06-28	Manipulate-Anything: Automating Real-World Robots using Vision-Language Models	Jiafei Duan et.al.	2406.18915	null
2024-06-27	Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models	Yicheng Xu et.al.	2406.18868	link
2024-06-27	Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs	Jie Zhang et.al.	2406.18849	link
2024-06-28	Revisiting Backdoor Attacks against Large Vision-Language Models	Siyuan Liang et.al.	2406.18844	null
2024-06-26	MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data	William Berman et.al.	2406.18790	null
2024-06-26	3D Feature Distillation with Object-Centric Priors	Georgios Tziafas et.al.	2406.18742	null
2024-06-26	Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme	Pi-Wei Chen et.al.	2406.18197	null
2024-06-26	Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation	Qilai Zhang et.al.	2406.18054	link
2024-06-26	Multimodal foundation world models for generalist embodied agents	Pietro Mazzaglia et.al.	2406.18043	link
2024-06-25	Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts	Xuyang Wu et.al.	2406.17974	link
2024-06-25	EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data	Jesse Zhang et.al.	2406.17768	null
2024-06-25	DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning	Xiaohan Zhang et.al.	2406.17659	null
2024-06-24	Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models	Bei Yan et.al.	2406.17115	link
2024-06-24	Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts	Aditya Sharma et.al.	2406.16851	null
2024-06-24	ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance	Shuwei Shi et.al.	2406.16476	null
2024-06-24	Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration	Yujin Baek et.al.	2406.16469	null
2024-06-24	Evaluating and Analyzing Relationship Hallucinations in LVLMs	Mingrui Wu et.al.	2406.16449	link
2024-06-24	High-resolution open-vocabulary object 6D pose estimation	Jaime Corsetti et.al.	2406.16384	null
2024-06-24	What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation	Michal Golovanevsky et.al.	2406.16320	link
2024-06-23	Review of Zero-Shot and Few-Shot AI Algorithms in The Medical Domain	Maged Badawi et.al.	2406.16143	null
2024-06-22	TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM	Wenxue Li et.al.	2406.15764	link
2024-06-21	Open-vocabulary Pick and Place via Patch-level Semantic Maps	Mingxi Jia et.al.	2406.15677	null
2024-06-21	DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection	Jia Syuen Lim et.al.	2406.14924	null
2024-06-21	Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models	Jiayu Wang et.al.	2406.14852	link
2024-06-20	ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights	Gabriel Sarch et.al.	2406.14596	null
2024-06-20	Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs	Yuxuan Qiao et.al.	2406.14544	link
2024-06-20	MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Xinyu Fang et.al.	2406.14515	link
2024-06-20	African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Gregor Geigle et.al.	2406.14496	link
2024-06-20	Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?	Gregor Geigle et.al.	2406.14492	null
2024-06-20	Revealing Vision-Language Integration in the Brain with Multimodal Networks	Vighnesh Subramaniam et.al.	2406.14481	link
2024-06-20	VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model	Jie Zhang et.al.	2406.14194	link
2024-06-20	MACAROON: Training Vision-Language Models To Be Your Engaged Partners	Shujin Wu et.al.	2406.14137	link
2024-06-21	VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning	Ziyang Meng et.al.	2406.14056	link
2024-06-20	From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment	Yusuke Hirota et.al.	2406.13912	null
2024-06-19	WATT: Weight Average Test-Time Adaption of CLIP	David Osowiechi et.al.	2406.13875	link
2024-06-18	AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention	Wenbin An et.al.	2406.12718	link
2024-06-18	Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?	Mingqian Feng et.al.	2406.12663	null
2024-06-18	Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model	Jiang-Xin Shi et.al.	2406.12638	link
2024-06-18	VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding	Xiang Li et.al.	2406.12384	link
2024-06-18	VoCo-LLaMA: Towards Vision Compression with Large Language Models	Xubing Ye et.al.	2406.12275	link
2024-06-18	The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge	Hongpeng Pan et.al.	2406.12225	null
2024-06-17	SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model	Yongting Zhang et.al.	2406.12030	link
2024-06-17	MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs	Ziyu Liu et.al.	2406.11833	link
2024-06-17	Unveiling Encoder-Free Vision-Language Models	Haiwen Diao et.al.	2406.11832	link
2024-06-17	On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning	Geewook Kim et.al.	2406.11823	link
2024-06-17	See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding	Amith Ananthram et.al.	2406.11665	link
2024-06-18	MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More	Yue Jiang et.al.	2406.11451	null
2024-06-17	They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias	Salma Abdel Magid et.al.	2406.11331	null
2024-06-17	GUICourse: From General Vision Language Models to Versatile GUI Agents	Wentong Chen et.al.	2406.11317	link
2024-06-18	BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models	Xuefeng Hu et.al.	2406.11309	null
2024-06-17	MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models	Shengkang Wang et.al.	2406.11288	link
2024-06-17	Unifying Multimodal Retrieval via Document Screenshot Embedding	Xueguang Ma et.al.	2406.11251	null
2024-06-14	Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding	Ridouane Ghermi et.al.	2406.10221	link
2024-06-14	DevBench: A multimodal developmental benchmark for language learning	Alvin Wei Ming Tan et.al.	2406.10215	link
2024-06-14	Detecting and Evaluating Medical Hallucinations in Large Vision Language Models	Jiawei Chen et.al.	2406.10185	null
2024-06-14	CarLLaVA: Vision language models for camera-only closed-loop driving	Katrin Renz et.al.	2406.10165	null
2024-06-14	RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model	Hantao Zhou et.al.	2406.10157	null
2024-06-14	Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning	Xiaowen Sun et.al.	2406.09988	link
2024-06-14	Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment	Fei Zhou et.al.	2406.09858	null
2024-06-14	Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps	Jian Chen et.al.	2406.09838	link
2024-06-14	Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting	Ce Hao et.al.	2406.09767	null
2024-06-13	Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA	Jongwoo Park et.al.	2406.09396	link
2024-06-13	Enhancing Domain Adaptation through Prompt Gradient Alignment	Hoang Phan et.al.	2406.09353	link
2024-06-13	AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models	Yuhang Wu et.al.	2406.09295	null
2024-06-13	MirrorCheck: Efficient Adversarial Defense for Vision-Language Models	Samar Fares et.al.	2406.09250	null
2024-06-13	Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model	Melvin Wong et.al.	2406.09143	null
2024-06-13	INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance	Chenwei Lin et.al.	2406.09105	link
2024-06-13	How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models	Tarun Khajuria et.al.	2406.09067	null
2024-06-13	Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning	Huy Hoang Nguyen et.al.	2406.09039	null
2024-06-13	Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency	Maor Dikter et.al.	2406.08840	link
2024-06-13	MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs	Xuannan Liu et.al.	2406.08772	null
2024-06-12	What If We Recaption Billions of Web Images with LLaMA-3?	Xianhang Li et.al.	2406.08478	null
2024-06-12	AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind	Wei Ding et.al.	2406.08455	null
2024-06-12	ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs	Irene Huang et.al.	2406.08164	link
2024-06-12	Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Shimin Chen et.al.	2406.08024	null
2024-06-13	A3VLM: Actionable Articulation-Aware Vision Language Model	Siyuan Huang et.al.	2406.07549	link
2024-06-11	Let Go of Your Labels with Unsupervised Transfer	Artyom Gadetsky et.al.	2406.07236	link
2024-06-11	FaceGPT: Self-supervised Learning to Chat about 3D Human Faces	Haoran Wang et.al.	2406.07163	null
2024-06-11	Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph	Sergey Linok et.al.	2406.07113	null
2024-06-11	UVIS: Unsupervised Video Instance Segmentation	Shuaiyi Huang et.al.	2406.06908	null
2024-06-10	Merlin: A Vision Language Foundation Model for 3D Computed Tomography	Louis Blankemeier et.al.	2406.06512	null
2024-06-10	Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation	Oishi Banerjee et.al.	2406.06496	null
2024-06-10	VCR: Visual Caption Restoration	Tianyu Zhang et.al.	2406.06462	link
2024-06-10	Data Augmentation in Earth Observation: A Diffusion Model Approach	Tiago Sousa et.al.	2406.06218	null
2024-06-10	CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models	Peng Xia et.al.	2406.06007	link
2024-06-10	CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark	David Romero et.al.	2406.05967	null
2024-06-09	EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models	Mengfei Du et.al.	2406.05756	link
2024-06-09	ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition	Sanjoy Kundu et.al.	2406.05722	null
2024-06-08	Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification	Yunhe Gao et.al.	2406.05596	null
2024-06-08	Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models	Minho Park et.al.	2406.05432	link
2024-06-07	3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation	Feiyu Pan et.al.	2406.04842	null
2024-06-07	OVMR: Open-Vocabulary Recognition with Multi-Modal References	Zehong Ma et.al.	2406.04675	link
2024-06-06	Evaluating Large Vision-Language Models' Understanding of Real-World Complexities Through Synthetic Benchmarks	Haokun Zhou et.al.	2406.04470	null
2024-06-06	Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning	Amandeep Kumar et.al.	2406.04413	link
2024-06-06	VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval	Junjie Zhou et.al.	2406.04292	link
2024-06-06	Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt	Zonghao Ying et.al.	2406.04031	link
2024-06-06	Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following	Anshul Gupta et.al.	2406.03907	null
2024-06-06	VisLTR: Visualization-in-the-Loop Table Reasoning	Jianing Hao et.al.	2406.03753	null
2024-06-05	CountCLIP -- [Re] Teaching CLIP to Count to Ten	Harshvardhan Mestha et.al.	2406.03586	link
2024-06-05	Exploiting LMM-based knowledge for image classification tasks	Maria Tzelepi et.al.	2406.03071	null
2024-06-05	Balancing Performance and Efficiency in Zero-shot Robotic Navigation	Dmytro Kuzmenko et.al.	2406.03015	null
2024-06-05	Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models	Jinhao Li et.al.	2406.02915	link
2024-06-04	LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery	Samuel Scheele et.al.	2406.02780	link
2024-06-04	TopViewRS: Vision-Language Models as Top-View Spatial Reasoners	Chengzu Li et.al.	2406.02537	link
2024-06-04	On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept	Guangliang Liu et.al.	2406.02378	null
2024-06-04	Radar Spectra-Language Model for Automotive Scene Parsing	Mariia Pushkareva et.al.	2406.02158	null
2024-06-04	HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model	Yu Tian et.al.	2406.01914	null
2024-06-03	Boosting Vision-Language Models with Transduction	Maxime Zanella et.al.	2406.01837	link
2024-06-03	SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model	An-Chieh Cheng et.al.	2406.01584	null
2024-06-03	SLANT: Spurious Logo ANalysis Toolkit	Maan Qraitem et.al.	2406.01449	null
2024-06-03	ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models	Thanh-Dat Truong et.al.	2406.01432	null
2024-06-03	EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding	Thanh-Dat Truong et.al.	2406.01429	null
2024-06-03	TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy	Weichao Zhao et.al.	2406.01326	link
2024-06-04	StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond	Pengyuan Lyu et.al.	2405.21013	null
2024-05-31	Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning	Cheng Tan et.al.	2405.20834	null
2024-05-31	InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding	Huaxiang Zhang et.al.	2405.20795	null
2024-05-31	Information Theoretic Text-to-Image Alignment	Chao Wang et.al.	2405.20759	null
2024-05-31	Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images	Mansi Kakkar et.al.	2405.20735	null
2024-05-30	Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals	Phillip Howard et.al.	2405.20152	null
2024-05-30	OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation	Gonca Yilmaz et.al.	2405.20141	null
2024-05-30	Enhancing Large Vision Language Models with Self-Training on Image Comprehension	Yihe Deng et.al.	2405.19716	link
2024-05-30	Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training	Aisha Urooj Khan et.al.	2405.19675	null
2024-05-29	Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding	Shenghuan Sun et.al.	2405.19567	null
2024-05-29	CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients	Pierre Chambon et.al.	2405.19538	link
2024-05-29	Evaluating Vision-Language Models on Bistable Images	Artemis Panagopoulou et.al.	2405.19423	link
2024-05-29	Video Anomaly Detection in 10 Years: A Survey and Outlook	Moshira Abdalla et.al.	2405.19387	null
2024-05-29	Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models	Tianrun Chen et.al.	2405.19326	null
2024-05-29	Matryoshka Query Transformer for Large Vision-Language Models	Wenbo Hu et.al.	2405.19315	link
2024-05-29	MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification	Laura Fieback et.al.	2405.19186	null
2024-05-29	I Bet You Did Not Mean That: Testing Semantic Importance via Betting	Jacopo Teneggi et.al.	2405.19146	link
2024-05-29	ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs	Omar Moured et.al.	2405.19117	link
2024-05-29	Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer	Zengqun Zhao et.al.	2405.19100	link
2024-05-29	Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior	Shuyu Cheng et.al.	2405.19098	link
2024-05-30	Benchmarking and Improving Detail Image Caption	Hongyuan Dong et.al.	2405.19092	link
2024-05-29	Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions	Zhe Hu et.al.	2405.19088	null
2024-05-29	Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	Markus J. Buehler et.al.	2405.19076	link
2024-05-28	WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization	Jiawei Ma et.al.	2405.18405	null
2024-05-28	Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?	Yifan Bai et.al.	2405.18361	null
2024-05-28	Frustratingly Easy Test-Time Adaptation of Vision-Language Models	Matteo Farina et.al.	2405.18330	link
2024-05-28	White-box Multimodal Jailbreaks Against Large Vision-Language Models	Ruofan Wang et.al.	2405.17894	link
2024-05-28	Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment	Xin Xiao et.al.	2405.17871	link
2024-05-28	RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs	Sangmin Woo et.al.	2405.17821	null
2024-05-28	Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models	Sangmin Woo et.al.	2405.17820	null
2024-05-27	An Introduction to Vision-Language Modeling	Florian Bordes et.al.	2405.17247	null
2024-05-27	Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View	Jin Wang et.al.	2405.17201	null
2024-05-27	Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks	Yunqi Zhang et.al.	2405.16860	link
2024-05-27	PromptFix: You Prompt and We Fix the Photo	Yongsheng Yu et.al.	2405.16785	link
2024-05-25	Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities	Shiyu Xia et.al.	2405.16234	null
2024-05-25	Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs	Myong Chol Jung et.al.	2405.16091	null
2024-05-24	Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement	Xiyao Wang et.al.	2405.15973	link
2024-05-24	Disease-informed Adaptation of Vision-Language Models	Jiajin Zhang et.al.	2405.15728	link
2024-05-24	VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap	Sreyan Ghosh et.al.	2405.15683	link
2024-05-24	Composed Image Retrieval for Remote Sensing	Bill Psomas et.al.	2405.15587	link
2024-05-24	Open-Vocabulary SAM3D: Understand Any 3D Scene	Hanchen Tai et.al.	2405.15580	null
2024-05-24	Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization	Beitao Chen et.al.	2405.15356	link
2024-05-24	Learning Invariant Causal Mechanism from Vision-Language Models	Zeen Song et.al.	2405.15289	null
2024-05-24	Learning from True-False Labels via Multi-modal Prompt Retrieving	Zhongnian Li et.al.	2405.15228	link
2024-05-24	CLIP model is an Efficient Online Lifelong Learner	Leyuan Wang et.al.	2405.15155	link
2024-05-23	Agentic Skill Discovery	Xufeng Zhao et.al.	2405.15019	link
2024-05-23	A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models	Mario Döbler et.al.	2405.14977	link
2024-05-23	PuzzleAvatar: Assembling 3D Avatars from Personal Albums	Yuliang Xiu et.al.	2405.14869	link
2024-05-23	Designing A Sustainable Marine Debris Clean-up Framework without Human Labels	Raymond Wang et.al.	2405.14815	link
2024-05-23	Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models	Young Kyun Jang et.al.	2405.14715	null
2024-05-23	Calibrated Self-Rewarding Vision Language Models	Yiyang Zhou et.al.	2405.14622	link
2024-05-23	UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge	Chuanhao Li et.al.	2405.14554	null
2024-05-23	AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2	Simon Damm et.al.	2405.14529	link
2024-05-23	Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports	Guangyu Guo et.al.	2405.14230	null
2024-05-23	Unveiling the Tapestry of Consistency in Large Vision-Language Models	Yuan Zhang et.al.	2405.14156	link
2024-05-23	Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation	Se-eun Yoon et.al.	2405.14142	null
2024-05-22	Refining Skewed Perceptions in Vision-Language Models through Visual Representations	Haocheng Dai et.al.	2405.14030	null
2024-05-21	C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning	Ji Ma et.al.	2405.12752	null
2024-05-21	EmoEdit: Evoking Emotions through Image Manipulation	Jingyuan Yang et.al.	2405.12661	null
2024-05-22	Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography	Shantanu Ghosh et.al.	2405.12255	link
2024-05-20	Rethinking Overlooked Aspects in Vision-Language Models	Yuan Liu et.al.	2405.11850	null
2024-05-19	Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems	Shengxiang Sun et.al.	2405.11629	null
2024-05-18	MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection	Ximiao Zhang et.al.	2405.11315	link
2024-05-18	Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models	Canshi Wei et.al.	2405.11301	null
2024-05-18	Revisiting the Robust Generalization of Adversarial Prompt Tuning	Fan Yang et.al.	2405.11154	null
2024-05-18	Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions	Junzhang Liu et.al.	2405.11145	null
2024-05-17	Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors	Jiachen Sun et.al.	2405.10529	null
2024-05-16	Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees	Yu Gui et.al.	2405.10301	link
2024-05-17	Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning	Yuexiang Zhai et.al.	2405.10292	null
2024-05-16	FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models	Adrian Bulat et.al.	2405.10286	null
2024-05-16	Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks	João Bordalo et.al.	2405.10122	null
2024-05-16	Harmonizing Generalization and Personalization in Federated Prompt Learning	Tianyu Cui et.al.	2405.09771	link
2024-05-17	SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge	Andong Wang et.al.	2405.09713	null
2024-05-15	A Survey On Text-to-3D Contents Generation In The Wild	Chenhan Jiang et.al.	2405.09431	null
2024-05-15	Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model	Wanting Xu et.al.	2405.09215	link
2024-05-14	Contextual Emotion Recognition using Large Vision Language Models	Yasaman Etesam et.al.	2405.08992	null
2024-05-14	Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research	Qinglong Cao et.al.	2405.08668	link
2024-05-14	Open-Vocabulary Object Detection via Neighboring Region Attention Alignment	Sunyuan Qiang et.al.	2405.08593	null
2024-05-13	Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?	Hari Chandana Kuchibhotla et.al.	2405.07921	null
2024-05-12	DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model	Yang Jin et.al.	2405.07309	null
2024-05-11	TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt	Xiangyu Wu et.al.	2405.06926	link
2024-05-10	Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark	Evan M. Williams et.al.	2405.06634	link
2024-05-10	Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification	Yaoqin Ye et.al.	2405.06468	link
2024-05-10	VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks	Manish Dhakal et.al.	2405.06196	link
2024-05-09	Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control	Gunshi Gupta et.al.	2405.05852	link
2024-05-09	Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media	Zhizhen Zhang et.al.	2405.05760	null
2024-05-09	Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft	Debabrata Pal et.al.	2405.05574	null
2024-05-08	THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models	Prannay Kaul et.al.	2405.05256	null
2024-05-08	Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection	Zhaoxiang Zhang et.al.	2405.04782	null
2024-05-08	Unveiling Disparities in Web Task Handling Between Human and Web Agent	Kihoon Son et.al.	2405.04497	null
2024-05-07	Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks	Georgios Pantazopoulos et.al.	2405.04403	link
2024-05-06	VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images	Anna Penzkofer et.al.	2405.03852	null
2024-05-06	Knowledge-aware Text-Image Retrieval for Remote Sensing Images	Li Mi et.al.	2405.03373	null
2024-05-06	Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval	Jiacheng Cheng et.al.	2405.03190	null
2024-05-05	Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training	Wenyu Zhang et.al.	2405.02954	link
2024-05-05	Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models	Tobias Groot et.al.	2405.02917	null
2024-05-05	Octopi: Object Property Reasoning with Large Tactile-Language Models	Samson Yu et.al.	2405.02794	link
2024-05-05	ImageInWords: Unlocking Hyper-Detailed Image Descriptions	Roopal Garg et.al.	2405.02793	link
2024-05-03	On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?	Maxime Zanella et.al.	2405.02266	link
2024-05-03	What matters when building vision-language models?	Hugo Laurençon et.al.	2405.02246	null
2024-05-03	Improving Concept Alignment in Vision-Language Concept Bottleneck Models	Nithish Muthuchamy Selvaraj et.al.	2405.01825	link
2024-05-02	V-FLUTE: Visual Figurative Language Understanding with Textual Explanations	Arkadiy Saakyan et.al.	2405.01474	link
2024-05-02	Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models	Yifei Ming et.al.	2405.01468	null
2024-05-02	MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors	Yuan Tang et.al.	2405.01413	link
2024-05-02	Learning Object States from Actions via Large Language Models	Masatoshi Tateno et.al.	2405.01090	null
2024-05-02	Few Shot Class Incremental Learning using Vision-Language models	Anurag Kumar et.al.	2405.01040	null
2024-05-01	Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis	Prateek Verma et.al.	2405.00876	null
2024-05-01	CLIPArTT: Light-weight Adaptation of CLIP to New Domains at Test Time	Gustavo Adolfo Vargas Hakim et.al.	2405.00754	link
2024-05-01	Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis	Huy H. Nguyen et.al.	2405.00355	link
2024-04-30	GUing: A Mobile GUI Search Engine using a Vision-Language Model	Jialiang Wei et.al.	2405.00145	link
2024-04-30	MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation	Min Zhang et.al.	2404.19644	link
2024-04-30	Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective	Wanqi Zhou et.al.	2404.19287	link
2024-04-30	Soft Prompt Generation for Domain Generalization	Shuanghao Bai et.al.	2404.19286	link
2024-04-30	PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition	Dongyun Lin et.al.	2404.19168	null
2024-04-29	Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM	Navid Rajabi et.al.	2404.19128	null
2024-04-29	In-Context Symbolic Regression: Leveraging Language Models for Function Discovery	Matteo Merler et.al.	2404.19094	link
2024-04-29	Hallucination of Multimodal Large Language Models: A Survey	Zechen Bai et.al.	2404.18930	link
2024-04-29	Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models	Hongyi Zhu et.al.	2404.18746	null
2024-04-28	Paint by Inpaint: Learning to Add Image Objects by Removing Them First	Navve Wasserman et.al.	2404.18212	link
2024-04-27	SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models	Manav Nitin Kapadnis et.al.	2404.17912	null
2024-04-27	Medical Vision-Language Pre-Training for Brain Abnormalities	Masoud Monajatipoor et.al.	2404.17779	null
2024-04-26	BlenderAlchemy: Editing 3D Graphics with Vision-Language Models	Ian Huang et.al.	2404.17672	null
2024-04-26	Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models	Yuhang Huang et.al.	2404.17534	null
2024-04-26	Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Yuanyuan Liu et.al.	2404.17100	null
2024-04-25	AAPL: Adding Attributes to Prompt Learning for Vision-Language Models	Gahyeon Kim et.al.	2404.16804	link
2024-04-25	Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class	Mazda Moayeri et.al.	2404.16717	null
2024-04-25	VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations	Sri Harsha Dumpala et.al.	2404.16365	null
2024-04-25	Training-Free Unsupervised Prompt for Vision-Language Models	Sifan Long et.al.	2404.16339	link
2024-04-24	Improving Multi-label Recognition using Class Co-Occurrence Probabilities	Samyak Rawlekar et.al.	2404.16193	null
2024-04-24	Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering	Cuong Nhat Ha et.al.	2404.16192	null
2024-04-24	MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI	Kaining Ying et.al.	2404.16006	null
2024-04-24	Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography	Xuxin Chen et.al.	2404.15946	null
2024-04-24	Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer	Jiaming Lei et.al.	2404.15785	null
2024-04-23	BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis	Shuhang Lin et.al.	2404.15532	link
2024-04-23	MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning	Sunan He et.al.	2404.15127	link
2024-04-21	Interpreting COVID Lateral Flow Tests' Results with Foundation Models	Stuti Pandey et.al.	2404.14990	null
2024-04-23	Driver Activity Classification Using Generalizable Representations from Vision-Language Models	Ross Greer et.al.	2404.14906	null
2024-04-23	SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models	Bo Lin et.al.	2404.14755	null
2024-04-23	FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction	Hang Hua et.al.	2404.14715	null
2024-04-23	DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance	Linxuan Xin et.al.	2404.14676	null
2024-04-22	A Multimodal Automated Interpretability Agent	Tamar Rott Shaham et.al.	2404.14394	null
2024-04-22	Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback	Wenyi Xiao et.al.	2404.14233	link
2024-04-22	VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models	Haoyi Qiu et.al.	2404.13874	link
2024-04-20	AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models	Yuheng Ji et.al.	2404.13425	null
2024-04-20	Movie101v2: Improved Movie Narration Benchmark	Zihao Yue et.al.	2404.13370	null
2024-04-19	ECOR: Explainable CLIP for Object Recognition	Ali Rasekh et.al.	2404.12839	null
2024-04-19	Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model	Jihao Dong et.al.	2404.12678	null
2024-04-19	Pre-trained Vision-Language Models Learn Discoverable Visual Concepts	Yuan Zang et.al.	2404.12652	link
2024-04-19	ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation	Yu-Hsuan Ho et.al.	2404.12606	null
2024-04-19	Cross-Modal Adapter: Parameter-Efficient Transfer Learning Approach for Vision-Language Models	Juncheng Yang et.al.	2404.12588	null
2024-04-18	V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning	Hang Hua et.al.	2404.12353	null
2024-04-18	What does CLIP know about peeling a banana?	Claudia Cuttano et.al.	2404.12015	null
2024-04-18	Progressive Multi-modal Conditional Prompt Tuning	Xiaoyu Qiu et.al.	2404.11864	link
2024-04-17	VG4D: Vision-Language Model Goes 4D Video Recognition	Zhichao Deng et.al.	2404.11605	link
2024-04-17	A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene	Wenbo Zhang et.al.	2404.11249	null
2024-04-17	Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model	Hao Yan et.al.	2404.11046	null
2024-04-17	OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding	Edmond Tong et.al.	2404.11000	null
2024-04-16	Vocabulary-free Image Classification and Semantic Segmentation	Alessandro Conti et.al.	2404.10864	link
2024-04-16	COMBO: Compositional World Models for Embodied Multi-Agent Cooperation	Hongxin Zhang et.al.	2404.10775	null
2024-04-16	Private Attribute Inference from Images with Vision-Language Models	Batuhan Tömekçe et.al.	2404.10618	null
2024-04-16	Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases	Yanze Li et.al.	2404.10595	null
2024-04-16	Self-Supervised Visual Preference Alignment	Ke Zhu et.al.	2404.10501	link
2024-04-17	Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models	Enming Zhang et.al.	2404.10357	link
2024-04-16	Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning	Rui Hu et.al.	2404.10332	null
2024-04-16	MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models	Songtao Jiang et.al.	2404.10237	link
2024-04-16	Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering	Zaid Khan et.al.	2404.10193	null
2024-04-15	Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels	Amaya Dharmasiri et.al.	2404.10146	link
2024-04-15	OneChart: Purify the Chart Structural Extraction via One Auxiliary Token	Jinyue Chen et.al.	2404.09987	link
2024-04-15	Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models	Ziwei Luo et.al.	2404.09732	link
2024-04-15	Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction	David Sobrín-Hidalgo et.al.	2404.09705	null
2024-04-15	Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection	Jiaqi Zhu et.al.	2404.09654	null
2024-04-15	Leveraging Temporal Contextualization for Video Action Recognition	Minji Kim et.al.	2404.09490	link
2024-04-15	RankCLIP: Ranking-Consistent Language-Image Pretraining	Yiming Zhang et.al.	2404.09387	null
2024-04-13	PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization	Zining Chen et.al.	2404.09011	link
2024-04-13	AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning	Yuwei Tang et.al.	2404.08958	link
2024-04-13	ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition	Otto Brookes et.al.	2404.08937	null
2024-04-12	Training a Vision Language Model as Smartphone Assistant	Nicolai Dorka et.al.	2404.08755	null
2024-04-12	Improving Continuous Sign Language Recognition with Adapted Image Models	Lianyu Hu et.al.	2404.08226	link
2024-04-11	Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning	Simon Schrodi et.al.	2404.07983	null
2024-04-11	Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese	Yuichi Inoue et.al.	2404.07824	link
2024-04-12	Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics	Masashi Osada et.al.	2404.07717	link
2024-04-12	PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination	Anant Khandelwal et.al.	2404.07520	null
2024-04-11	Transferable and Principled Efficiency for Open-Vocabulary Segmentation	Jingxuan Xu et.al.	2404.07448	link
2024-04-10	BRAVE: Broadening the visual encoding of vision-language models	Oğuzhan Fatih Kar et.al.	2404.07204	null
2024-04-10	Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic	Sachin Goyal et.al.	2404.07177	link
2024-04-10	ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling	Ege Özsoy et.al.	2404.07031	link
2024-04-10	Vision-Language Model-based Physical Reasoning for Robot Liquid Perception	Wenqiang Lai et.al.	2404.06904	null
2024-04-09	InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD	Xiaoyi Dong et.al.	2404.06512	link
2024-04-09	Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?	Yuan-Hong Liao et.al.	2404.06510	null
2024-04-09	Anchor-based Robust Finetuning of Vision-Language Models	Jinwei Han et.al.	2404.06244	null
2024-04-08	Retrieval-Augmented Open-Vocabulary Object Detection	Jooyeon Kim et.al.	2404.05687	link
2024-04-08	MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning	Matteo Farina et.al.	2404.05621	link
2024-04-08	PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection	Xiaofan Li et.al.	2404.05231	link
2024-04-08	Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset	Chih-Chung Hsu et.al.	2404.05183	null
2024-04-07	FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback	Liqiang Jing et.al.	2404.05046	null
2024-04-07	Hyperbolic Learning with Synthetic Captions for Open-World Detection	Fanjie Kong et.al.	2404.05016	null
2024-04-07	Mixture of Low-rank Experts for Transferable AI-Generated Image Detection	Zihan Liu et.al.	2404.04883	link
2024-04-07	GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling	Hritik Bansal et.al.	2404.04763	null
2024-04-05	Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)	Michael Saxon et.al.	2404.04251	link
2024-04-05	Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation	Ji-Jia Wu et.al.	2404.04231	link
2024-04-05	Label Propagation for Zero-shot Classification with Vision-Language Models	Vladan Stojnić et.al.	2404.04072	link
2024-04-04	Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity	Jake Varley et.al.	2404.03570	null
2024-04-03	LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models	Gabriela Ben Melech Stan et.al.	2404.03118	link
2024-04-03	AWOL: Analysis WithOut synthesis using Language	Silvia Zuffi et.al.	2404.03042	null
2024-04-03	I-Design: Personalized LLM Interior Designer	Ata Çelen et.al.	2404.02838	null
2024-04-03	Harnessing the Power of Large Vision Language Models for Synthetic Image Detection	Mamadou Keita et.al.	2404.02726	link
2024-04-03	RESSA: Repair Sparse Vision-Language Models via Sparse Cross-Modality Adaptation	Shwai He et.al.	2404.02424	link
2024-04-03	What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases	Anthony Meng Huat Tiong et.al.	2404.02415	link
2024-04-03	Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns	Yunsoo Kim et.al.	2404.02370	null
2024-04-02	ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models	Vishnunandan L. N. Venkatesh et.al.	2404.02318	null
2024-04-02	Iterated Learning Improves Compositionality in Large Vision-Language Models	Chenhao Zheng et.al.	2404.02145	null
2024-04-03	ViTamin: Designing Scalable Vision Models in the Vision-Language Era	Jieneng Chen et.al.	2404.02132	link
2024-04-02	Bi-LORA: A Vision-Language Approach for Synthetic Image Detection	Mamadou Keita et.al.	2404.01959	link
2024-04-02	VLRM: Vision-Language Models act as Reward Models for Image Captioning	Maksim Dzabraev et.al.	2404.01911	null
2024-04-01	OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation	Xiongwei Wu et.al.	2404.01409	null
2024-04-02	Open-Vocabulary Federated Learning with Multimodal Prototyping	Huimin Zeng et.al.	2404.01232	link
2024-04-01	Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models	Yuxin Wen et.al.	2404.01231	null
2024-04-01	Vision-language models for decoding provider attention during neonatal resuscitation	Felipe Parodi et.al.	2404.01207	null
2024-04-01	SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining	Chull Hwan Song et.al.	2404.01156	null
2024-04-01	Harnessing Large Language Models for Training-free Video Anomaly Detection	Luca Zanella et.al.	2404.01014	null
2024-03-29	Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models	Atsuyuki Miyai et.al.	2403.20331	link
2024-03-29	Are We on the Right Way for Evaluating Large Vision-Language Models?	Lin Chen et.al.	2403.20330	link
2024-03-29	Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations	Jaisidh Singh et.al.	2403.20312	link
2024-03-29	H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model	Chao Pang et.al.	2403.20213	link
2024-03-29	ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models	Shuo Liu et.al.	2403.20194	null
2024-03-29	LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving	Pranjal Paul et.al.	2403.20116	null
2024-03-29	Negative Label Guided OOD Detection with Pretrained Vision-Language Models	Xue Jiang et.al.	2403.20078	link
2024-03-28	Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks	Pooria Ashrafian et.al.	2403.19880	link
2024-03-28	Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving	Akshay Gopalkrishnan et.al.	2403.19838	link
2024-04-01	Concept-based Analysis of Neural Networks via Vision-Language Models	Ravi Mangal et.al.	2403.19837	null
2024-03-28	CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models	Saurav Jha et.al.	2403.19137	link
2024-03-27	Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models	Anees Ur Rehman Hashmi et.al.	2403.18996	null
2024-03-27	Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models	Keyan Guo et.al.	2403.18957	link
2024-03-27	Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models	Yanwei Li et.al.	2403.18814	link
2024-03-27	Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Xintong Wang et.al.	2403.18715	link
2024-03-27	Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIP	Reza Abbasi et.al.	2403.18525	null
2024-03-27	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Wonkyun Kim et.al.	2403.18406	link
2024-03-27	Efficient Test-Time Adaptation of Vision-Language Models	Adilbek Karmanov et.al.	2403.18293	null
2024-03-26	Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models	Yabin Zhang et.al.	2403.17589	link
2024-03-26	Visual Hallucination: Definition, Quantification, and Prescriptive Remediations	Vipula Rawte et.al.	2403.17306	null
2024-03-25	Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks	Jonathan Salfity et.al.	2403.17238	link
2024-03-25	Open-Set Recognition in the Age of Vision-Language Models	Dimity Miller et.al.	2403.16528	link
2024-03-25	Learning To Guide Human Decision Makers With Vision-Language Models	Debodeep Banerjee et.al.	2403.16501	null
2024-03-25	If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions	Reza Esfandiarpoor et.al.	2403.16442	link
2024-03-24	Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models	Yuxuan Wang et.al.	2403.16184	null
2024-03-26	Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models	Minchan Kim et.al.	2403.16167	null
2024-03-23	IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models	Haz Sameen Shahgir et.al.	2403.15952	link
2024-03-23	Explore until Confident: Efficient Exploration for Embodied Question Answering	Allen Z. Ren et.al.	2403.15941	null
2024-03-23	Centered Masking for Language-Image Pre-Training	Mingliang Liang et.al.	2403.15837	link
2024-03-23	VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification	Lanfeng Zhong et.al.	2403.15836	link
2024-03-22	CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments	Adarsh Jagan Sathyamoorthy et.al.	2403.15637	null
2024-03-22	Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	Bumsoo Kim et.al.	2403.15048	null
2024-03-21	Few-Shot Adversarial Prompt Learning on Vision-Language Models	Yiwei Zhou et.al.	2403.14774	link
2024-03-21	Can 3D Vision-Language Models Truly Understand Natural Language?	Weipeng Deng et.al.	2403.14760	link
2024-03-21	MyVLM: Personalizing VLMs for User-Specific Queries	Yuval Alaluf et.al.	2403.14599	null
2024-03-21	Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network	Zih-Syuan Huang et.al.	2403.14398	link
2024-03-21	Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation	Jianeng Wang et.al.	2403.14320	null
2024-03-21	C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion	Hee Suk Yoon et.al.	2403.14119	link
2024-03-21	Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots	Connor Lee et.al.	2403.14056	null
2024-03-20	Multi-Modal Hallucination Control by Visual Information Grounding	Alessandro Favero et.al.	2403.14003	null
2024-03-20	Bridge the Modality and Capacity Gaps in Vision-Language Model Selection	Chao Yi et.al.	2403.13797	null
2024-03-20	Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model	Diwei Wang et.al.	2403.13756	null
2024-03-20	Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments	Djamahl Etchegaray et.al.	2403.13556	link
2024-03-20	CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models	Pablo Pueyo et.al.	2403.13467	null
2024-03-20	AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation	Jingkun An et.al.	2403.13352	null
2024-03-20	TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation	Santosh Sanjeev et.al.	2403.13343	link
2024-03-20	SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models	Tongtian Yue et.al.	2403.13263	link
2024-03-19	Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models	Zuyan Liu et.al.	2403.12966	link
2024-03-19	Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models	Ce Zhang et.al.	2403.12964	link
2024-03-19	Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models	Elaine Sui et.al.	2403.12952	link
2024-03-19	Yell At Your Robot: Improving On-the-Fly from Language Corrections	Lucy Xiaoyang Shi et.al.	2403.12910	null
2024-03-19	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning	Fucai Ke et.al.	2403.12884	link
2024-03-19	RelationVLM: Making Large Vision-Language Models Understand Visual Relations	Zhipeng Huang et.al.	2403.12801	null
2024-03-19	Towards Multimodal In-Context Learning for Vision & Language Models	Sivan Doveh et.al.	2403.12736	null
2024-03-19	Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs	Victor Carbune et.al.	2403.12596	null
2024-03-19	CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation	Wenqi Zhu et.al.	2403.12455	link
2024-03-18	FlexCap: Generating Rich, Localized, and Flexible Captions in Images	Debidatta Dwibedi et.al.	2403.12026	null
2024-03-18	Prioritized Semantic Learning for Zero-shot Instance Navigation	Xander Sun et.al.	2403.11650	link
2024-03-18	Compositional Kronecker Context Optimization for Vision-Language Models	Kun Ding et.al.	2403.11631	null
2024-03-18	Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters	Jiazuo Yu et.al.	2403.11549	link
2024-03-18	Do CLIPs Always Generalize Better than ImageNet Models?	Qizhou Wang et.al.	2403.11497	null
2024-03-18	VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Yue Fan et.al.	2403.11481	null
2024-03-17	Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding	Zichen Wu et.al.	2403.11311	null
2024-03-17	SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant	Guohao Sun et.al.	2403.11299	link
2024-03-17	Training A Small Emotional Vision Language Model for Visual Art Comprehension	Jing Zhang et.al.	2403.11150	link
2024-03-17	PhD: A Prompted Visual Hallucination Evaluation Dataset	Jiazhen Liu et.al.	2403.11116	link
2024-03-17	Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping	Haoxi Zhang et.al.	2403.11073	null
2024-03-15	Reconfigurable Robot Identification from Motion Data	Yuhang Hu et.al.	2403.10496	null
2024-03-15	EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models	Rocktim Jyoti Das et.al.	2403.10378	link
2024-03-15	Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Tian Meng et.al.	2403.10287	null
2024-03-15	CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning	Yukun Li et.al.	2403.10245	link
2024-03-15	Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning	Hang Zhang et.al.	2403.10107	null
2024-03-14	An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models	Haochen Luo et.al.	2403.09766	link
2024-03-14	Renovating Names in Open-Vocabulary Segmentation Benchmarks	Haiwen Huang et.al.	2403.09593	null
2024-03-14	Anomaly Detection by Adapting a pre-trained Vision Language Model	Yuxuan Cai et.al.	2403.09493	null
2024-03-14	XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization	Yequan Bie et.al.	2403.09410	null
2024-03-14	AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions	Hao Zhang et.al.	2403.09346	link
2024-03-14	Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring	Yufei Zhan et.al.	2403.09333	link
2024-03-14	Annotation Free Semantic Segmentation with Vision Foundation Models	Soroush Seifi et.al.	2403.09307	null
2024-03-14	Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models	Yu-Chu Yu et.al.	2403.09296	null
2024-03-14	Are Vision Language Models Texture or Shape Biased and Can We Steer Them?	Paul Gavrikov et.al.	2403.09193	link
2024-03-14	The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?	Qinyu Zhao et.al.	2403.09037	link
2024-03-14	Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset	Hugo Laurençon et.al.	2403.09029	null
2024-03-13	AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models	Yifei Gao et.al.	2403.08542	link
2024-03-13	Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation	Zicheng Zhang et.al.	2403.08426	null
2024-03-13	Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification	Long Lan et.al.	2403.08271	link
2024-03-13	CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models	Haoxu Huang et.al.	2403.08248	null
2024-03-13	Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization	Kento Kawaharazuka et.al.	2403.08239	null
2024-03-12	TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection	Hanning Chen et.al.	2403.08108	null
2024-03-12	MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric	Haokun Lin et.al.	2403.07839	null
2024-03-12	Unified Source-Free Domain Adaptation	Song Tang et.al.	2403.07601	link
2024-03-12	In-context learning enables multimodal large language models to classify cancer pathology images	Dyke Ferber et.al.	2403.07407	null
2024-03-12	KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models	Han Huang et.al.	2403.07350	link
2024-03-12	Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion	Wenhui Tan et.al.	2403.07312	link
2024-03-12	Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations	Chenyu You et.al.	2403.07241	link
2024-03-11	Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation	Xinyao Li et.al.	2403.06946	link
2024-03-11	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models	Liang Chen et.al.	2403.06764	link
2024-03-11	FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications	Yuki Tatsukawa et.al.	2403.06453	link
2024-03-11	Can LLMs' Tuning Methods Work in Medical Multimodal Domain?	Jiawei Chen et.al.	2403.06407	link
2024-03-10	A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets	Thang Doan et.al.	2403.06295	link
2024-03-10	In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model	Junhui Yin et.al.	2403.06126	null
2024-03-11	DeepSeek-VL: Towards Real-World Vision-Language Understanding	Haoyu Lu et.al.	2403.05525	link
2024-03-08	Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery	Xavier Bou et.al.	2403.05381	link
2024-03-08	VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model	Junsu Kim et.al.	2403.05346	null
2024-03-08	Debiasing Large Visual Language Models	Yi-Fan Zhang et.al.	2403.05262	link
2024-03-08	CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model	Pengwei Yin et.al.	2403.05124	null
2024-03-08	How Far Are We from Intelligent Visual Deductive Reasoning?	Yizhe Zhang et.al.	2403.04732	link
2024-03-07	Yi: Open Foundation Models by 01.AI	01. AI et.al.	2403.04652	link
2024-03-07	Embodied Understanding of Driving Scenarios	Yunsong Zhou et.al.	2403.04593	link
2024-03-07	Effectiveness Assessment of Recent Large Vision-Language Models	Yao Jiang et.al.	2403.04306	null
2024-03-06	MeaCap: Memory-Augmented Zero-shot Image Captioning	Zequn Zeng et.al.	2403.03715	link
2024-03-05	Enhancing Vision-Language Pre-training with Rich Supervisions	Yuan Gao et.al.	2403.03346	null
2024-03-05	CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments	Savitha Sam Abraham et.al.	2403.03203	null
2024-03-05	MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting	Fangchen Liu et.al.	2403.03174	null
2024-03-06	ImgTrojan: Jailbreaking Vision-Language Models with ONE Image	Xijia Tao et.al.	2403.02910	link
2024-03-05	Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation	Zhekai Du et.al.	2403.02899	null
2024-03-05	Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples	Philipp J. Rösch et.al.	2403.02875	null
2024-03-06	PromptKD: Unsupervised Prompt Distillation for Vision-Language Models	Zheng Li et.al.	2403.02781	link
2024-03-05	DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization	Feng Hou et.al.	2403.02714	null
2024-03-05	Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use	Imad Eddine Toubal et.al.	2403.02626	null
2024-03-05	Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research	Brenda Y. Miao et.al.	2403.02558	link
2024-03-04	Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review	Iryna Hartsock et.al.	2403.02469	link
2024-03-02	Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning	Shuo Yang et.al.	2403.01209	null
2024-03-01	HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding	Zhaorun Chen et.al.	2403.00425	link
2024-03-01	Invariant Test-Time Adaptation for Vision-Language Model Generalization	Huan Ma et.al.	2403.00376	link
2024-03-04	Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models	Lei Li et.al.	2403.00231	null
2024-03-01	Multi-modal Attribute Prompting for Vision-Language Models	Xin Liu et.al.	2403.00219	null
2024-02-29	Artwork Explanation in Large-scale Vision Language Models	Kazuki Hayashi et.al.	2403.00068	null
2024-02-29	Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction	Hao Li et.al.	2402.19326	link
2024-02-29	Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts	Hao Cheng et.al.	2402.19150	null
2024-02-28	IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding	Lanyun Zhu et.al.	2402.18476	null
2024-02-29	A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models	Xiujie Song et.al.	2402.18409	link
2024-02-28	SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model	Bin Cao et.al.	2402.18068	link
2024-02-28	Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction	Koki Maeda et.al.	2402.17969	null
2024-02-27	Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning	Maurits Bleeker et.al.	2402.17510	link
2024-02-27	VCD: Knowledge Base Guided Visual Commonsense Discovery in Images	Xiangqing Shen et.al.	2402.17213	null
2024-02-26	Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models	Jeonghwan Kim et.al.	2402.16315	null
2024-02-26	Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion	Xuantong Liu et.al.	2402.16305	null
2024-02-27	NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation	Jiazhao Zhang et.al.	2402.15852	null
2024-02-24	Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation	Zekun Jiang et.al.	2402.15759	link
2024-02-24	GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation	Yi Zong et.al.	2402.15745	link
2024-02-24	CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge	Xiao Lin et.al.	2402.15726	null
2024-02-24	Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	Chaoya Jiang et.al.	2402.15721	null
2024-02-24	Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics	Sadaf Ghaffari et.al.	2402.15654	null
2024-02-23	Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning	Tejas Srinivasan et.al.	2402.15610	link
2024-02-23	Representing Online Handwriting for Recognition in Large Vision-Language Models	Anastasiia Fadeeva et.al.	2402.15307	null
2024-02-23	Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding	Ailin Deng et.al.	2402.15300	link
2024-02-22	CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models	Santiago Castro et.al.	2402.15021	link
2024-02-22	PALO: A Polyglot Large Multimodal Model for 5B People	Muhammad Maaz et.al.	2402.14818	link
2024-02-22	Uncertainty-Aware Evaluation for Vision-Language Models	Vasily Kostumov et.al.	2402.14418	link
2024-02-22	Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology	Nur Yildirim et.al.	2402.14252	null
2024-02-21	A Unified Framework and Dataset for Assessing Gender Bias in Vision-Language Models	Ashutosh Sathe et.al.	2402.13636	null
2024-02-21	WinoViz: Probing Visual Properties of Objects Under Different States	Woojeong Jin et.al.	2402.13584	null
2024-02-21	BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models	Xueliang Zhao et.al.	2402.13577	null
2024-02-20	A Touch, Vision, and Language Dataset for Multimodal Alignment	Letian Fu et.al.	2402.13232	link
2024-02-20	SoMeLVLM: A Large Vision Language Model for Social Media Processing	Xinnong Zhang et.al.	2402.13022	null
2024-02-20	CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection	Sohail Ahmed Khan et.al.	2402.12927	link
2024-02-20	GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models	Sayantan Adak et.al.	2402.12881	link
2024-02-20	MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion	Sen Li et.al.	2402.12741	link
2024-02-19	Talk Through It: End User Directed Manipulation Learning	Carl Winge et.al.	2402.12509	null
2024-02-19	Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection	Ruibo Chen et.al.	2402.12501	link
2024-02-19	Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models	Christian Schlarmann et.al.	2402.12336	link
2024-02-19	DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models	Xiaoyu Tian et.al.	2402.12289	null
2024-02-19	Evaluating Image Review Ability of Vision Language Models	Shigeki Saito et.al.	2402.12121	null
2024-02-19	LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation	Keyang Xuan et.al.	2402.11943	link
2024-02-18	Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning	Zhiyang Xu et.al.	2402.11690	null
2024-02-18	ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model	Guiming Hardy Chen et.al.	2402.11684	link
2024-02-18	Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models	Junfei Wu et.al.	2402.11622	link
2024-02-18	Visual In-Context Learning for Large Vision-Language Models	Yucheng Zhou et.al.	2402.11574	null
2024-02-17	ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing	Zhenghang Yuan et.al.	2402.11325	link
2024-02-17	CoLLaVO: Crayon Large Language and Vision mOdel	Byung-Kwan Lee et.al.	2402.11248	link
2024-02-16	PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter	Junfei Xiao et.al.	2402.10896	null
2024-02-16	Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering	David Romero et.al.	2402.10698	link
2024-02-16	OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models	Yuxuan Kuang et.al.	2402.10670	link
2024-02-15	On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities	Xiyang Wu et.al.	2402.10340	link
2024-02-15	Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment	Angelos Zavras et.al.	2402.09816	null
2024-02-16	MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models	Corentin Royer et.al.	2402.09262	**[link](https://github

Name		Name	Last commit message	Last commit date
Latest commit History 2,326 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2025.01.26

Camouflage

In-context

VLM

About

Releases

Packages

Languages

Zetianuser/cv-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.01.26

Camouflage

In-context

VLM

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages