Skip to content

🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

Notifications You must be signed in to change notification settings

Zetianuser/cv-arxiv-daily

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]

Updated on 2025.01.26

Usage instructions: here

Table of Contents
  1. Camouflage
  2. In-context
  3. VLM
  4. Visual In-context
  5. V-ICL

Camouflage

Publish Date Title Authors PDF Code
2025-01-22 Observation of Strong Nonreciprocal Thermal Emission Zhenong Zhang et.al. 2501.12947 null
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-21 Library-Attack: Reverse Engineering Approach for Evaluating Hardware IP Protection Aritra Dasgupta et.al. 2501.12292 null
2025-01-19 Green Video Camouflaged Object Detection Xinyu Wang et.al. 2501.10914 null
2025-01-13 Toward Realistic Camouflaged Object Detection: Benchmarks and Method Zhimeng Xin et.al. 2501.07297 link
2025-01-10 A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection Tsui Qin Mok et.al. 2501.06038 null
2025-01-20 Tailored Thin Films: Modulating Soft Photonics with Dynamically Tunable Large Area Microstructures via Controlled Thermal Processing Srijeeta Biswas et.al. 2501.05736 null
2025-01-02 Anti-counterfeiting tags with camouflaged QR codes on nanocavities, using polymer-dispersed-liquid-crystals Giuseppe Nicoletta et.al. 2501.02011 null
2025-01-03 Innate behavioural mechanisms and defensive traits in ecological models of predator-prey types Sangeeta Saha et.al. 2501.01687 null
2024-12-31 B2Net: Camouflaged Object Detection via Boundary Aware and Boundary Fusion Junmin Cai et.al. 2501.00426 null
2025-01-15 CGCOD: Class-Guided Camouflaged Object Detection Chenxi Zhang et.al. 2412.18977 link
2025-01-05 Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks Against GNN-Based Fraud Detectors Jinhyeok Choi et.al. 2412.18370 link
2024-12-22 Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection Yi Liu et.al. 2412.16840 link
2024-12-18 Novel AI Camera Camouflage: Face Cloaking Without Full Disguise David Noever et.al. 2412.13507 null
2024-12-14 Unconstrained Salient and Camouflaged Object Detection Zhangjun Zhou et.al. 2412.10943 null
2024-12-14 CATALOG: A Camera Trap Language-guided Contrastive Learning Model Julian D. Santamaria et.al. 2412.10624 link
2024-12-10 CapGen:An Environment-Adaptive Generator of Adversarial Patches Chaoqun Li et.al. 2412.07253 null
2024-12-02 Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes Xiaoqi Zhao et.al. 2412.01240 null
2024-11-28 COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection Xiaoqin Zhang et.al. 2411.18858 link
2024-11-15 Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors Jiawei Zhou et.al. 2411.10029 null
2024-11-10 SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains Bijoy Ahmed Saiem et.al. 2411.06426 null
2024-11-22 Financial Fraud Detection using Jump-Attentive Graph Neural Networks Prashank Kadam et.al. 2411.05857 link
2024-10-28 TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors Adonisz Dimitriu et.al. 2410.21443 null
2024-10-23 PlantCamo: Plant Camouflage Detection Jinyu Yang et.al. 2410.17598 link
2024-10-22 Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations Cheng Lei et.al. 2410.16953 null
2024-10-20 Lying mirror Yuhang Li et.al. 2410.15521 null
2024-10-15 Octopus-Swimming-Like Robot with Soft Asymmetric Arms Bobing Zhang et.al. 2410.11764 null
2024-10-05 Exploring Strengths and Weaknesses of Super-Resolution Attack in Deepfake Detection Davide Alessandro Coccomini et.al. 2410.04205 null
2024-10-05 Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection Dingwen Zhang et.al. 2410.03987 null
2024-09-27 When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation Yuli Zhou et.al. 2409.18653 link
2024-09-26 CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors Linye Lyu et.al. 2409.17963 link
2024-09-25 Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 Chunhui Zhang et.al. 2409.16902 link
2024-09-24 Phase-space gaussian ensemble quantum camouflage Alex E. Bernardini et.al. 2409.16377 null
2024-09-24 MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios Jiacheng Ruan et.al. 2409.16084 link
2024-09-19 Frequency-Guided Spatial Adaptation for Camouflaged Object Detection Shizhou Zhang et.al. 2409.12421 null
2024-09-01 NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques Leand Thaqi et.al. 2409.10547 null
2024-09-15 Optimality of Motion Camouflage Under Escape Uncertainty Mallory Gaspard et.al. 2409.09890 null
2024-09-15 GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection Yanguang Sun et.al. 2409.09588 link
2024-09-11 Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning Yingling Lu et.al. 2409.07238 link
2024-09-05 Active Fake: DeepFake Camouflage Pu Sun et.al. 2409.03200 null
2024-09-04 Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation Tiantian Zhang et.al. 2409.02567 link
2024-09-03 Frequency-Spatial Entanglement Learning for Camouflaged Object Detection Yanguang Sun et.al. 2409.01686 link
2024-09-04 ExpoSort: Breaking the quasi-polynomial-time barrier for reluctant sorting Mikkel Abrahamsen et.al. 2409.00794 null
2024-08-29 Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning Luyao Tang et.al. 2408.16310 link
2024-09-21 Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection Siyuan Yao et.al. 2408.15020 link
2024-08-26 A Survey of Camouflaged Object Detection and Beyond Fengyang Xiao et.al. 2408.14562 link
2024-08-25 Camouflaged_Object_Tracking__A_Benchmark Xiaoyu Guo et.al. 2408.13877 null
2024-08-22 BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking Hanzheng Wang et.al. 2408.12232 null
2024-08-22 Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy Hong Zhang et.al. 2408.12086 link
2024-08-20 Just a Hint: Point-Supervised Camouflaged Object Detection Huafeng Chen et.al. 2408.10777 null
2024-08-20 SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection Huafeng Chen et.al. 2408.10760 null
2024-08-20 Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory Yongxin Deng et.al. 2408.10608 null
2024-08-19 Microscopic Analysis on LLM players via Social Deduction Game Byungjun Kim et.al. 2408.09946 null
2024-08-19 Games with Planned Actions and Scouting Wolfgang Kuhle et.al. 2408.09778 null
2024-08-17 Depth-guided Texture Diffusion for Image Semantic Segmentation Wei Sun et.al. 2408.09097 null
2024-08-16 SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation Xinyu Xiong et.al. 2408.08870 link
2024-08-15 CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection Xunfa Lai et.al. 2408.08050 null
2024-08-12 Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes Ke Zhou et.al. 2408.05936 null
2024-08-10 SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More Tianrun Chen et.al. 2408.04579 null
2024-08-02 PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network Changqun Xia et.al. 2408.01137 null
2024-08-01 VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection Fei Xiao et.al. 2408.00513 null
2024-07-31 Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2 Lv Tang et.al. 2407.21596 null
2024-08-18 Global Confidence Degree Based Graph Neural Network for Financial Fraud Detection Jiaxun Liu et.al. 2407.17333 null
2024-07-18 Learning Camouflaged Object Detection from Noisy Pseudo Label Jin Zhang et.al. 2407.13157 null
2024-07-18 FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection Jianwei Zhao et.al. 2407.13133 null
2024-07-17 Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection Zhenni Yu et.al. 2407.12339 link
2024-07-10 Edge-dominance games on graphs Farid Arthaud et.al. 2407.07785 null
2024-07-02 Adversarial Magnification to Deceive Deepfake Detection through Super Resolution Davide Alessandro Coccomini et.al. 2407.02670 link
2024-06-18 PFID: Privacy First Inference Delegation Framework for LLMs Haoyan Yang et.al. 2406.12238 null
2024-06-17 YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection Tamara R. Lenhard et.al. 2406.11641 null
2024-06-09 SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention Muhammad Nawfal Meeran et.al. 2406.05802 link
2024-06-09 Utilizing Grounded SAM for self-supervised frugal camouflaged human detection Matthias Pijarowski et.al. 2406.05776 null
2024-05-25 GreenCOD: A Green Camouflaged Object Detection Method Hong-Shuo Chen et.al. 2405.16144 null
2024-05-09 Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection Xinran Liua et.al. 2405.05614 null
2024-05-10 Honeyfile Camouflage: Hiding Fake Files in Plain Sight Roelien C. Timmer et.al. 2405.04758 null
2024-05-07 Adaptive Guidance Learning for Camouflaged Object Detection Zhennan Chen et.al. 2405.02824 null
2024-05-28 Spider: A Unified Framework for Context-dependent Concept Segmentation Xiaoqi Zhao et.al. 2405.01002 link
2024-04-24 BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers Buyun He et.al. 2404.15070 link
2024-04-18 An Overview of Electromagnetic Illusions: Empowering Smart Environments with Reconfigurable Metasurfaces Hamidreza Taghvaee et.al. 2404.12089 null
2024-04-18 Enhance Robustness of Language Models Against Variation Attack through Graph Integration Zi Xiong et.al. 2404.12014 null
2024-04-13 Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage Yang Hu et.al. 2404.08936 null
2024-04-04 InsectMamba: Insect Pest Classification with State Space Model Qianning Wang et.al. 2404.03611 null
2024-04-13 LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion Pancheng Zhao et.al. 2404.00292 link
2024-03-21 Latent Diffusion Models for Attribute-Preserving Image Anonymization Luca Piano et.al. 2403.14790 null
2024-03-04 Weaponization of Conscience in Cybercrime and Online Fraud: A Novel Systems Theory Michelle Espinoza et.al. 2403.14667 null
2024-03-14 Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations Xinyu Xiong et.al. 2403.09315 null
2024-05-04 Effectiveness Assessment of Recent Large Vision-Language Models Yao Jiang et.al. 2403.04306 null
2024-03-04 Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection Xin Zhang et.al. 2403.01968 null
2024-02-29 A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection Chao Hao et.al. 2402.18922 link
2024-02-28 Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond Ziyun Yang et.al. 2402.18698 null
2024-02-28 Living-off-The-Land Reverse-Shell Detection by Informed Data Augmentation Dmitrijs Trizna et.al. 2402.18329 null
2024-02-24 RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation Jiawei Zhou et.al. 2402.15853 link
2024-02-21 Flexible Physical Camouflage Generation Based on a Differential Approach Yang Li et.al. 2402.13575 null
2024-02-15 Camouflage is all you need: Evaluating and Enhancing Language Model Robustness Against Camouflage Adversarial Attacks Álvaro Huertas-García et.al. 2402.09874 null
2024-02-16 Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues Zhiyuan Chang et.al. 2402.09091 null
2024-02-03 CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse Cunhan Guo et.al. 2402.02217 null
2024-01-29 The Reasoning Under Uncertainty Trap: A Structural AI Risk Toby D. Pilditch et.al. 2402.01743 null
2024-01-30 Camouflage Adversarial Attacks on Multiple Agent Systems Ziqing Lu et.al. 2401.17405 null
2024-01-22 Concealed Object Segmentation with Hierarchical Coherence Modeling Fengyang Xiao et.al. 2401.11767 null
2024-01-17 The problem of optimal camouflaging Alexander Plakhov et.al. 2401.08928 null
2024-01-16 Localised Thermal Emission from Topological Interfaces M. Said Ergoktas et.al. 2401.08316 null
2024-01-07 Dynamic Multi Color Switching using Ultrathin Vanadium Oxide on Aluminium based Asymmetric Fabry-Perot Resonant Structure Shubhangi Saini et.al. 2401.03543 null
2024-01-02 Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector Jitao Ma et.al. 2401.01093 null
2023-12-30 TPatch: A Triggered Physical Adversarial Patch Wenjun Zhu et.al. 2401.00148 link
2023-12-29 Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation Tuan-Anh Vu et.al. 2312.17505 null
2024-01-12 MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World Zheng Zhou et.al. 2312.17431 null
2023-12-27 Natural Adversarial Patch Generation Method Based on Latent Diffusion Model Xianyi Chen et.al. 2312.16401 null
2023-12-18 Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects Jian Hu et.al. 2312.07374 link
2023-12-06 Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation Haojie Zhang et.al. 2312.03502 link
2023-12-06 Antibody-loading of biological nanocarrier vesicles derived from red-blood-cell membranes Maryam Sanaee et.al. 2312.03417 null
2023-11-28 Large Model Based Referring Camouflaged Object Detection Shupeng Cheng et.al. 2311.17122 null
2023-11-28 Cross-level Attention with Overlapped Windows for Camouflaged Object Detection Jiepan Li et.al. 2311.16618 null
2023-11-25 VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning Ziyang Luo et.al. 2311.15011 link
2023-11-19 Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens Lv Tang et.al. 2311.11273 link
2023-11-19 Open-Vocabulary Camouflaged Object Segmentation Youwei Pang et.al. 2311.11241 link
2023-11-15 Infrared thermochromic antenna composite for self-adaptive thermoregulation Francisco V. Ramirez-Cuevas et.al. 2311.08633 null
2023-11-10 Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16 T. T Lemani et.al. 2311.05981 null

(back to top)

In-context

Publish Date Title Authors PDF Code
2025-01-23 EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents Yuhui Yun et.al. 2501.13746 null
2025-01-21 Compositional Instruction Following with Language Models and Reinforcement Learning Vanya Cohen et.al. 2501.12539 null
2025-01-21 CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification Cristiano Patrício et.al. 2501.12266 null
2025-01-21 Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs Saiful Haq et.al. 2501.11833 null
2025-01-20 Trojan Detection Through Pattern Recognition for Large Language Models Vedant Bhasin et.al. 2501.11621 null
2025-01-19 AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model Lipeng Ma et.al. 2501.11031 link
2025-01-18 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Hongjin Su et.al. 2501.10893 null
2025-01-18 Visual RAG: Expanding MLLM visual knowledge without fine-tuning Mirco Bonomo et.al. 2501.10834 null
2025-01-18 GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems Amin Robatian et.al. 2501.10734 null
2025-01-17 Tabular-TX: Theme-Explanation Structure-based Table Summarization via In-Context Learning TaeYoon Kwack et.al. 2501.10487 null
2025-01-16 Confidence Estimation for Error Detection in Text-to-SQL Systems Oleg Somov et.al. 2501.09527 null
2025-01-16 Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval Jesus Lovon et.al. 2501.09384 null
2025-01-16 A Study of In-Context-Learning-Based Text-to-SQL Errors Jiawei Shen et.al. 2501.09310 link
2025-01-16 Perspective Transition of Large Language Models for Solving Subjective Tasks Xiaolong Wang et.al. 2501.09265 null
2025-01-16 Task Vectors in In-Context Learning: Emergence, Formation, and Benefit Liu Yang et.al. 2501.09240 null
2025-01-15 Exploring Task-Level Optimal Prompts for Visual In-Context Learning Yan Zhu et.al. 2501.08841 null
2025-01-15 Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning Alain Komaty et.al. 2501.08799 null
2025-01-15 The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities Irina Bigoulaeva et.al. 2501.08716 link
2025-01-13 SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models Fabien Bernier et.al. 2501.07639 null
2025-01-13 Enhancing Retrieval-Augmented Generation: A Study of Best Practices Siran Li et.al. 2501.07391 link
2025-01-13 Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models Yongyu Mu et.al. 2501.07086 link
2025-01-12 An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering Zaber Al Hassan Ayon et.al. 2501.06837 null
2025-01-09 What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning Jelena Bratulić et.al. 2501.06256 null
2025-01-09 Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding Mohammed Elhenawy et.al. 2501.05566 null
2025-01-08 Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations Kirandeep Kaur et.al. 2501.04762 null
2025-01-08 ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training Xinfa Zhu et.al. 2501.04416 null
2025-01-09 More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives Xiaoqing Zhang et.al. 2501.04070 link
2025-01-08 A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval Shuo Tong et.al. 2501.03295 null
2025-01-06 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Beichen Zhang et.al. 2501.03226 link
2025-01-06 Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text Ali Al-Lawati et.al. 2501.03166 link
2025-01-03 Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models Lei Tang et.al. 2501.01679 null
2025-01-01 Unraveling Indirect In-Context Learning Using Influence Functions Hadi Askari et.al. 2501.01473 null
2025-01-05 Learning Spectral Methods by Transformers Yihan He et.al. 2501.01312 null
2025-01-02 Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction Alexander Brinkmann et.al. 2501.01237 link
2025-01-02 ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning Wonduk Seo et.al. 2501.01031 null
2024-12-31 Robust and Adaptive Optimization under a Large Language Model Lens Dimitris Bertsimas et.al. 2501.00568 null
2024-12-31 SPDZCoder: Teaching LLMs to Synthesize Privacy Computing Code without Massive Training Data Xiaoning Dong et.al. 2501.00363 null
2024-12-29 ICLR: In-Context Learning of Representations Core Francisco Park et.al. 2501.00070 null
2024-12-29 Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection Dmitri Roussinov et.al. 2412.20595 link
2024-12-29 Towards Neural No-Resource Language Translation: A Comparative Evaluation of Approaches Madhavendra Thakur et.al. 2412.20584 null
2024-12-27 TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data Xiang Huang et.al. 2412.19544 link
2024-12-27 Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs Zhe Yang et.al. 2412.19513 link
2024-12-26 SILC-EFSA: Self-aware In-context Learning Correction for Entity-level Financial Sentiment Analysis Senbin Zhu et.al. 2412.19140 link
2024-12-26 SketchFill: Sketch-Guided Code Generation for Imputing Derived Missing Values Yunfan Zhang et.al. 2412.19113 null
2024-12-26 Let the Rule Speak: Enhancing In-context Learning Debiasing with Interpretability Ruixi Lin et.al. 2412.19018 null
2024-12-30 TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization Yucong Luo et.al. 2412.18185 null
2024-12-24 Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner Aizierjiang Aiersilan et.al. 2412.18086 link
2024-12-23 The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting Shuzhang Cai et.al. 2412.17891 null
2024-12-22 SAIL: Sample-Centric In-Context Learning for Document Information Extraction Jinyu Zhang et.al. 2412.17092 link
2024-12-22 PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask Jeongho Kim et.al. 2412.16978 link
2024-12-22 Revisiting In-Context Learning with Long Context Language Models Jinheon Baek et.al. 2412.16926 null
2024-12-21 Dynamical Behaviors of the Gradient Flows for In-Context Learning Songtao Lu et.al. 2412.16683 null
2024-12-21 Learning Cross-Task Generalities Across Graphs via Task-trees Zehong Wang et.al. 2412.16441 null
2024-12-20 Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning? Mengyu Ye et.al. 2412.15628 null
2024-12-20 Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification Gyutae Park et.al. 2412.15603 null
2024-12-20 In-context Continual Learning Assisted by an External Continual Learner Saleh Momeni et.al. 2412.15563 null
2024-12-19 Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models Nishtha N. Vaidya et.al. 2412.15309 null
2024-12-19 LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Yushi Bai et.al. 2412.15204 link
2024-12-19 Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture Thomas F Burns et.al. 2412.15113 link
2024-12-19 MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance Hallee E. Wong et.al. 2412.15058 null
2024-12-19 DS $^2$ -ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis Hongling Xu et.al. 2412.14849 link
2024-12-19 Relational Programming with Foundation Models Ziyang Li et.al. 2412.14515 null
2024-12-18 LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning Yansheng Mao et.al. 2412.13626 null
2024-12-17 In-context learning for medical image segmentation Eichi Takaya et.al. 2412.13299 null
2024-12-17 In-Context Learning Distillation for Efficient Few-Shot Fine-Tuning Yifei Duan et.al. 2412.13243 null
2024-12-17 Jailbreaking? One Step Is Enough! Weixiong Zheng et.al. 2412.12621 null
2024-12-17 Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL Geling Liu et.al. 2412.12522 null
2024-12-16 Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering Jinhe Bi et.al. 2412.12359 link
2024-12-18 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers Seungwook Han et.al. 2412.12276 null
2024-12-16 Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning Yuti Liu et.al. 2412.11952 null
2024-12-16 PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection Sepideh Mamooler et.al. 2412.11923 null
2024-12-16 PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Kun Ouyang et.al. 2412.11906 null
2024-12-16 A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection Simon Hachmeier et.al. 2412.11851 link
2024-12-16 ColorFlow: Retrieval-Augmented Image Sequence Colorization Junhao Zhuang et.al. 2412.11815 null
2024-12-16 Embodied CoT Distillation From LLM To Off-the-shelf Agents Wonje Choi et.al. 2412.11499 null
2024-12-16 Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory Shuo Wang et.al. 2412.11459 null
2024-12-15 HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation Tengfei Liu et.al. 2412.11070 link
2024-12-14 Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning Piyapath T Spencer et.al. 2412.10960 null
2024-12-13 ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL Yang Qin et.al. 2412.10138 link
2024-12-13 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Zhihao Du et.al. 2412.10117 link
2024-12-13 RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector Zhensheng Wang et.al. 2412.10104 link
2024-12-12 A Systematic Review of Knowledge Tracing and Large Language Models in Education: Opportunities, Issues, and Future Research Yongwan Cho et.al. 2412.09248 null
2024-12-12 Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning Mateo Alejandro Rojas et.al. 2412.08955 null
2024-12-11 In-Context Learning with Topological Information for Knowledge Graph Completion Udari Madhushani Sehwag et.al. 2412.08742 null
2024-12-11 Fast Prompt Alignment for Text-to-Image Generation Khalil Mrini et.al. 2412.08639 link
2024-12-11 Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages Ashutosh Bajpai et.al. 2412.08090 link
2024-12-11 Using Large Language Models for Parametric Shape Optimization Xinxin Zhang et.al. 2412.08072 null
2024-12-11 Federated In-Context LLM Agent Learning Panlong Wu et.al. 2412.08054 null
2024-12-10 DRUM: Learning Demonstration Retriever for Large MUlti-modal Models Ellen Yi-Ge et.al. 2412.07619 null
2024-12-09 A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension Saahith Janapati et.al. 2412.06245 null
2024-12-08 Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective Andrew Jesson et.al. 2412.06033 null
2024-12-07 PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks Soumya Suvra Ghosal et.al. 2412.05710 null
2024-12-07 On the effective transfer of knowledge from English to Hindi Wikipedia Paramita Das et.al. 2412.05708 null
2024-12-06 A text-to-tabular approach to generate synthetic patient data using LLMs Margaux Tornqvist et.al. 2412.05153 link
2024-12-06 REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments Kaustubh Sridhar et.al. 2412.04759 null
2024-12-05 Improving LLM Group Fairness on Tabular Data via In-Context Learning Valeriia Cherepanova et.al. 2412.04642 null
2024-12-05 Demonstration Selection for In-Context Learning via Reinforcement Learning Xubin Wang et.al. 2412.03966 null
2024-12-09 The broader spectrum of in-context learning Andrew Kyle Lampinen et.al. 2412.03782 null
2024-12-04 Intent-driven In-context Learning for Few-shot Dialogue State Tracking Zihao Yi et.al. 2412.03270 null
2024-12-03 Minimization of Boolean Complexity in In-Context Concept Learning Leroy Z. Wang et.al. 2412.02823 null
2024-12-03 CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++? Vaishnavi Bhargava et.al. 2412.02735 null
2024-12-03 A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis Changzhi Zhou et.al. 2412.02279 null
2024-12-03 Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs Zixuan Hu et.al. 2412.02220 null
2024-12-03 VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding Kangsan Kim et.al. 2412.02186 link
2024-12-02 X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Zeyi Sun et.al. 2412.01824 link
2024-12-02 Can Large Language Models Serve as Evaluators for Code Summarization? Yang Wu et.al. 2412.01333 link
2024-12-02 RL2: Reinforce Large Language Model to Assist Safe Reinforcement Learning for Energy Management of Active Distribution Networks Xu Yang et.al. 2412.01303 null
2024-12-03 CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search Kaixin Wu et.al. 2412.01269 null
2024-12-02 Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes Xiaoqi Zhao et.al. 2412.01240 null
2024-12-03 Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation Bolin Lai et.al. 2412.01027 null
2024-12-01 Competition Dynamics Shape Algorithmic Phases of In-Context Learning Core Francisco Park et.al. 2412.01003 link
2024-11-29 In-Context Learning with Noisy Labels Junyong Kang et.al. 2411.19581 null
2024-11-29 KV Shifting Attention Enhances Language Modeling Mingyu Xu et.al. 2411.19574 link
2024-11-28 ICLERB: In-Context Learning Embedding and Reranker Benchmark Marie Al Ghossein et.al. 2411.18947 null
2024-11-27 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Jinyang Wu et.al. 2411.18478 null
2024-11-27 Curriculum Demonstration Selection for In-Context Learning Duc Anh Vu et.al. 2411.18126 null
2024-11-26 On the ERM Principle in Meta-Learning Yannay Alon et.al. 2411.17898 null
2024-11-26 MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation Harsh Singh et.al. 2411.17636 null
2024-11-26 "Stupid robot, I want to speak to a human!" User Frustration Detection in Task-Oriented Dialog Systems Mireia Hernandez Caralt et.al. 2411.17437 null
2024-11-26 Using Large Language Models for Expert Prior Elicitation in Predictive Modelling Alexander Capstick et.al. 2411.17284 link
2024-11-27 MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing Feifei Shao et.al. 2411.16773 null
2024-11-25 Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training Weimin Wu et.al. 2411.16549 null
2024-11-25 Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain Hangyul Yoon et.al. 2411.16123 link
2024-11-24 Can a Large Language Model Learn Matrix Functions In Context? Paimon Goulart et.al. 2411.15675 link
2024-11-23 Multi-label Sequential Sentence Classification via Large Language Model Mengfei Lan et.al. 2411.15623 link
2024-11-23 From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars Albert Kornilov et.al. 2411.15577 link
2024-11-23 From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set Mara Finkelstein et.al. 2411.15387 null
2024-11-22 There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks Miguel Espinosa et.al. 2411.15288 link
2024-11-22 Optimizing Social Media Annotation of HPV Vaccine Skepticism and Misinformation Using Large Language Models: An Experimental Evaluation of In-Context Learning and Fine-Tuning Stance Detection Across Multiple Models Luhang Sun et.al. 2411.14720 null
2024-11-20 Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL Zhibo Chu et.al. 2411.13244 link
2024-11-19 Instant Policy: In-Context Imitation Learning via Graph Diffusion Vitalis Vosylius et.al. 2411.12633 null
2024-11-22 SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Hongrui Jia et.al. 2411.11909 link
2024-11-18 LaVin-DiT: Large Vision Diffusion Transformer Zhaoqing Wang et.al. 2411.11505 null
2024-11-18 Re-examining learning linear functions in context Omar Naim et.al. 2411.11465 null
2024-11-18 ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification Son T. Luu et.al. 2411.11247 link
2024-11-17 AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers Jake Grigsby et.al. 2411.11188 link
2024-11-17 Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu et.al. 2411.10950 link
2024-11-16 SPICA: Retrieving Scenarios for Pluralistic In-Context Alignment Quan Ze Chen et.al. 2411.10912 null
2024-11-16 One-Layer Transformer Provably Learns One-Nearest Neighbor In Context Zihao Li et.al. 2411.10830 null
2024-11-16 IntentGPT: Few-shot Intent Discovery with Large Language Models Juan A. Rodriguez et.al. 2411.10670 null
2024-11-15 Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data Kai Helli et.al. 2411.10634 null
2024-11-15 Does Prompt Formatting Have Any Impact on LLM Performance? Jia He et.al. 2411.10541 null
2024-11-15 Zero-shot Voice Conversion with Diffusion Transformers Songting Liu et.al. 2411.09943 link
2024-11-14 Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models Kirill Vasilevski et.al. 2411.09837 null
2024-11-14 StreamAdapter: Efficient Test Time Adaptation from Contextual Streams Dilxat Muhtar et.al. 2411.09289 null
2024-11-13 XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL Yingqi Gao et.al. 2411.08599 link
2024-11-13 Towards Optimizing a Retrieval Augmented Generation using Large Language Model on Academic Data Anum Afzal et.al. 2411.08438 null
2024-11-12 Decision Feedback In-Context Symbol Detection over Block-Fading Channels Li Fan et.al. 2411.07600 null
2024-11-11 Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks Madeline Brumley et.al. 2411.07213 null
2024-11-11 Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation Kaijian Zou et.al. 2411.07130 null
2024-11-11 Universal Response and Emergence of Induction in LLMs Niclas Luick et.al. 2411.07071 null
2024-11-10 In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages Joseph Gatto et.al. 2411.06549 link
2024-11-10 One controller to rule them all Riccardo Busetto et.al. 2411.06482 null
2024-11-09 A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization Haoxin Liu et.al. 2411.06018 null
2024-11-08 Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass Tong Chen et.al. 2411.05877 null
2024-11-14 SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark Sithursan Sivasubramaniam et.al. 2411.05521 link
2024-11-08 WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning Xiangyu Zhao et.al. 2411.05420 null
2024-11-07 Adversarial Robustness of In-Context Learning in Transformers for Linear Regression Usman Anwar et.al. 2411.05189 null
2024-11-07 Vision Language Models are In-Context Value Learners Yecheng Jason Ma et.al. 2411.04549 null
2024-11-06 Enhancing Security Control Production With Generative AI Chen Ling et.al. 2411.04284 null
2024-11-06 Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences Niklas Schmidinger et.al. 2411.04165 link
2024-11-06 Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval Davide Buoso et.al. 2411.04006 null
2024-11-06 Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks Ryan Campbell et.al. 2411.03945 link
2024-11-06 EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning Kiran Purohit et.al. 2411.03877 link
2024-11-06 From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Harsha Nori et.al. 2411.03590 null
2024-11-05 Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature Viviane Torres da Silva et.al. 2411.03484 null
2024-11-05 LLMs for Domain Generation Algorithm Detection Reynier Leyva La O et.al. 2411.03307 null
2024-11-05 Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation Francisco Giral et.al. 2411.02975 null
2024-11-05 Mixtures of In-Context Learners Giwon Hong et.al. 2411.02830 null
2024-11-04 Fair In-Context Learning via Latent Concept Variables Karuna Bhaila et.al. 2411.02671 link
2024-11-04 TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos Leonardo Plini et.al. 2411.02570 link
2024-11-04 TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel et.al. 2411.02545 null
2024-11-04 Pretrained transformer efficiently learns low-dimensional target functions in-context Kazusato Oko et.al. 2411.02544 null
2024-11-04 Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages Hoang Nguyen et.al. 2411.02398 null
2024-11-04 Defining and Evaluating Physical Safety for Large Language Models Yung-Chen Tang et.al. 2411.02317 null
2024-11-04 Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning Dake Bu et.al. 2411.02199 null
2024-11-04 Shortcut Learning in In-Context Learning: A Survey Rui Song et.al. 2411.02018 null
2024-11-04 N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs Ilya Zisman et.al. 2411.01958 null
2024-11-03 Robust Neural Processes for Noisy Data Chen Shapira et.al. 2411.01670 null
2024-11-01 Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization Zeyuan Ma et.al. 2411.00625 link
2024-11-01 STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing Jiaru Zou et.al. 2411.00387 null
2024-10-31 In-Context Learned Equalization in Cell-Free Massive MIMO via State-Space Models Zihang Song et.al. 2410.23882 null
2024-10-31 Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales? Zhanke Zhou et.al. 2410.23856 link
2024-10-31 What is Wrong with Perplexity for Long-context Language Modeling? Lizhe Fang et.al. 2410.23771 link
2024-10-31 Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs Shuyang Yu et.al. 2410.23605 null
2024-11-01 Large Language Models for Patient Comments Multi-Label Classification Hajar Sakai et.al. 2410.23528 null
2024-10-30 EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning Peide Huang et.al. 2410.23234 null
2024-10-30 Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning Keqin Bao et.al. 2410.23136 link
2024-10-30 Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning Dong Shu et.al. 2410.23099 link
2024-10-30 Toward Understanding In-context vs. In-weight Learning Bryan Chan et.al. 2410.23042 null
2024-10-29 Improving In-Context Learning with Small Language Model Ensembles M. Mehdi Mojarradi et.al. 2410.21868 link
2024-10-29 On the Role of Depth and Looping for In-Context Learning with Task Diversity Khashayar Gatmiry et.al. 2410.21698 null
2024-10-28 CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity Yutong Cheng et.al. 2410.21060 null
2024-10-28 Matryoshka: Learning to Drive Black-Box LLMs with LLMs Changhao Li et.al. 2410.20749 null
2024-10-27 What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration Libo Qin et.al. 2410.20482 null
2024-10-27 Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications Xilun Zhang et.al. 2410.20357 null
2024-10-26 DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning Xinyu Tang et.al. 2410.20215 link
2024-10-26 RARe: Retrieval Augmented Retrieval with In-Context Examples Atula Tejaswi et.al. 2410.20088 link
2024-10-25 SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies Weiqin Chen et.al. 2410.19982 null
2024-10-24 Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models Yue Li et.al. 2410.19195 null
2024-10-24 Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code Jipeng Zhang et.al. 2410.18957 null
2024-10-24 GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning Rita Ramos et.al. 2410.18702 null
2024-10-23 TabDPT: Scaling Tabular Foundation Models Junwei Ma et.al. 2410.18164 link
2024-10-23 Scaling Diffusion Language Models via Adaptation from Autoregressive Models Shansan Gong et.al. 2410.17891 link
2024-10-23 Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks Paul Smolensky et.al. 2410.17498 null
2024-10-22 In Context Learning and Reasoning for Symbolic Regression with Large Language Models Samiha Sharlin et.al. 2410.17448 link
2024-10-22 Interpreting Affine Recurrence Learning in GPT-style Transformers Samarth Bhargav et.al. 2410.17438 null
2024-10-22 Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods Tsachi Blau et.al. 2410.17222 null
2024-10-22 Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models Zhijie Tan et.al. 2410.16983 null
2024-10-21 Can Transformers In-Context Learn Behavior of a Linear Dynamical System? Usman Akram et.al. 2410.16546 null
2024-10-21 A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration Yingqian Cui et.al. 2410.16540 null
2024-10-21 Bayesian scaling laws for in-context learning Aryaman Arora et.al. 2410.16531 link
2024-10-21 Analyzing Context Contributions in LLM-based Machine Translation Emmanouil Zaranis et.al. 2410.16246 null
2024-10-21 CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning Kumar Manas et.al. 2410.16207 null
2024-10-20 How Aligned are Generative Models to Humans in High-Stakes Decision-Making? Sarah Tan et.al. 2410.15471 null
2024-10-20 BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression Yuankai Li et.al. 2410.15277 link
2024-10-18 Provable In-context Learning for Mixture of Linear Regressions using Transformers Yanhao Jin et.al. 2410.14183 null
2024-10-18 LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems Nan Xu et.al. 2410.14166 null
2024-10-17 In-context learning and Occam's razor Eric Elmoznino et.al. 2410.14086 link
2024-10-17 Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection Chuhong Mai et.al. 2410.14049 null
2024-10-17 Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles Xiao Pu et.al. 2410.14042 null
2024-10-17 Personalized Adaptation via In-Context Preference Learning Allison Lau et.al. 2410.14001 null
2024-10-17 On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery Renpu Liu et.al. 2410.13981 null
2024-10-18 BenTo: Benchmark Task Reduction with In-Context Transferability Hongyu Zhao et.al. 2410.13804 link
2024-10-18 Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors Georgios Chochlakis et.al. 2410.13776 null
2024-10-17 MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Andreas Opedal et.al. 2410.13502 null
2024-10-17 Repetition Neurons: How Do Language Models Produce Repetitions? Tatsuya Hiraoka et.al. 2410.13497 null
2024-10-17 Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models Yu Yuan et.al. 2410.13343 null
2024-10-17 Retrieval-Enhanced Named Entity Recognition Enzo Shiraishi et.al. 2410.13118 null
2024-10-16 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization Ruiqi Li et.al. 2410.12957 null
2024-10-16 Context-Scaling versus Task-Scaling in In-Context Learning Amirhesam Abedsoltan et.al. 2410.12783 null
2024-10-16 In-Context Learning Enables Robot Action Prediction in LLMs Yida Yin et.al. 2410.12782 null
2024-10-16 A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning Yuanning Cui et.al. 2410.12288 link
2024-10-16 Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection Yong Xie et.al. 2410.12278 null
2024-10-16 Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree Harbani Jaggi et.al. 2410.12217 null
2024-10-15 Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning Fengyu Gao et.al. 2410.12085 null
2024-10-15 Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability Tsz Ting Chung et.al. 2410.11786 null
2024-10-15 On the Training Convergence of Transformers for In-Context Classification Wei Shen et.al. 2410.11778 null
2024-10-15 Zero-shot Model-based Reinforcement Learning using Large Language Models Abdelhakim Benechehab et.al. 2410.11711 link
2024-10-15 State-space models can learn in-context by gradient descent Neeraj Mohan Sushma et.al. 2410.11687 null
2024-10-15 BSM: Small but Powerful Biological Sequence Model for Genes and Proteins Weixi Xiang et.al. 2410.11499 null
2024-10-16 How Transformers Implement Induction Heads: Approximation and Optimization Analysis Mingze Wang et.al. 2410.11474 null
2024-10-15 Instructive Code Retriever: Learn from Large Language Model's Feedback for Code Intelligence Tasks Jiawei Lu et.al. 2410.11300 link
2024-10-15 Cognitive Overload Attack:Prompt Injection for Long Context Bibek Upadhayay et.al. 2410.11272 link
2024-10-15 Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent Bo Chen et.al. 2410.11268 null
2024-10-15 In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions Alireza Shamshiri et.al. 2410.11265 null
2024-10-15 SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Enze Xie et.al. 2410.10629 null
2024-10-14 Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification? Gabriel Roccabruna et.al. 2410.10476 link
2024-10-14 KBLaM: Knowledge Base augmented Language Model Xi Wang et.al. 2410.10450 null
2024-10-14 Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement Joseph Shtok et.al. 2410.10348 null
2024-10-14 Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies Jiajie Yu et.al. 2410.10212 null
2024-10-14 Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning Chengsong Huang et.al. 2410.10074 link
2024-10-13 Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models Chengshuai Shi et.al. 2410.09701 null
2024-10-13 Can In-context Learning Really Generalize to Out-of-distribution Tasks? Qixun Wang et.al. 2410.09695 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-12 Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study Pengfei He et.al. 2410.09411 null
2024-10-11 On-Chip Learning via Transformer In-Context Learning Jan Finkbeiner et.al. 2410.08711 null
2024-10-11 StraGo: Harnessing Strategic Guidance for Prompt Optimization Yurong Wu et.al. 2410.08601 null
2024-10-10 SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation Guanhua Zhang et.al. 2410.08356 null
2024-10-10 Metalic: Meta-Learning In-Context with Protein Language Models Jacob Beck et.al. 2410.08355 null
2024-10-10 Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? Khashayar Gatmiry et.al. 2410.08292 null
2024-10-10 Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning David D. Baek et.al. 2410.08255 null
2024-10-10 Uncovering Overfitting in Large Language Model Editing Mengqi Zhang et.al. 2410.07819 null
2024-10-10 Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data Can Wang et.al. 2410.07737 link
2024-10-10 DemoShapley: Valuation of Demonstrations for In-Context Learning Shan Xie et.al. 2410.07523 null
2024-10-09 Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning Abhinav Bandari et.al. 2410.07461 link
2024-10-09 Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning Zhengyu Hu et.al. 2410.07074 null
2024-10-09 Retrieval-Augmented Decision Transformer: External Memory for In-context RL Thomas Schmied et.al. 2410.07071 link
2024-10-09 Generative Model for Less-Resourced Language with 1 billion parameters Domen Vreš et.al. 2410.06898 null
2024-10-10 Mind Your Questions! Towards Backdoor Attacks on Text-to-Visualization Models Shuaimin Li et.al. 2410.06782 null
2024-10-09 Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance? Fumiya Uchiyama et.al. 2410.06735 link
2024-10-09 Tree of Problems: Improving structured problem solving with compositionality Armel Zebaze et.al. 2410.06634 link
2024-10-09 MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data Mingu Kang et.al. 2410.06442 null
2024-10-08 Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content? Shenbin Qian et.al. 2410.06338 link
2024-10-08 The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning Xiyan Fu et.al. 2410.06272 link
2024-10-08 ConML: A Universal Meta-Learning Framework with Task-Level Contrastive Learning Shiguang Wu et.al. 2410.05975 null
2024-10-07 Differential Transformer Tianzhu Ye et.al. 2410.05258 link
2024-10-07 Density estimation with LLMs: a geometric investigation of in-context learning trajectories Toni J. B. Liu et.al. 2410.05218 null
2024-10-08 A Simple Image Segmentation Framework via In-Context Examples Yang Liu et.al. 2410.04842 link
2024-10-07 Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning Qingyu Yin et.al. 2410.04691 link
2024-10-06 GAMformer: In-Context Learning for Generalized Additive Models Andreas Mueller et.al. 2410.04560 null
2024-10-06 Revisiting In-context Learning Inference Circuit in Large Language Models Hakaze Cho et.al. 2410.04468 null
2024-10-06 Inference Scaling for Long-Context Retrieval Augmented Generation Zhenrui Yue et.al. 2410.04343 null
2024-10-05 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning Gang Liu et.al. 2410.04223 link
2024-10-04 PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models Lemei Zhang et.al. 2410.03905 link
2024-10-08 Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs Louis Serrano et.al. 2410.03437 null
2024-10-04 Enhanced Transformer architecture for in-context learning of dynamical systems Matteo Rufolo et.al. 2410.03291 null
2024-10-04 Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models Yuxiang Zhang et.al. 2410.03212 null
2024-10-04 Generating bilingual example sentences with large language models as lexicography assistants Raphael Merx et.al. 2410.03182 link
2024-10-04 In-context Learning in Presence of Spurious Correlations Hrayr Harutyunyan et.al. 2410.03140 link
2024-10-04 On Unsupervised Prompt Learning for Classification with Black-box Language Models Zhen-Yu Zhang et.al. 2410.03124 null
2024-10-04 RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning Zihao Zhao et.al. 2410.03122 link
2024-10-03 Demonstration Attack against In-Context Learning for Code Intelligence Yifei Ge et.al. 2410.02841 null
2024-10-03 ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI Ahmad Elawady et.al. 2410.02751 link
2024-10-04 IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models Tuo An et.al. 2410.02429 null
2024-10-04 Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation Muzhi Zhu et.al. 2410.02369 link
2024-10-03 Simplicity bias and optimization threshold in two-layer ReLU networks Etienne Boursier et.al. 2410.02348 null
2024-10-03 Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference Wei Cheng et.al. 2410.02210 null
2024-10-03 GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning Jiale Fu et.al. 2410.02203 null
2024-10-03 Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis Hongkang Li et.al. 2410.02167 null
2024-10-02 Intent Detection in the Age of LLMs Gaurav Arora et.al. 2410.01627 null
2024-10-02 ENTP: Encoder-only Next Token Prediction Ethan Ewer et.al. 2410.01600 null
2024-10-02 Bayes' Power for Explaining In-Context Learning Generalizations Samuel Müller et.al. 2410.01565 link
2024-10-02 In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks Dingzirui Wang et.al. 2410.01548 link
2024-10-02 Disentangling Latent Shifts of In-Context Learning Through Self-Training Josip Jukić et.al. 2410.01508 null
2024-10-02 SecCoder: Towards Generalizable and Robust Secure Code Generation Boyu Zhang et.al. 2410.01488 null
2024-10-02 Agent-Driven Large Language Models for Mandarin Lyric Generation Hong-Hsiang Liu et.al. 2410.01450 null
2024-10-02 Unveiling Language Skills under Circuits Hang Chen et.al. 2410.01334 link
2024-10-03 Mitigating Copy Bias in In-Context Learning through Neuron Pruning Ameen Ali et.al. 2410.01288 null
2024-10-02 Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models Can Demircan et.al. 2410.01280 null
2024-09-30 Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments Mohamed Elnoor et.al. 2409.20445 null
2024-09-30 PersonalLLM: Tailoring LLMs to Individual Preferences Thomas P. Zollo et.al. 2409.20296 link
2024-09-30 TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks Areeg Fahad Rasheed et.al. 2409.20189 link
2024-09-30 Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models Luohe Shi et.al. 2409.20181 null
2024-09-30 Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis Luka Andrenšek et.al. 2409.20054 null
2024-09-29 Efficient Long-Form Speech Recognition for General Speech In-Context Learning Hao Yen et.al. 2409.19757 null
2024-10-02 T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition Chen Yeh et.al. 2409.19734 link
2024-09-26 AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models Xin Hong et.al. 2409.18339 null
2024-09-26 Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion Hengrui Gu et.al. 2409.17928 link
2024-09-25 Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? Bowen Zhao et.al. 2409.17080 link
2024-09-26 Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition Pritika Ramu et.al. 2409.17073 null
2024-09-25 A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates Paulina Garcia Corral et.al. 2409.16807 null
2024-09-24 Do the Right Thing, Just Debias! Multi-Category Bias Mitigation Using LLMs Amartya Roy et.al. 2409.16371 null
2024-09-26 In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations Moucheng Xu et.al. 2409.15867 link
2024-09-24 Small Language Models: Survey, Measurements, and Insights Zhenyan Lu et.al. 2409.15790 link
2024-09-24 Making Text Embedders Few-Shot Learners Chaofan Li et.al. 2409.15700 link
2024-09-23 Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction Yuanchao Li et.al. 2409.15551 link
2024-09-23 In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models Pengrui Han et.al. 2409.15454 link
2024-09-24 PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models Zhiyuan Wang et.al. 2409.15188 link
2024-09-23 A Controlled Study on Long Context Extension and Generalization in LLMs Yi Lu et.al. 2409.12181 link
2024-09-18 M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Jiaming Zhou et.al. 2409.11889 null
2024-09-18 Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation Kejia Chen et.al. 2409.11863 null
2024-09-18 RoboMorph: In-Context Meta-Learning for Robot Dynamics Modeling Manuel Bianchi Bazzi et.al. 2409.11815 null
2024-09-18 RUIE: Retrieval-based Unified Information Extraction using Large Language Model Xincheng Liao et.al. 2409.11673 null
2024-09-17 HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection Theo King et.al. 2409.11579 link
2024-09-17 THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models Mengfei Liang et.al. 2409.11353 link
2024-09-17 Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Maojia Song et.al. 2409.11242 link
2024-09-17 Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning Yukang Lin et.al. 2409.11147 link
2024-09-17 Semformer: Transformer Language Models with Semantic Planning Yongjing Yin et.al. 2409.11143 null
2024-09-18 Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming Chalamalasetti Kranti et.al. 2409.11041 null
2024-09-16 LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning Jicong Ao et.al. 2409.10444 link
2024-09-16 Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages Ming-Hao Hsu et.al. 2409.10429 null
2024-09-16 From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs Navya Jain et.al. 2409.10245 null
2024-09-16 Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization Xiaoxue Gao et.al. 2409.10157 null
2024-09-16 SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL Ke Shen et.al. 2409.10007 link
2024-09-15 AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs Madhusudan Ghosh et.al. 2409.09704 link
2024-09-14 Language Models "Grok" to Copy Ang Lv et.al. 2409.09281 null
2024-09-13 Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach Siqi Li et.al. 2409.09009 link
2024-09-13 LLM-based Weak Supervision Framework for Query Intent Classification in Video Search Farnoosh Javadi et.al. 2409.08931 null
2024-09-13 LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation Shaojun Li et.al. 2409.08597 null
2024-09-12 Fine-tuning Large Language Models for Entity Matching Aaron Steiner et.al. 2409.08185 link
2024-09-11 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Praveen K Kanithi et.al. 2409.07314 null
2024-09-10 Quantifying and Enabling the Interpretability of CLIP-like Models Avinash Madasu et.al. 2409.06579 null
2024-09-10 Inference is All You Need: Self Example Retriever for Cross-domain Dialogue State Tracking with ChatGPT Jihyun Lee et.al. 2409.06243 null
2024-09-10 Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks Georgios Chochlakis et.al. 2409.06173 link
2024-09-09 Seek and Solve Reasoning for Table Question Answering Ruya Jiang et.al. 2409.05286 null
2024-09-10 Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion Zhengyang Chen et.al. 2409.05004 null
2024-09-07 MILE: A Mutation Testing Framework of In-Context Learning Systems Zeming Wei et.al. 2409.04831 link
2024-09-06 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs Aliakbar Nafar et.al. 2409.04318 link
2024-09-06 Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers Gorka Abad et.al. 2409.04142 null
2024-09-05 CACER: Clinical Concept Annotations for Cancer Events and Relations Yujuan Fu et.al. 2409.03905 link
2024-09-07 The representation landscape of few-shot learning and fine-tuning in large language models Diego Doimo et.al. 2409.03662 link
2024-09-05 FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications Hao-Han Guo et.al. 2409.03283 null
2024-09-03 How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Saeid Asgari Taghanaki et.al. 2409.02253 link
2024-09-03 Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs Zhuo Li et.al. 2409.01552 null
2024-09-03 Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition Yaozong Gan et.al. 2409.01534 null
2024-09-02 The Compressor-Retriever Architecture for Language Model OS Yuan Yang et.al. 2409.01495 link
2024-09-02 PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science Menglin Liu et.al. 2409.01466 null
2024-09-02 Membership Inference Attacks Against In-Context Learning Rui Wen et.al. 2409.01380 null
2024-08-30 AWRaCLe: All-Weather Image Restoration using Visual In-Context Learning Sudarshan Rajagopalan et.al. 2409.00263 null
2024-08-28 Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning Momin Abbas et.al. 2409.00124 null
2024-08-29 DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving Yongjie Fu et.al. 2408.16647 null
2024-08-29 Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning Rochelle Choenni et.al. 2408.16482 null
2024-08-28 Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games Nicholas R. Waytowich et.al. 2408.15950 null
2024-09-04 Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models Hédi Zeghidi et.al. 2408.15796 link
2024-08-28 Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings Lingyu Gao et.al. 2408.15650 null
2024-08-26 MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Kuluhan Binici et.al. 2408.14418 null
2024-08-26 Probing Causality Manipulation of Large Language Models Chenyang Zhang et.al. 2408.14380 link
2024-09-03 Foundation Models for Music: A Survey Yinghao Ma et.al. 2408.14340 link
2024-08-26 Epidemic Information Extraction for Event-Based Surveillance using Large Language Models Sergio Consoli et.al. 2408.14277 null
2024-08-26 Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach Vittoriano Muttillo et.al. 2408.14259 null
2024-08-26 Focused Large Language Models are Stable Many-Shot Learners Peiwen Yuan et.al. 2408.13987 null
2024-08-24 Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models Sakhinana Sagar Srinivas et.al. 2408.13621 null
2024-08-23 In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting Haowei Du et.al. 2408.13028 null
2024-08-23 Multimodal Contrastive In-Context Learning Yosuke Miyanishi et.al. 2408.12959 null
2024-08-23 Causal-Guided Active Learning for Debiasing Large Language Models Zhouhao Sun et.al. 2408.12942 link
2024-08-23 Investigating LLM Applications in E-Commerce Chester Palen-Michel et.al. 2408.12779 null
2024-08-22 Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models Meiyun Wang et.al. 2408.12326 link
2024-08-22 Transformers are Minimax Optimal Nonparametric In-Context Learners Juno Kim et.al. 2408.12186 null
2024-08-26 uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization Aishik Nagar et.al. 2408.12095 null
2024-08-22 Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs Ronit Singhal et.al. 2408.12060 link
2024-08-21 Memorization In In-Context Learning Shahriar Golchin et.al. 2408.11546 null
2024-08-20 Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks Nathaniel Pinckney et.al. 2408.11053 link
2024-08-20 Benchmarking Large Language Models for Math Reasoning Tasks Kathrin Seßler et.al. 2408.10839 link
2024-08-19 Self-Refined Generative Foundation Models for Wireless Traffic Prediction Chengming Hu et.al. 2408.10390 null
2024-08-19 In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Yang et.al. 2408.10147 null
2024-08-19 Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning Jingyu Hu et.al. 2408.09757 null
2024-08-19 Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts Jiaqing Liu et.al. 2408.09688 null
2024-08-18 Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song et.al. 2408.09503 link
2024-08-16 Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning Jinwei Hu et.al. 2408.08959 null
2024-08-16 xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Le Xue et.al. 2408.08872 null
2024-08-20 Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions Chenming Tang et.al. 2408.08780 null
2024-08-16 LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression Yuqi Ye et.al. 2408.08682 null
2024-08-15 ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models Faris Hijazi et.al. 2408.07983 link
2024-08-16 MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL Wenxuan Xie et.al. 2408.07930 link
2024-08-14 Cropper: Vision-Language Model for Image Cropping through In-Context Learning Seung Hyun Lee et.al. 2408.07790 null
2024-08-14 Large Language Models Know What Makes Exemplary Contexts Quanyu Long et.al. 2408.07505 null
2024-08-13 SceneGPT: A Language Model for 3D Scene Understanding Shivam Chandhok et.al. 2408.06926 null
2024-08-13 HLSPilot: LLM-based High-Level Synthesis Chenwei Xiong et.al. 2408.06810 link
2024-08-12 Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning Chuanneng Sun et.al. 2408.06520 null
2024-08-12 Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models Yen-Che Hsiao et.al. 2408.06458 link
2024-08-11 LLM-Based Robust Product Classification in Commerce and Compliance Sina Gholamian et.al. 2408.05874 null
2024-08-10 In-Context Exploiter for Extensive-Form Games Shuxin Li et.al. 2408.05575 null
2024-08-10 Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction Jung Hoon Lim et.al. 2408.05555 null
2024-08-10 LaiDA: Linguistics-aware In-context Learning with Data Augmentation for Metaphor Components Identification Hongde Liu et.al. 2408.05404 link
2024-08-09 SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation Chenming Tang et.al. 2408.04872 link
2024-08-06 LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations Lei Shi et.al. 2408.04665 null
2024-08-08 Learning Fine-Grained Grounded Citations for Attributed Large Language Models Lei Huang et.al. 2408.04568 link
2024-08-08 How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression Xingwu Chen et.al. 2408.04532 null
2024-08-08 Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning Seong-Il Park et.al. 2408.04414 null
2024-08-07 Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Zaijing Li et.al. 2408.03615 link
2024-08-06 Can LLMs Serve As Time Series Anomaly Detectors? Manqing Dong et.al. 2408.03475 null
2024-08-06 Pre-training and in-context learning IS Bayesian inference a la De Finetti Naimeng Ye et.al. 2408.03307 null
2024-08-06 Enhancing Complex Causality Extraction via Improved Subtask Interaction and Knowledge Fusion Jinglong Gao et.al. 2408.03079 null
2024-08-06 Hide and Seek: Fingerprinting Large Language Models with Evolutionary Learning Dmitri Iourovitski et.al. 2408.02871 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-05 OneLove beyond the field -- A few-shot pipeline for topic and sentiment analysis during the FIFA World Cup in Qatar Christoph Rauchegger et.al. 2408.02520 null
2024-08-05 A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models Vanni Zavarella et.al. 2408.02377 null
2024-08-05 Spin glass model of in-context learning Yuhao Li et.al. 2408.02288 null
2024-08-04 Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process Peng Wang et.al. 2408.02103 null
2024-08-04 Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages Tomáš Filip et.al. 2408.02044 null
2024-08-03 Can LLMs predict the convergence of Stochastic Gradient Descent? Oussama Zekri et.al. 2408.01736 null
2024-08-02 OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models Zeyang Ma et.al. 2408.01585 link
2024-08-02 NOLO: Navigate Only Look Once Bohan Zhou et.al. 2408.01384 null
2024-08-02 Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs Phillip Schneider et.al. 2408.01088 link
2024-08-02 ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models Hojae Han et.al. 2408.00994 link
2024-08-01 Intermittent Semi-working Mask: A New Masking Paradigm for LLMs Mingcong Lu et.al. 2408.00539 null
2024-08-01 Jailbreaking Text-to-Image Models with LLM-Based Agents Yingkai Dong et.al. 2408.00523 null
2024-08-01 In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation Armel Zebaze et.al. 2408.00397 link
2024-08-01 Adversarial Text Rewriting for Text-aware Recommender Systems Sejoon Oh et.al. 2408.00312 link
2024-08-01 QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression Wenshan Wang et.al. 2408.00274 link
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-31 Distributed In-Context Learning under Non-IID Among Clients Siqi Liang et.al. 2408.00144 null
2024-07-31 Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM Can Wang et.al. 2407.21333 null
2024-07-27 LawLLM: Law Large Language Model for the US Legal System Dong Shu et.al. 2407.21065 null
2024-07-30 SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition Hao Tan et.al. 2407.20920 null
2024-07-30 SceneTeller: Language-to-3D Scene Generation Başak Melis Öcal et.al. 2407.20727 null
2024-07-30 CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge Tianshi Zheng et.al. 2407.20564 null
2024-07-29 AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs Muhammad Arbab Arshad et.al. 2407.19617 null
2024-07-27 Polynomial Regression as a Task for Understanding In-context Learning Through Finetuning and Alignment Max Wilcoxson et.al. 2407.19346 link
2024-07-27 Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications Till Speicher et.al. 2407.19262 null
2024-07-26 Many-Shot In-Context Learning for Molecular Inverse Design Saeed Moayedpour et.al. 2407.19089 null
2024-07-24 Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning Hongwei Jin et.al. 2407.17545 link
2024-07-24 Grammar-based Game Description Generation using Large Language Models Tsunehiko Tanaka et.al. 2407.17404 null
2024-07-24 Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism Anhao Zhao et.al. 2407.17011 link
2024-07-24 SelfPiCo: Self-Guided Partial Code Execution with LLMs Zhipeng Xue et.al. 2407.16974 null
2024-07-23 Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack Xiaoyue Xu et.al. 2407.16695 link
2024-07-23 Can Large Language Models Automatically Jailbreak GPT-4V? Yuanwei Wu et.al. 2407.16686 null
2024-07-23 Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data Julian Schelb et.al. 2407.16516 null
2024-07-23 Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction Rithik Sachdev et.al. 2407.16370 link
2024-07-23 PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing Blazej Manczak et.al. 2407.16318 link
2024-07-22 Multilingual Fine-Grained News Headline Hallucination Detection Jiaming Shen et.al. 2407.15975 null
2024-07-22 Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability Zhuoyan Xu et.al. 2407.15720 link
2024-07-22 In-Context Learning Improves Compositional Understanding of Vision-Language Models Matteo Nulli et.al. 2407.15487 link
2024-07-22 ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning Senbin Zhu et.al. 2407.15341 null
2024-07-21 MIBench: Evaluating Multimodal Large Language Models over Multiple Images Haowei Liu et.al. 2407.15272 null
2024-07-19 Prompted Aspect Key Point Analysis for Quantitative Review Summarization An Quang Tang et.al. 2407.14049 link
2024-07-19 ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness? Siddhant Waghjale et.al. 2407.14044 link
2024-07-18 FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking Zhuoer Wang et.al. 2407.13945 null
2024-07-18 Large Language Models as Reliable Knowledge Bases? Danna Zheng et.al. 2407.13578 null
2024-07-18 Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks Samy Ateia et.al. 2407.13511 link
2024-07-18 Learning-From-Mistakes Prompting for Indigenous Language Translation You-Cheng Liao et.al. 2407.13343 null
2024-07-17 R+X: Retrieval and Execution from Everyday Human Videos Georgios Papagiannis et.al. 2407.12957 null
2024-07-16 Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection Ye Jiang et.al. 2407.12879 null
2024-07-17 Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning Mustafa Dogan et.al. 2407.12498 null
2024-07-16 Private prediction for large-scale synthetic text generation Kareem Amin et.al. 2407.12108 null
2024-07-16 AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization Anum Afzal et.al. 2407.11591 link
2024-07-16 Reasoning with Large Language Models, a Survey Aske Plaat et.al. 2407.11511 null
2024-07-16 Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach Sojung Lucia Kim et.al. 2407.11368 null
2024-07-16 Large Vision-Language Models as Emotion Recognizers in Context Awareness Yuxuan Lei et.al. 2407.11300 null
2024-07-15 Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection Chenwei Wu et.al. 2407.11188 null
2024-07-15 GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM Keshav Bimbraw et.al. 2407.10870 null
2024-07-16 Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Yulong Wang et.al. 2407.10718 link
2024-07-15 Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems Yunxiao Shi et.al. 2407.10670 link
2024-07-14 Visual Prompt Selection for In-Context Learning Segmentation Wei Suo et.al. 2407.10233 link
2024-07-13 Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond Yingcong Li et.al. 2407.10005 null
2024-07-12 HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context Federico Arangath Joseph et.al. 2407.09375 null
2024-07-12 SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Yuzhang Tian et.al. 2407.09025 null
2024-07-12 Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models Ye Liu et.al. 2407.08967 link
2024-07-12 Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection Ye Liu et.al. 2407.08952 null
2024-07-11 DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding Jincen Jiang et.al. 2407.08801 null
2024-07-12 RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL Zhenhe Wu et.al. 2407.08273 null
2024-07-10 Video In-context Learning Wentao Zhang et.al. 2407.07356 null
2024-07-09 Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning J. Crosbie et.al. 2407.07011 null
2024-07-09 ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization Wai Man Si et.al. 2407.06955 null
2024-07-08 Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs Sanjeet Singh et.al. 2407.05887 link
2024-07-08 Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition Yaozong Gan et.al. 2407.05814 null
2024-07-08 Empirical Study of Symmetrical Reasoning in Conversational Chatbots Daniela N. Rim et.al. 2407.05734 null
2024-07-08 FairPFN: Transformers Can do Counterfactual Fairness Jake Robertson et.al. 2407.05732 null
2024-07-08 Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation Jian Qian et.al. 2407.05693 link
2024-07-08 Retrieved In-Context Principles from Previous Mistakes Hao Sun et.al. 2407.05682 null
2024-07-08 GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks Xuan Wang et.al. 2407.05566 null
2024-07-07 Just read twice: closing the recall gap for recurrent language models Simran Arora et.al. 2407.05483 link
2024-07-04 FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Tongyi SpeechTeam et.al. 2407.04051 link
2024-07-03 Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning Zhili Shen et.al. 2407.03227 null
2024-07-03 Exploring the Capabilities of LLMs for Code Change Related Tasks Lishui Fan et.al. 2407.02824 link
2024-07-02 Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms Viet Cuong Nguyen et.al. 2407.02662 null
2024-07-02 RVISA: Reasoning and Verification for Implicit Sentiment Analysis Wenna Lai et.al. 2407.02340 null
2024-07-02 Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts Chunlan Ma et.al. 2407.02320 null
2024-07-02 Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks Adrian Rebmann et.al. 2407.02310 link
2024-07-02 Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions Xiang Li et.al. 2407.02028 link
2024-07-02 SADL: An Effective In-Context Learning Method for Compositional Visual QA Long Hoang Dang et.al. 2407.01983 null
2024-07-03 MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation Yongan Zhang et.al. 2407.01910 link
2024-07-01 Dynamic Few-Shot Learning for Knowledge Graph Question Answering Jacopo D'Abramo et.al. 2407.01409 null
2024-07-01 TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval Wenbo Xu et.al. 2407.01183 null
2024-07-01 Can Small Language Models Learn, Unlearn, and Retain Noise Patterns? Nicy Scaria et.al. 2407.00996 link
2024-07-01 Universal Approximation Theory: The basic theory for large language models Wei Wang et.al. 2407.00958 null
2024-06-28 Mining Reasons For And Against Vaccination From Unstructured Data Using Nichesourcing and AI Data Augmentation Damián Ariel Furman et.al. 2406.19951 null
2024-06-27 Aligning Teacher with Student Preferences for Tailored Training Data Generation Yantao Liu et.al. 2406.19227 null
2024-06-27 STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis Wenbin Li et.al. 2406.19065 link
2024-06-27 Efficient course recommendations with T5-based ranking and summarization Thijmen Bijl et.al. 2406.19018 link
2024-06-27 Can we teach language models to gloss endangered languages? Michael Ginn et.al. 2406.18895 null
2024-06-27 SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models Vipul Rathore et.al. 2406.18880 link
2024-06-26 ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models Yuxuan Yin et.al. 2406.18770 null
2024-06-26 PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation Christoph Leiter et.al. 2406.18528 link
2024-06-26 Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming Zhenghao Zhou et.al. 2406.18501 null
2024-06-26 BADGE: BADminton report Generation and Evaluation with LLM Shang-Hsuan Chiang et.al. 2406.18116 link
2024-06-26 Octo-planner: On-device Language Model for Planner-Action Agents Wei Chen et.al. 2406.18082 null
2024-06-26 Automated Clinical Data Extraction with Knowledge Conditioned LLMs Diya Li et.al. 2406.18027 null
2024-06-25 LABOR-LLM: Language-Based Occupational Representations with Large Language Models Tianyu Du et.al. 2406.17972 null
2024-06-25 BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning Ercong Nie et.al. 2406.17764 null
2024-06-25 Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels Nicholas Pangakis et.al. 2406.17633 null
2024-06-25 Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft Chalamalasetti Kranti et.al. 2406.17553 null
2024-06-25 Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification Huiyao Chen et.al. 2406.17534 link
2024-06-25 Enhancing Tool Retrieval with Iterative Feedback from Large Language Models Qiancheng Xu et.al. 2406.17465 link
2024-06-25 A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs Vaibhav Singh et.al. 2406.17377 null
2024-06-25 Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement Yunlong Feng et.al. 2406.17233 link
2024-06-24 Finding Transformer Circuits with Edge Pruning Adithya Bhaskar et.al. 2406.16778 link
2024-06-24 Token-based Decision Criteria Are Suboptimal in In-context Learning Hakaze Cho et.al. 2406.16535 null
2024-06-24 DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task Wenhan Liu et.al. 2406.16332 link
2024-06-23 Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning Bowen Zheng et.al. 2406.16007 null
2024-06-22 Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts Louis Give et.al. 2406.15871 null
2024-06-21 Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem Sara Court et.al. 2406.15625 null
2024-06-21 Automated radiotherapy treatment planning guided by GPT-4Vision Sheng Liu et.al. 2406.15609 null
2024-06-21 Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning Brandon Huang et.al. 2406.15334 link
2024-06-21 ICLEval: Evaluating In-Context Learning Ability of Large Language Models Wentong Chen et.al. 2406.14955 link
2024-06-20 Learning to Retrieve Iteratively for In-Context Learning Yunmo Chen et.al. 2406.14739 null
2024-06-20 ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Gabriel Sarch et.al. 2406.14596 null
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary Xingmeng Zhao et.al. 2406.14500 null
2024-06-20 Data-Centric AI in the Age of Large Language Models Xinyi Xu et.al. 2406.14473 null
2024-06-20 SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots Weixing Wang et.al. 2406.14208 null
2024-06-20 Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning Xiaolei Wang et.al. 2406.14022 link
2024-06-23 Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations Arie Cattan et.al. 2406.13632 null
2024-06-19 InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising Zhepei Wei et.al. 2406.13629 link
2024-06-19 In-Context In-Context Learning with Transformer Neural Processes Matthew Ashman et.al. 2406.13493 null
2024-06-19 ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models Hwiyeol Jo et.al. 2406.13342 null
2024-06-19 In-Context Learning on a Budget: A Case Study in Named Entity Recognition Uri Berger et.al. 2406.13274 null
2024-06-18 Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? Zhe Yang et.al. 2406.12809 link
2024-06-18 In-Context Learning of Energy Functions Rylan Schaeffer et.al. 2406.12785 null
2024-06-18 Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs Ahmad Mohsin et.al. 2406.12513 null
2024-06-18 Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems Nasim Borazjanizadeh et.al. 2406.12172 null
2024-06-17 Soft Prompting for Unlearning in Large Language Models Karuna Bhaila et.al. 2406.12038 link
2024-06-17 Multi-Layer Ranking with Large Language Models for News Source Recommendation Wenjia Zhang et.al. 2406.11745 null
2024-06-17 Meta Reasoning for Large Language Models Peizhong Gao et.al. 2406.11698 null
2024-06-17 Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better! Mingyang Song et.al. 2406.11629 link
2024-06-17 How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment Heyan Huang et.al. 2406.11474 null
2024-06-17 A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences Leonardo Bertolazzi et.al. 2406.11341 link
2024-06-17 Fine-grained Controllable Text Generation through In-context Learning with Feedback Sarubi Thillainathan et.al. 2406.11338 null
2024-06-17 Hallucination Mitigation Prompts Long-term Video Understanding Yiwei Sun et.al. 2406.11333 null
2024-06-17 FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation Bangzheng Li et.al. 2406.11243 null
2024-06-17 Probing the Decision Boundaries of In-context Learning in Large Language Models Siyan Zhao et.al. 2406.11233 link
2024-06-17 In-Context Editing: Learning Knowledge from Self-Induced Distributions Siyuan Qi et.al. 2406.11194 link
2024-06-14 UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner Dongchao Yang et.al. 2406.10056 link
2024-06-14 GeoSEE: Regional Socio-Economic Estimation With a Large Language Model Sungwon Han et.al. 2406.09799 null
2024-06-13 Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI Mohammed-Khalil Ghali et.al. 2406.09621 null
2024-06-13 Automated Molecular Concept Generation and Labeling with Large Language Models Shichang Zhang et.al. 2406.09612 link
2024-06-13 Chain-of-Though (CoT) prompting strategies for medical error detection and correction Zhaolong Wu et.al. 2406.09103 null
2024-06-13 XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Alexander Nikulin et.al. 2406.08973 null
2024-06-13 mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Matthieu Futeral et.al. 2406.08707 null
2024-06-12 State Soup: In-Context Skill Learning, Retrieval and Mixing Maciej Pióro et.al. 2406.08423 null
2024-06-13 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Qingyun Li et.al. 2406.08418 link
2024-06-12 Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation Javad Pourmostafa Roshan Sharami et.al. 2406.07970 link
2024-06-12 DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning Yuxi Feng et.al. 2406.07913 null
2024-06-12 An Empirical Study of Mamba-based Language Models Roger Waleffe et.al. 2406.07887 link
2024-06-12 Are Large Language Models Good Statisticians? Yizhang Zhu et.al. 2406.07815 link
2024-06-11 Estimating the Hallucination Rate of Generative AI Andrew Jesson et.al. 2406.07457 null
2024-06-11 On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations Shiao Meng et.al. 2406.07444 link
2024-06-11 Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning Menglong Cui et.al. 2406.07081 null
2024-06-11 DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs Haishuo Fang et.al. 2406.07080 link
2024-06-11 CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only Junhee Cho et.al. 2406.06947 link
2024-06-11 Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems Mohammed Elhenawy et.al. 2406.06865 null
2024-06-10 Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing Enshuo Hsu et.al. 2406.06723 null
2024-06-10 In-Context Learning and Fine-Tuning GPT for Argument Mining Jérémie Cabessa et.al. 2406.06699 link
2024-06-10 Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue Simone Alghisi et.al. 2406.06399 link
2024-06-09 LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning Utsav Singh et.al. 2406.05881 null
2024-06-09 TR2MTL: LLM based framework for Metric Temporal Logic Formalization of Traffic Rules Kumar Manas et.al. 2406.05709 null
2024-06-08 ThatiAR: Subjectivity Detection in Arabic News Sentences Reem Suwaileh et.al. 2406.05559 null
2024-06-08 RAG-Enhanced Commit Message Generation Linghao Zhang et.al. 2406.05514 null
2024-06-07 TabPFGen -- Tabular Data Generation with TabPFN Junwei Ma et.al. 2406.05216 null
2024-06-07 Retrieval & Fine-Tuning for In-Context Tabular Models Valentin Thomas et.al. 2406.05207 null
2024-06-07 Scenarios and Approaches for Situated Natural Language Explanations Pengshuo Qiu et.al. 2406.05035 null
2024-06-07 BERTs are Generative In-Context Learners David Samuel et.al. 2406.04823 link
2024-06-07 Large Language Model-guided Document Selection Xiang Kong et.al. 2406.04638 null
2024-06-06 **llmNER: (Zero Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models** Fabián Villena et.al. 2406.04528
2024-06-06 Aligning Large Language Models with Self-generated Preference Data Dongyoung Kim et.al. 2406.04412 null
2024-06-06 VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation Prashanth Vijayaraghavan et.al. 2406.04379 null
2024-06-08 What Do Language Models Learn in Context? The Structured Task Hypothesis Jiaoda Li et.al. 2406.04216 link
2024-06-06 Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following Anshul Gupta et.al. 2406.03907 null
2024-06-06 Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective Xinhao Yao et.al. 2406.03768 link
2024-06-06 FastGAS: Fast Graph-based Annotation Selection for In-Context Learning Zihan Chen et.al. 2406.03730 null
2024-06-05 Log Parsing with Self-Generated In-Context Learning and Self-Correction Yifan Wu et.al. 2406.03376 null
2024-06-06 StatBot.Swiss: Bilingual Open Data Exploration in Natural Language Farhad Nooralahzadeh et.al. 2406.03170 null
2024-06-05 Improving In-Context Learning with Prediction Feedback for Sentiment Analysis Hongling Xu et.al. 2406.02911 link
2024-06-06 Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers Brian K Chen et.al. 2406.02847 null
2024-06-04 E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory Zhou Yang et.al. 2406.02642 null
2024-06-04 Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks Tianyu He et.al. 2406.02550 link
2024-06-04 Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Philip Anastassiou et.al. 2406.02430 link
2024-06-04 Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion Ruiqi Li et.al. 2406.02429 null
2024-06-04 Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Kun Zhou et.al. 2406.02009 null
2024-06-04 Eliciting the Priors of Large Language Models using Iterated In-Context Learning Jian-Qiao Zhu et.al. 2406.01860 null
2024-06-03 In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs Grzegorz Kaszuba et.al. 2406.01808 null
2024-06-03 Universal In-Context Approximation By Prompting Fully Recurrent Models Aleksandar Petrov et.al. 2406.01424 link
2024-06-03 Demonstration Augmentation for Zero-shot In-context Learning Yi Su et.al. 2406.01224 link
2024-06-03 Guiding ChatGPT to Generate Salient Domain Summaries Jun Gao et.al. 2406.01070 null
2024-06-03 Selectively Answering Visual Questions Julian Martin Eisenschlos et.al. 2406.00980 null
2024-05-31 In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought Sili Huang et.al. 2405.20692 link
2024-05-31 UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation Hanzhang Zhou et.al. 2405.20612 link
2024-05-31 The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes Alissa A. Valentine et.al. 2405.20582 null
2024-05-30 Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads Avelina Asada Hadji-Kyriacou et.al. 2405.20053 link
2024-05-30 From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems Jianliang He et.al. 2405.19883 null
2024-05-30 Is In-Context Learning Sufficient for Instruction Following in LLMs? Hao Zhao et.al. 2405.19874 link
2024-05-30 Why Larger Language Models Do In-context Learning Differently? Zhenmei Shi et.al. 2405.19592 null
2024-05-29 Does learning the right latent variables necessarily improve in-context learning? Sarthak Mittal et.al. 2405.19162 link
2024-05-28 A Theoretical Understanding of Self-Correction through In-context Alignment Yifei Wang et.al. 2405.18634 null
2024-05-28 Multi-modal Generation via Cross-Modal In-Context Learning Amandeep Kumar et.al. 2405.18304 link
2024-05-28 IM-Context: In-Context Learning for Imbalanced Regression Tasks Ismail Nejjar et.al. 2405.18202 link
2024-05-28 Knowledge Circuits in Pretrained Transformers Yunzhi Yao et.al. 2405.17969 link
2024-05-28 FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction Zhonghang Li et.al. 2405.17898 link
2024-05-28 Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents Andrew H. Lee et.al. 2405.17840 null
2024-05-28 EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? Boshen Xu et.al. 2405.17719 link
2024-05-27 RAGSys: Item-Cold-Start Recommender as RAG System Emile Contal et.al. 2405.17587 null
2024-05-27 On the Noise Robustness of In-Context Learning for Text Generation Hongfu Gao et.al. 2405.17264 link
2024-05-27 Transformer In-Context Learning for Categorical Data Aaron T. Wang et.al. 2405.17248 null
2024-05-29 Benchmarking General Purpose In-Context Learning Fan Wang et.al. 2405.17234 link
2024-05-27 Unifying Demonstration Selection and Compression for In-Context Learning Jun Gao et.al. 2405.17062 null
2024-05-27 SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself Jun Gao et.al. 2405.17052 null
2024-05-27 On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability Chenyu Zheng et.al. 2405.16845 link
2024-05-27 Automatic Domain Adaptation by Transformers in In-Context Learning Ryuichiro Hataya et.al. 2405.16819 null
2024-05-27 ARC: A Generalist Graph Anomaly Detector with In-Context Learning Yixin Liu et.al. 2405.16771 link
2024-05-25 Learning to Reason via Program Generation, Emulation, and Search Nathaniel Weir et.al. 2405.16337 link
2024-05-25 Mixture of In-Context Prompters for Tabular PFNs Derek Xu et.al. 2405.16156 null
2024-05-24 MLPs Learn In-Context William L. Tong et.al. 2405.15618 link
2024-05-24 Synergizing In-context Learning with Hints for End-to-end Task-oriented Dialog Systems Vishal Vivek Saley et.al. 2405.15585 link
2024-05-24 Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs Siyuan Guo et.al. 2405.15485 null
2024-05-24 Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation Ge Qu et.al. 2405.15307 link
2024-05-24 Towards Global Optimal Visual In-Context Learning Prompt Selection Chengming Xu et.al. 2405.15279 null
2024-05-24 Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor Haoxuan Qu et.al. 2405.15267 null
2024-05-24 Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification Shang Liu et.al. 2405.15115 null
2024-05-23 Linking In-context Learning in Transformers to Human Episodic Memory Li Ji-An et.al. 2405.14992 link
2024-05-23 In-context Time Series Predictor Jiecheng Lu et.al. 2405.14982 null
2024-05-23 Evaluating Large Language Models for Public Health Classification and Extraction Tasks Joshua Harris et.al. 2405.14766 null
2024-05-23 Implicit In-context Learning Zhuowei Li et.al. 2405.14660 link
2024-05-23 Emotion Identification for French in Written Texts: Considering their Modes of Expression as a Step Towards Text Complexity Analysis Aline Étienne et.al. 2405.14385 null
2024-05-23 Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition Chan-Jan Hsu et.al. 2405.14259 link
2024-05-22 Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning Jiuqi Wang et.al. 2405.13861 null
2024-05-22 Why In-Context Learning Transformers are Tabular Data Classifiers Felix den Breejen et.al. 2405.13396 link
2024-05-21 Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting Krishna Prasad Varadarajan Srinivasan et.al. 2405.13181 null
2024-05-21 Quantifying Emergence in Large Language Models Hang Chen et.al. 2405.12617 link
2024-05-20 Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning Guanglin Zhou et.al. 2405.12217 link
2024-05-20 Asymptotic theory of in-context learning by linear attention Yue M. Lu et.al. 2405.11751 link
2024-05-19 Effective In-Context Example Selection through Data Compression Zhongxiang Sun et.al. 2405.11465 null
2024-05-19 MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning Sanchit Sinha et.al. 2405.11446 null
2024-05-19 Large Language Models are Biased Reinforcement Learners William M. Hayes et.al. 2405.11422 link
2024-05-18 Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Yan Wang et.al. 2405.11196 link
2024-05-17 Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection Han Zhang et.al. 2405.11002 null
2024-05-17 Feature-Adaptive and Data-Scalable In-Context Learning Jiahao Li et.al. 2405.10738 link
2024-05-20 Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks Anwoy Chatterjee et.al. 2405.10548 link
2024-05-17 In-context Contrastive Learning for Event Causality Identification Chao Liang et.al. 2405.10512 link
2024-05-16 Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction Chinedu Ekuma et.al. 2405.10448 link
2024-05-16 Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model Zheng Gu et.al. 2405.10316 null
2024-05-16 Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction Jianhao Chen et.al. 2405.10288 link
2024-05-16 When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models Xianzheng Ma et.al. 2405.10255 link
2024-05-16 LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting Stijn Verdenius et.al. 2405.10093 link
2024-05-16 Many-Shot In-Context Learning in Multimodal Foundation Models Yixing Jiang et.al. 2405.09798 link
2024-05-14 Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach Syed Mhamudul Hasan et.al. 2405.08755 null
2024-05-14 PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles Satya Kesav Gundabathula et.al. 2405.08373 null
2024-05-14 Compositional Text-to-Image Generation with Dense Blob Representations Weili Nie et.al. 2405.08246 null
2024-05-13 AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models Shuo Liu et.al. 2405.07626 link
2024-05-13 COBias and Debias: Minimizing Language Model Pairwise Accuracy Bias via Nonlinear Integer Programming Ruixi Lin et.al. 2405.07623 null
2024-05-13 MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation Dongjun Lee et.al. 2405.07467 null
2024-05-10 An Empirical Study on the Effectiveness of Large Language Models for SATD Identification and Classification Mohammad Sadegh Sheikhaei et.al. 2405.06806 link
2024-05-10 Linearizing Large Language Models Jean Mercat et.al. 2405.06640 link
2024-05-13 Memory Mosaics Jianyu Zhang et.al. 2405.06394 link
2024-05-15 XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare Fatemeh Nazary et.al. 2405.06270 null
2024-05-08 XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples Peiqin Lin et.al. 2405.05116 link
2024-05-08 P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models Guochao Jiang et.al. 2405.04960 link
2024-05-08 AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models Yongheng Zhang et.al. 2405.04753 null
2024-05-07 ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning Jing Lin et.al. 2405.04533 null
2024-05-07 In-context Learning for Automated Driving Scenarios Ziqi Zhou et.al. 2405.04135 link
2024-05-08 Locally Differentially Private In-Context Learning Chunyan Zheng et.al. 2405.04032 null
2024-05-06 OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs Jiahao Nick Li et.al. 2405.03901 null
2024-05-06 Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning Yubo Mai et.al. 2405.03509 null
2024-05-06 OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization Weidong Wang et.al. 2405.03215 null
2024-05-04 CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions Hanchong Zhang et.al. 2405.02712 link
2024-05-04 Enhancing News Summarization with ELearnFit through Efficient In-Context Learning and Efficient Fine-Tuning Che Guan et.al. 2405.02710 null
2024-05-04 PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation Ye Liu et.al. 2405.02580 link
2024-05-03 Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning Hyeong Kyu Choi et.al. 2405.02501 link
2024-05-03 Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression Karthik Duraisamy et.al. 2405.02462 null
2024-05-03 FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems Yashar Deldjoo et.al. 2405.02219 null
2024-05-03 Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo Mahmoud Masoud et.al. 2405.01997 null
2024-05-03 Understanding LLMs Requires More Than Statistical Generalization Patrik Reizinger et.al. 2405.01964 link
2024-05-02 Question Suggestion for Conversational Shopping Assistants Using Product Metadata Nikhita Vedula et.al. 2405.01738 null
2024-05-02 DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection Yanjing Yang et.al. 2405.01202 link
2024-05-02 "In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval" Andrew Parry et.al. 2405.01116 null
2024-05-01 Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations Kirandeep Kaur et.al. 2405.00824 null
2024-04-30 Graphical Reasoning: LLM-based Semi-Open Relation Extraction Yicheng Tao et.al. 2405.00216 link
2024-04-30 In-Context Learning with Long-Context Models: An In-Depth Exploration Amanda Bertsch et.al. 2405.00200 null
2024-04-29 It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments Petter Mæhlum et.al. 2404.18832 null
2024-05-01 Capabilities of Gemini Models in Medicine Khaled Saab et.al. 2404.18416 null
2024-04-28 From Persona to Personalization: A Survey on Role-Playing Language Agents Jiangjie Chen et.al. 2404.18231 null
2024-05-01 Exploring the Robustness of In-Context Learning with Noisy Labels Chen Cheng et.al. 2404.18191 link
2024-04-30 ComposerX: Multi-Agent Symbolic Music Composition with LLMs Qixin Deng et.al. 2404.18081 link
2024-04-27 Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language Tsimur Hadeliya et.al. 2404.17832 null
2024-04-27 Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction Guozheng Li et.al. 2404.17809 null
2024-04-27 Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors Guozheng Li et.al. 2404.17807 null
2024-04-26 Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study Yang Wu et.al. 2404.17136 link
2024-04-25 Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models Eren Dogan et.al. 2404.17010 null
2024-04-25 Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning Tianhui Zhang et.al. 2404.16807 link
2024-04-25 In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization Herilalaina Rakotoarison et.al. 2404.16795 link
2024-04-25 What Makes Multimodal In-Context Learning Work? Folco Bertini Baldassini et.al. 2404.15736 link
2024-04-23 XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference João Monteiro et.al. 2404.15420 null
2024-04-21 Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following Suyeon Shin et.al. 2404.15190 null
2024-04-23 Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond Pengyu Xue et.al. 2404.14824 link
2024-04-23 Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities Siyin Wang et.al. 2404.14716 null
2024-04-23 FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction Hang Hua et.al. 2404.14715 null
2024-04-23 FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model Zezheng Song et.al. 2404.14688 link
2024-04-21 AnyPattern: Towards In-context Image Copy Detection Wenhao Wang et.al. 2404.13788 link
2024-04-21 "A good pun is its own reword": Can Large Language Models Understand Puns? Zhijun Xu et.al. 2404.13599 link
2024-04-19 Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs Biyang Guo et.al. 2404.13033 link
2024-04-19 Stronger Random Baselines for In-Context Learning Gregory Yauney et.al. 2404.13020 link
2024-04-19 Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction Qinyuan Wu et.al. 2404.12957 link
2024-04-19 How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning? Yang Luo et.al. 2404.12866 link
2024-04-19 Requirements Satisfiability with In-Context Learning Sarah Santos et.al. 2404.12576 link
2024-04-18 Point-In-Context: Understanding Point Cloud via In-Context Learning Mengyuan Liu et.al. 2404.12352 link
2024-04-18 Exploring the landscape of large language models: Foundations, techniques, and challenges Milad Moradi et.al. 2404.11973 null
2024-04-17 In-Context Learning State Vector with Inner and Momentum Optimization Dongfang Li et.al. 2404.11225 link
2024-04-17 Position Engineering: Boosting Large Language Models through Positional Information Manipulation Zhiyuan He et.al. 2404.11216 null
2024-04-17 Many-Shot In-Context Learning Rishabh Agarwal et.al. 2404.11018 null
2024-04-16 Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning Moghis Fereidouni et.al. 2404.10887 null
2024-04-16 Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning Xiao Wang et.al. 2404.10552 null
2024-04-15 Memory Sharing for Large Language Model based Agents Hang Gao et.al. 2404.09982 link
2024-04-15 Evolving Interpretable Visual Classifiers with Large Language Models Mia Chiquier et.al. 2404.09941 null
2024-04-15 In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation Han Xue et.al. 2404.09633 null
2024-04-15 Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning Sungwon Han et.al. 2404.09491 link
2024-04-14 GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning Amani Namboori et.al. 2404.09163 null
2024-04-13 Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model Zita Lifelo et.al. 2404.09045 null
2024-04-11 Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models Tanmay Gautam et.al. 2404.08080 null
2024-04-11 LLoCO: Learning Long Contexts Offline Sijun Tan et.al. 2404.07979 link
2024-04-11 Discourse-Aware In-Context Learning for Temporal Expression Normalization Akash Kumar Gautam et.al. 2404.07775 null
2024-04-11 Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning Quanyu Long et.al. 2404.07546 link
2024-04-10 Adaptive behavior with stable synapses Cristiano Capone et.al. 2404.07150 link
2024-04-10 What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation Aaditya K. Singh et.al. 2404.07129 link
2024-04-10 What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs Anna Wegmann et.al. 2404.06670 link
2024-04-09 Neuromorphic In-Context Learning for Energy-Efficient MIMO Symbol Detection Zihang Song et.al. 2404.06469 null
2024-04-11 Privacy Preserving Prompt Engineering: A Survey Kennedy Edemacu et.al. 2404.06001 null
2024-04-08 WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents Michael Lutz et.al. 2404.05902 null
2024-04-08 Enhancing Software Related Information Extraction with Generative Language Models through Single-Choice Question Answering Wolfgang Otto et.al. 2404.05587 null
2024-04-11 Cell-Free Multi-User MIMO Equalization via In-Context Learning Matteo Zecchin et.al. 2404.05538 link
2024-04-07 How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations? Ishani Mondal et.al. 2404.05088 null
2024-04-05 Exploring Autonomous Agents through the Lens of Large Language Models: A Review Saikat Barua et.al. 2404.04442 null
2024-04-05 Deciphering Political Entity Sentiment in News with Large Language Models: Zero-Shot and Few-Shot Strategies Alapan Kuila et.al. 2404.04361 link
2024-04-05 Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving Gulsum Yigit et.al. 2404.03938 null
2024-04-04 SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection Bradley P. Allen et.al. 2404.03732 link
2024-04-04 How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes Harmon Bhasin et.al. 2404.03558 link
2024-04-03 GPT-DETOX: An In-Context Learning-Based Paraphraser for Text Detoxification Ali Pesaranghader et.al. 2404.03052 null
2024-04-03 Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison Maxime Bouthors et.al. 2404.02835 null
2024-04-03 Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM Zhe Liu et.al. 2404.02706 null
2024-04-03 Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation Zhe Xu et.al. 2404.02505 link
2024-04-03 uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers? Pouya Sadeghi et.al. 2404.02474 link
2024-04-03 Task Agnostic Architecture for Algorithm Induction via Implicit Composition Sahil J. Sindhi et.al. 2404.02450 null
2024-04-03 Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data Parth Patwa et.al. 2404.02422 null
2024-04-02 Emergent Abilities in Reduced-Scale Generative Language Models Sherin Muckatira et.al. 2404.02204 link
2024-04-02 Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Maksym Andriushchenko et.al. 2404.02151 link
2024-04-02 Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models Wanyong Feng et.al. 2404.02124 link
2024-04-04 Long-context LLMs Struggle with Long In-context Learning Tianle Li et.al. 2404.02060 link
2024-04-02 Deconstructing In-Context Learning: Understanding Prompts via Corruption Namrata Shivagunde et.al. 2404.02054 link
2024-04-02 Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts Zhuo Chen et.al. 2404.02022 link
2024-04-02 Large Language Models for Orchestrating Bimanual Robots Kun Chu et.al. 2404.02018 link
2024-04-02 Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4 Dan Schumacher et.al. 2404.01961 link
2024-04-02 Self-Improvement Programming for Temporal Knowledge Graph Question Answering Zhuo Chen et.al. 2404.01720 null
2024-04-01 Structured Information Matters: Incorporating Abstract Meaning Representation into LLMs for Improved Open-Domain Dialogue Evaluation Bohao Yang et.al. 2404.01129 link
2024-04-01 Efficient Prompting Methods for Large Language Models: A Survey Kaiyan Chang et.al. 2404.01077 null
2024-03-29 Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science Yazheng Yang et.al. 2403.20208 null
2024-03-28 Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models Yucheng Shi et.al. 2403.19631 link
2024-03-28 Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation Chenming Tang et.al. 2403.19285 null
2024-03-28 Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction Chenming Tang et.al. 2403.19283 null
2024-03-28 Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation Yutong He et.al. 2403.19103 null
2024-03-26 Large Language Models Enhanced Collaborative Filtering Zhongxiang Sun et.al. 2403.17688 null
2024-03-26 Language Models for Text Classification: Is In-Context Learning Enough? Aleksandra Edwards et.al. 2403.17661 null
2024-03-26 Naive Bayes-based Context Extension for Large Language Models Jianlin Su et.al. 2403.17552 link
2024-03-26 ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler Paramita Mirza et.al. 2403.17536 link
2024-03-25 A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection Benjamin Steenhoek et.al. 2403.17218 null
2024-03-25 MetaAligner: Conditional Weak-to-Strong Correction for Generalizable Multi-Objective Alignment of Language Models Kailai Yang et.al. 2403.17141 link
2024-03-25 The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition Georgios Chochlakis et.al. 2403.17125 null
2024-03-25 SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging Lingdong Shen et.al. 2403.16578 null
2024-03-27 LLMs Are Few-Shot In-Context Low-Resource Language Learners Samuel Cahyawijaya et.al. 2403.16512 link
2024-03-25 LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification Liu Junhua et.al. 2403.16504 null
2024-03-24 SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder Mohammadreza Pourreza et.al. 2403.16204 null
2024-03-23 IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models Haz Sameen Shahgir et.al. 2403.15952 link
2024-03-21 Sequence-to-Sequence Language Models for Character and Emotion Detection in Dream Narratives Gustave Cortal et.al. 2403.15486 null
2024-03-22 ESG Classification by Implicit Rule Learning via GPT-4 Hyo Jeong Yun et.al. 2403.15040 null
2024-03-22 Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation Shanthi Karpurapu et.al. 2403.14965 link
2024-03-22 Stance Reasoner: Zero-Shot Stance Detection on Social Media with Explicit Reasoning Maksym Taranukhin et.al. 2403.14895 link
2024-03-21 Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning Changtong Zan et.al. 2403.14399 link
2024-03-21 PE-GPT: A Physics-Informed Interactive Large Language Model for Power Converter Modulation Design Fanfan Lin et.al. 2403.14059 null
2024-03-19 VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning Yongshuo Zong et.al. 2403.13164 link
2024-03-19 Towards Multimodal In-Context Learning for Vision & Language Models Sivan Doveh et.al. 2403.12736 null
2024-03-19 CrossTune: Black-Box Few-Shot Classification with Label Enhancement Danqing Luo et.al. 2403.12468 null
2024-03-19 An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis Yifan Peng et.al. 2403.12402 null
2024-03-18 Transfer Learning Beyond Bounded Density Ratios Alkis Kalavasis et.al. 2403.11963 null
2024-03-18 CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification Korbinian Randl et.al. 2403.11904 link
2024-03-18 Towards Understanding the Relationship between In-context Learning and Compositional Generalization Sungjun Han et.al. 2403.11834 null
2024-03-18 Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Vishnu Sashank Dorbala et.al. 2403.11487 null
2024-03-16 Interpretable Machine Learning for TabPFN David Rundel et.al. 2403.10923 link
2024-03-16 Zero-shot Generative Linguistic Steganography Ke Lin et.al. 2403.10856 link
2024-03-15 Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models Tian Meng et.al. 2403.10287 null
2024-03-15 Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning Shang-Hsuan Chiang et.al. 2403.10281 link
2024-03-15 The Whole is Better than the Sum: Using Aggregated Demonstrations in In-Context Learning for Sequential Recommendation Lei Wang et.al. 2403.10135 link
2024-03-14 MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Brandon McKinzie et.al. 2403.09611 null
2024-03-15 WavCraft: Audio Editing and Generation with Natural Language Prompts Jinhua Liang et.al. 2403.09527 link
2024-03-14 Rectifying Demonstration Shortcut in In-Context Learning Joonwon Jang et.al. 2403.09488 link
2024-03-14 Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity Zhuo Zhi et.al. 2403.09428 link
2024-03-14 Unveiling the Generalization Power of Fine-Tuned Large Language Models Haoran Yang et.al. 2403.09162 link
2024-03-14 Large Language Models are Parallel Multilingual Learners Yongyu Mu et.al. 2403.09073 link
2024-03-13 Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking Ming Dong et.al. 2403.08492 null
2024-03-12 BAGEL: Bootstrapping Agents by Guiding Exploration with Language Shikhar Murty et.al. 2403.08140 null
2024-03-12 In-context learning enables multimodal large language models to classify cancer pathology images Dyke Ferber et.al. 2403.07407 null
2024-03-13 Knowledge Graph Large Language Model (KG-LLM) for Link Prediction Dong Shu et.al. 2403.07311 null
2024-03-11 SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data Jialu Li et.al. 2403.06952 null
2024-03-12 MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning Yichuan Li et.al. 2403.06914 link
2024-03-11 In-context Exploration-Exploitation for Reinforcement Learning Zhenwen Dai et.al. 2403.06826 null
2024-03-11 'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification Manish Chandra et.al. 2403.06402 null
2024-03-10 FedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning Zhuo Zhang et.al. 2403.06131 null
2024-03-10 In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model Junhui Yin et.al. 2403.06126 null
2024-03-09 Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages Christopher Toukmaji et.al. 2403.06018 null
2024-03-08 A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries Asad Aali et.al. 2403.05720 link
2024-03-08 DP-TabICL: In-Context Learning with Differentially Private Tabular Data Alycia N. Carey et.al. 2403.05681 null
2024-03-08 InstructGIE: Towards Generalizable Image Editing Zichong Meng et.al. 2403.05018 null
2024-03-07 LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Boshi Wang et.al. 2403.04746 link
2024-03-08 How Far Are We from Intelligent Visual Deductive Reasoning? Yizhe Zhang et.al. 2403.04732 link
2024-03-07 Where does In-context Translation Happen in Large Language Models Suzanna Sia et.al. 2403.04510 null
2024-03-07 DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning Xingwei Qu et.al. 2403.04233 null
2024-03-07 On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models Xinpeng Wang et.al. 2403.04204 null
2024-03-06 German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset Laura Mascarell et.al. 2403.03750 link
2024-03-06 Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem Yuhong Sun et.al. 2403.03558 link
2024-03-06 Japanese-English Sentence Translation Exercises Dataset for Automatic Grading Naoki Miura et.al. 2403.03396 null
2024-03-05 How Well Can Transformers Emulate In-context Newton's Method? Angeliki Giannou et.al. 2403.03183 null
2024-03-05 MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting Fangchen Liu et.al. 2403.03174 null
2024-03-06 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation Bin Zhang et.al. 2403.02951 null
2024-03-05 Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment Congzhi Zhang et.al. 2403.02738 null
2024-03-04 Not all Layers of LLMs are Necessary during Inference Siqi Fan et.al. 2403.02181 null
2024-03-04 Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? Evgeniia Razumovskaia et.al. 2403.01929 null
2024-03-03 Transformers for Supervised Online Continual Learning Jorg Bornschein et.al. 2403.01554 null
2024-03-03 Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models Amal Rannen-Triki et.al. 2403.01518 null
2024-03-02 Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal Jianheng Huang et.al. 2403.01244 link
2024-03-02 Distilling Text Style Transfer With Self-Explanation From LLMs Chiyu Zhang et.al. 2403.01106 null
2024-03-02 FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis Songhua Yang et.al. 2403.01063 link
2024-03-01 DFIN-SQL: Integrating Focused Schema with DIN-SQL for Superior Accuracy in Large-Scale Databases Shai Volvovsky et.al. 2403.00872 null
2024-02-29 ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph Xukun Liu et.al. 2403.00839 null
2024-03-01 LLMs for Targeted Sentiment in News Headlines: Exploring Different Levels of Prompt Prescriptiveness Jana Juroš et.al. 2403.00418 null
2024-03-01 Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish Recep Firat Cekinel et.al. 2403.00411 link
2024-02-29 Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality Siyu Chen et.al. 2402.19442 null
2024-02-29 Teaching Large Language Models an Unseen Language on the Fly Chen Zhang et.al. 2402.19167 link
2024-02-29 Dual Operating Modes of In-Context Learning Ziqian Lin et.al. 2402.18819 link
2024-02-28 Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling Mahdi Karami et.al. 2402.18508 null
2024-02-28 Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification Garima Chhikara et.al. 2402.18502 null
2024-02-28 Large Language Models As Evolution Strategies Robert Tjarko Lange et.al. 2402.18381 null
2024-02-28 From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs Yulong Liu et.al. 2402.18157 null
2024-02-28 Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation Shicheng Xu et.al. 2402.18150 link
2024-02-28 All in a Single Image: Large Multimodal Models are In-Image Learners Lei Wang et.al. 2402.17971 link
2024-02-27 Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models Yunpeng Huang et.al. 2402.17671 null
2024-02-27 Reinforced In-Context Black-Box Optimization Lei Song et.al. 2402.17423 link
2024-02-27 Video as the New Language for Real-World Decision Making Sherry Yang et.al. 2402.17139 null
2024-02-25 DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers Xirui Li et.al. 2402.16914 link
2024-02-28 Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models Anchun Gui et.al. 2402.16696 null
2024-02-26 Long-Context Language Modeling with Parallel Context Encoding Howard Yen et.al. 2402.16617 link
2024-02-25 LLMs with Chain-of-Thought Are Non-Causal Reasoners Guangsheng Bao et.al. 2402.16048 link
2024-02-25 Likelihood-based Mitigation of Evaluation Bias in Large Language Models Masanari Ohi et.al. 2402.15987 link
2024-02-24 Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning Wuyang Chen et.al. 2402.15734 link
2024-02-23 Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models Yanzheng Xiang et.al. 2402.15637 link
2024-02-23 Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis Hongkang Li et.al. 2402.15607 null
2024-02-23 Evaluating the Performance of ChatGPT for Spam Email Detection Yuwei Wu et.al. 2402.15537 null
2024-02-23 Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models Guanming Xiong et.al. 2402.15131 link
2024-02-23 Studying LLM Performance on Closed- and Open-source Data Toufique Ahmed et.al. 2402.15100 null
2024-02-23 Fine-tuning Large Language Models for Domain-specific Machine Translation Jiawei Zheng et.al. 2402.15061 null
2024-02-22 In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization Ruiqi Zhang et.al. 2402.14951 null
2024-02-22 How Transformers Learn Causal Structure with Gradient Descent Eshaan Nichani et.al. 2402.14735 link
2024-02-23 Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis Takehiro Takayanagi et.al. 2402.14484 null
2024-02-22 On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe Ningyu Xu et.al. 2402.14404 link
2024-02-22 A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation Yuyue Zhou et.al. 2402.14300 link
2024-02-21 Analysing The Impact of Sequence Composition on Language Model Pre-Training Yu Zhao et.al. 2402.13991 link
2024-02-21 $\texttt{Se}^2$: $\textit{Se}$quential Example $\textit{Se}$ lection for In-Context Learning Haoyu Liu et.al. 2402.13874 link
2024-02-21 Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction Guozheng Li et.al. 2402.13741 null
2024-02-21 Unsupervised Text Style Transfer via LLMs and Attention Masking with Multi-way Interactions Lei Pan et.al. 2402.13647 null
2024-02-21 A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation Yunxin Li et.al. 2402.13587 link
2024-02-21 CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory Zexue He et.al. 2402.13449 null
2024-02-20 Harnessing Large Language Models as Post-hoc Correctors Zhiqiang Zhong et.al. 2402.13414 link
2024-02-20 Identifying Semantic Induction Heads to Understand In-Context Learning Jie Ren et.al. 2402.13055 null
2024-02-20 The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis Miaoran Zhang et.al. 2402.12976 link
2024-02-20 Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need? Branislav Pecher et.al. 2402.12819 null
2024-02-20 On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices Branislav Pecher et.al. 2402.12817 link
2024-02-19 Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation Joseph Marvin Imperial et.al. 2402.12593 link
2024-02-19 Parallel Structures in Pre-training Data Yield In-Context Learning Yanda Chen et.al. 2402.12530 null
2024-02-19 Task-Oriented Dialogue with In-Context Learning Tom Bocklisch et.al. 2402.12234 link
2024-02-19 Do Large Language Models Understand Logic or Just Mimick Context? Junbing Yan et.al. 2402.12091 null
2024-02-19 Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations Milan Bhan et.al. 2402.12038 null
2024-02-19 Modularized Networks for Few-shot Hateful Meme Detection Rui Cao et.al. 2402.11845 link
2024-02-19 In-Context Learning Demonstration Selection via Influence Analysis Vinay M. S. et.al. 2402.11750 null
2024-02-18 GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network Shuzhou Yuan et.al. 2402.11709 link
2024-02-18 In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness Liam Collins et.al. 2402.11639 null
2024-02-18 Visual In-Context Learning for Large Vision-Language Models Yucheng Zhou et.al. 2402.11574 null
2024-02-18 Learning to Learn Faster from Human Feedback with Language Model Predictive Control Jacky Liang et.al. 2402.11450 null
2024-02-18 In-Context Example Ordering Guided by Label Distributions Zhichao Xu et.al. 2402.11447 null
2024-02-16 RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model Jianhao Yuan et.al. 2402.10828 null
2024-02-16 Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning Yinpeng Liu et.al. 2402.10738 link
2024-02-16 Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm Yuanzhen Xie et.al. 2402.10671 link
2024-02-16 Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL Dingzirui Wang et.al. 2402.10663 link
2024-02-16 Linear Transformers with Learnable Kernel Functions are Better In-Context Models Yaroslav Aksenov et.al. 2402.10644 link
2024-02-16 LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty Zhen Zhang et.al. 2402.10573 link
2024-02-16 Understanding In-Context Learning with a Pelican Soup Framework Ting-Rui Chiang et.al. 2402.10424 null
2024-02-16 Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting Jiaheng Wei et.al. 2402.10412 null
2024-02-15 Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models Kang He et.al. 2402.10353 null
2024-02-15 Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models Chen Ling et.al. 2402.10189 link
2024-02-15 Self-Augmented In-Context Learning for Unsupervised Word Translation Yaoyiran Li et.al. 2402.10024 link
2024-02-15 Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation Jiashu Pu et.al. 2402.09954 null
2024-02-15 Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language Models Chenyang Shao et.al. 2402.09836 null
2024-02-15 QuRating: Selecting High-Quality Data for Training Language Models Alexander Wettig et.al. 2402.09739 link
2024-02-14 Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems Liang Zhang et.al. 2402.09584 null
2024-02-14 HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation Yihao Fang et.al. 2402.09390 link
2024-02-14 ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization Feifan Song et.al. 2402.09320 link
2024-02-14 GrounDial: Human-norm Grounded Safe Dialog Response Generation Siwon Kim et.al. 2402.08968 null
2024-02-13 Human Curriculum Effects Emerge with In-Context Learning in Neural Networks Jacob Russin et.al. 2402.08674 null
2024-02-12 Text-centric Alignment for Multi-Modality Learning Yun-Da Tsai et.al. 2402.08086 null
2024-02-12 Universal link predictor by In-context Learning Kaiwen Dong et.al. 2402.07738 null
2024-02-12 Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping Haoyu Wang et.al. 2402.07610 null
2024-02-12 VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization Dongsheng Zhu et.al. 2402.07398 link
2024-02-12 Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples Qingkai Zeng et.al. 2402.07386 link
2024-02-12 Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning Gabriel Simmons et.al. 2402.07368 null
2024-02-10 In-Context Data Distillation with TabPFN Junwei Ma et.al. 2402.06971 null
2024-02-09 NICE: To Optimize In-Context Examples or Not? Pragya Srivastava et.al. 2402.06733 null
2024-02-09 Entropy-Regularized Token-Level Policy Optimization for Large Language Models Muning Wen et.al. 2402.06700 link
2024-02-09 On the Out-Of-Distribution Generalization of Multimodal Large Language Models Xingxuan Zhang et.al. 2402.06599 null
2024-02-09 InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Huaiyuan Ying et.al. 2402.06332 link
2024-02-08 In-Context Learning Can Re-learn Forbidden Tasks Sophie Xhonneux et.al. 2402.05723 null
2024-02-08 NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning Yufeng Zhao et.al. 2402.05515 link
2024-02-09 In-Context Principle Learning from Mistakes Tianjun Zhang et.al. 2402.05403 null
2024-02-07 InCoRo: In-Context Learning for Robotics Control with Feedback Loops Jiaqiang Ye Zhu et.al. 2402.05188 null
2024-02-07 L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Hyesung Jeon et.al. 2402.04902 null
2024-02-06 Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Jongho Park et.al. 2402.04248 link
2024-02-06 In-context learning agents are asymmetric belief updaters Johannes A. Schubert et.al. 2402.03969 null
2024-02-06 Rethinking Skill Extraction in the Job Market Domain using Large Language Models Khanh Cao Nguyen et.al. 2402.03832 link
2024-02-05 Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations Álvaro Martín-Cortinas et.al. 2402.03407 null
2024-02-05 The Matrix: A Bayesian learning model for LLMs Siddhartha Dalal et.al. 2402.03175 null
2024-02-05 Multi: Multimodal Understanding Leaderboard with Text and Images Zichen Zhu et.al. 2402.03173 null
2024-02-05 Is Mamba Capable of In-Context Learning? Riccardo Grazzi et.al. 2402.03170 link
2024-02-05 Automatic Combination of Sample Selection Strategies for Few-Shot Learning Branislav Pecher et.al. 2402.03038 null
2024-02-05 How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning Zeping Yu et.al. 2402.02872 link
2024-02-04 Are Large Language Models Table-based Fact-Checkers? Hangwen Zhang et.al. 2402.02549 link
2024-02-04 KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion Yanbin Wei et.al. 2402.02389 link
2024-02-04 Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning Tong Niu et.al. 2402.02388 null
2024-02-04 AutoTimes: Autoregressive Time Series Forecasters via Large Language Models Yong Liu et.al. 2402.02370 link
2024-02-04 The Developmental Landscape of In-Context Learning Jesse Hoogland et.al. 2402.02364 null
2024-02-02 Can MLLMs Perform Text-to-Image In-Context Learning? Yuchen Zeng et.al. 2402.01293 link
2024-02-02 Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape Juno Kim et.al. 2402.01258 null
2024-02-02 In-Context Learning for Few-Shot Nested Named Entity Recognition Meishan Zhang et.al. 2402.01182 null
2024-02-02 CABINET: Content Relevance based Noise Reduction for Table Question Answering Sohan Patnaik et.al. 2402.01155 link
2024-02-01 Can Large Language Models Understand Context? Yilun Zhu et.al. 2402.00858 null
2024-02-01 Unlearnable Algorithms for In-context Learning Andrei Muresanu et.al. 2402.00751 null
2024-02-01 Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement Xin Quan et.al. 2402.00745 link
2024-02-01 Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data Yue Xing et.al. 2402.00743 null
2024-02-01 Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning Jitao Sang et.al. 2402.00667 link
2024-01-31 Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction Jialiang Wu et.al. 2401.17716 null
2024-01-31 Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning Yuelyu Ji et.al. 2401.17602 link
2024-01-30 Superiority of Multi-Head Attention in In-Context Linear Regression Yingqian Cui et.al. 2401.17426 null
2024-01-30 Customizing Language Model Responses with Contrastive In-Context Learning Xiang Gao et.al. 2401.17390 null
2024-01-29 ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks Bolei Ma et.al. 2401.16589 link
2024-01-29 APIGen: Generative API Method Recommendation Yujia Chen et.al. 2401.15843 link
2024-01-28 An Information-Theoretic Analysis of In-Context Learning Hong Jun Jeon et.al. 2401.15530 null
2024-01-26 Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning Tao He et.al. 2401.14626 null
2024-01-25 Language Modelling Approaches to Adaptive Machine Translation Yasmin Moslem et.al. 2401.14559 null
2024-01-25 K-QA: A Real-World Medical Q&A Benchmark Itay Manes et.al. 2401.14493 link
2024-01-24 Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 Xuchao Zhang et.al. 2401.13810 null
2024-01-24 Tyche: Stochastic In-Context Learning for Medical Image Segmentation Marianne Rakic et.al. 2401.13650 link
2024-01-24 MaLA-500: Massive Language Adaptation of Large Language Models Peiqin Lin et.al. 2401.13303 null
2024-01-30 In-Context Language Learning: Architectures and Algorithms Ekin Akyürek et.al. 2401.12973 link
2024-01-22 Enhancing In-context Learning via Linear Probe Calibration Momin Abbas et.al. 2401.12406 link
2024-01-22 In-Context Learning for Extreme Multi-Label Classification Karel D'Oosterlinck et.al. 2401.12178 link
2024-01-22 An Empirical Analysis of In-context Learning Abilities of LLMs for MT Pranjal A. Chitale et.al. 2401.12097 link
2024-01-22 Revisiting Demonstration Selection Strategies in In-Context Learning Keqin Peng et.al. 2401.12087 link
2024-01-23 In-context Learning with Retrieved Demonstrations for Language Models: A Survey Man Luo et.al. 2401.11624 null
2024-01-20 Analyzing Task-Encoding Tokens in Large Language Models Yu Bai et.al. 2401.11323 null
2024-01-18 Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation Zdeněk Kasner et.al. 2401.10186 null
2024-01-18 Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning Yong Zhang et.al. 2401.09783 null
2024-01-16 HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance Huanjun Kong et.al. 2401.08772 link
2024-01-16 The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing Masahiro Kaneko et.al. 2401.08511 null
2024-01-16 Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian Directions Nooshin Pourkamali et.al. 2401.08429 null
2024-01-14 A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models Namjoon Suh et.al. 2401.07187 null
2024-01-13 Fast and Accurate Zero-Training Classification for Tabular Engineering Data Cyril Picard et.al. 2401.06948 null
2024-01-12 Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements Anton Voronov et.al. 2401.06766 link
2024-01-12 The Unreasonable Effectiveness of Easy Training Data for Hard Tasks Peter Hase et.al. 2401.06751 link
2024-01-12 Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning Kaiyi Zhang et.al. 2401.06469 link
2024-01-12 Misconfidence-based Demonstration Selection for LLM In-Context Learning Shangqing Xu et.al. 2401.06301 null
2024-01-12 Universal Vulnerabilities in Large Language Models: In-context Learning Backdoor Attacks Shuai Zhao et.al. 2401.05949 link
2024-01-11 Probing Structured Semantics Understanding and Generation of Language Models via Question Answering Jinxin Liu et.al. 2401.05777 null
2024-01-16 POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation Shilong Pan et.al. 2401.05596 null
2024-01-10 Leveraging Print Debugging to Improve Code Generation in Large Language Models Xueyu Hu et.al. 2401.05319 null
2024-01-09 SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning Hector A. Gonzalez et.al. 2401.04491 null
2024-01-09 Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Zilong Wang et.al. 2401.04398 null
2024-01-04 MobileAgent: enhancing mobile control via human-machine interaction and SOP integration Tinghe Ding et.al. 2401.04124 link
2024-01-08 Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection Georgios Fatouros et.al. 2401.03737 null
2024-01-10 Grimoire is All You Need for Enhancing Large Language Models Ding Chen et.al. 2401.03385 link
2024-01-05 Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks Kevin Everson et.al. 2401.02921 null
2024-01-05 Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task Gabriel Lino Garcia et.al. 2401.02909 null
2024-01-04 DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models Songbo Hu et.al. 2401.02208 link
2024-01-01 A Computational Framework for Behavioral Assessment of LLM Therapists Yu Ying Chiu et.al. 2401.00820 link
2024-01-01 The Earth is Flat? Unveiling Factual Errors in Large Language Models Wenxuan Wang et.al. 2401.00761 null
2024-01-01 A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models Yuxuan Wan et.al. 2401.00757 link
2023-12-29 Overview of the PromptCBLUE Shared Task in CHIP2023 Wei Zhu et.al. 2312.17522 link
2023-12-28 Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos Houlun Chen et.al. 2312.17117 null
2023-12-28 Improving In-context Learning via Bidirectional Alignment Chengwei Qin et.al. 2312.17055 null
2023-12-27 How Robust are LLMs to In-Context Majority Label Bias? Karan Gupta et.al. 2312.16549 null
2023-12-26 Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation Zhu Sun et.al. 2312.16262 null
2023-12-26 RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation Sichun Luo et.al. 2312.16018 link
2023-12-26 Supervised Knowledge Makes Large Language Models Better In-context Learners Linyi Yang et.al. 2312.15918 link
2023-12-25 EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data Shirong Ma et.al. 2312.15696 null
2023-12-22 On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning Chengzu Li et.al. 2312.13772 link
2023-12-19 RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios Wenhao Ding et.al. 2312.13303 null
2023-12-20 Generative Multimodal Models are In-Context Learners Quan Sun et.al. 2312.13286 link
2023-12-20 Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest Emily Groves et.al. 2312.12989 null
2023-12-20 Fine-tuning Large Language Models for Adaptive Machine Translation Yasmin Moslem et.al. 2312.12740 link
2023-12-21 Can Transformers Learn Sequential Function Classes In Context? Ryan Campbell et.al. 2312.12655 link
2023-12-19 Emergence of In-Context Reinforcement Learning from Noise Distillation Ilya Zisman et.al. 2312.12275 link
2023-12-18 DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation Yu Wang et.al. 2312.11336 null
2023-12-19 Split and Rephrase with Large Language Models David Ponce et.al. 2312.11075 null
2023-12-18 APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation Chengran Yang et.al. 2312.10934 null

(back to top)

VLM

Publish Date Title Authors PDF Code
2025-01-23 Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models Linh Tran et.al. 2501.13904 null
2025-01-23 Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning Shiyu Zhang et.al. 2501.13859 null
2025-01-23 Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes Shiling Deng et.al. 2501.13851 link
2025-01-23 Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models Chaolei Han et.al. 2501.13795 null
2025-01-23 Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak Erjia Xiao et.al. 2501.13772 null
2025-01-23 EventVL: Understand Event Streams via Multimodal Large Language Model Pengteng Li et.al. 2501.13707 null
2025-01-23 Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task Mohit Vaishnav et.al. 2501.13620 null
2025-01-23 Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving Lu Wang et.al. 2501.13563 null
2025-01-23 Text-driven Online Action Detection Manuel Benavent-Lledo et.al. 2501.13518 link
2025-01-23 Iterative Shaping of Multi-Particle Aggregates based on Action Trees and VLM Hoi-Yin Lee et.al. 2501.13507 null
2025-01-22 Patent Figure Classification using Large Vision-language Models Sushil Awale et.al. 2501.12751 link
2025-01-22 TeD-Loc: Text Distillation for Weakly Supervised Object Localization Shakeeb Murtaza et.al. 2501.12632 link
2025-01-22 ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality Yanming Xiu et.al. 2501.12553 link
2025-01-21 Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models Tabinda Aman et.al. 2501.12433 null
2025-01-20 ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models Jingwei Yi et.al. 2501.12418 link
2025-01-21 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Yuhang Zang et.al. 2501.12368 link
2025-01-21 Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2 Md. Rakibul Islam et.al. 2501.12356 null
2025-01-21 CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification Cristiano Patrício et.al. 2501.12266 null
2025-01-21 Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model Kazi Hasan Ibn Arif et.al. 2501.12206 null
2025-01-20 Human-AI Collaborative Game Testing with Vision Language Models Boran Zhang et.al. 2501.11782 null
2025-01-20 SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models Shu Zou et.al. 2501.11485 link
2025-01-20 Verifying Cross-modal Entity Consistency in News using Vision-language Models Sahar Tahmasebi et.al. 2501.11403 null
2025-01-20 KPL: Training-Free Medical Knowledge Mining of Vision-Language Models Jiaxiang Liu et.al. 2501.11231 link
2025-01-19 ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models Yassir Bendou et.al. 2501.11175 null
2025-01-19 Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding Zhanpeng Chen et.al. 2501.10967 link
2025-01-17 HiMix: Reducing Computational Complexity in Large Vision-Language Models Xuange Zhang et.al. 2501.10318 null
2025-01-17 SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning Yuecheng Liu et.al. 2501.10074 null
2025-01-17 CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment Yating Liu et.al. 2501.10071 null
2025-01-17 MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paul Röttger et.al. 2501.10057 link
2025-01-17 Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions Zhijie Tan et.al. 2501.10011 null
2025-01-17 Explainable artificial intelligence (XAI): from inherent explainability to large language models Fuseini Mumuni et.al. 2501.09967 null
2025-01-16 Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Zhihe Yang et.al. 2501.09695 link
2025-01-16 Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark Alexis Roger et.al. 2501.09672 null
2025-01-16 Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness Zeyu Wang et.al. 2501.09446 null
2025-01-16 Vision-Language Models Do Not Understand Negation Kumail Alhamoud et.al. 2501.09425 null
2025-01-16 YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks Saptarashmi Bandyopadhyay et.al. 2501.09355 null
2025-01-16 RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects Zhen Luo et.al. 2501.09307 null
2025-01-16 Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning Harrison Fuller et.al. 2501.09294 null
2025-01-16 Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites Abdalwhab Abdalwhab et.al. 2501.09267 null
2025-01-16 Exploring the Capabilities of Vision-Language Models to Detect Visual Bugs in HTML5 Applications Finlay Macklon et.al. 2501.09236 null
2025-01-15 Embodied Scene Understanding for Vision Language Models via MetaVQA Weizhen Wang et.al. 2501.09167 null
2025-01-15 CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation Qi Ma et.al. 2501.08982 null
2025-01-15 Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning Julian Perry et.al. 2501.08597 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Liping Yuan et.al. 2501.07888 null
2025-01-14 Visual Language Models as Operator Agents in the Space Domain Alejandro Carrasco et.al. 2501.07802 null
2025-01-14 BMIP: Bi-directional Modality Interaction Prompt Learning for VLM Song-Lin Lv et.al. 2501.07769 null
2025-01-13 SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing Varun Biyyala et.al. 2501.07554 link
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525 link
2025-01-13 Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models Yasiru Ranasinghe et.al. 2501.07396 null
2025-01-14 GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction Oleg Kobzarev et.al. 2501.07295 null
2025-01-13 Can Vision-Language Models Evaluate Handwritten Math? Oikantik Nath et.al. 2501.07244 null
2025-01-13 TimeLogic: A Temporal Logic Benchmark for Video QA Sirnam Swetha et.al. 2501.07214 null
2025-01-13 BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Alejandro Lozano et.al. 2501.07171 link
2025-01-13 Duplex: Dual Prototype Learning for Compositional Zero-Shot Learning Zhong Peng et.al. 2501.07114 null
2025-01-12 MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis Sadia Kamal et.al. 2501.06887 null
2025-01-12 Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving Haoxiang Gao et.al. 2501.06680 null
2025-01-10 VideoAuteur: Towards Long Narrative Video Generation Junfei Xiao et.al. 2501.06173 null
2025-01-10 CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems Haichao Liu et.al. 2501.06132 link
2025-01-10 Generate, Transduct, Adapt: Iterative Transduction with VLMs Oindrila Saha et.al. 2501.06031 null
2025-01-10 Scalable Vision Language Model Training via High Quality Data Curation Hongyuan Dong et.al. 2501.05952 null
2025-01-10 Valley2: Exploring Multimodal Models with Scalable Vision-Language Design Ziheng Wu et.al. 2501.05901 link
2025-01-10 Super-class guided Transformer for Zero-Shot Attribute Classification Sehyung Kim et.al. 2501.05728 link
2025-01-10 From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities Dominick Reilly et.al. 2501.05711 null
2025-01-09 Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding Mohammed Elhenawy et.al. 2501.05566 null
2025-01-09 Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation Darius Petermann et.al. 2501.05413 null
2025-01-09 Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection Pei-Kang Lee et.al. 2501.05228 null
2025-01-09 Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Gregor Geigle et.al. 2501.05122 null
2025-01-09 DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving Xuran Zheng et.al. 2501.05081 null
2025-01-09 ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark Ronghao Dang et.al. 2501.05031 null
2025-01-09 Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments Yifan Xu et.al. 2501.04947 null
2025-01-08 Re-ranking the Context for Multimodal Retrieval Augmented Generation Matin Mortaheb et.al. 2501.04695 null
2025-01-08 Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations Archita Srivastava et.al. 2501.04675 null
2025-01-08 DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests Charles Corbière et.al. 2501.04671 null
2025-01-08 A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI Kazusato Oko et.al. 2501.04641 link
2025-01-08 Supervision-free Vision-Language Alignment Giorgio Giannone et.al. 2501.04568 null
2025-01-08 Online Gaussian Test-Time Adaptation of Vision-Language Models Clément Fuchs et.al. 2501.04352 link
2025-01-08 Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs Zeyi Huang et.al. 2501.04336 null
2025-01-08 Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts Miao Rang et.al. 2501.04322 null
2025-01-08 Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation Senwei Xie et.al. 2501.04268 null
2025-01-07 MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation Siddharth Joshi et.al. 2501.04155 link
2025-01-07 Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Shaoyuan Xie et.al. 2501.04003 link
2025-01-07 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Haobo Yuan et.al. 2501.04001 link
2025-01-07 RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance Matin Mortaheb et.al. 2501.03995 null
2025-01-07 VLM-driven Behavior Tree for Context-aware Task Planning Naoki Wake et.al. 2501.03968 link
2025-01-07 Vision Language Models as Values Detectors Giulio Antonio Abbo et.al. 2501.03957 null
2025-01-07 OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Mingjie Pan et.al. 2501.03841 null
2025-01-07 KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration Chengyuan Li et.al. 2501.03786 null
2025-01-07 Realistic Test-Time Adaptation of Vision-Language Models Maxime Zanella et.al. 2501.03729 link
2025-01-07 Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein Xiaotong Guo et.al. 2501.03722 null
2025-01-07 SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning Andrew Li et.al. 2501.03675 null
2025-01-06 Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Yuhui Zhang et.al. 2501.03225 link
2025-01-06 Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches Alhassan Mumuni et.al. 2501.03151 null
2025-01-06 MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Wenyi Hong et.al. 2501.02955 null
2025-01-06 Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology Susu Sun et.al. 2501.02922 null
2025-01-06 Large Language Models for Video Surveillance Applications Ulindu De Silva et.al. 2501.02850 null
2025-01-05 Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Simon Park et.al. 2501.02669 link
2025-01-05 Efficient Architectures for High Resolution Vision-Language Models Miguel Carvalho et.al. 2501.02584 link
2025-01-05 FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models Hui Lin et.al. 2501.02461 null
2025-01-04 Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Kangyu Zhu et.al. 2501.02385 null
2025-01-04 Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4 Messi H. J. Lee et.al. 2501.02211 null
2025-01-03 Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding Jiaming Li et.al. 2501.01926 link
2025-01-03 MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Pu Yang et.al. 2501.01834 null
2025-01-03 LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction Er Jin et.al. 2501.01767 null
2025-01-03 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-03 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi et.al. 2501.01428 null
2025-01-02 Training Medical Large Vision-Language Models with Abnormal-Aware Feedback Yucheng Zhou et.al. 2501.01377 null
2025-01-02 CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Ben Vardi et.al. 2501.01371 null
2025-01-02 Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability Dong Shu et.al. 2501.01346 null
2025-01-02 CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries Shudong Liu et.al. 2501.01282 null
2025-01-03 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Wenqi Zhang et.al. 2501.00958 link
2025-01-01 Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models Emily Johnson et.al. 2501.00917 null
2025-01-01 FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation Bingyu Li et.al. 2501.00877 link
2025-01-01 IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models Yiming Zhang et.al. 2501.00848 null
2024-12-31 ICONS: Influence Consensus for Vision-Language Data Selection Xindi Wu et.al. 2501.00654 null
2024-12-30 Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model Yifei Huang et.al. 2412.21080 link
2024-12-30 UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI Fangwei Zhong et.al. 2412.20977 null
2024-12-30 Low-Light Image Enhancement via Generative Perceptual Priors Han Zhou et.al. 2412.20916 null
2024-12-30 WalkVLM:Aid Visually Impaired People Walking by Vision Language Model Zhiqiang Yuan et.al. 2412.20903 null
2024-12-30 Towards Compatible Fine-tuning for Vision-Language Model Updates Zhengbo Wang et.al. 2412.20895 null
2024-12-30 ReStory: VLM-augmentation of Social Human-Robot Interaction Datasets Fanjun Bu et.al. 2412.20826 null
2024-12-30 Are Vision-Language Models Truly Understanding Multi-vision Sensor? Sangyun Chung et.al. 2412.20750 link
2024-12-30 UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models Yujie Li et.al. 2412.20742 link
2024-12-30 M $^3$ oralBench: A MultiModal Moral Benchmark for LVLMs Bei Yan et.al. 2412.20718 link
2024-12-30 ChartAdapter: Large Vision-Language Model for Chart Summarization Peixin Xu et.al. 2412.20715 null
2024-12-27 MVTamperBench: Evaluating Robustness of Vision-Language Models Amit Agarwal et.al. 2412.19794 null
2024-12-27 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Qiushi Sun et.al. 2412.19723 null
2024-12-27 Is Your Text-to-Image Model Robust to Caption Noise? Weichen Yu et.al. 2412.19531 null
2024-12-27 MBQ: Modality-Balanced Quantization for Large Vision-Language Models Shiyao Li et.al. 2412.19509 link
2024-12-27 Multi-P $^2$ A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models Jie Zhang et.al. 2412.19496 link
2024-12-27 Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation Chengyang Ye et.al. 2412.19492 link
2024-12-26 CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models Kiet A. Nguyen et.al. 2412.19331 null
2024-12-26 Sketch-MoMa: Teleoperation for Mobile Manipulator via Interpretation of Hand-Drawn Sketches Kosei Tanada et.al. 2412.19153 null
2024-12-26 MoPD: Mixture-of-Prompts Distillation for Vision-Language Models Yang Chen et.al. 2412.19087 null
2024-12-26 Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation Tao Liu et.al. 2412.19021 null
2024-12-24 Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models Tahira Kazimi et.al. 2412.18604 null
2024-12-24 The Key of Understanding Vision Tasks: Explanatory Instructions Yang Shen et.al. 2412.18525 link
2024-12-24 LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating Chao Deng et.al. 2412.18424 link
2024-12-24 Weak Scaling Capability in Token Space: An Observation from Large Vision Language Model Tenghui Li et.al. 2412.18387 link
2024-12-24 Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model Yushu Li et.al. 2412.18303 null
2024-12-24 Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight Xi Ding et.al. 2412.18298 link
2024-12-24 Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration Zhixuan Shen et.al. 2412.18292 link
2024-12-24 EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation Shuhao Han et.al. 2412.18150 link
2024-12-24 MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Wan-Cyuan Fan et.al. 2412.18072 null
2024-12-23 ChatGarment: Garment Estimation, Generation and Editing via Large Language Models Siyuan Bian et.al. 2412.17811 null
2024-12-23 Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Yitong Chen et.al. 2412.17800 link
2024-12-23 Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective Xinmiao Yu et.al. 2412.17787 null
2024-12-23 Reasoning to Attend: Try to Understand How Token Works Rui Qian et.al. 2412.17741 link
2024-12-23 Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection Fenfang Tao et.al. 2412.17619 link
2024-12-23 Personalized Large Vision-Language Models Chau Pham et.al. 2412.17610 null
2024-12-23 Retention Score: Quantifying Jailbreak Risks for Vision Language Models Zaitang Li et.al. 2412.17544 null
2024-12-23 On the Feasibility of Vision-Language Models for Time-Series Classification Vinay Prithyani et.al. 2412.17304 link
2024-12-23 GCS-M3VLT: Guided Context Self-Attention based Multi-modal Medical Vision Language Transformer for Retinal Image Captioning Teja Krishna Cherukuri et.al. 2412.17251 null
2024-12-22 ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues Shaina Raza et.al. 2412.17052 link
2024-12-20 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding Chenxin Tao et.al. 2412.16158 null
2024-12-20 Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training Mingliang Liang et.al. 2412.16148 link
2024-12-20 Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring Ahmet Bahaddin Ersoz et.al. 2412.16108 null
2024-12-20 VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models Dexter Neo et.al. 2412.15739 null
2024-12-20 Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Zhi Gao et.al. 2412.15606 null
2024-12-20 VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving Zilin Huang et.al. 2412.15544 null
2024-12-20 PolySmart @ TRECVid 2024 Video-To-Text Jiaxin Wu et.al. 2412.15509 null
2024-12-19 TalkWithMachines: Enhancing Human-Robot Interaction for Interpretable Industrial Robotics Through Large/Vision Language Models Ammar N. Abbas et.al. 2412.15462 null
2024-12-19 PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation Muntasir Wahed et.al. 2412.15209 null
2024-12-19 AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Shuo Xing et.al. 2412.15206 link
2024-12-19 EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues Sagar Soni et.al. 2412.15190 null
2024-12-19 LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Weijia Shi et.al. 2412.15188 null
2024-12-19 A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space Yonghao He et.al. 2412.14680 link
2024-12-19 FiVL: A Framework for Improved Vision-Language Alignment Estelle Aflalo et.al. 2412.14672 null
2024-12-19 HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model Masanari Ohi et.al. 2412.14613 null
2024-12-19 Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation Jihao Gu et.al. 2412.14487 null
2024-12-19 GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering Saumya Saxena et.al. 2412.14480 null
2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Junjie Zhou et.al. 2412.14475 null
2024-12-18 Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation Jianyu Zhang et.al. 2412.14145 null
2024-12-18 Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models Ido Cohen et.al. 2412.14133 link
2024-12-18 Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models Xinghang Li et.al. 2412.14058 null
2024-12-18 Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence Jinghan He et.al. 2412.13949 null
2024-12-18 Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition Ethan Baron et.al. 2412.13947 null
2024-12-18 Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection Le Yang et.al. 2412.13817 link
2024-12-18 RelationField: Relate Anything in Radiance Fields Sebastian Koch et.al. 2412.13652 null
2024-12-18 Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation Changsun Lee et.al. 2412.13558 null
2024-12-18 Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning Yingjie Zhu et.al. 2412.13540 link
2024-12-17 Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality Qitong Wang et.al. 2412.13333 link
2024-12-17 HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction Chen Bao et.al. 2412.13187 null
2024-12-17 Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration Mark Endo et.al. 2412.13180 null
2024-12-17 CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models Zihui Cheng et.al. 2412.12932 null
2024-12-17 An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions Shreeyash Gowaikar et.al. 2412.12898 null
2024-12-17 ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation Shiqi Huang et.al. 2412.12798 link
2024-12-17 CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels Shizhuo Deng et.al. 2412.12793 null
2024-12-17 Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference Siyuan Wang et.al. 2412.12785 null
2024-12-17 Defending LVLMs Against Vision Attacks through Partial-Perception Supervision Qi Zhou et.al. 2412.12722 null
2024-12-17 SPHERE: A Hierarchical Evaluation on Spatial Perception and Reasoning for Vision-Language Models Wenyu Zhang et.al. 2412.12693 null
2024-12-17 DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation Qingtao Pan et.al. 2412.12492 link
2024-12-16 Does VLM Classification Benefit from LLM Description Semantics? Pingchuan Ma et.al. 2412.11917 link
2024-12-17 From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach Xilin Wang et.al. 2412.11892 null
2024-12-16 LMM-Regularized CLIP Embeddings for Image Classification Maria Tzelepi et.al. 2412.11663 null
2024-12-16 Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves Shihan Wu et.al. 2412.11509 link
2024-12-16 Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents Wonje Choi et.al. 2412.11484 null
2024-12-16 OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference Wei Chen et.al. 2412.11475 null
2024-12-16 MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation Quan-Sheng Zeng et.al. 2412.11464 link
2024-12-16 Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes Antonio Carlos Rivera et.al. 2412.11396 null
2024-12-16 Temporal Contrastive Learning for Video Temporal Reasoning in Large Vision-Language Models Rafael Souza et.al. 2412.11391 null
2024-12-15 Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval Zelong Sun et.al. 2412.11087 null
2024-12-13 UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Muhammad Uzair Khattak et.al. 2412.10372 link
2024-12-13 A dual contrastive framework Yuan Sun et.al. 2412.10348 null
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation Hyeonseok Lim et.al. 2412.10151 null
2024-12-13 WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model Songyan Zhang et.al. 2412.09951 null
2024-12-13 CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models Dongyu Yao et.al. 2412.09936 link
2024-12-13 Selective State Space Memory for Large Vision-Language Models Chee Ng et.al. 2412.09875 null
2024-12-12 BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation Pablo Morales-Álvarez et.al. 2412.09718 null
2024-12-13 V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding Junqi Ge et.al. 2412.09616 link
2024-12-12 PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Chenyu Yang et.al. 2412.09613 null
2024-12-12 Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM Han Wang et.al. 2412.09530 link
2024-12-12 Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis Shengxuming Zhang et.al. 2412.09521 null
2024-12-12 ATPrompt: Textual Prompt Learning with Embedded Attributes Zheng Li et.al. 2412.09442 null
2024-12-12 Causal Graphical Models for Vision-Language Compositional Understanding Fiorenzo Parascandolo et.al. 2412.09353 link
2024-12-12 Learning Novel Skills from Language-Generated Demonstrations Ao-Qun Jin et.al. 2412.09286 null
2024-12-12 VLMs meet UDA: Boosting Transferability of Open Vocabulary Segmentation with Unsupervised Domain Adaptation Roberto Alcover-Couso et.al. 2412.09240 null
2024-12-12 A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter Zirun Guo et.al. 2412.08979 null
2024-12-12 GaGA: Towards Interactive Global Geolocation Assistant Zhiyang Dou et.al. 2412.08907 null
2024-12-11 Synthetic Vision: Training Vision-Language Models to Understand Physics Vahid Balazadeh et.al. 2412.08619 null
2024-12-12 Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Fan Lu et.al. 2412.08614 link
2024-12-11 SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting Pallavi Jain et.al. 2412.08536 link
2024-12-11 POINTS1.5: Building a Vision-Language Model towards Real World Applications Yuan Liu et.al. 2412.08443 null
2024-12-11 LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba Yubo Cui et.al. 2412.08388 null
2024-12-11 HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models Shiding Zhu et.al. 2412.08378 null
2024-12-11 Position-aware Guided Point Cloud Completion with CLIP Model Feng Zhou et.al. 2412.08271 null
2024-12-11 TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning Jingjing Xie et.al. 2412.08176 link
2024-12-11 Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models Quang-Hung Le et.al. 2412.08125 link
2024-12-11 Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models Sri Harsha Dumpala et.al. 2412.08111 null
2024-12-10 RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models Greg Heinrich et.al. 2412.07679 link
2024-12-10 DRUM: Learning Demonstration Retriever for Large MUlti-modal Models Ellen Yi-Ge et.al. 2412.07619 null
2024-12-10 Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic Scenarios Jiaqi Fan et.al. 2412.07518 link
2024-12-10 SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World Jiaqi Zhang et.al. 2412.07472 link
2024-12-10 MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models Sayak Chakrabarty et.al. 2412.07148 link
2024-12-10 Maya: An Instruction Finetuned Multilingual Multimodal Model Nahid Alam et.al. 2412.07112 link
2024-12-10 Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling Donggeun Kim et.al. 2412.07077 null
2024-12-09 Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models Yi-Lun Lee et.al. 2412.06775 link
2024-12-09 Visual Lexicon: Rich Image Features in Language Space XuDong Wang et.al. 2412.06774 null
2024-12-09 Ranking-aware adapter for text-driven image ordering with CLIP Wei-Hsiang Yu et.al. 2412.06760 link
2024-12-09 ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities Adhiraj Ghosh et.al. 2412.06745 null
2024-12-09 The Narrow Gate: Localized Image-Text Communication in Vision-Language Models Alessandro Serra et.al. 2412.06646 null
2024-12-09 From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding Yixiong Fang et.al. 2412.06474 link
2024-12-09 Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models Wei Suo et.al. 2412.06458 null
2024-12-09 No Annotations for Object Detection in Art through Stable Diffusion Patrick Ramos et.al. 2412.06286 link
2024-12-09 iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models Lianyu Hu et.al. 2412.06263 link
2024-12-09 DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction Yunheng Li et.al. 2412.06244 null
2024-12-06 Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies Recep Firat Cekinel et.al. 2412.05155 link
2024-12-06 Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Michael Y. Hu et.al. 2412.05149 null
2024-12-06 $S^3$ : Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models Xiaojie Yin et.al. 2412.04925 null
2024-12-06 Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model Keunwoo Peter Yu et.al. 2412.04729 null
2024-12-05 Cross-Self KV Cache Pruning for Efficient Vision-Language Inference Xiaohuan Pei et.al. 2412.04652 link
2024-12-05 VisionZip: Longer is Better but Not Necessary in Vision Language Models Senqiao Yang et.al. 2412.04467 link
2024-12-05 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Enshen Zhou et.al. 2412.04455 null
2024-12-05 Grounding Descriptions in Images informs Zero-Shot Visual Recognition Shaunak Halbe et.al. 2412.04429 link
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Rong Li et.al. 2412.04383 null
2024-12-05 Discriminative Fine-tuning of LVLMs Yassine Ouali et.al. 2412.04378 null
2024-12-05 3D Part Segmentation via Geometric Aggregation of 2D Visual Features Marco Garosi et.al. 2412.04247 null
2024-12-06 VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction Jiahao Zhang et.al. 2412.04237 null
2024-12-05 Unified Framework for Open-World Compositional Zero-shot Learning Hirunima Jayasekara et.al. 2412.04083 link
2024-12-05 GenChaR: A Dataset for Stock Chart Captioning Le Qiu et.al. 2412.04041 null
2024-12-04 FLAIR: VLM with Fine-grained Language-informed Image Representations Rui Xiao et.al. 2412.03561 link
2024-12-04 Best-of-N Jailbreaking John Hughes et.al. 2412.03556 link
2024-12-04 PaliGemma 2: A Family of Versatile VLMs for Transfer Andreas Steiner et.al. 2412.03555 null
2024-12-04 PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation Ao Wang et.al. 2412.03409 link
2024-12-04 A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs Wangbo Zhao et.al. 2412.03324 link
2024-12-04 Composed Image Retrieval for Training-Free Domain Conversion Nikos Efthymiadis et.al. 2412.03297 link
2024-12-04 Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation Gianni Franchi et.al. 2412.03178 null
2024-12-04 AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations? Shouwei Ruan et.al. 2412.03002 null
2024-12-04 Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch Qing Zhang et.al. 2412.02978 null
2024-12-04 Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis Po-Hsuan Huang et.al. 2412.02946 null
2024-12-03 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback Hiroki Furuta et.al. 2412.02617 null
2024-12-03 CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs Abhas Kumar et.al. 2412.02602 null
2024-12-03 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Junyuan Zhang et.al. 2412.02592 link
2024-12-03 Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey Chenyang Liu et.al. 2412.02573 link
2024-12-03 SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection Joongwon Chae et.al. 2412.02565 link
2024-12-03 Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks Jinjin Cai et.al. 2412.02531 null
2024-12-03 OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations Caixin Kang et.al. 2412.02479 null
2024-12-03 BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding Chenguang Huang et.al. 2412.02449 null
2024-12-03 Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation Sepand Dyanatkar et.al. 2412.02262 null
2024-12-03 LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models Fan-Yun Sun et.al. 2412.02193 null
2024-11-29 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks Kim-Celine Kahl et.al. 2411.19688 link
2024-11-29 CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation Qixiu Li et.al. 2411.19650 null
2024-11-29 Interleaved-Modal Chain-of-Thought Jun Gao et.al. 2411.19488 null
2024-11-29 Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis Ruoqi Wang et.al. 2411.19475 null
2024-11-28 Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Luca Barsellotti et.al. 2411.19331 link
2024-11-28 GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Muhammad Sohail Danish et.al. 2411.19325 link
2024-11-28 GRAPE: Generalizing Robot Policy via Preference Alignment Zijian Zhang et.al. 2411.19309 null
2024-11-28 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models Jeongho Ju et.al. 2411.19103 null
2024-11-27 ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics Letian Chen et.al. 2411.18825 null
2024-11-27 Generative Visual Communication in the Era of Vision-Language Models Yael Vinker et.al. 2411.18727 null
2024-11-27 Visual Adversarial Attack on Vision-Language Models for Autonomous Driving Tianyuan Zhang et.al. 2411.18275 null
2024-11-27 SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought Aladin Djuhera et.al. 2411.18212 null
2024-11-27 From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects Zizhao Li et.al. 2411.18207 link
2024-11-27 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Di Zhang et.al. 2411.18203 null
2024-11-27 DistinctAD: Distinctive Audio Description Generation in Contexts Bo Fang et.al. 2411.18180 null
2024-11-27 COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models Xiao An et.al. 2411.18145 null
2024-11-27 When Large Vision-Language Models Meet Person Re-Identification Qizao Wang et.al. 2411.18111 null
2024-11-27 Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis Weiqin Zhao et.al. 2411.18101 link
2024-11-27 VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Donggoo Kang et.al. 2411.18038 null
2024-11-28 Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models Shuyang Hao et.al. 2411.18000 null
2024-11-26 What's in the Image? A Deep-Dive into the Vision of Vision Language Models Omri Kaduri et.al. 2411.17491 null
2024-11-26 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Lei Li et.al. 2411.17451 null
2024-11-26 CoA: Chain-of-Action for Generative Semantic Labels Meng Wei et.al. 2411.17406 link
2024-11-26 Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Dongping Chen et.al. 2411.17188 null
2024-11-26 Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation Chanyoung Kim et.al. 2411.17150 null
2024-11-26 Free $^2$ Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models Jaemin Kim et.al. 2411.17041 null
2024-11-26 Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation Shambhavi Mishra et.al. 2411.17002 link
2024-11-25 Probing the limitations of multimodal language models for chemistry and materials research Nawaf Alampara et.al. 2411.16955 link
2024-11-25 Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge Yaqi Zhao et.al. 2411.16824 null
2024-11-25 Generating Out-Of-Distribution Scenarios Using Language Models Erfan Aasi et.al. 2411.16554 null
2024-11-25 RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Chan Hee Song et.al. 2411.16537 null
2024-11-25 Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis Boming Miao et.al. 2411.16503 null
2024-11-25 A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models Manuel Schwonberg et.al. 2411.16407 null
2024-11-25 CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain Jingchao Peng et.al. 2411.16327 null
2024-11-25 Open-Vocabulary Octree-Graph for 3D Scene Understanding Zhigang Wang et.al. 2411.16253 null
2024-11-25 Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models Niloufar Alipour Talemi et.al. 2411.16018 null
2024-11-24 Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation Sule Bai et.al. 2411.15869 link
2024-11-24 ResCLIP: Residual Attention for Training-free Dense Vision-language Inference Yuhang Yang et.al. 2411.15851 link
2024-11-24 VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding Jiaqi Wang et.al. 2411.15839 null
2024-11-22 Context-Aware Multimodal Pretraining Karsten Roth et.al. 2411.15099 null
2024-11-22 Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning Junjie Shan et.al. 2411.14937 link
2024-11-22 ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos Tanveer Hannan et.al. 2411.14901 null
2024-11-22 VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models Camilo Chacón Sartori et.al. 2411.14832 null
2024-11-22 Continual SFT Matches Multimodal RLHF with Negative Supervision Ke Zhu et.al. 2411.14797 null
2024-11-22 VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Songhao Han et.al. 2411.14794 link
2024-11-22 Effective SAM Combination for Open-Vocabulary Semantic Segmentation Minhyeok Lee et.al. 2411.14723 null
2024-11-21 GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Tianbin Li et.al. 2411.14522 link
2024-11-21 Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance Haozhe Zhao et.al. 2411.14279 null
2024-11-21 FoPru: Focal Pruning for Efficient Large Vision-Language Models Lei Jiang et.al. 2411.14164 null
2024-11-21 Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset Heejeong Nam et.al. 2411.14137 link
2024-11-20 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Davide Paglieri et.al. 2411.13543 null
2024-11-20 Teaching VLMs to Localize Specific Objects from In-context Examples Sivan Doveh et.al. 2411.13317 link
2024-11-20 XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation Ziyi Wang et.al. 2411.13243 link
2024-11-21 ViSTa Dataset: Do vision-language models understand sequential tasks? Evžen Wybitul et.al. 2411.13211 link
2024-11-20 TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models Xin Wang et.al. 2411.13136 null
2024-11-19 VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Vishwesh Nath et.al. 2411.12915 null
2024-11-19 CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs Zhehan Kan et.al. 2411.12713 null
2024-11-18 Vision Language Models Are Few-Shot Audio Spectrogram Classifiers Satvik Dixit et.al. 2411.12058 null
2024-11-18 ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements M. Arda Aydın et.al. 2411.12044 link
2024-11-18 MC-LLaVA: Multi-Concept Personalized Vision-Language Model Ruichuan An et.al. 2411.11706 link
2024-11-18 TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World Xianlong Wang et.al. 2411.11683 null
2024-11-18 VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation Bangguo Yu et.al. 2411.11609 null
2024-11-18 Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment Zhendong Liu et.al. 2411.11543 null
2024-11-19 Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models Chenhang Cui et.al. 2411.11496 link
2024-11-18 Exploring Emerging Trends and Research Opportunities in Visual Place Recognition Antonios Gasteratos et.al. 2411.11481 null
2024-11-18 Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts Jingxuan Li et.al. 2411.11479 null
2024-11-18 Efficient Transfer Learning for Video-language Foundation Models Haoxing Chen et.al. 2411.11223 link
2024-11-17 Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection Wentao Bao et.al. 2411.10922 null
2024-11-16 MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection Xu Cao et.al. 2411.10888 link
2024-11-15 VeriGraph: Scene Graphs for Execution Verifiable Robot Planning Daniel Ekpo et.al. 2411.10446 null
2024-11-15 LLaVA-o1: Let Vision Language Models Reason Step-by-Step Guowei Xu et.al. 2411.10440 link
2024-11-15 SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning Zewen Chen et.al. 2411.10161 link
2024-11-15 Federated Domain Generalization via Prompt Learning and Aggregation Shuai Gong et.al. 2411.10063 link
2024-11-15 Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement Yanyan Huang et.al. 2411.09894 link
2024-11-14 LLV-FSR: Exploiting Large Language-Vision Prior for Face Super-resolution Chenyang Wang et.al. 2411.09293 null
2024-11-13 ClevrSkills: Compositional Language and Visual Reasoning in Robotics Sanjay Haresh et.al. 2411.09052 link
2024-11-13 DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models Yongdong Wang et.al. 2411.09022 link
2024-11-13 Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Moran Yanuka et.al. 2411.09018 null
2024-11-13 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Daniel P. Jeong et.al. 2411.08870 link
2024-11-13 Sharingan: Extract User Action Sequence from Desktop Recordings Yanting Chen et.al. 2411.08768 null
2024-11-13 Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification Jose-Luis Matez-Bandera et.al. 2411.08727 link
2024-11-13 LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation Pengwei Yin et.al. 2411.08606 null
2024-11-13 NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation Youzhi Liu et.al. 2411.08579 null
2024-11-13 Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints Nishanth Kumar et.al. 2411.08253 null
2024-11-12 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Yiyang Ma et.al. 2411.07975 link
2024-11-12 Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer's Disease Francesco Chiumento et.al. 2411.07871 null
2024-11-12 BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Anas Awadalla et.al. 2411.07461 null
2024-11-11 SAMPart3D: Segment Any Part in 3D Objects Yunhan Yang et.al. 2411.07184 link
2024-11-11 StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Yichen He et.al. 2411.07076 link
2024-11-11 UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models Jiachen Liang et.al. 2411.06921 null
2024-11-11 Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning Hongsheng Zhang et.al. 2411.06764 null
2024-11-11 Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models Jungseok Hong et.al. 2411.06752 null
2024-11-11 Renaissance: Investigating the Pretraining of Vision-Language Encoders Clayton Fields et.al. 2411.06657 link
2024-11-09 Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models Arshia Hemmat et.al. 2411.06287 link
2024-11-09 Aquila-plus: Prompt-Driven Visual-Language Models for Pixel-Level Remote Sensing Image Understanding Kaixuan Lu et.al. 2411.06142 null
2024-11-09 Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension Kaixuan Lu et.al. 2411.06074 null
2024-11-09 GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection Jiyul Ham et.al. 2411.06071 link
2024-11-08 End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering Dylan Goetting et.al. 2411.05755 link
2024-11-08 Poze: Sports Technique Feedback under Data Constraints Agamdeep Singh et.al. 2411.05734 null
2024-11-08 A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis Cristiano Patrício et.al. 2411.05609 link
2024-11-08 Enhancing Visual Classification using Comparative Descriptors Hankyeol Lee et.al. 2411.05357 link
2024-11-08 Real-World Offline Reinforcement Learning from Vision Language Model Feedback Sreyas Venkataraman et.al. 2411.05273 null
2024-11-07 On Erroneous Agreements of CLIP Image Embeddings Siting Li et.al. 2411.05195 null
2024-11-07 Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model Sheng Cheng et.al. 2411.05079 link
2024-11-07 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Peiqi Liu et.al. 2411.04999 link
2024-11-07 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model Panwen Hu et.al. 2411.04942 null
2024-11-07 In the Era of Prompt Learning with Vision-Language Models Ankit Jha et.al. 2411.04892 null
2024-11-07 TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models Jonathan Fhima et.al. 2411.04642 null
2024-11-07 Vision Language Models are In-Context Value Learners Yecheng Jason Ma et.al. 2411.04549 null
2024-11-07 BendVLM: Test-Time Debiasing of Vision-Language Embeddings Walter Gerych et.al. 2411.04420 link
2024-11-06 Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models Saketh Bachu et.al. 2411.04291 null
2024-11-06 Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong et.al. 2411.04118 link
2024-11-06 RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models Maya Varma et.al. 2411.04097 link
2024-11-06 H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models Nhi Pham et.al. 2411.04077 null
2024-11-06 Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval Davide Buoso et.al. 2411.04006 null
2024-11-06 Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models Minh Duc Bui et.al. 2411.03888 link
2024-11-06 DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model Tianhao He et.al. 2411.03827 null
2024-11-06 Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction Muhammad Tayyab Khan et.al. 2411.03707 null
2024-11-05 Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset Yingzi Ma et.al. 2411.03554 link
2024-11-05 VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation Haochen Zhang et.al. 2411.03540 link
2024-11-05 An Application-Agnostic Automatic Target Recognition System Using Vision Language Models Anthony Palladino et.al. 2411.03491 null
2024-11-05 Inference Optimal VLMs Need Only One Visual Token but Larger Models Kevin Y. Li et.al. 2411.03312 link
2024-11-05 HumanVLM: Foundation for Human-Scene Vision-Language Model Dawei Dai et.al. 2411.03034 null
2024-11-05 Membership Inference Attacks against Large Vision-Language Models Zhan Li et.al. 2411.02902 link
2024-11-05 Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs Muhammad Tayyab Khan et.al. 2411.02810 null
2024-11-05 Label Critic: Design Data Before Models Pedro R. A. S. Bassi et.al. 2411.02753 link
2024-11-05 DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark Haodong Li et.al. 2411.02733 link
2024-11-05 V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization Yuxi Xie et.al. 2411.02712 link
2024-11-04 Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models Meng Cao et.al. 2411.02564 link
2024-11-04 INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Edward Vendrow et.al. 2411.02537 link
2024-11-04 One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering Deepayan Das et.al. 2411.02210 null
2024-11-04 GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery Bhupendra Solanki et.al. 2411.02074 null
2024-11-03 Addressing Failures in Robotics using Vision-Based Language Models (VLMs) and Behavior Trees (BT) Faseeh Ahmad et.al. 2411.01568 null
2024-11-03 Integration of Large Vision Language Models for Efficient Post-disaster Damage Assessment and Reporting Zhaohui Chen et.al. 2411.01511 null
2024-11-03 A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning Fei Wang et.al. 2411.01445 null
2024-11-01 Identifying Implicit Social Biases in Vision-Language Models Kimia Hamidieh et.al. 2411.00997 null
2024-11-01 Retrieval-enriched zero-shot image classification in low-resource domains Nicola Dall'Asen et.al. 2411.00988 null
2024-11-01 Does GenAI Make Usability Testing Obsolete? Ali Ebrahimi Pourasad et.al. 2411.00634 null
2024-11-01 CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision Gi-Cheon Kang et.al. 2411.00508 null
2024-11-01 Right this way: Can VLMs Guide Us to See More to Answer Questions? Li Liu et.al. 2411.00394 link
2024-10-31 $π_0$ : A Vision-Language-Action Flow Model for General Robot Control Kevin Black et.al. 2410.24164 null
2024-10-31 Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age Nouar AlDahoul et.al. 2410.24148 null
2024-10-31 Bayesian-guided Label Mapping for Visual Reprogramming Chengyi Cai et.al. 2410.24018 link
2024-10-31 EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection Qinqian Lei et.al. 2410.23904 link
2024-10-31 Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP Chen Huang et.al. 2410.23698 null
2024-10-31 Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey Chiyu Zhang et.al. 2410.23687 null
2024-10-31 SuctionPrompt: Visual-assisted Robotic Picking with a Suction Cup Using Vision-Language Models and Facile Hardware Design Tomohiro Motoda et.al. 2410.23640 null
2024-10-30 Keypoint Abstraction using Large Models for Object-Relative Imitation Learning Xiaolin Fang et.al. 2410.23254 null
2024-10-30 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Zhiyong Wu et.al. 2410.23218 link
2024-10-30 VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Yichao Liang et.al. 2410.23156 null
2024-10-30 Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Junjie Wu et.al. 2410.23114 link
2024-10-30 An Individual Identity-Driven Framework for Animal Re-Identification Yihao Wu et.al. 2410.22927 link
2024-10-30 Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector Youcheng Huang et.al. 2410.22888 link
2024-10-30 Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization Kento Kawaharazuka et.al. 2410.22707 null
2024-10-30 SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset Ngoc Dung Huynh et.al. 2410.22648 null
2024-10-29 Image2Struct: Benchmarking Structure Extraction for Vision-Language Models Josselin Somerville Roberts et.al. 2410.22456 null
2024-10-29 Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier Kai Wang et.al. 2410.22317 link
2024-10-29 Natural Language Inference Improves Compositionality in Vision-Language Models Paola Cascante-Bonilla et.al. 2410.22315 null
2024-10-29 Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Bo Jiang et.al. 2410.22313 link
2024-10-29 ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising Ashutosh Chaubey et.al. 2410.22233 link
2024-10-29 Active Learning for Vision-Language Models Bardia Safaei et.al. 2410.22187 null
2024-10-29 Are VLMs Really Blind Ayush Singh et.al. 2410.22029 link
2024-10-29 Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation Halil Utku Unlu et.al. 2410.21926 null
2024-10-30 Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models Lu Yu et.al. 2410.21802 link
2024-10-29 PerSRV: Personalized Sticker Retrieval with Vision-Language Model Heng Er Metilda Chee et.al. 2410.21801 link
2024-10-29 AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Han Bao et.al. 2410.21259 link
2024-10-28 Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce Zhantao Yang et.al. 2410.21237 null
2024-10-28 Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Zhixin Zhang et.al. 2410.21220 link
2024-10-29 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Qintong Zhang et.al. 2410.21169 null
2024-10-28 Zero-Shot Action Recognition in Surveillance Videos Joao Pereira et.al. 2410.21113 null
2024-10-28 BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks Yunhan Zhao et.al. 2410.20971 null
2024-10-29 VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions Guanyan Chen et.al. 2410.20927 null
2024-10-28 Improving Generalization in Visual Reasoning via Self-Ensemble Tien-Huy Nguyen et.al. 2410.20883 null
2024-10-28 Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments Sangmim Song et.al. 2410.20666 null
2024-10-27 MatViX: Multimodal Information Extraction from Visually Rich Articles Ghazal Khalighinejad et.al. 2410.20494 null
2024-10-25 Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models Yucheng Zhou et.al. 2410.19732 null
2024-10-25 GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing Hosam Elgendy et.al. 2410.19552 link
2024-10-25 Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? Antonia Wüst et.al. 2410.19546 link
2024-10-25 EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data Xuetian Chen et.al. 2410.19461 null
2024-10-25 COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Haocheng Xi et.al. 2410.19313 link
2024-10-25 Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting Xingyu Zhu et.al. 2410.19294 null
2024-10-24 Probabilistic Language-Image Pre-Training Sanghyuk Chun et.al. 2410.18857 link
2024-10-24 Zero-shot Object Navigation with Vision-Language Models Reasoning Congcong Wen et.al. 2410.18570 null
2024-10-24 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Shuhao Gu et.al. 2410.18558 null
2024-10-24 Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics Jinghao Hu et.al. 2410.18537 null
2024-10-23 Lightweight Neural App Control Filippos Christianos et.al. 2410.17883 null
2024-10-23 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Shaofei Cai et.al. 2410.17856 link
2024-10-23 RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification Marco Mistretta et.al. 2410.17827 null
2024-10-23 An Intelligent Agentic System for Complex Image Restoration Problems Kaiwen Zhu et.al. 2410.17809 link
2024-10-23 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Ziyu Liu et.al. 2410.17637 link
2024-10-22 AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents Chejian Xu et.al. 2410.17401 null
2024-10-22 Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities Zheyuan Zhang et.al. 2410.17385 link
2024-10-22 PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Long Xing et.al. 2410.17247 link
2024-10-22 MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model Meng Xu et.al. 2410.16840 null
2024-10-21 Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives Angelo Moroncelli et.al. 2410.16411 link
2024-10-21 VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use Zhehao Zhang et.al. 2410.16400 null
2024-10-21 Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Ryan Li et.al. 2410.16232 null
2024-10-21 Improve Vision Language Model Chain-of-thought Reasoning Ruohong Zhang et.al. 2410.16198 link
2024-10-21 Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning Yihong Tang et.al. 2410.16162 null
2024-10-21 Mitigating Object Hallucination via Concentric Causal Attention Yun Xing et.al. 2410.15926 link
2024-10-21 MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images Pablo Meseguer et.al. 2410.15881 null
2024-10-21 Task-oriented Robotic Manipulation with Vision Language Models Nurhan Bulus Guran et.al. 2410.15863 null
2024-10-21 An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps Ziyi Liu et.al. 2410.15780 link
2024-10-22 Reducing Hallucinations in Vision-Language Models via Latent Space Steering Sheng Liu et.al. 2410.15778 link
2024-10-21 CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models Jianjun Gao et.al. 2410.15657 null
2024-10-21 A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM ByungOk Han et.al. 2410.15549 null
2024-10-18 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Baiqi Li et.al. 2410.14669 null
2024-10-18 Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets Namid R. Stillman et.al. 2410.14587 null
2024-10-18 CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection Andrea Appiani et.al. 2410.14509 null
2024-10-18 Zero-shot Action Localization via the Confidence of Large Vision-Language Models Josiah Aklilu et.al. 2410.14340 null
2024-10-18 E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model Haoran Lai et.al. 2410.14200 null
2024-10-18 LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs Yujun Zhou et.al. 2410.14182 null
2024-10-18 MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Zifeng Zhu et.al. 2410.14179 null
2024-10-18 ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom Jingqi Zhou et.al. 2410.14138 null
2024-10-17 Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers Yuxin Wen et.al. 2410.14072 null
2024-10-17 Reproducibility study of "LICO: Explainable Models with Language-Image Consistency" Luan Fletcher et.al. 2410.13989 link
2024-10-17 VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding Runsen Xu et.al. 2410.13860 link
2024-10-17 Differentiable Robot Rendering Ruoshi Liu et.al. 2410.13851 null
2024-10-17 Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning Xiaodan Xing et.al. 2410.13823 link
2024-10-17 Improving Multi-modal Large Language Model through Boosting Vision Capabilities Yanpeng Sun et.al. 2410.13733 null
2024-10-17 VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks Shailaja Keyur Sampat et.al. 2410.13666 link
2024-10-17 H2OVL-Mississippi Vision Language Models Technical Report Shaikat Galib et.al. 2410.13611 null
2024-10-17 GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models Aditya Sharma et.al. 2410.13510 null
2024-10-17 Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding Kyungmin Min et.al. 2410.13321 null
2024-10-17 Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead Kuleen Sasse et.al. 2410.13146 link
2024-10-17 Trust but Verify: Programmatic VLM Evaluation in the Wild Viraj Prabhu et.al. 2410.13121 null
2024-10-16 Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models Ce Zhang et.al. 2410.12790 link
2024-10-16 Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions Zhenyu Jiang et.al. 2410.12773 null
2024-10-16 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation João Matos et.al. 2410.12722 link
2024-10-16 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Genta Indra Winata et.al. 2410.12705 link
2024-10-16 VividMed: Vision Language Model with Versatile Visual Grounding for Medicine Lingxiao Luo et.al. 2410.12694 link
2024-10-16 Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models Shicheng Xu et.al. 2410.12662 null
2024-10-16 FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion Jiacheng Ruan et.al. 2410.12564 link
2024-10-16 Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety Lucas Choi et.al. 2410.12225 null
2024-10-16 Leveraging Large Vision Language Model For Better Automatic Web GUI Testing Siyi Wang et.al. 2410.12157 null
2024-10-15 Enabling Data-Driven and Empathetic Interactions: A Context-Aware 3D Virtual Agent in Mixed Reality for Enhanced Financial Customer Experience Cindy Xu et.al. 2410.12051 null
2024-10-15 A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem Kun Ding et.al. 2410.11686 null
2024-10-15 MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Reno Kriz et.al. 2410.11619 null
2024-10-15 PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model Shang-Ching Liu et.al. 2410.11564 null
2024-10-15 LargePiG: Your Large Language Model is Secretly a Pointer Generator Zhongxiang Sun et.al. 2410.11366 null
2024-10-15 CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification Huazhong Zhao et.al. 2410.11255 null
2024-10-15 Tree of Attributes Prompt Learning for Vision-Language Models Tong Ding et.al. 2410.11201 null
2024-10-14 Locality Alignment Improves Vision-Language Models Ian Covert et.al. 2410.11087 null
2024-10-14 Towards Foundation Models for 3D Vision: How Close Are We? Yiming Zuo et.al. 2410.10799 link
2024-10-14 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Shi Yu et.al. 2410.10594 link
2024-10-14 Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification Jiaxiang Gou et.al. 2410.10573 link
2024-10-14 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Jiacheng Chen et.al. 2410.10563 link
2024-10-14 LG-CAV: Train Any Concept Activation Vector with Language Guidance Qihan Huang et.al. 2410.10308 null
2024-10-14 Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection Jiawen Zhu et.al. 2410.10289 link
2024-10-14 LOBG:Less Overfitting for Better Generalization in Vision-Language Model Chenhao Ding et.al. 2410.10247 null
2024-10-14 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Peng Xia et.al. 2410.10139 link
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 null
2024-10-14 Can We Predict Performance of Large Models across Vision-Language Tasks? Qinyu Zhao et.al. 2410.10112 link
2024-10-11 Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models Qin Liu et.al. 2410.09047 null
2024-10-11 The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals Xiaofeng Wu et.al. 2410.09013 null
2024-10-11 SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation Haosheng Li et.al. 2410.08901 null
2024-10-11 Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation Kun Ding et.al. 2410.08895 null
2024-10-11 RoRA-VLM: Robust Retrieval-Augmented Vision Language Models Jingyuan Qi et.al. 2410.08876 null
2024-10-11 Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies Yingqiang Gao et.al. 2410.08860 null
2024-10-11 VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model Beichen Wang et.al. 2410.08792 null
2024-10-11 Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models Reza Abbasi et.al. 2410.08791 link
2024-10-11 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping Yue Yang et.al. 2410.08695 link
2024-10-11 Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models Mengyuan Chen et.al. 2410.08611 link
2024-10-10 MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models Wenbo Hu et.al. 2410.08182 null
2024-10-10 On the Evaluation of Generative Robotic Simulations Feng Chen et.al. 2410.08172 null
2024-10-10 Q-VLM: Post-training Quantization for Large Vision-Language Models Changyuan Wang et.al. 2410.08119 link
2024-10-10 Unsupervised Data Validation Methods for Efficient Model Training Yurii Paniv et.al. 2410.07880 null
2024-10-10 HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter Yumiao Zhao et.al. 2410.07854 null
2024-10-10 FLIER: Few-shot Language Image Models Embedded with Latent Representations Zhinuo Zhou et.al. 2410.07648 null
2024-10-10 A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks Hoin Jung et.al. 2410.07593 link
2024-10-10 3D Vision-Language Gaussian Splatting Qucheng Peng et.al. 2410.07577 null
2024-10-10 How Does Vision-Language Adaptation Impact the Safety of Vision Language Models? Seongyun Lee et.al. 2410.07571 null
2024-10-10 CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection Guankun Wang et.al. 2410.07540 link
2024-10-09 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate Qidong Huang et.al. 2410.07167 link
2024-10-09 Towards Interpreting Visual Information Processing in Vision-Language Models Clement Neo et.al. 2410.07149 link
2024-10-10 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Rui Zhao et.al. 2410.07133 link
2024-10-09 VHELM: A Holistic Evaluation of Vision Language Models Tony Lee et.al. 2410.07112 link
2024-10-09 Pixtral 12B Pravesh Agrawal et.al. 2410.07073 link
2024-10-09 Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback Dennis Hein et.al. 2410.07025 null
2024-10-09 $\texttt{ModSCAN}$ : Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities Yukun Jiang et.al. 2410.06967 link
2024-10-09 Compositional Entailment Learning for Hyperbolic Vision-Language Models Avik Pal et.al. 2410.06912 null
2024-10-09 From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models Yuying Shang et.al. 2410.06795 null
2024-10-09 Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models Yubo Wang et.al. 2410.06699 null
2024-10-07 Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia Mohammad Fahes et.al. 2410.05270 link
2024-10-07 TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens Ya-Qi Yu et.al. 2410.05261 null
2024-10-08 TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models Rabin Adhikari et.al. 2410.05239 link
2024-10-07 LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation Zhijie Wang et.al. 2410.05191 null
2024-10-07 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Ziyan Jiang et.al. 2410.05160 null
2024-10-07 HE-Drive: Human-Like End-to-End Driving with Vision Language Models Junming Wang et.al. 2410.05051 null
2024-10-07 TLDR: Token-Level Detective Reward Model for Large Vision Language Models Deqing Fu et.al. 2410.04734 null
2024-10-06 Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress Christopher Agia et.al. 2410.04640 null
2024-10-06 Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models Salma Abdel Magid et.al. 2410.04634 null
2024-10-06 LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking Alimohammad Beigi et.al. 2410.04616 null
2024-10-04 Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models Tinghui Zhu et.al. 2410.03659 link
2024-10-04 LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos Noriaki Hirose et.al. 2410.03603 null
2024-10-04 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation Ahmed Abdulaal et.al. 2410.03334 null
2024-10-04 Generalizable Prompt Tuning for Vision-Language Models Qian Zhang et.al. 2410.03189 null
2024-10-04 Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models Yufang Liu et.al. 2410.03176 link
2024-10-04 CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization Shigemichi Matsuzaki et.al. 2410.03054 null
2024-10-07 Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL Naoaki Kanazawa et.al. 2410.02874 null
2024-10-03 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Nick Jiang et.al. 2410.02762 link
2024-10-03 DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Zhaowei Wang et.al. 2410.02730 link
2024-10-03 Unified Multi-Modal Interleaved Document Representation for Information Retrieval Jaewoo Lee et.al. 2410.02729 null
2024-10-03 Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models Shuoyuan Wang et.al. 2410.02681 null
2024-10-03 LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Duy M. H. Nguyen et.al. 2410.02615 null
2024-10-03 Guiding Long-Horizon Task and Motion Planning with Vision Language Models Zhutian Yang et.al. 2410.02193 null
2024-10-02 Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning Xiao Yu et.al. 2410.02052 null
2024-10-02 Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description Mahshid Dehghani et.al. 2410.02049 null
2024-10-02 Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval Kyle Buettner et.al. 2410.02027 null
2024-10-02 Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker Xinlong Hou et.al. 2410.01966 null
2024-10-03 Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia et.al. 2410.01744 link
2024-10-03 LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models Zhenyue Qin et.al. 2410.01620 null
2024-10-02 Toward a Holistic Evaluation of Robustness in CLIP Models Weijie Tu et.al. 2410.01534 null
2024-10-03 LEGO: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion Dexuan Ding et.al. 2410.01506 null
2024-10-02 Information-Theoretical Principled Trade-off between Jailbreakability and Stealthiness on Vision Language Models Ching-Chia Kao et.al. 2410.01438 null
2024-10-02 Backdooring Vision-Language Models with Out-Of-Distribution Data Weimin Lyu et.al. 2410.01264 null
2024-10-02 UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark Hasnat Md Abdullah et.al. 2410.01180 link
2024-10-01 ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding Liang Shi et.al. 2410.00982 null
2024-10-01 Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion Lakshmi Nair et.al. 2410.00731 link
2024-10-01 Find Everything: A General Vision Language Model Approach to Multi-Object Search Daniel Choi et.al. 2410.00388 null
2024-09-30 UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Qiaojun Yu et.al. 2409.20551 null
2024-09-30 Robi Butler: Remote Multimodal Interactions with Household Robot Assistant Anxing Xiao et.al. 2409.20548 null
2024-09-30 Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments Mohamed Elnoor et.al. 2409.20445 null
2024-09-30 HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding Fan Yuan et.al. 2409.20429 null
2024-09-30 World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Jiacong Wang et.al. 2409.20424 link
2024-09-30 CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset Akshatha Arodi et.al. 2409.20353 link
2024-09-30 Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function Chenyi Zhuang et.al. 2409.19967 link
2024-09-30 Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels Heeseong Shin et.al. 2409.19846 null
2024-09-30 Textual Training for the Hassle-Free Removal of Unwanted Visual Data Saehyung Lee et.al. 2409.19840 link
2024-09-29 PALM: Few-Shot Prompt Learning for Audio Language Models Asif Hanif et.al. 2409.19806 null
2024-09-27 Image-guided topic modeling for interpretable privacy classification Alina Elena Baia et.al. 2409.18674 link
2024-09-26 SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation Xin Li et.al. 2409.18082 null
2024-09-26 Infering Alt-text For UI Icons With Large Language Models During App Development Sabrina Haque et.al. 2409.18060 null
2024-09-26 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Kai Chen et.al. 2409.18042 null
2024-09-26 DARE: Diverse Visual Question Answering with Robustness Evaluation Hannah Sterz et.al. 2409.18023 null
2024-09-26 The Hard Positive Truth about Vision-Language Compositionality Amita Kamath et.al. 2409.17958 link
2024-09-26 Cascade Prompt Learning for Vision-Language Model Adaptation Ge Wu et.al. 2409.17805 link
2024-09-26 Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications Nghia Nguyen et.al. 2409.17727 null
2024-09-26 AP-VLM: Active Perception Enabled by Vision-Language Models Venkatesh Sripada et.al. 2409.17641 null
2024-09-26 P4Q: Learning to Prompt for Quantization in Visual-language Models Huixin Sun et.al. 2409.17634 null
2024-09-26 Leveraging Semantic and Geometric Information for Zero-Shot Robot-to-Human Handover Jiangshan Liu et.al. 2409.17621 null
2024-09-25 Attention Prompting on Image for Large Vision-Language Models Runpeng Yu et.al. 2409.17143 link
2024-09-25 Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset Andrew Goldberg et.al. 2409.17126 null
2024-09-25 Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? Bowen Zhao et.al. 2409.17080 link
2024-09-25 GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design Phillip Mueller et.al. 2409.17045 null
2024-09-25 Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification Ming Li et.al. 2409.16718 link
2024-09-24 A Unified Hallucination Mitigation Framework for Large Vision-Language Models Yue Chang et.al. 2409.16494 link
2024-09-24 BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes Kasun Weerakoon et.al. 2409.16484 null
2024-09-24 Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation Yong Xien Chng et.al. 2409.16278 null
2024-09-24 ComiCap: A VLMs pipeline for dense captioning of Comic Panels Emanuele Vivoli et.al. 2409.16159 link
2024-09-24 Bridging Environments and Language with Rendering Functions and Vision-Language Models Theo Cachet et.al. 2409.16024 null
2024-09-18 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Peng Wang et.al. 2409.12191 link
2024-09-18 Mixture of Prompt Learning for Vision Language Models Yu Du et.al. 2409.12011 null
2024-09-18 GauTOAO: Gaussian-based Task-Oriented Affordance of Objects Jiawen Wang et.al. 2409.11941 null
2024-09-18 LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models Amaia Cardiel et.al. 2409.11919 null
2024-09-17 CAST: Cross-modal Alignment Similarity Test for Vision Language Models Gautier Dagan et.al. 2409.11007 link
2024-09-17 KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph Yanbei Jiang et.al. 2409.10921 link
2024-09-16 Benchmarking VLMs' Reasoning About Persuasive Atypical Images Sina Malakouti et.al. 2409.10719 null
2024-09-16 MotIF: Motion Instruction Fine-tuning Minyoung Hwang et.al. 2409.10683 null
2024-09-16 Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman et.al. 2409.10488 null
2024-09-16 CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera Jingpei Lu et.al. 2409.10441 null
2024-09-16 HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models Vineet Bhat et.al. 2409.10419 null
2024-09-16 NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions Zhixi Cai et.al. 2409.10196 null
2024-09-16 MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior Weijing Tao et.al. 2409.10090 link
2024-09-17 IRIS: Interactive Responsive Intelligent Segmentation for 3D Affordance Analysis Meng Chu et.al. 2409.10078 null
2024-09-15 FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots Bo Peng et.al. 2409.09845 null
2024-09-15 Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models Yuan-Hong Liao et.al. 2409.09788 null
2024-09-15 Finetuning CLIP to Reason about Pairwise Differences Dylan Sam et.al. 2409.09721 link
2024-09-15 Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs Mengmeng Ren et.al. 2409.09715 null
2024-09-13 Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation Hangyu Li et.al. 2409.08598 null
2024-09-13 ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning Pei Deng et.al. 2409.08582 null
2024-09-13 Generalization Boosted Adapter for Open-Vocabulary Segmentation Wenhao Xu et.al. 2409.08468 null
2024-09-12 Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations Samyak Rawlekar et.al. 2409.08381 null
2024-09-12 ComAlign: Compositional Alignment in Vision-Language Models Ali Abdollah et.al. 2409.08206 null
2024-09-12 What Makes a Maze Look Like a Maze? Joy Hsu et.al. 2409.08202 null
2024-09-12 DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Liqiang Jing et.al. 2409.07703 link
2024-09-12 Open-Vocabulary Remote Sensing Image Semantic Segmentation Qinglong Cao et.al. 2409.07683 link
2024-09-11 Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks Md Zarif Hossain et.al. 2409.07353 link
2024-09-14 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-11 Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations Keumgang Cha et.al. 2409.07048 null
2024-09-10 ExIQA: Explainable Image Quality Assessment Using Distortion Attributes Sepehr Kazemi Ranjbar et.al. 2409.06853 null
2024-09-10 DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks Amin Karimi Monsefi et.al. 2409.06809 link
2024-09-09 NeIn: Telling What You Don't Want Nhat-Tan Bui et.al. 2409.06481 null
2024-09-10 MAGDA: Multi-agent guideline-driven diagnostic assistance David Bani-Harouni et.al. 2409.06351 null
2024-09-10 INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding Ji Ha Jang et.al. 2409.06210 null
2024-09-10 Revisiting Prompt Pretraining of Vision-Language Models Zhenyuan Chen et.al. 2409.06166 null
2024-09-09 PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems Aditya Narayanan et.al. 2409.06078 link
2024-09-09 DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments Chengzhong Ma et.al. 2409.05493 null
2024-09-09 From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models Tessa Pulli et.al. 2409.05413 null
2024-09-11 A Survey of Multimodal Composite Editing and Retrieval Suyan Li et.al. 2409.05405 link
2024-09-09 Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Georgios Pantazopoulos et.al. 2409.05395 link
2024-09-08 PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions Yudong Zhang et.al. 2409.05076 link
2024-09-07 POINTS: Improving Your Vision-language Model with Affordable Strategies Yuan Liu et.al. 2409.04828 null
2024-09-07 Enhancing Outlier Knowledge for Few-Shot Out-of-Distribution Detection with Extensible Local Prompts Fanhu Zeng et.al. 2409.04796 null
2024-09-07 MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality Ruiting Dai et.al. 2409.04693 null
2024-09-06 COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Koen Kraaijveld et.al. 2409.04053 link
2024-09-06 Automating Robot Failure Recovery Using Vision-Language Models With Optimized Prompts Hongyi Chen et.al. 2409.03966 null
2024-09-05 Few-shot Adaptation of Medical Vision-Language Models Fereshteh Shakeri et.al. 2409.03868 link
2024-09-05 Text-Guided Mixup Towards Long-Tailed Image Categorization Richard Franklin et.al. 2409.03583 link
2024-09-05 Have Large Vision-Language Models Mastered Art History? Ombretta Strafforello et.al. 2409.03521 null
2024-09-04 Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving Yuhang Lu et.al. 2409.02914 null
2024-09-04 Benchmarking Spurious Bias in Few-Shot Image Classifiers Guangtao Zheng et.al. 2409.02882 link
2024-09-04 Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection Kaiqing Lin et.al. 2409.02664 null
2024-09-04 Multi-modal Situated Reasoning in 3D Scenes Xiongkun Linghu et.al. 2409.02389 null
2024-09-03 Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems Sanjita Prajapati et.al. 2409.02278 null
2024-09-03 How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Saeid Asgari Taghanaki et.al. 2409.02253 link
2024-09-03 Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models Jiaqi Xu et.al. 2409.02101 link
2024-09-03 GraspSplats: Efficient Manipulation with 3D Feature Splatting Mazeyu Ji et.al. 2409.02084 null
2024-09-03 Boosting Vision-Language Models for Histopathology Classification: Predict all at once Maxime Zanella et.al. 2409.01883 link
2024-09-03 Towards Generative Class Prompt Learning for Few-shot Visual Recognition Soumitri Chattopadhyay et.al. 2409.01835 link
2024-09-03 Open-vocabulary Temporal Action Localization using VLMs Naoki Wake et.al. 2408.17422 null
2024-09-02 LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation Shuyi Ouyang et.al. 2408.17347 null
2024-08-30 Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning Xiaoye Qu et.al. 2408.17150 link
2024-08-29 VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition Zaiwei Zhang et.al. 2408.16930 null
2024-08-29 PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning Noor Hussein et.al. 2408.16769 link
2024-08-29 VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation Shiwei Wu et.al. 2408.16730 null
2024-08-29 Space3D-Bench: Spatial 3D Question Answering Benchmark Emilia Szymanska et.al. 2408.16662 null
2024-08-29 DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving Yongjie Fu et.al. 2408.16647 null
2024-08-29 Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning Zhengqing Gao et.al. 2408.16486 link
2024-08-29 Text-Enhanced Zero-Shot Action Recognition: A training-free approach Massimo Bosetti et.al. 2408.16412 null
2024-08-29 Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models Kengo Nakata et.al. 2408.16296 null
2024-08-29 Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation Vivek Myers et.al. 2408.16228 null
2024-08-30 LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models Jingyi Wang et.al. 2408.16224 null
2024-08-28 VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images M. Maruf et.al. 2408.16176 link
2024-08-28 Visual Prompt Engineering for Medical Vision Language Models in Radiology Stefan Denner et.al. 2408.15802 null
2024-08-28 Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Bianca Lamm et.al. 2408.15626 null
2024-08-28 Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Wei Chen et.al. 2408.15518 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-28 VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities Shusaku Egami et.al. 2408.14895 link
2024-08-27 HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling Yubin Wang et.al. 2408.14812 null
2024-08-27 MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation Yuanbing Zhu et.al. 2408.14776 null
2024-08-27 RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models Junyao Ge et.al. 2408.14744 link
2024-08-27 Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild Tianqi Wei et.al. 2408.14723 null
2024-08-26 Social perception of faces in a vision-language model Carina I. Hausladen et.al. 2408.14435 link
2024-08-26 More Pictures Say More: Visual Intersection Network for Open Set Object Detection Bingcheng Dong et.al. 2408.14032 null
2024-08-26 Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models Shuai Fu et.al. 2408.13979 link
2024-08-25 LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task Ali Asgarov et.al. 2408.13909 link
2024-08-25 Evaluating Attribute Comprehension in Large Vision-Language Models Haiwen Zhang et.al. 2408.13898 link
2024-08-23 VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models Purushothaman Natarajan et.al. 2408.12808 link
2024-08-23 Cap2Sum: Learning to Summarize Videos by Generating Captions Cairong Zhao et.al. 2408.12800 null
2024-08-22 Building and better understanding vision-language models: insights and future directions Hugo Laurençon et.al. 2408.12637 null
2024-08-22 Adapt CLIP as Aggregation Instructor for Image Dehazing Xiaozhe Zhang et.al. 2408.12317 null
2024-08-22 TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model Yuhao Wang et.al. 2408.12141 null
2024-08-23 SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models Youngjoon Yu et.al. 2408.12114 link
2024-08-22 RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data Chenglong Wang et.al. 2408.12109 link
2024-08-22 DH-Bench: Probing Depth and Height Perception of Large Visual-Language Models Shehreen Azad et.al. 2408.11748 link
2024-08-21 CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering Yuliang Cai et.al. 2408.11742 link
2024-08-21 MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning Minghao Han et.al. 2408.11505 link
2024-08-21 Enabling Small Models for Zero-Shot Classification through Model Label Learning Jia Zhang et.al. 2408.11449 null
2024-08-21 Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models Kento Kawaharazuka et.al. 2408.11380 null
2024-08-21 Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework Xiao Han et.al. 2408.11312 null
2024-08-21 UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation Xiangyu Zhao et.al. 2408.11305 link
2024-08-21 Making Large Vision Language Models to be Good Few-shot Learners Fan Liu et.al. 2408.11297 null
2024-08-21 Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models Yunpu Zhao et.al. 2408.11261 null
2024-08-20 HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Kazi Hasan Ibn Arif et.al. 2408.10945 link
2024-08-21 V-RoAst: A New Dataset for Visual Road Assessment Natchapon Jongwiriyanurak et.al. 2408.10872 link
2024-08-20 TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning Bin Wang et.al. 2408.10688 link
2024-08-20 MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval Haoran Tang et.al. 2408.10575 link
2024-08-19 CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs Yassine Ouali et.al. 2408.10433 null
2024-08-19 SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP Yusuke Hirota et.al. 2408.10202 null
2024-08-21 LongVILA: Scaling Long-Context Visual Language Models for Long Videos Fuzhao Xue et.al. 2408.10188 link
2024-08-19 Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype Yadong Lu et.al. 2408.09984 null
2024-08-19 Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit Qizhou Chen et.al. 2408.09916 link
2024-08-19 Cross-composition Feature Disentanglement for Compositional Zero-shot Learning Yuxia Geng et.al. 2408.09786 null
2024-08-19 MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model Xinyang Wang et.al. 2408.09706 null
2024-08-18 PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding Dawei Dai et.al. 2408.09530 link
2024-08-18 Image-Based Geolocation Using Large Vision-Language Models Yi Liu et.al. 2408.09474 null
2024-08-17 V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models Junwei You et.al. 2408.09251 null
2024-08-16 DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models Eman Ali et.al. 2408.08855 null
2024-08-16 Beyond the Hype: A dispassionate look at vision-language models in medical scenario Yang Nan et.al. 2408.08704 null
2024-08-16 TextCAVs: Debugging vision models using text Angus Nicolson et.al. 2408.08652 link
2024-08-16 \textit{MMJ-Bench}: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models Fenghua Weng et.al. 2408.08464 link
2024-08-15 Penny-Wise and Pound-Foolish in Deepfake Detection Yabin Wang et.al. 2408.08412 link
2024-08-15 Level Up Your Tutorials: VLMs for Game Tutorials Quality Assessment Daniele Rege Cambrin et.al. 2408.08396 link
2024-08-15 Towards Flexible Visual Relationship Segmentation Fangrui Zhu et.al. 2408.08305 null
2024-08-14 Cropper: Vision-Language Model for Image Cropping through In-Context Learning Seung Hyun Lee et.al. 2408.07790 null
2024-08-14 Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach Shizhou Zhang et.al. 2408.07500 link
2024-08-13 Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces Zhiling Chen et.al. 2408.07146 null
2024-08-13 Do Vision-Language Foundational models show Robust Visual Perception? Shivam Chandhok et.al. 2408.06781 link
2024-08-13 Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities Shivam Chandhok et.al. 2408.06721 null
2024-08-13 IFShip: A Large Vision-Language Model for Interpretable Fine-grained Ship Classification via Domain Knowledge-Enhanced Instruction Tuning Mingning Guo et.al. 2408.06631 null
2024-08-13 ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding Yubin Wang et.al. 2408.06622 null
2024-08-12 Long-Form Answers to Visual Questions from Blind and Low Vision People Mina Huh et.al. 2408.06303 null
2024-08-12 OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning Mushui Liu et.al. 2408.06158 link
2024-08-12 Adapting a Foundation Model for Space-based Tasks Matthew Foutter et.al. 2408.05924 null
2024-08-13 Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts Peng Wu et.al. 2408.05905 null
2024-08-12 GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models Zixuan Wu et.al. 2408.05894 link
2024-08-11 Efficient Test-Time Prompt Tuning for Vision-Language Models Yuhan Zhu et.al. 2408.05775 null
2024-08-11 Reference-free Hallucination Detection for Large Vision-Language Models Qing Li et.al. 2408.05767 null
2024-08-11 Decoder Pre-Training with only Text for Scene Text Recognition Shuai Zhao et.al. 2408.05706 link
2024-08-09 Hyperbolic Learning with Multimodal Large Language Models Paolo Mandica et.al. 2408.05097 null
2024-08-09 Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model Jaehyuk Heo et.al. 2408.04917 link
2024-08-09 VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving Keke Long et.al. 2408.04821 null
2024-08-09 UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Haider Al-Tahan et.al. 2408.04810 link
2024-08-07 Prompt and Prejudice Lorenzo Berlincioni et.al. 2408.04671 null
2024-08-07 ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling William Y. Zhu et.al. 2408.04102 link
2024-08-07 How Well Can Vision Language Models See Image Details? Chenhui Gou et.al. 2408.03940 null
2024-08-07 Target Prompting for Information Extraction with Vision Language Model Dipankar Medhi et.al. 2408.03834 null
2024-08-07 Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Zilyu Ye et.al. 2408.03695 link
2024-08-07 Teach CLIP to Develop a Number Sense for Ordinal Regression Yao Du et.al. 2408.03574 link
2024-08-07 Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection Subaru Kimura et.al. 2408.03554 null
2024-08-09 GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Pengcheng Chen et.al. 2408.03361 link
2024-08-06 Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization Yanghai Zhang et.al. 2408.03149 link
2024-08-05 Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services Shaopeng Fu et.al. 2408.02814 link
2024-08-05 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Fanqing Meng et.al. 2408.02718 null
2024-08-07 TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments Daeun Song et.al. 2408.02454 null
2024-08-05 Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs Jeongkee Lim et.al. 2408.02261 link
2024-08-05 Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets Lucas Choi et.al. 2408.02244 null
2024-08-05 REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Agneet Chatterjee et.al. 2408.02231 null
2024-08-04 Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models Fushuo Huo et.al. 2408.02032 link
2024-08-04 AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis Townim F. Chowdhury et.al. 2408.02001 link
2024-08-04 Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI Robert Wolfe et.al. 2408.01959 null
2024-08-04 Visual Grounding for Object-Level Generalization in Reinforcement Learning Haobin Jiang et.al. 2408.01942 link
2024-08-03 Is Generative Communication between Embodied Agents Good for Zero-Shot ObjectNav? Vishnu Sashank Dorbala et.al. 2408.01877 null
2024-08-03 Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis Hiroshi Takato et.al. 2408.01682 null
2024-08-02 Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation Jheng-Hong Yang et.al. 2408.01363 null
2024-08-02 The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models Simone Caldarella et.al. 2408.01228 null
2024-08-01 Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper) Bin Han et.al. 2408.00932 null
2024-08-01 Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Siyu Jiao et.al. 2408.00744 link
2024-08-01 ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh et.al. 2408.00672 null
2024-08-01 Are Bigger Encoders Always Better in Vision Large Models? Bozhou Li et.al. 2408.00620 null
2024-08-01 Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation Xiaoye Qu et.al. 2408.00555 null
2024-08-01 Mitigating Multilingual Hallucination in Large Vision-Language Models Xiaoye Qu et.al. 2408.00550 link
2024-08-01 Jailbreaking Text-to-Image Models with LLM-Based Agents Yingkai Dong et.al. 2408.00523 null
2024-08-01 DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation Rakshith Subramanyam et.al. 2408.00331 link
2024-08-01 OmniParser for Pure Vision Based GUI Agent Yadong Lu et.al. 2408.00203 null
2024-07-31 Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Atsuyuki Miyai et.al. 2407.21794 null
2024-07-31 Vision-Language Model Based Handwriting Verification Mihir Chauhan et.al. 2407.21788 null
2024-07-31 Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs Shi Liu et.al. 2407.21771 null
2024-07-31 Open-Vocabulary Audio-Visual Semantic Segmentation Ruohao Guo et.al. 2407.21721 null
2024-08-01 Defending Jailbreak Attack in VLMs via Cross-modality Information Detector Yue Xu et.al. 2407.21659 link
2024-07-31 MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment Anurag Das et.al. 2407.21654 null
2024-07-31 Conditioned Prompt-Optimization for Continual Deepfake Detection Francesco Laiti et.al. 2407.21554 link
2024-07-31 MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection Kuo Wang et.al. 2407.21465 link
2024-07-31 Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering Danfeng Guo et.al. 2407.21368 null
2024-07-31 SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving Peiru Zheng et.al. 2407.21293 null
2024-07-30 GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models Ali Abdollahi et.al. 2407.21001 link
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor Huiyu Duan et.al. 2407.20928 link
2024-07-30 SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition Hao Tan et.al. 2407.20920 null
2024-07-30 Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Norman Di Palo et.al. 2407.20798 null
2024-07-30 OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance Yongqiang Yao et.al. 2407.20761 link
2024-07-30 SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models Zheng Liu et.al. 2407.20756 link
2024-07-30 Autonomous Improvement of Instruction Following Skills via Foundation Models Zhiyuan Zhou et.al. 2407.20635 null
2024-07-29 FlexAttention for Efficient High-Resolution Vision-Language Models Junyan Li et.al. 2407.20228 null
2024-07-29 Normality Addition via Normality Detection in Industrial Image Anomaly Detection Models Jihun Yi et.al. 2407.19849 null
2024-07-29 Harnessing Large Vision and Language Models in Agriculture: A Review Hongyan Zhu et.al. 2407.19679 null
2024-07-27 GP-VLS: A general-purpose vision language model for surgery Samuel Schmidgall et.al. 2407.19305 null
2024-07-26 Solving Robotics Problems in Zero-Shot with Vision-Language Models Zidan Wang et.al. 2407.19094 null
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908 null
2024-07-25 UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models Xinyu Pi et.al. 2407.18391 null
2024-07-25 $\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs Vlad Sobal et.al. 2407.18134 null
2024-07-25 Efficient Inference of Vision Instruction-Following Models with Elastic Cache Zuyan Liu et.al. 2407.18121 link
2024-07-25 Cost-effective Instruction Learning for Pathology Vision and Language Analysis Kaitao Chen et.al. 2407.17734 link
2024-07-24 DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation Qian Feng et.al. 2407.17348 null
2024-07-26 Selective Vision-Language Subspace Projection for Few-shot CLIP Xingyu Zhu et.al. 2407.16977 link
2024-07-23 Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions Kai Liu et.al. 2407.16725 link
2024-07-23 Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models Aristeidis Panos et.al. 2407.16526 null
2024-07-23 Cross Anything: General Quadruped Robot Navigation through Complex Terrains Shaoting Zhu et.al. 2407.16412 null
2024-07-22 Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models Raza Imam et.al. 2407.15913 link
2024-07-22 AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection Yunkang Cao et.al. 2407.15795 link
2024-07-22 CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning Emanuele Frascaroli et.al. 2407.15793 link
2024-07-22 Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels Zhuorui Ye et.al. 2407.15786 null
2024-07-22 Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders Laura Niss et.al. 2407.15731 null
2024-07-23 SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection Dimitrios Kollias et.al. 2407.15728 null
2024-07-22 HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning Zhecan Wang et.al. 2407.15680 link
2024-07-22 In-Context Learning Improves Compositional Understanding of Vision-Language Models Matteo Nulli et.al. 2407.15487 link
2024-07-22 WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding Quan Kong et.al. 2407.15350 null
2024-07-21 Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective Mariya Hendriksen et.al. 2407.15239 null
2024-07-21 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer et.al. 2407.15211 null
2024-07-19 DEAL: Disentangle and Localize Concept-level Explanations for VLMs Tang Li et.al. 2407.14412 link
2024-07-19 Multimodal Misinformation Detection using Large Vision-Language Models Sahar Tahmasebi et.al. 2407.14321 null
2024-07-19 Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models Dionis Totsila et.al. 2407.14229 link
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-19 Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition Rui Zhang et.al. 2407.14146 null
2024-07-19 Multi-modal Relation Distillation for Unified 3D Representation Learning Huiqun Wang et.al. 2407.14007 null
2024-07-18 Simultaneous Localization and Affordance Prediction for Tasks in Egocentric Video Zachary Chavis et.al. 2407.13856 null
2024-07-18 Which objects help me to act effectively? Reasoning about physically-grounded affordances Anne Kemmeren et.al. 2407.13811 null
2024-07-18 BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models Moon Ye-Bin et.al. 2407.13442 null
2024-07-18 Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction Gertjan Burghouts et.al. 2407.13368 null
2024-07-17 R+X: Retrieval and Execution from Everyday Human Videos Georgios Papagiannis et.al. 2407.12957 null
2024-07-17 ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference Mengcheng Lan et.al. 2407.12442 null
2024-07-17 NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models Gengze Zhou et.al. 2407.12366 link
2024-07-17 VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions Seokha Moon et.al. 2407.12345 null
2024-07-17 ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map Yilin Ye et.al. 2407.12315 link
2024-07-17 VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation Zhen Qu et.al. 2407.12276 link
2024-07-16 XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach Truong Thanh Hung Nguyen et.al. 2407.11771 link
2024-07-16 VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Haodong Duan et.al. 2407.11691 link
2024-07-16 FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Pengxiang Li et.al. 2407.11522 null
2024-07-16 Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation Shijie Chang et.al. 2407.11503 null
2024-07-16 Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models Jinrui Zhang et.al. 2407.11422 null
2024-07-16 Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain Hyeon Bae Kim et.al. 2407.11375 link
2024-07-16 Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities Xu Zheng et.al. 2407.11351 null
2024-07-16 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction Penghui Du et.al. 2407.11335 link
2024-07-16 Large Vision-Language Models as Emotion Recognizers in Context Awareness Yuxuan Lei et.al. 2407.11300 null
2024-07-15 Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques Rishika Bhagwatkar et.al. 2407.11121 null
2024-07-15 Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Ruisheng Cao et.al. 2407.10956 link
2024-07-15 Benchmarking Vision Language Models for Cultural Understanding Shravan Nayak et.al. 2407.10920 null
2024-07-15 GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM Keshav Bimbraw et.al. 2407.10870 null
2024-07-15 Physics-Inspired Generative Models in Medical Imaging: A Review Dennis Hein et.al. 2407.10856 null
2024-07-15 Quantized Prompt for Efficient Generalization of Vision-Language Models Tianxiang Hao et.al. 2407.10704 link
2024-07-15 OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer Yu Wang et.al. 2407.10655 link
2024-07-15 NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models Pranshu Pandya et.al. 2407.10380 null
2024-07-14 Affordance-Guided Reinforcement Learning via Visual Prompting Olivia Y. Lee et.al. 2407.10341 null
2024-07-13 VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation Wentao Zhao et.al. 2407.09829 link
2024-07-13 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance Xiaoxu Xu et.al. 2407.09826 link
2024-07-12 Open Vocabulary Multi-Label Video Classification Rohit Gupta et.al. 2407.09073 null
2024-07-12 Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing Jun Zhu et.al. 2407.09053 link
2024-07-12 Textual Query-Driven Mask Transformer for Domain Generalized Segmentation Byeonghyun Pak et.al. 2407.09033 link
2024-07-12 OVExp: Open Vocabulary Exploration for Object-Oriented Navigation Meng Wei et.al. 2407.09016 null
2024-07-12 LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models Yabin Zhang et.al. 2407.08966 link
2024-07-11 Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design Jingyi Xie et.al. 2407.08882 null
2024-07-11 CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting Naman Sharma et.al. 2407.08811 null
2024-07-11 Extracting Training Data from Document-Based VQA Models Francesco Pinto et.al. 2407.08707 null
2024-07-11 HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models Runhui Huang et.al. 2407.08706 null
2024-07-12 Robotic Control via Embodied Chain-of-Thought Reasoning Michał Zawalski et.al. 2407.08693 null
2024-07-11 NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning Yi Zhang et.al. 2407.08672 null
2024-07-11 Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement Zijie Yue et.al. 2407.08507 null
2024-07-11 Specialist vision-language models for clinical ophthalmology Robbie Holland et.al. 2407.08410 link
2024-07-11 Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization Jinlong Li et.al. 2407.08374 null
2024-07-11 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation Tong Shao et.al. 2407.08268 link
2024-07-11 AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization Shixiong Xu et.al. 2407.08156 link
2024-07-11 Live Fitness Coaching as a Testbed for Situated Interaction Sunny Panchal et.al. 2407.08101 link
2024-07-10 Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison Qian Yang et.al. 2407.07840 null
2024-07-10 Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs Hao-Tien Lewis Chiang et.al. 2407.07775 null
2024-07-10 PaliGemma: A versatile 3B VLM for transfer Lucas Beyer et.al. 2407.07726 link
2024-07-11 Tuning Vision-Language Models with Candidate Labels by Prompt Alignment Zhifang Zhang et.al. 2407.07638 null
2024-07-10 IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model Yatai Ji et.al. 2407.07577 link
2024-07-10 A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends Daizong Liu et.al. 2407.07403 link
2024-07-10 Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems Chashi Mahiul Islam et.al. 2407.07392 null
2024-07-10 Towards a text-based quantitative and explainable histopathology image analysis Anh Tien Nguyen et.al. 2407.07360 link
2024-07-10 CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Raza Imam et.al. 2407.07315 null
2024-07-09 Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization Jeongseok Hyun et.al. 2407.07024 link
2024-07-09 CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection Shuang Hao et.al. 2407.06780 link
2024-07-09 LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition Teng Wang et.al. 2407.06730 null
2024-07-09 Vision language models are blind Pooyan Rahmanzadehgervi et.al. 2407.06581 link
2024-07-08 A Single Transformer for Scalable Vision-Language Modeling Yangyi Chen et.al. 2407.06438 link
2024-07-08 Multi-Object Hallucination in Vision-Language Models Xuweiyi Chen et.al. 2407.06192 link
2024-07-08 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Orr Zohar et.al. 2407.06189 link
2024-07-08 Vision-Language Models under Cultural and Inclusive Considerations Antonia Karamolegkou et.al. 2407.06177 null
2024-07-08 Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding Aaron Lohner et.al. 2407.05910 null
2024-07-09 HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels Yingying Jiang et.al. 2407.05795 null
2024-07-08 OneDiff: A Generalist Model for Image Difference Erdong Hu et.al. 2407.05645 null
2024-07-07 Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models Longxiang Tang et.al. 2407.05342 link
2024-07-07 WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks Léo Boisvert et.al. 2407.05291 link
2024-07-07 Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image Pengkun Jiao et.al. 2407.05256 null
2024-07-06 FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding Huitong Pan et.al. 2407.05183 link
2024-07-05 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation Yuhan Zhu et.al. 2407.04603 link
2024-07-05 Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model Duy M. H. Nguyen et.al. 2407.04489 null
2024-07-05 Smart Vision-Language Reasoners Denisa Roberts et.al. 2407.04212 link
2024-07-04 VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation I-Chun Arthur Liu et.al. 2407.04152 link
2024-07-04 MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis Asma Alkhaldi et.al. 2407.04106 link
2024-07-04 Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners Mushui Liu et.al. 2407.04003 null
2024-07-04 Concept Bottleneck Models Without Predefined Concepts Simon Schrodi et.al. 2407.03921 null
2024-07-04 Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning Thong Nguyen et.al. 2407.03788 link
2024-07-04 Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models Chang-Sheng Kao et.al. 2407.03615 link
2024-07-04 Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations Zhiyang Xu et.al. 2407.03604 null
2024-07-03 InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Pan Zhang et.al. 2407.03320 link
2024-07-03 BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations Zhantao Yang et.al. 2407.03314 null
2024-07-03 Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation Marco Mistretta et.al. 2407.03056 link
2024-07-03 SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning Bac Nguyen et.al. 2407.03036 null
2024-07-03 VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values Zhe Hu et.al. 2407.03000 null
2024-07-03 Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective Zhaotian Weng et.al. 2407.02814 null
2024-07-03 MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context Zishan Gu et.al. 2407.02730 link
2024-07-02 Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models Xu Han et.al. 2407.02716 null
2024-07-02 Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models Annie S. Chen et.al. 2407.02666 null
2024-07-02 Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models Joan Nwatu et.al. 2407.02623 link
2024-07-02 Conceptual Codebook Learning for Vision-Language Models Yi Zhang et.al. 2407.02350 null
2024-07-02 Why do LLaVA Vision-Language Models Reply to Images in English? Musashi Hinck et.al. 2407.02333 null
2024-07-02 Multi-Modal Video Dialog State Tracking in the Wild Adnen Abdessaied et.al. 2407.02218 null
2024-07-02 BiasDora: Exploring Hidden Biased Associations in Vision-Language Models Chahat Raj et.al. 2407.02066 link
2024-07-02 Fake News Detection and Manipulation Reasoning via Large Vision-Language Models Ruihan Jin et.al. 2407.02042 null
2024-07-03 ViG-Bias: Visually Grounded Bias Discovery and Mitigation Badr-Eddine Marani et.al. 2407.01996 link
2024-07-02 SADL: An Effective In-Context Learning Method for Compositional Visual QA Long Hoang Dang et.al. 2407.01983 null
2024-07-02 VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs Qiucheng Wu et.al. 2407.01863 link
2024-07-01 CLIP the Divergence: Language-guided Unsupervised Domain Adaptation Jinjing Zhu et.al. 2407.01842 null
2024-07-01 μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Alejandro Lozano et.al. 2407.01791 link
2024-06-28 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li et.al. 2406.20095 link
2024-06-28 EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Yuxuan Zhang et.al. 2406.20076 link
2024-06-28 STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Guohao Sun et.al. 2406.19973 link
2024-06-28 From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis Chuanqi Cheng et.al. 2406.19934 link
2024-06-28 Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang et.al. 2406.19905 link
2024-06-27 PathAlign: A vision-language model for whole slide images in histopathology Faruk Ahmed et.al. 2406.19578 null
2024-06-27 RAVEN: Multitask Retrieval Augmented Vision-Language Learning Varun Nagaraj Rao et.al. 2406.19150 null
2024-06-27 CELLO: Causal Evaluation of Large Vision-Language Models Meiqi Chen et.al. 2406.19131 link
2024-06-27 Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis Yibo Gao et.al. 2406.19130 link
2024-06-27 RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton Fanfan Liu et.al. 2406.18977 link
2024-06-28 Manipulate-Anything: Automating Real-World Robots using Vision-Language Models Jiafei Duan et.al. 2406.18915 null
2024-06-27 Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models Yicheng Xu et.al. 2406.18868 link
2024-06-27 Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs Jie Zhang et.al. 2406.18849 link
2024-06-28 Revisiting Backdoor Attacks against Large Vision-Language Models Siyuan Liang et.al. 2406.18844 null
2024-06-26 MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data William Berman et.al. 2406.18790 null
2024-06-26 3D Feature Distillation with Object-Centric Priors Georgios Tziafas et.al. 2406.18742 null
2024-06-26 Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme Pi-Wei Chen et.al. 2406.18197 null
2024-06-26 Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation Qilai Zhang et.al. 2406.18054 link
2024-06-26 Multimodal foundation world models for generalist embodied agents Pietro Mazzaglia et.al. 2406.18043 link
2024-06-25 Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts Xuyang Wu et.al. 2406.17974 link
2024-06-25 EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data Jesse Zhang et.al. 2406.17768 null
2024-06-25 DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning Xiaohan Zhang et.al. 2406.17659 null
2024-06-24 Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models Bei Yan et.al. 2406.17115 link
2024-06-24 Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts Aditya Sharma et.al. 2406.16851 null
2024-06-24 ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance Shuwei Shi et.al. 2406.16476 null
2024-06-24 Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration Yujin Baek et.al. 2406.16469 null
2024-06-24 Evaluating and Analyzing Relationship Hallucinations in LVLMs Mingrui Wu et.al. 2406.16449 link
2024-06-24 High-resolution open-vocabulary object 6D pose estimation Jaime Corsetti et.al. 2406.16384 null
2024-06-24 What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation Michal Golovanevsky et.al. 2406.16320 link
2024-06-23 Review of Zero-Shot and Few-Shot AI Algorithms in The Medical Domain Maged Badawi et.al. 2406.16143 null
2024-06-22 TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM Wenxue Li et.al. 2406.15764 link
2024-06-21 Open-vocabulary Pick and Place via Patch-level Semantic Maps Mingxi Jia et.al. 2406.15677 null
2024-06-21 DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection Jia Syuen Lim et.al. 2406.14924 null
2024-06-21 Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models Jiayu Wang et.al. 2406.14852 link
2024-06-20 ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Gabriel Sarch et.al. 2406.14596 null
2024-06-20 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Yuxuan Qiao et.al. 2406.14544 link
2024-06-20 MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang et.al. 2406.14515 link
2024-06-20 African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification Gregor Geigle et.al. 2406.14496 link
2024-06-20 Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? Gregor Geigle et.al. 2406.14492 null
2024-06-20 Revealing Vision-Language Integration in the Brain with Multimodal Networks Vighnesh Subramaniam et.al. 2406.14481 link
2024-06-20 VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model Jie Zhang et.al. 2406.14194 link
2024-06-20 MACAROON: Training Vision-Language Models To Be Your Engaged Partners Shujin Wu et.al. 2406.14137 link
2024-06-21 VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning Ziyang Meng et.al. 2406.14056 link
2024-06-20 From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment Yusuke Hirota et.al. 2406.13912 null
2024-06-19 WATT: Weight Average Test-Time Adaption of CLIP David Osowiechi et.al. 2406.13875 link
2024-06-18 AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention Wenbin An et.al. 2406.12718 link
2024-06-18 Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? Mingqian Feng et.al. 2406.12663 null
2024-06-18 Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jiang-Xin Shi et.al. 2406.12638 link
2024-06-18 VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Xiang Li et.al. 2406.12384 link
2024-06-18 VoCo-LLaMA: Towards Vision Compression with Large Language Models Xubing Ye et.al. 2406.12275 link
2024-06-18 The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge Hongpeng Pan et.al. 2406.12225 null
2024-06-17 SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model Yongting Zhang et.al. 2406.12030 link
2024-06-17 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Ziyu Liu et.al. 2406.11833 link
2024-06-17 Unveiling Encoder-Free Vision-Language Models Haiwen Diao et.al. 2406.11832 link
2024-06-17 On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning Geewook Kim et.al. 2406.11823 link
2024-06-17 See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding Amith Ananthram et.al. 2406.11665 link
2024-06-18 MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More Yue Jiang et.al. 2406.11451 null
2024-06-17 They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias Salma Abdel Magid et.al. 2406.11331 null
2024-06-17 GUICourse: From General Vision Language Models to Versatile GUI Agents Wentong Chen et.al. 2406.11317 link
2024-06-18 BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models Xuefeng Hu et.al. 2406.11309 null
2024-06-17 MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models Shengkang Wang et.al. 2406.11288 link
2024-06-17 Unifying Multimodal Retrieval via Document Screenshot Embedding Xueguang Ma et.al. 2406.11251 null
2024-06-14 Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding Ridouane Ghermi et.al. 2406.10221 link
2024-06-14 DevBench: A multimodal developmental benchmark for language learning Alvin Wei Ming Tan et.al. 2406.10215 link
2024-06-14 Detecting and Evaluating Medical Hallucinations in Large Vision Language Models Jiawei Chen et.al. 2406.10185 null
2024-06-14 CarLLaVA: Vision language models for camera-only closed-loop driving Katrin Renz et.al. 2406.10165 null
2024-06-14 RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model Hantao Zhou et.al. 2406.10157 null
2024-06-14 Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning Xiaowen Sun et.al. 2406.09988 link
2024-06-14 Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment Fei Zhou et.al. 2406.09858 null
2024-06-14 Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps Jian Chen et.al. 2406.09838 link
2024-06-14 Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting Ce Hao et.al. 2406.09767 null
2024-06-13 Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA Jongwoo Park et.al. 2406.09396 link
2024-06-13 Enhancing Domain Adaptation through Prompt Gradient Alignment Hoang Phan et.al. 2406.09353 link
2024-06-13 AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models Yuhang Wu et.al. 2406.09295 null
2024-06-13 MirrorCheck: Efficient Adversarial Defense for Vision-Language Models Samar Fares et.al. 2406.09250 null
2024-06-13 Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model Melvin Wong et.al. 2406.09143 null
2024-06-13 INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance Chenwei Lin et.al. 2406.09105 link
2024-06-13 How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models Tarun Khajuria et.al. 2406.09067 null
2024-06-13 Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning Huy Hoang Nguyen et.al. 2406.09039 null
2024-06-13 Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency Maor Dikter et.al. 2406.08840 link
2024-06-13 MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs Xuannan Liu et.al. 2406.08772 null
2024-06-12 What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li et.al. 2406.08478 null
2024-06-12 AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind Wei Ding et.al. 2406.08455 null
2024-06-12 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs Irene Huang et.al. 2406.08164 link
2024-06-12 Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models Shimin Chen et.al. 2406.08024 null
2024-06-13 A3VLM: Actionable Articulation-Aware Vision Language Model Siyuan Huang et.al. 2406.07549 link
2024-06-11 Let Go of Your Labels with Unsupervised Transfer Artyom Gadetsky et.al. 2406.07236 link
2024-06-11 FaceGPT: Self-supervised Learning to Chat about 3D Human Faces Haoran Wang et.al. 2406.07163 null
2024-06-11 Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph Sergey Linok et.al. 2406.07113 null
2024-06-11 UVIS: Unsupervised Video Instance Segmentation Shuaiyi Huang et.al. 2406.06908 null
2024-06-10 Merlin: A Vision Language Foundation Model for 3D Computed Tomography Louis Blankemeier et.al. 2406.06512 null
2024-06-10 Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation Oishi Banerjee et.al. 2406.06496 null
2024-06-10 VCR: Visual Caption Restoration Tianyu Zhang et.al. 2406.06462 link
2024-06-10 Data Augmentation in Earth Observation: A Diffusion Model Approach Tiago Sousa et.al. 2406.06218 null
2024-06-10 CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models Peng Xia et.al. 2406.06007 link
2024-06-10 CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark David Romero et.al. 2406.05967 null
2024-06-09 EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models Mengfei Du et.al. 2406.05756 link
2024-06-09 ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition Sanjoy Kundu et.al. 2406.05722 null
2024-06-08 Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification Yunhe Gao et.al. 2406.05596 null
2024-06-08 Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models Minho Park et.al. 2406.05432 link
2024-06-07 3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation Feiyu Pan et.al. 2406.04842 null
2024-06-07 OVMR: Open-Vocabulary Recognition with Multi-Modal References Zehong Ma et.al. 2406.04675 link
2024-06-06 Evaluating Large Vision-Language Models' Understanding of Real-World Complexities Through Synthetic Benchmarks Haokun Zhou et.al. 2406.04470 null
2024-06-06 Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning Amandeep Kumar et.al. 2406.04413 link
2024-06-06 VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Junjie Zhou et.al. 2406.04292 link
2024-06-06 Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt Zonghao Ying et.al. 2406.04031 link
2024-06-06 Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following Anshul Gupta et.al. 2406.03907 null
2024-06-06 VisLTR: Visualization-in-the-Loop Table Reasoning Jianing Hao et.al. 2406.03753 null
2024-06-05 CountCLIP -- [Re] Teaching CLIP to Count to Ten Harshvardhan Mestha et.al. 2406.03586 link
2024-06-05 Exploiting LMM-based knowledge for image classification tasks Maria Tzelepi et.al. 2406.03071 null
2024-06-05 Balancing Performance and Efficiency in Zero-shot Robotic Navigation Dmytro Kuzmenko et.al. 2406.03015 null
2024-06-05 Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models Jinhao Li et.al. 2406.02915 link
2024-06-04 LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery Samuel Scheele et.al. 2406.02780 link
2024-06-04 TopViewRS: Vision-Language Models as Top-View Spatial Reasoners Chengzu Li et.al. 2406.02537 link
2024-06-04 On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept Guangliang Liu et.al. 2406.02378 null
2024-06-04 Radar Spectra-Language Model for Automotive Scene Parsing Mariia Pushkareva et.al. 2406.02158 null
2024-06-04 HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model Yu Tian et.al. 2406.01914 null
2024-06-03 Boosting Vision-Language Models with Transduction Maxime Zanella et.al. 2406.01837 link
2024-06-03 SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model An-Chieh Cheng et.al. 2406.01584 null
2024-06-03 SLANT: Spurious Logo ANalysis Toolkit Maan Qraitem et.al. 2406.01449 null
2024-06-03 ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models Thanh-Dat Truong et.al. 2406.01432 null
2024-06-03 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding Thanh-Dat Truong et.al. 2406.01429 null
2024-06-03 TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy Weichao Zhao et.al. 2406.01326 link
2024-06-04 StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond Pengyuan Lyu et.al. 2405.21013 null
2024-05-31 Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning Cheng Tan et.al. 2405.20834 null
2024-05-31 InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding Huaxiang Zhang et.al. 2405.20795 null
2024-05-31 Information Theoretic Text-to-Image Alignment Chao Wang et.al. 2405.20759 null
2024-05-31 Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images Mansi Kakkar et.al. 2405.20735 null
2024-05-30 Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals Phillip Howard et.al. 2405.20152 null
2024-05-30 OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation Gonca Yilmaz et.al. 2405.20141 null
2024-05-30 Enhancing Large Vision Language Models with Self-Training on Image Comprehension Yihe Deng et.al. 2405.19716 link
2024-05-30 Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training Aisha Urooj Khan et.al. 2405.19675 null
2024-05-29 Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding Shenghuan Sun et.al. 2405.19567 null
2024-05-29 CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients Pierre Chambon et.al. 2405.19538 link
2024-05-29 Evaluating Vision-Language Models on Bistable Images Artemis Panagopoulou et.al. 2405.19423 link
2024-05-29 Video Anomaly Detection in 10 Years: A Survey and Outlook Moshira Abdalla et.al. 2405.19387 null
2024-05-29 Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-05-29 Matryoshka Query Transformer for Large Vision-Language Models Wenbo Hu et.al. 2405.19315 link
2024-05-29 MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification Laura Fieback et.al. 2405.19186 null
2024-05-29 I Bet You Did Not Mean That: Testing Semantic Importance via Betting Jacopo Teneggi et.al. 2405.19146 link
2024-05-29 ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs Omar Moured et.al. 2405.19117 link
2024-05-29 Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer Zengqun Zhao et.al. 2405.19100 link
2024-05-29 Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior Shuyu Cheng et.al. 2405.19098 link
2024-05-30 Benchmarking and Improving Detail Image Caption Hongyuan Dong et.al. 2405.19092 link
2024-05-29 Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions Zhe Hu et.al. 2405.19088 null
2024-05-29 Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design Markus J. Buehler et.al. 2405.19076 link
2024-05-28 WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization Jiawei Ma et.al. 2405.18405 null
2024-05-28 Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? Yifan Bai et.al. 2405.18361 null
2024-05-28 Frustratingly Easy Test-Time Adaptation of Vision-Language Models Matteo Farina et.al. 2405.18330 link
2024-05-28 White-box Multimodal Jailbreaks Against Large Vision-Language Models Ruofan Wang et.al. 2405.17894 link
2024-05-28 Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment Xin Xiao et.al. 2405.17871 link
2024-05-28 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs Sangmin Woo et.al. 2405.17821 null
2024-05-28 Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models Sangmin Woo et.al. 2405.17820 null
2024-05-27 An Introduction to Vision-Language Modeling Florian Bordes et.al. 2405.17247 null
2024-05-27 Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View Jin Wang et.al. 2405.17201 null
2024-05-27 Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks Yunqi Zhang et.al. 2405.16860 link
2024-05-27 PromptFix: You Prompt and We Fix the Photo Yongsheng Yu et.al. 2405.16785 link
2024-05-25 Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities Shiyu Xia et.al. 2405.16234 null
2024-05-25 Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs Myong Chol Jung et.al. 2405.16091 null
2024-05-24 Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement Xiyao Wang et.al. 2405.15973 link
2024-05-24 Disease-informed Adaptation of Vision-Language Models Jiajin Zhang et.al. 2405.15728 link
2024-05-24 VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap Sreyan Ghosh et.al. 2405.15683 link
2024-05-24 Composed Image Retrieval for Remote Sensing Bill Psomas et.al. 2405.15587 link
2024-05-24 Open-Vocabulary SAM3D: Understand Any 3D Scene Hanchen Tai et.al. 2405.15580 null
2024-05-24 Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization Beitao Chen et.al. 2405.15356 link
2024-05-24 Learning Invariant Causal Mechanism from Vision-Language Models Zeen Song et.al. 2405.15289 null
2024-05-24 Learning from True-False Labels via Multi-modal Prompt Retrieving Zhongnian Li et.al. 2405.15228 link
2024-05-24 CLIP model is an Efficient Online Lifelong Learner Leyuan Wang et.al. 2405.15155 link
2024-05-23 Agentic Skill Discovery Xufeng Zhao et.al. 2405.15019 link
2024-05-23 A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models Mario Döbler et.al. 2405.14977 link
2024-05-23 PuzzleAvatar: Assembling 3D Avatars from Personal Albums Yuliang Xiu et.al. 2405.14869 link
2024-05-23 Designing A Sustainable Marine Debris Clean-up Framework without Human Labels Raymond Wang et.al. 2405.14815 link
2024-05-23 Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models Young Kyun Jang et.al. 2405.14715 null
2024-05-23 Calibrated Self-Rewarding Vision Language Models Yiyang Zhou et.al. 2405.14622 link
2024-05-23 UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge Chuanhao Li et.al. 2405.14554 null
2024-05-23 AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2 Simon Damm et.al. 2405.14529 link
2024-05-23 Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports Guangyu Guo et.al. 2405.14230 null
2024-05-23 Unveiling the Tapestry of Consistency in Large Vision-Language Models Yuan Zhang et.al. 2405.14156 link
2024-05-23 Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation Se-eun Yoon et.al. 2405.14142 null
2024-05-22 Refining Skewed Perceptions in Vision-Language Models through Visual Representations Haocheng Dai et.al. 2405.14030 null
2024-05-21 C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning Ji Ma et.al. 2405.12752 null
2024-05-21 EmoEdit: Evoking Emotions through Image Manipulation Jingyuan Yang et.al. 2405.12661 null
2024-05-22 Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography Shantanu Ghosh et.al. 2405.12255 link
2024-05-20 Rethinking Overlooked Aspects in Vision-Language Models Yuan Liu et.al. 2405.11850 null
2024-05-19 Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems Shengxiang Sun et.al. 2405.11629 null
2024-05-18 MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection Ximiao Zhang et.al. 2405.11315 link
2024-05-18 Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models Canshi Wei et.al. 2405.11301 null
2024-05-18 Revisiting the Robust Generalization of Adversarial Prompt Tuning Fan Yang et.al. 2405.11154 null
2024-05-18 Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions Junzhang Liu et.al. 2405.11145 null
2024-05-17 Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors Jiachen Sun et.al. 2405.10529 null
2024-05-16 Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees Yu Gui et.al. 2405.10301 link
2024-05-17 Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning Yuexiang Zhai et.al. 2405.10292 null
2024-05-16 FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models Adrian Bulat et.al. 2405.10286 null
2024-05-16 Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks João Bordalo et.al. 2405.10122 null
2024-05-16 Harmonizing Generalization and Personalization in Federated Prompt Learning Tianyu Cui et.al. 2405.09771 link
2024-05-17 SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge Andong Wang et.al. 2405.09713 null
2024-05-15 A Survey On Text-to-3D Contents Generation In The Wild Chenhan Jiang et.al. 2405.09431 null
2024-05-15 Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Wanting Xu et.al. 2405.09215 link
2024-05-14 Contextual Emotion Recognition using Large Vision Language Models Yasaman Etesam et.al. 2405.08992 null
2024-05-14 Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research Qinglong Cao et.al. 2405.08668 link
2024-05-14 Open-Vocabulary Object Detection via Neighboring Region Attention Alignment Sunyuan Qiang et.al. 2405.08593 null
2024-05-13 Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? Hari Chandana Kuchibhotla et.al. 2405.07921 null
2024-05-12 DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model Yang Jin et.al. 2405.07309 null
2024-05-11 TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt Xiangyu Wu et.al. 2405.06926 link
2024-05-10 Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark Evan M. Williams et.al. 2405.06634 link
2024-05-10 Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification Yaoqin Ye et.al. 2405.06468 link
2024-05-10 VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks Manish Dhakal et.al. 2405.06196 link
2024-05-09 Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control Gunshi Gupta et.al. 2405.05852 link
2024-05-09 Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media Zhizhen Zhang et.al. 2405.05760 null
2024-05-09 Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft Debabrata Pal et.al. 2405.05574 null
2024-05-08 THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models Prannay Kaul et.al. 2405.05256 null
2024-05-08 Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection Zhaoxiang Zhang et.al. 2405.04782 null
2024-05-08 Unveiling Disparities in Web Task Handling Between Human and Web Agent Kihoon Son et.al. 2405.04497 null
2024-05-07 Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks Georgios Pantazopoulos et.al. 2405.04403 link
2024-05-06 VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images Anna Penzkofer et.al. 2405.03852 null
2024-05-06 Knowledge-aware Text-Image Retrieval for Remote Sensing Images Li Mi et.al. 2405.03373 null
2024-05-06 Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval Jiacheng Cheng et.al. 2405.03190 null
2024-05-05 Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training Wenyu Zhang et.al. 2405.02954 link
2024-05-05 Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models Tobias Groot et.al. 2405.02917 null
2024-05-05 Octopi: Object Property Reasoning with Large Tactile-Language Models Samson Yu et.al. 2405.02794 link
2024-05-05 ImageInWords: Unlocking Hyper-Detailed Image Descriptions Roopal Garg et.al. 2405.02793 link
2024-05-03 On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? Maxime Zanella et.al. 2405.02266 link
2024-05-03 What matters when building vision-language models? Hugo Laurençon et.al. 2405.02246 null
2024-05-03 Improving Concept Alignment in Vision-Language Concept Bottleneck Models Nithish Muthuchamy Selvaraj et.al. 2405.01825 link
2024-05-02 V-FLUTE: Visual Figurative Language Understanding with Textual Explanations Arkadiy Saakyan et.al. 2405.01474 link
2024-05-02 Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models Yifei Ming et.al. 2405.01468 null
2024-05-02 MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors Yuan Tang et.al. 2405.01413 link
2024-05-02 Learning Object States from Actions via Large Language Models Masatoshi Tateno et.al. 2405.01090 null
2024-05-02 Few Shot Class Incremental Learning using Vision-Language models Anurag Kumar et.al. 2405.01040 null
2024-05-01 Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis Prateek Verma et.al. 2405.00876 null
2024-05-01 CLIPArTT: Light-weight Adaptation of CLIP to New Domains at Test Time Gustavo Adolfo Vargas Hakim et.al. 2405.00754 link
2024-05-01 Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis Huy H. Nguyen et.al. 2405.00355 link
2024-04-30 GUing: A Mobile GUI Search Engine using a Vision-Language Model Jialiang Wei et.al. 2405.00145 link
2024-04-30 MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation Min Zhang et.al. 2404.19644 link
2024-04-30 Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective Wanqi Zhou et.al. 2404.19287 link
2024-04-30 Soft Prompt Generation for Domain Generalization Shuanghao Bai et.al. 2404.19286 link
2024-04-30 PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition Dongyun Lin et.al. 2404.19168 null
2024-04-29 Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM Navid Rajabi et.al. 2404.19128 null
2024-04-29 In-Context Symbolic Regression: Leveraging Language Models for Function Discovery Matteo Merler et.al. 2404.19094 link
2024-04-29 Hallucination of Multimodal Large Language Models: A Survey Zechen Bai et.al. 2404.18930 link
2024-04-29 Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Hongyi Zhu et.al. 2404.18746 null
2024-04-28 Paint by Inpaint: Learning to Add Image Objects by Removing Them First Navve Wasserman et.al. 2404.18212 link
2024-04-27 SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models Manav Nitin Kapadnis et.al. 2404.17912 null
2024-04-27 Medical Vision-Language Pre-Training for Brain Abnormalities Masoud Monajatipoor et.al. 2404.17779 null
2024-04-26 BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Ian Huang et.al. 2404.17672 null
2024-04-26 Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models Yuhang Huang et.al. 2404.17534 null
2024-04-26 Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting Yuanyuan Liu et.al. 2404.17100 null
2024-04-25 AAPL: Adding Attributes to Prompt Learning for Vision-Language Models Gahyeon Kim et.al. 2404.16804 link
2024-04-25 Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class Mazda Moayeri et.al. 2404.16717 null
2024-04-25 VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations Sri Harsha Dumpala et.al. 2404.16365 null
2024-04-25 Training-Free Unsupervised Prompt for Vision-Language Models Sifan Long et.al. 2404.16339 link
2024-04-24 Improving Multi-label Recognition using Class Co-Occurrence Probabilities Samyak Rawlekar et.al. 2404.16193 null
2024-04-24 Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering Cuong Nhat Ha et.al. 2404.16192 null
2024-04-24 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Kaining Ying et.al. 2404.16006 null
2024-04-24 Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography Xuxin Chen et.al. 2404.15946 null
2024-04-24 Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer Jiaming Lei et.al. 2404.15785 null
2024-04-23 BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis Shuhang Lin et.al. 2404.15532 link
2024-04-23 MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning Sunan He et.al. 2404.15127 link
2024-04-21 Interpreting COVID Lateral Flow Tests' Results with Foundation Models Stuti Pandey et.al. 2404.14990 null
2024-04-23 Driver Activity Classification Using Generalizable Representations from Vision-Language Models Ross Greer et.al. 2404.14906 null
2024-04-23 SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models Bo Lin et.al. 2404.14755 null
2024-04-23 FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction Hang Hua et.al. 2404.14715 null
2024-04-23 DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance Linxuan Xin et.al. 2404.14676 null
2024-04-22 A Multimodal Automated Interpretability Agent Tamar Rott Shaham et.al. 2404.14394 null
2024-04-22 Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback Wenyi Xiao et.al. 2404.14233 link
2024-04-22 VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models Haoyi Qiu et.al. 2404.13874 link
2024-04-20 AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models Yuheng Ji et.al. 2404.13425 null
2024-04-20 Movie101v2: Improved Movie Narration Benchmark Zihao Yue et.al. 2404.13370 null
2024-04-19 ECOR: Explainable CLIP for Object Recognition Ali Rasekh et.al. 2404.12839 null
2024-04-19 Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model Jihao Dong et.al. 2404.12678 null
2024-04-19 Pre-trained Vision-Language Models Learn Discoverable Visual Concepts Yuan Zang et.al. 2404.12652 link
2024-04-19 ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation Yu-Hsuan Ho et.al. 2404.12606 null
2024-04-19 Cross-Modal Adapter: Parameter-Efficient Transfer Learning Approach for Vision-Language Models Juncheng Yang et.al. 2404.12588 null
2024-04-18 V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning Hang Hua et.al. 2404.12353 null
2024-04-18 What does CLIP know about peeling a banana? Claudia Cuttano et.al. 2404.12015 null
2024-04-18 Progressive Multi-modal Conditional Prompt Tuning Xiaoyu Qiu et.al. 2404.11864 link
2024-04-17 VG4D: Vision-Language Model Goes 4D Video Recognition Zhichao Deng et.al. 2404.11605 link
2024-04-17 A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene Wenbo Zhang et.al. 2404.11249 null
2024-04-17 Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model Hao Yan et.al. 2404.11046 null
2024-04-17 OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding Edmond Tong et.al. 2404.11000 null
2024-04-16 Vocabulary-free Image Classification and Semantic Segmentation Alessandro Conti et.al. 2404.10864 link
2024-04-16 COMBO: Compositional World Models for Embodied Multi-Agent Cooperation Hongxin Zhang et.al. 2404.10775 null
2024-04-16 Private Attribute Inference from Images with Vision-Language Models Batuhan Tömekçe et.al. 2404.10618 null
2024-04-16 Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases Yanze Li et.al. 2404.10595 null
2024-04-16 Self-Supervised Visual Preference Alignment Ke Zhu et.al. 2404.10501 link
2024-04-17 Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models Enming Zhang et.al. 2404.10357 link
2024-04-16 Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning Rui Hu et.al. 2404.10332 null
2024-04-16 MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models Songtao Jiang et.al. 2404.10237 link
2024-04-16 Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering Zaid Khan et.al. 2404.10193 null
2024-04-15 Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels Amaya Dharmasiri et.al. 2404.10146 link
2024-04-15 OneChart: Purify the Chart Structural Extraction via One Auxiliary Token Jinyue Chen et.al. 2404.09987 link
2024-04-15 Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models Ziwei Luo et.al. 2404.09732 link
2024-04-15 Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction David Sobrín-Hidalgo et.al. 2404.09705 null
2024-04-15 Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection Jiaqi Zhu et.al. 2404.09654 null
2024-04-15 Leveraging Temporal Contextualization for Video Action Recognition Minji Kim et.al. 2404.09490 link
2024-04-15 RankCLIP: Ranking-Consistent Language-Image Pretraining Yiming Zhang et.al. 2404.09387 null
2024-04-13 PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization Zining Chen et.al. 2404.09011 link
2024-04-13 AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning Yuwei Tang et.al. 2404.08958 link
2024-04-13 ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition Otto Brookes et.al. 2404.08937 null
2024-04-12 Training a Vision Language Model as Smartphone Assistant Nicolai Dorka et.al. 2404.08755 null
2024-04-12 Improving Continuous Sign Language Recognition with Adapted Image Models Lianyu Hu et.al. 2404.08226 link
2024-04-11 Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning Simon Schrodi et.al. 2404.07983 null
2024-04-11 Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese Yuichi Inoue et.al. 2404.07824 link
2024-04-12 Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics Masashi Osada et.al. 2404.07717 link
2024-04-12 PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination Anant Khandelwal et.al. 2404.07520 null
2024-04-11 Transferable and Principled Efficiency for Open-Vocabulary Segmentation Jingxuan Xu et.al. 2404.07448 link
2024-04-10 BRAVE: Broadening the visual encoding of vision-language models Oğuzhan Fatih Kar et.al. 2404.07204 null
2024-04-10 Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Sachin Goyal et.al. 2404.07177 link
2024-04-10 ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling Ege Özsoy et.al. 2404.07031 link
2024-04-10 Vision-Language Model-based Physical Reasoning for Robot Liquid Perception Wenqiang Lai et.al. 2404.06904 null
2024-04-09 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Xiaoyi Dong et.al. 2404.06512 link
2024-04-09 Can Feedback Enhance Semantic Grounding in Large Vision-Language Models? Yuan-Hong Liao et.al. 2404.06510 null
2024-04-09 Anchor-based Robust Finetuning of Vision-Language Models Jinwei Han et.al. 2404.06244 null
2024-04-08 Retrieval-Augmented Open-Vocabulary Object Detection Jooyeon Kim et.al. 2404.05687 link
2024-04-08 MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning Matteo Farina et.al. 2404.05621 link
2024-04-08 PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection Xiaofan Li et.al. 2404.05231 link
2024-04-08 Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset Chih-Chung Hsu et.al. 2404.05183 null
2024-04-07 FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback Liqiang Jing et.al. 2404.05046 null
2024-04-07 Hyperbolic Learning with Synthetic Captions for Open-World Detection Fanjie Kong et.al. 2404.05016 null
2024-04-07 Mixture of Low-rank Experts for Transferable AI-Generated Image Detection Zihan Liu et.al. 2404.04883 link
2024-04-07 GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling Hritik Bansal et.al. 2404.04763 null
2024-04-05 Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) Michael Saxon et.al. 2404.04251 link
2024-04-05 Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation Ji-Jia Wu et.al. 2404.04231 link
2024-04-05 Label Propagation for Zero-shot Classification with Vision-Language Models Vladan Stojnić et.al. 2404.04072 link
2024-04-04 Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity Jake Varley et.al. 2404.03570 null
2024-04-03 LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Gabriela Ben Melech Stan et.al. 2404.03118 link
2024-04-03 AWOL: Analysis WithOut synthesis using Language Silvia Zuffi et.al. 2404.03042 null
2024-04-03 I-Design: Personalized LLM Interior Designer Ata Çelen et.al. 2404.02838 null
2024-04-03 Harnessing the Power of Large Vision Language Models for Synthetic Image Detection Mamadou Keita et.al. 2404.02726 link
2024-04-03 RESSA: Repair Sparse Vision-Language Models via Sparse Cross-Modality Adaptation Shwai He et.al. 2404.02424 link
2024-04-03 What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases Anthony Meng Huat Tiong et.al. 2404.02415 link
2024-04-03 Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns Yunsoo Kim et.al. 2404.02370 null
2024-04-02 ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models Vishnunandan L. N. Venkatesh et.al. 2404.02318 null
2024-04-02 Iterated Learning Improves Compositionality in Large Vision-Language Models Chenhao Zheng et.al. 2404.02145 null
2024-04-03 ViTamin: Designing Scalable Vision Models in the Vision-Language Era Jieneng Chen et.al. 2404.02132 link
2024-04-02 Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Mamadou Keita et.al. 2404.01959 link
2024-04-02 VLRM: Vision-Language Models act as Reward Models for Image Captioning Maksim Dzabraev et.al. 2404.01911 null
2024-04-01 OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation Xiongwei Wu et.al. 2404.01409 null
2024-04-02 Open-Vocabulary Federated Learning with Multimodal Prototyping Huimin Zeng et.al. 2404.01232 link
2024-04-01 Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models Yuxin Wen et.al. 2404.01231 null
2024-04-01 Vision-language models for decoding provider attention during neonatal resuscitation Felipe Parodi et.al. 2404.01207 null
2024-04-01 SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining Chull Hwan Song et.al. 2404.01156 null
2024-04-01 Harnessing Large Language Models for Training-free Video Anomaly Detection Luca Zanella et.al. 2404.01014 null
2024-03-29 Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Atsuyuki Miyai et.al. 2403.20331 link
2024-03-29 Are We on the Right Way for Evaluating Large Vision-Language Models? Lin Chen et.al. 2403.20330 link
2024-03-29 Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations Jaisidh Singh et.al. 2403.20312 link
2024-03-29 H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model Chao Pang et.al. 2403.20213 link
2024-03-29 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models Shuo Liu et.al. 2403.20194 null
2024-03-29 LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving Pranjal Paul et.al. 2403.20116 null
2024-03-29 Negative Label Guided OOD Detection with Pretrained Vision-Language Models Xue Jiang et.al. 2403.20078 link
2024-03-28 Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks Pooria Ashrafian et.al. 2403.19880 link
2024-03-28 Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving Akshay Gopalkrishnan et.al. 2403.19838 link
2024-04-01 Concept-based Analysis of Neural Networks via Vision-Language Models Ravi Mangal et.al. 2403.19837 null
2024-03-28 CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models Saurav Jha et.al. 2403.19137 link
2024-03-27 Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models Anees Ur Rehman Hashmi et.al. 2403.18996 null
2024-03-27 Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models Keyan Guo et.al. 2403.18957 link
2024-03-27 Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Yanwei Li et.al. 2403.18814 link
2024-03-27 Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding Xintong Wang et.al. 2403.18715 link
2024-03-27 Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIP Reza Abbasi et.al. 2403.18525 null
2024-03-27 An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM Wonkyun Kim et.al. 2403.18406 link
2024-03-27 Efficient Test-Time Adaptation of Vision-Language Models Adilbek Karmanov et.al. 2403.18293 null
2024-03-26 Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models Yabin Zhang et.al. 2403.17589 link
2024-03-26 Visual Hallucination: Definition, Quantification, and Prescriptive Remediations Vipula Rawte et.al. 2403.17306 null
2024-03-25 Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks Jonathan Salfity et.al. 2403.17238 link
2024-03-25 Open-Set Recognition in the Age of Vision-Language Models Dimity Miller et.al. 2403.16528 link
2024-03-25 Learning To Guide Human Decision Makers With Vision-Language Models Debodeep Banerjee et.al. 2403.16501 null
2024-03-25 If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions Reza Esfandiarpoor et.al. 2403.16442 link
2024-03-24 Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models Yuxuan Wang et.al. 2403.16184 null
2024-03-26 Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models Minchan Kim et.al. 2403.16167 null
2024-03-23 IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models Haz Sameen Shahgir et.al. 2403.15952 link
2024-03-23 Explore until Confident: Efficient Exploration for Embodied Question Answering Allen Z. Ren et.al. 2403.15941 null
2024-03-23 Centered Masking for Language-Image Pre-Training Mingliang Liang et.al. 2403.15837 link
2024-03-23 VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification Lanfeng Zhong et.al. 2403.15836 link
2024-03-22 CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments Adarsh Jagan Sathyamoorthy et.al. 2403.15637 null
2024-03-22 Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning Bumsoo Kim et.al. 2403.15048 null
2024-03-21 Few-Shot Adversarial Prompt Learning on Vision-Language Models Yiwei Zhou et.al. 2403.14774 link
2024-03-21 Can 3D Vision-Language Models Truly Understand Natural Language? Weipeng Deng et.al. 2403.14760 link
2024-03-21 MyVLM: Personalizing VLMs for User-Specific Queries Yuval Alaluf et.al. 2403.14599 null
2024-03-21 Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network Zih-Syuan Huang et.al. 2403.14398 link
2024-03-21 Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation Jianeng Wang et.al. 2403.14320 null
2024-03-21 C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion Hee Suk Yoon et.al. 2403.14119 link
2024-03-21 Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots Connor Lee et.al. 2403.14056 null
2024-03-20 Multi-Modal Hallucination Control by Visual Information Grounding Alessandro Favero et.al. 2403.14003 null
2024-03-20 Bridge the Modality and Capacity Gaps in Vision-Language Model Selection Chao Yi et.al. 2403.13797 null
2024-03-20 Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model Diwei Wang et.al. 2403.13756 null
2024-03-20 Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments Djamahl Etchegaray et.al. 2403.13556 link
2024-03-20 CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models Pablo Pueyo et.al. 2403.13467 null
2024-03-20 AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation Jingkun An et.al. 2403.13352 null
2024-03-20 TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation Santosh Sanjeev et.al. 2403.13343 link
2024-03-20 SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models Tongtian Yue et.al. 2403.13263 link
2024-03-19 Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models Zuyan Liu et.al. 2403.12966 link
2024-03-19 Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models Ce Zhang et.al. 2403.12964 link
2024-03-19 Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models Elaine Sui et.al. 2403.12952 link
2024-03-19 Yell At Your Robot: Improving On-the-Fly from Language Corrections Lucy Xiaoyang Shi et.al. 2403.12910 null
2024-03-19 HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Fucai Ke et.al. 2403.12884 link
2024-03-19 RelationVLM: Making Large Vision-Language Models Understand Visual Relations Zhipeng Huang et.al. 2403.12801 null
2024-03-19 Towards Multimodal In-Context Learning for Vision & Language Models Sivan Doveh et.al. 2403.12736 null
2024-03-19 Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs Victor Carbune et.al. 2403.12596 null
2024-03-19 CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation Wenqi Zhu et.al. 2403.12455 link
2024-03-18 FlexCap: Generating Rich, Localized, and Flexible Captions in Images Debidatta Dwibedi et.al. 2403.12026 null
2024-03-18 Prioritized Semantic Learning for Zero-shot Instance Navigation Xander Sun et.al. 2403.11650 link
2024-03-18 Compositional Kronecker Context Optimization for Vision-Language Models Kun Ding et.al. 2403.11631 null
2024-03-18 Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters Jiazuo Yu et.al. 2403.11549 link
2024-03-18 Do CLIPs Always Generalize Better than ImageNet Models? Qizhou Wang et.al. 2403.11497 null
2024-03-18 VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Yue Fan et.al. 2403.11481 null
2024-03-17 Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding Zichen Wu et.al. 2403.11311 null
2024-03-17 SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant Guohao Sun et.al. 2403.11299 link
2024-03-17 Training A Small Emotional Vision Language Model for Visual Art Comprehension Jing Zhang et.al. 2403.11150 link
2024-03-17 PhD: A Prompted Visual Hallucination Evaluation Dataset Jiazhen Liu et.al. 2403.11116 link
2024-03-17 Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping Haoxi Zhang et.al. 2403.11073 null
2024-03-15 Reconfigurable Robot Identification from Motion Data Yuhang Hu et.al. 2403.10496 null
2024-03-15 EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models Rocktim Jyoti Das et.al. 2403.10378 link
2024-03-15 Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models Tian Meng et.al. 2403.10287 null
2024-03-15 CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning Yukun Li et.al. 2403.10245 link
2024-03-15 Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning Hang Zhang et.al. 2403.10107 null
2024-03-14 An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models Haochen Luo et.al. 2403.09766 link
2024-03-14 Renovating Names in Open-Vocabulary Segmentation Benchmarks Haiwen Huang et.al. 2403.09593 null
2024-03-14 Anomaly Detection by Adapting a pre-trained Vision Language Model Yuxuan Cai et.al. 2403.09493 null
2024-03-14 XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization Yequan Bie et.al. 2403.09410 null
2024-03-14 AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions Hao Zhang et.al. 2403.09346 link
2024-03-14 Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Yufei Zhan et.al. 2403.09333 link
2024-03-14 Annotation Free Semantic Segmentation with Vision Foundation Models Soroush Seifi et.al. 2403.09307 null
2024-03-14 Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models Yu-Chu Yu et.al. 2403.09296 null
2024-03-14 Are Vision Language Models Texture or Shape Biased and Can We Steer Them? Paul Gavrikov et.al. 2403.09193 link
2024-03-14 The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? Qinyu Zhao et.al. 2403.09037 link
2024-03-14 Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Hugo Laurençon et.al. 2403.09029 null
2024-03-13 AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models Yifei Gao et.al. 2403.08542 link
2024-03-13 Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation Zicheng Zhang et.al. 2403.08426 null
2024-03-13 Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification Long Lan et.al. 2403.08271 link
2024-03-13 CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models Haoxu Huang et.al. 2403.08248 null
2024-03-13 Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization Kento Kawaharazuka et.al. 2403.08239 null
2024-03-12 TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection Hanning Chen et.al. 2403.08108 null
2024-03-12 MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric Haokun Lin et.al. 2403.07839 null
2024-03-12 Unified Source-Free Domain Adaptation Song Tang et.al. 2403.07601 link
2024-03-12 In-context learning enables multimodal large language models to classify cancer pathology images Dyke Ferber et.al. 2403.07407 null
2024-03-12 KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models Han Huang et.al. 2403.07350 link
2024-03-12 Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion Wenhui Tan et.al. 2403.07312 link
2024-03-12 Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations Chenyu You et.al. 2403.07241 link
2024-03-11 Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation Xinyao Li et.al. 2403.06946 link
2024-03-11 An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Liang Chen et.al. 2403.06764 link
2024-03-11 FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications Yuki Tatsukawa et.al. 2403.06453 link
2024-03-11 Can LLMs' Tuning Methods Work in Medical Multimodal Domain? Jiawei Chen et.al. 2403.06407 link
2024-03-10 A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets Thang Doan et.al. 2403.06295 link
2024-03-10 In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model Junhui Yin et.al. 2403.06126 null
2024-03-11 DeepSeek-VL: Towards Real-World Vision-Language Understanding Haoyu Lu et.al. 2403.05525 link
2024-03-08 Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery Xavier Bou et.al. 2403.05381 link
2024-03-08 VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model Junsu Kim et.al. 2403.05346 null
2024-03-08 Debiasing Large Visual Language Models Yi-Fan Zhang et.al. 2403.05262 link
2024-03-08 CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model Pengwei Yin et.al. 2403.05124 null
2024-03-08 How Far Are We from Intelligent Visual Deductive Reasoning? Yizhe Zhang et.al. 2403.04732 link
2024-03-07 Yi: Open Foundation Models by 01.AI 01. AI et.al. 2403.04652 link
2024-03-07 Embodied Understanding of Driving Scenarios Yunsong Zhou et.al. 2403.04593 link
2024-03-07 Effectiveness Assessment of Recent Large Vision-Language Models Yao Jiang et.al. 2403.04306 null
2024-03-06 MeaCap: Memory-Augmented Zero-shot Image Captioning Zequn Zeng et.al. 2403.03715 link
2024-03-05 Enhancing Vision-Language Pre-training with Rich Supervisions Yuan Gao et.al. 2403.03346 null
2024-03-05 CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham et.al. 2403.03203 null
2024-03-05 MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting Fangchen Liu et.al. 2403.03174 null
2024-03-06 ImgTrojan: Jailbreaking Vision-Language Models with ONE Image Xijia Tao et.al. 2403.02910 link
2024-03-05 Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation Zhekai Du et.al. 2403.02899 null
2024-03-05 Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples Philipp J. Rösch et.al. 2403.02875 null
2024-03-06 PromptKD: Unsupervised Prompt Distillation for Vision-Language Models Zheng Li et.al. 2403.02781 link
2024-03-05 DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization Feng Hou et.al. 2403.02714 null
2024-03-05 Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use Imad Eddine Toubal et.al. 2403.02626 null
2024-03-05 Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research Brenda Y. Miao et.al. 2403.02558 link
2024-03-04 Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review Iryna Hartsock et.al. 2403.02469 link
2024-03-02 Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning Shuo Yang et.al. 2403.01209 null
2024-03-01 HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Zhaorun Chen et.al. 2403.00425 link
2024-03-01 Invariant Test-Time Adaptation for Vision-Language Model Generalization Huan Ma et.al. 2403.00376 link
2024-03-04 Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models Lei Li et.al. 2403.00231 null
2024-03-01 Multi-modal Attribute Prompting for Vision-Language Models Xin Liu et.al. 2403.00219 null
2024-02-29 Artwork Explanation in Large-scale Vision Language Models Kazuki Hayashi et.al. 2403.00068 null
2024-02-29 Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction Hao Li et.al. 2402.19326 link
2024-02-29 Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts Hao Cheng et.al. 2402.19150 null
2024-02-28 IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding Lanyun Zhu et.al. 2402.18476 null
2024-02-29 A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models Xiujie Song et.al. 2402.18409 link
2024-02-28 SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model Bin Cao et.al. 2402.18068 link
2024-02-28 Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction Koki Maeda et.al. 2402.17969 null
2024-02-27 Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning Maurits Bleeker et.al. 2402.17510 link
2024-02-27 VCD: Knowledge Base Guided Visual Commonsense Discovery in Images Xiangqing Shen et.al. 2402.17213 null
2024-02-26 Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models Jeonghwan Kim et.al. 2402.16315 null
2024-02-26 Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion Xuantong Liu et.al. 2402.16305 null
2024-02-27 NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation Jiazhao Zhang et.al. 2402.15852 null
2024-02-24 Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation Zekun Jiang et.al. 2402.15759 link
2024-02-24 GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation Yi Zong et.al. 2402.15745 link
2024-02-24 CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge Xiao Lin et.al. 2402.15726 null
2024-02-24 Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models Chaoya Jiang et.al. 2402.15721 null
2024-02-24 Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics Sadaf Ghaffari et.al. 2402.15654 null
2024-02-23 Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning Tejas Srinivasan et.al. 2402.15610 link
2024-02-23 Representing Online Handwriting for Recognition in Large Vision-Language Models Anastasiia Fadeeva et.al. 2402.15307 null
2024-02-23 Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding Ailin Deng et.al. 2402.15300 link
2024-02-22 CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Santiago Castro et.al. 2402.15021 link
2024-02-22 PALO: A Polyglot Large Multimodal Model for 5B People Muhammad Maaz et.al. 2402.14818 link
2024-02-22 Uncertainty-Aware Evaluation for Vision-Language Models Vasily Kostumov et.al. 2402.14418 link
2024-02-22 Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology Nur Yildirim et.al. 2402.14252 null
2024-02-21 A Unified Framework and Dataset for Assessing Gender Bias in Vision-Language Models Ashutosh Sathe et.al. 2402.13636 null
2024-02-21 WinoViz: Probing Visual Properties of Objects Under Different States Woojeong Jin et.al. 2402.13584 null
2024-02-21 BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models Xueliang Zhao et.al. 2402.13577 null
2024-02-20 A Touch, Vision, and Language Dataset for Multimodal Alignment Letian Fu et.al. 2402.13232 link
2024-02-20 SoMeLVLM: A Large Vision Language Model for Social Media Processing Xinnong Zhang et.al. 2402.13022 null
2024-02-20 CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection Sohail Ahmed Khan et.al. 2402.12927 link
2024-02-20 GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models Sayantan Adak et.al. 2402.12881 link
2024-02-20 MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion Sen Li et.al. 2402.12741 link
2024-02-19 Talk Through It: End User Directed Manipulation Learning Carl Winge et.al. 2402.12509 null
2024-02-19 Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection Ruibo Chen et.al. 2402.12501 link
2024-02-19 Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models Christian Schlarmann et.al. 2402.12336 link
2024-02-19 DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models Xiaoyu Tian et.al. 2402.12289 null
2024-02-19 Evaluating Image Review Ability of Vision Language Models Shigeki Saito et.al. 2402.12121 null
2024-02-19 LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation Keyang Xuan et.al. 2402.11943 link
2024-02-18 Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning Zhiyang Xu et.al. 2402.11690 null
2024-02-18 ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model Guiming Hardy Chen et.al. 2402.11684 link
2024-02-18 Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models Junfei Wu et.al. 2402.11622 link
2024-02-18 Visual In-Context Learning for Large Vision-Language Models Yucheng Zhou et.al. 2402.11574 null
2024-02-17 ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing Zhenghang Yuan et.al. 2402.11325 link
2024-02-17 CoLLaVO: Crayon Large Language and Vision mOdel Byung-Kwan Lee et.al. 2402.11248 link
2024-02-16 PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Junfei Xiao et.al. 2402.10896 null
2024-02-16 Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering David Romero et.al. 2402.10698 link
2024-02-16 OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models Yuxuan Kuang et.al. 2402.10670 link
2024-02-15 On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities Xiyang Wu et.al. 2402.10340 link
2024-02-15 Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment Angelos Zavras et.al. 2402.09816 null
2024-02-16 MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models Corentin Royer et.al. 2402.09262 **[link](https://github

About

🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%