Skip to content

[ICLR 2025] "Attribute-based Visual Reprogramming for Image Classification with CLIP" Official Website: https://github.com/tmlr-group/AttrVR

License

Notifications You must be signed in to change notification settings

caichengyi/AttrVR

Repository files navigation

Attribute-based Visual Reprogramming for Image Classification with CLIP

License: MIT Static Badge Static Badge Static Badge

This repository is the official PyTorch implementation of the ICLR 2025 paper: Attribute-based Visual Reprogramming for Image Classification with CLIP, authored by Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, and Feng Liu.

Abstract: Visual reprogramming (VR) reuses pre-trained vision models for downstream image classification tasks by adding trainable noise patterns to inputs. When applied to vision-language models (e.g., CLIP), existing VR approaches follow the same pipeline used in vision models (e.g., ResNet, ViT), where ground-truth class labels are inserted into fixed text templates to guide the optimization of VR patterns. This label-based approach, however, overlooks the rich information and diverse attribute-guided textual representations that CLIP can exploit, which may lead to the misclassification of samples. In this paper, we propose Attribute-based Visual Reprogramming (AttrVR) for CLIP, utilizing descriptive attributes (DesAttrs) and distinctive attributes (DistAttrs), which respectively represent common and unique feature descriptions for different classes. Besides, as images of the same class may reflect different attributes after VR, AttrVR iteratively refines patterns using the k-nearest DesAttrs and DistAttrs for each image sample, enabling more dynamic and sample-specific optimization. Theoretically, AttrVR is shown to reduce intra-class variance and increase inter-class separation. Empirically, it achieves superior performance in 12 downstream tasks for both ViT-based and ResNet-based CLIP. The success of AttrVR facilitates more effective integration of VR from unimodal vision models into vision-language models.

Framework

Environment

  • Python (3.10.0)
  • PyTorch (2.0.1)
  • TorchVision (0.15.2)

Installation

conda create -n reprogram
conda activate reprogram
pip install -r requirement.txt

Dataset Preparation

To implement the results, please follow CoOp to download the datasets and modify DOWNSTREAM_PATH = "" in cfg.py of this repository.

Step 1: Generating DesAttr and DistAttr / Use the Generated Attributes

  • We have uploaded all generated attributes used in this paper to attributes/gpt3. Other attributes used in Appendix C.7 & C.9 generated by other LLMs/MLLMs can also be found in attributes/....

  • If you would like to generate the attributes by yourself, please first enter your API Key in generate_attributes.py, then run the code python generated_attributes.py.

Step 2.1: Running Code for Baselines

python experiments/fs_vp.py --dataset [dataset]
python experiments/fs_ar.py --dataset [dataset]

Step 2.2: Running Code for AttrVR

python experiments/fs_attrvr.py --dataset [dataset]

Acknowledgements

This repo is built upon these previous works:

Citation

@inproceedings{cai2025attribute,
    title={Attribute-based Visual Reprogramming for Image Classification with CLIP},
    author={Chengyi Cai and Zesheng Ye and Lei Feng and Jianzhong Qi and Feng Liu},
    booktitle = {International Conference on Learning Representations},
    year={2025}
}

About

[ICLR 2025] "Attribute-based Visual Reprogramming for Image Classification with CLIP" Official Website: https://github.com/tmlr-group/AttrVR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages