Select and Distill

This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24.

https://chuyu.org/research/snd

Annoucement

[2025/01/19] The model checkpoints have also been uploaded! Check here for more details.

[2025/01/19] The instruction page is ready! We plan to release our original checkpoints soon.

[2024/12/31] Our full codebase has been released! Introduction and installation method (include packages) would be updated soon.

Installation

Create a new Conda environment with Python 3.10.14:

conda create -n snd python==3.10.14

Activate the environment and install PyTorch with the specified version and CUDA support:

conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia

Install additional dependencies using the provided requirements.txt file:

pip install -r requirements.txt

Dataset Preparation

To reproduce our experiments, download the following datasets from the guidance provided here.

FGVCAircraft
DTD
EuroSAT
Flowers102
Food101
OxfordPets
StanfordCars
UCF101
ImageNet

Organize each dataset in the following directory structure:

<DATASET_NAME>/
    ├── images/
    │   ├── image data / folders
    ├── <DATASET_NAME>_annotations.json

The <DATASET_NAME>_annotations.json file contains the training, validation, and test splits, along with class names. The files we used for all datasets are provided here. Download these files and place them in the appropriate paths as described above.

Model Checkpoints

We provide our original model checkpoints for public use. Due to limited storage space, only the last checkpoints for each training sequence are released.

Unfortunately, while reproducing our experiments, we observed a slight performance drop (0.08% in mean scores). This discrepancy may be attributed to differences in hardware or package versions. Despite this minor variation, our method still achieve state-of-the-art performance compared to previous works.

You can access the model checkpoints and the reproduced average accuracy scores here.

Running with the Scripts

We provide several scripts to help you easily reproduce our experiments. Our experiments were conducted using 4x V100 GPUs in distributed parallel mode. Note that we have not tested our method outside of distributed mode. If you have only one GPU, run the code in distributed mode by specifying --nproc_per_node to 1.

Prerequisite

Before running the scripts, ensure that the root paths to your dataset folders are correctly configured in all files within the configs/ directory.

Specifically, update the data.root attribute to point to your dataset's root directory.

Other configuration attributes do not need modification, as our scripts will automatically adjust them during runtime. However, you may modify these attributes if you wish to experiment with different hyper-parameters.

Train and Eval

The following script allows training on a single dataset (e.g., fgvc-aircraft) and evaluating on all datasets using 4 GPUs.

Run the command below to execute the script:

python -m scripts.train_and_eval --config_path configs/snd_config_4_gpus.yaml --dataset fgvc-aircraft --distributed --nproc_per_node 4

Using a Single GPU

If you are using only one GPU, modify the command as follows:

python -m scripts.train_and_eval --config_path configs/snd_config_1_gpu.yaml --dataset fgvc-aircraft --distributed --nproc_per_node 1

Continual Training

To load a model trained on a specific dataset and continue training on another dataset, include the --pretrained_dataset argument:

python -m scripts.train_and_eval --config_path configs/snd_config_4_gpus.yaml --pretrained_dataset fgvc-aircraft --dataset dtd --distributed --nproc_per_node 4

Note

Our code has only been verified with 1 or 4 GPUs.
Using more than 4 GPUs is not recommended, as we observed that the performance drops a bit.
When training with 1–4 GPUs, ensure that the batch size for training and reference data is correctly adjusted to match the number of GPUs.

Continual Training on the whole training sequence

We also provide a script to continually train and evaluate across an entire sequence of datasets (i.e., reproduce our Multi-Domain Task Incremental Learning setting):

python -m scripts.continually_train --config_path configs/snd_config_4_gpus.yaml --order 0 --distributed --nproc_per_node 4

Note

The --order argument specifies an offset to shift the pre-defined dataset sequence.
For detailed task orders of each training sequence, refer to the supplementary materials.
The whole process of training and evaluation on a single training sequence using 4x V100 GPUs takes approximately 150 minutes on our devices.

Inference

We also provide a script for performing inference on all datasets used in our experiments.

Run the following command to execute the inference script using the model stored in outputs/order_0/checkpoint_latest.pth:

python -m scripts.inference --model_path outputs/order_0/checkpoint_latest.pth

Citation

If you find our work useful, please cite it using the following BibTeX entry:

@inproceedings{yu2025select,
  title={Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models},
  author={Yu, Yu-Chu and Huang, Chi-Pin and Chen, Jr-Jen and Chang, Kai-Po and Lai, Yung-Hsuan and Yang, Fu-En and Wang, Yu-Chiang Frank},
  booktitle={European Conference on Computer Vision},
  pages={219--236},
  year={2025},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
main		main
scripts		scripts
src		src
visualization		visualization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Select and Distill

Table of Contents

Annoucement

Installation

Dataset Preparation

Model Checkpoints

Running with the Scripts

Prerequisite

Train and Eval

Using a Single GPU

Continual Training

Note

Continual Training on the whole training sequence

Note

Inference

Citation

About

Releases

Packages

Languages

License

chu0802/SnD

Folders and files

Latest commit

History

Repository files navigation

Select and Distill

Table of Contents

Annoucement

Installation

Dataset Preparation

Model Checkpoints

Running with the Scripts

Prerequisite

Train and Eval

Using a Single GPU

Continual Training

Note

Continual Training on the whole training sequence

Note

Inference

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages