Lianghui Zhu1 *,Bencheng Liao1 *,Qian Zhang2, Xinlong Wang3, Wenyu Liu1, Xinggang Wang1 📧
1 Huazhong University of Science and Technology, 2 Horizon Robotics, 3 Beijing Academy of Artificial Intelligence
(*) equal contribution, (📧) corresponding author.
ArXiv Preprint (arXiv 2401.09417)
Try Hilbert Indexing on the serial.
Python 3.10.13
conda create -n your_env_name python=3.10.13
torch 2.1.1 + cu118
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url
Requirements: vim_requirements.txt
pip install -r vim/vim_requirements.txt
cd causal_conv1d; pip install -e .
cd mamba; pip install -e .
To train Vim-Ti
on ImageNet-1K, run:
bash vim/scripts/
To finetune Vim-Ti
on ImageNet-1K based on the Published Checkpoint, run:
bash vim/scripts/
To evaluate Vim-Ti
on ImageNet-1K, run:
bash vim/scripts/
Model | #param. | Top-1 Acc. | Top-5 Acc. | Hugginface Repo |
Vim-tiny | 7M | 73.1 | 91.1 | |
This project is based on Mamba (paper, code), Causal-Conv1d (code), DeiT (paper, code). Thanks for their wonderful works.
If you find Vim is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
journal={arXiv preprint arXiv:2401.09417},