Authors: Wouter Bant, Colin Bot, Jasper Eppink, Clio Feng, Floris Six Dijkstra
Inspired by the group equivariant attention model presented in Group Equivariant Vision Transformer (GE-ViT), we conduct experiments to validate the performance of the presented model. We provide many visualizations for a better understanding of GE-ViTs and other presented methods. Furthermore, we present and evaluate several ways of making non-equivariant models equivariant by combining the latent embeddings or probabilities of different transformed inputs. We also speed up the experiments with GE-ViT by first projecting the image to an artificial image with smaller spatial resolution.
For the full analysis see our blogpost, but to give a little preview:
- 👓 We visualize many layers of the Group Equivariant Vision Transformer (GE-ViT)
- 🎯 Evaluate and propose novel ways of making any image classification model globally E(2) equivariant, outperform regular test time augmentation, and beat previous image classification benchmarks:
- ⚡ Speed up and improve GE-ViTs by projecting the image to an artificial image with lower spatial resolution for less attention computations:
Clone the repository:
git clone https://github.com/WouterBant/GEVit-DL2-Project.git
And go inside the directory:
cd GEVit-DL2-Project
Unfortunately we had to use two different environments. For running the GE-ViT you can create the environment with:
conda env create -f gevit_conda_env.yml
conda activate gevit
For running the post hoc experiments and training of the equivariant modern ViT:
conda env create -f posthoc_conda_env.yml
conda activate posthoc
In the demos folder we provide notebooks for visualizing the artifacts for non 90 degree rotations and creating the video comparing normal ViT to equivariant models for rotated inputs.
For reproducing the results for GE-ViT, change directory to the src folder and execute the commands from the README.
For reproducing the results of the post hoc experiments, change directory to src/post_hoc_equivariant and follow the instructions from the README. Also, in this folder checkpoints and results are saved for various models.
For reproducing the results of the modern equivariant ViT, change directory to src/modern_eq_vit and refer to the README for instructions to run the training.
- The blogpost can be found in Blogpost.md
- Jupyter notebooks with small experiments can be found in demos/
- All the other code can be found in src/
- src/README.md: instructions for running GE-ViT experiments
- src/modern_eq_vit: code for down sampling images for faster GE-ViT experiments
- src/modern_eq_vit/README.md: instructions to reproduce these experiments
- src/modern_eq_vit/eq_modern_vit.py: implementation for this part
- src/post_hoc_equivariance: post hoc equivariant methods
- src/post_hoc_equivariance/README.md: instructions to reproduce these experiments
- src/post_hoc_equivariance/post_hoc_equivariant.py: implementation of the different methods
- src/models/: various models
- src/models/gcnn.py: implementation of the Group Convolutional Neural Network
This repository contains the source code accompanying the paper: Group Equivariant Vision Transformer, UAI 2023.
The original code, containing a small error, is from the GSA-Nets paper by David W. Romero and Jean-Baptiste Cordonnier.