Replies: 1 comment
-
The only difference between IncrementalClassifier and nn.Linear is the new units initialization:
I'm not sure if this difference can explain the results you are getting. I would expect IncrementalClassifier to work slightly better. Honestly, this looks like a bug, but I'm not sure what's causing it. @AndreaCossu do you have any idea of what's happening? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Many default architectures borrowed from the baselines repo tend to use a static nn.Linear() as the last classifier layer at the model. For example:
https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/lwf.py
https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/naive.py
(note that the the MLP parameter initial_out_features defaults to 0, thus setting self.classifier = nn.Linear(hidden_size, output_size)
This solution will have a fixed number of outputs dependent upon the number of total classes across all experiences.
In a true class-IL setting, however, the number of total classes may be unknown, so we wish grow the last layer as new classes are observed. Thus, using the IncrementalClassifier() from avalanche.models may be the preferred, valid solution. This mimics the solution in the "Three scenarios for continual learning" paper, where "all units of the classes seen so far were active" (see sec. 4.2 in https://arxiv.org/pdf/1904.07734.pdf). I believe this implies non-observed classes have inactive output units.
In the case of LwF, final average accuracy is ~30% when using a static nn.Linear() classification layer. However, using an IncrementalClassifier() produces significantly different results as accuracy of all previously learned experiences drops to 0% (maximum catastrophic forgetting). I am trying to understand:
To reproduce my error, make the following modifications to https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/lwf.py:
initial_out_features=2
when initializing MLPfixed_class_order=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
when initializing the SplitMNIST benchmark and run the following training/eval loop so eval is only performed on observed classes:I add the summary call (from torchsummary) to observe how the classification layer grows with each experience (unlike with a static nn.Linear()).
Thanks,
Ethan
Beta Was this translation helpful? Give feedback.
All reactions