class-IL: IncrementalClassifier() vs. static nn.Linear() #1262

wisdeth14 · 2022-12-16T22:15:07Z

wisdeth14
Dec 16, 2022

Many default architectures borrowed from the baselines repo tend to use a static nn.Linear() as the last classifier layer at the model. For example:
https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/lwf.py
https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/naive.py
(note that the the MLP parameter initial_out_features defaults to 0, thus setting self.classifier = nn.Linear(hidden_size, output_size)
This solution will have a fixed number of outputs dependent upon the number of total classes across all experiences.

In a true class-IL setting, however, the number of total classes may be unknown, so we wish grow the last layer as new classes are observed. Thus, using the IncrementalClassifier() from avalanche.models may be the preferred, valid solution. This mimics the solution in the "Three scenarios for continual learning" paper, where "all units of the classes seen so far were active" (see sec. 4.2 in https://arxiv.org/pdf/1904.07734.pdf). I believe this implies non-observed classes have inactive output units.

In the case of LwF, final average accuracy is ~30% when using a static nn.Linear() classification layer. However, using an IncrementalClassifier() produces significantly different results as accuracy of all previously learned experiences drops to 0% (maximum catastrophic forgetting). I am trying to understand:

Why are these results different? IncrementalClassifier() is ultimately is just a nn.Linear() that is continuously re-sizes, so matched results would be expected. One theory is to access to the last classification layer differs: for a static nn.Linear() and a IncrementalClassifier() the classifier is accessed via model.classifier and model.classifier.classifier, respectively - could this cause an issue?
Why is a static nn.Linear() used as the default solution? Isn't it an awkward approach to have additional output units for unseen classes? Obviously this solution seems to be working, so it is the preferred approach for now, but using IncrementalClassifier() seems to be the more "pure" solution.

To reproduce my error, make the following modifications to https://github.com/ContinualAI/continual-learning-baselines/blob/main/experiments/split_mnist/lwf.py:

To use an IncrementalClassifier(), set initial_out_features=2 when initializing MLP
To avoid triggering the classifier into adding new classes, use fixed_class_order=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] when initializing the SplitMNIST benchmark and run the following training/eval loop so eval is only performed on observed classes:

    for id, exp in enumerate(benchmark.train_stream):
        eval_exps = [e for e in benchmark.test_stream][: id + 1]
        cl_strategy.train(exp)
        summary(model, (1, 28, 28))
        res = cl_strategy.eval(eval_exps)

I add the summary call (from torchsummary) to observe how the classification layer grows with each experience (unlike with a static nn.Linear()).

Thanks,
Ethan

AntonioCarta · 2022-12-19T09:50:45Z

AntonioCarta
Dec 19, 2022
Maintainer

The only difference between IncrementalClassifier and nn.Linear is the new units initialization:

IncrementalClassifier sets them to zero by default
nn.Linear trains them all since the first experience, so you can expect them to be trained to have negative logits (prob. zero).

I'm not sure if this difference can explain the results you are getting. I would expect IncrementalClassifier to work slightly better.

Honestly, this looks like a bug, but I'm not sure what's causing it.

@AndreaCossu do you have any idea of what's happening?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

class-IL: IncrementalClassifier() vs. static nn.Linear() #1262

{{title}}

Replies: 1 comment

{{title}}

Select a reply

class-IL: IncrementalClassifier() vs. static nn.Linear() #1262

wisdeth14 Dec 16, 2022

Replies: 1 comment

AntonioCarta Dec 19, 2022 Maintainer

wisdeth14
Dec 16, 2022

AntonioCarta
Dec 19, 2022
Maintainer