Replace hardcoded class index with `logit_class_dim` argument #177

wiseodd · 2024-04-26T23:09:13Z

Closes #163

Please wait until #144 is merged.

wiseodd · 2024-04-26T23:09:36Z

Still WIP test cases

wiseodd · 2024-04-27T19:47:54Z

Ready to review!

Discussion points:

Do BackPACK, ASDL, Asdfghjkl even support multiple output dims? That is, if we flatten logits = logits.view(-1, logits.size(logit_class_dim)), do they even compute the correct quantities?
Curvlinops and torch.func interfaces always assume logit_class_dim = 1. Do we want to make them respect logit_class_dim? I don't think flattening as above is the correct approach, right?

wiseodd · 2024-06-17T20:00:29Z

Merged with main and ready for review!

runame

I think @f-dangel mentioned that BackPack doesn't support multiple output dims, so we have to rearrange to the equivalent 2d output/label. ASDL assumes the class dim is -1 and flattens everything, so again, we have to rearrange to the equivalent 2d output. Same probably holds for asdfghjkl. With other words, we can't just flatten the output, but have to transpose correctly (probably most readable with einops.rearrange). See here, for how we handle this in curvlinops with assumption that logit_class_dim=1 for CE loss and logit_class_dim=-1 for MSE loss.
For functorch, it should be possible to also rearrange the logits/labels. For curvlinops, the issue is that we don't have access to the logits, so to make it work here we would have to wrap the forward pass of the model to fix the logit class dim at the end of the forward pass. So maybe we should just raise an informative error, telling the user how to modify the forward pass of the model to be compatible with curvlinops.

To make sure we get this right, we have to test for equivalence of computation, see my comment. Also, if we only want logit_class_dim to affect classification tasks, we should maybe verify at initialization that likelihood="classification" if logit_class_dim is not the default value and raise a ValueError otherwise.

runame · 2024-07-01T14:58:01Z

laplace/curvature/asdl.py

-                dummy if self.loss_type == LOSS_MSE else dummy.view(-1, dummy.size(-1))
+                dummy
+                if self.loss_type == LOSS_MSE
+                else dummy.view(-1, dummy.size(self.logit_class_dim))


I don't think flattening will have the intended effect; same for the other changes below. See my main comment.

runame · 2024-07-01T15:03:24Z

tests/test_curv_backends_interface.py

+)
+@pytest.mark.parametrize("method", ["full", "kron", "diag"])
+@pytest.mark.parametrize("logit_class_dim", [-1, 1000])
+def test_logit_class_dim_class(backend_cls, method, logit_class_dim, model, class_Xy):


I think this has to be tested for actual equivalence of the computation, i.e. pytest.mark.parametrize the tests of the backends with logit_class_dim.

wiseodd · 2024-07-04T18:33:52Z

It seems more complicated than anticipated. This PR is useful for models with image outputs like diffusion models.

Considering v0.2 is all about LLMs, let's defer this to v0.3!

runame · 2024-07-04T18:37:22Z

It seems more complicated than anticipated. This PR is useful for models with image outputs like diffusion models.

Considering v0.2 is all about LLMs, let's defer this to v0.3!

Ok, for now maybe we can add a note in the README and the docstring that clearly states how multi dim outputs are handled?

wiseodd · 2024-07-04T18:48:30Z

So what do you have in mind regarding the wording? Something like this in README.md?

## Caveats

- Currently, this library always assumes that the model has an 
  output tensor of shape `(batch_size, ..., n_classes)`, so in 
  the case of image outputs, you need to rearrange from NCHW to NHWC.

runame · 2024-07-04T19:51:08Z

So what do you have in mind regarding the wording? Something like this in README.md?

Yes, this is exactly what I was thinking!

Replace hardcoded class index with an argument

4def1de

wiseodd added the enhancement New feature or request label Apr 26, 2024

wiseodd added this to the 0.2 milestone Apr 26, 2024

wiseodd self-assigned this Apr 26, 2024

Finish tests

8df11ca

wiseodd marked this pull request as ready for review April 27, 2024 01:53

wiseodd requested review from runame and aleximmer April 27, 2024 01:54

Add logit_class_dim arg to SubnetLaplace

06caada

wiseodd changed the base branch from main to mc-subset2 April 27, 2024 17:25

Base automatically changed from mc-subset2 to main April 27, 2024 18:53

Merge branch 'main' into logit-class-index

4430477

wiseodd added 2 commits June 17, 2024 15:45

Merge branch 'main' into logit-class-index

16c4244

Reorder logit_class_dim parameter in constructors

d081566

Merge branch 'main' into logit-class-index

10b2d9c

runame requested changes Jul 1, 2024

View reviewed changes

wiseodd removed this from the 0.2 milestone Jul 4, 2024

wiseodd added this to the 0.3 milestone Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace hardcoded class index with `logit_class_dim` argument #177

Replace hardcoded class index with `logit_class_dim` argument #177

wiseodd commented Apr 26, 2024

wiseodd commented Apr 26, 2024

wiseodd commented Apr 27, 2024

wiseodd commented Jun 17, 2024

runame left a comment

runame Jul 1, 2024

runame Jul 1, 2024

wiseodd commented Jul 4, 2024

runame commented Jul 4, 2024

wiseodd commented Jul 4, 2024

runame commented Jul 4, 2024

Replace hardcoded class index with logit_class_dim argument #177

Are you sure you want to change the base?

Replace hardcoded class index with logit_class_dim argument #177

Conversation

wiseodd commented Apr 26, 2024

wiseodd commented Apr 26, 2024

wiseodd commented Apr 27, 2024

wiseodd commented Jun 17, 2024

runame left a comment

Choose a reason for hiding this comment

runame Jul 1, 2024

Choose a reason for hiding this comment

runame Jul 1, 2024

Choose a reason for hiding this comment

wiseodd commented Jul 4, 2024

runame commented Jul 4, 2024

wiseodd commented Jul 4, 2024

runame commented Jul 4, 2024

Replace hardcoded class index with `logit_class_dim` argument #177

Replace hardcoded class index with `logit_class_dim` argument #177