Optax + Equinox, calling .update with None gradients. #411

liukidar · 2022-09-06T14:43:42Z

liukidar
Sep 6, 2022

Hello,
I'm working on a model in which, based on the current step number, I need to compute the gradients only of a specific subset of parameters. This is achieved by creating a different split of (params, static) (using equinox.partition when passing the model to the loss function. Equinox also provides equinox.apply_updates to deal with None gradients in the updates pytree. However this doesn't change how the optim.update function behaves in optax. For example, if I have an adam optimizer, optim.update will try to update all the parameters in the optim state, even when the gradient provided is None. This operation, of course, results in an error. Is there a tool in the library to deal with such scenario?

More in detail I'm actually using a multi_transform optimizer, where each group of parameters is "active" (i.e., the gradient is computed with respect of those parameters and is not None) only at specific steps. Currently I have created my own multi_transform to deal with such cases. Is it a good implementation? Is it possible to improve or use something else already existing in the library?

Here's the code:

# I've edited only the update_fn of multi_transform

def update_fn(updates, state, params=None):
		labels = param_labels(updates) if callable(param_labels) else param_labels
		new_inner_state = {}
		for group, tx in transforms.items():
			group_mask = make_mask(labels, group)
			updates_mask, _ = jtu.tree_flatten(jtu.tree_map(lambda m, v: m == True and v is not None, group_mask, updates))

			if np.any(updates_mask):
				assert updates_mask == jtu.tree_flatten(group_mask)[0], "[TODO] If updates_mask as any True value, it means that the whole group should have valid gradients. If you see this error message it means you are voiding only some of the gradients of a group. This should never happen in normal circumstances. Please report this."
				masked_tx = optax.masked(tx, group_mask)				
				updates, new_inner_state[group] = masked_tx.update(
						updates, state.inner_states[group], params)
			else:
				updates, new_inner_state[group] = updates, state.inner_states[group]
		return updates, optax.MultiTransformState(new_inner_state)

Thank you for any advice

Answered by mtthss

Sep 7, 2022

I see, one potential issue is that I don't think your proposed solution with work with jitting due to the if condition not being based on static values

View full answer

mtthss · 2022-09-06T14:50:13Z

mtthss
Sep 6, 2022
Maintainer

if I have an adam optimizer, optim.update will try to update all the parameters in the optim state, even when the gradient provided is None. This operation, of course, results in an error. Is there a tool in the library to deal with such scenario?

Did you try wrapping your optimiser with optax.masked?
You can use masked to have a GradientTransformation leave any subset of input grads unchanged
This should work also if such grads are None (since they are just ignored)

Then equinox.apply_updates already knows how to take care of None at the end

1 reply

liukidar Sep 6, 2022
Author

I think my problem is that my masking changes at "run time", so I want to update the optimizer state and model params only when the gradients are actually present. From my understanding optax.masked would mask away some elements permanently.

I understand this is not a common scenario and probably a valid advice should be simply "just build two different optimizers", I was just looking for a simple and scalable solution (for example the code I suggested if it is a reasonable way of solving the issue).

mtthss · 2022-09-07T09:53:39Z

mtthss
Sep 7, 2022
Maintainer

I see, one potential issue is that I don't think your proposed solution with work with jitting due to the if condition not being based on static values

3 replies

liukidar Sep 7, 2022
Author

I tried to switch to jax.lax.cond:

def update_fn(updates, state, params=None):
		labels = param_labels(updates) if callable(param_labels) else param_labels
		new_inner_state = {}
		
		for group, tx in transforms.items():
			group_mask = make_mask(labels, group)
			update_group = not jtu.tree_all(jtu.tree_map(lambda m, v: m == False or v is None, group_mask, updates))
			masked_tx = optax.masked(tx, group_mask)

			def do_update(_):
				return masked_tx.update(updates, state.inner_states[group], params)

			def reject_update(_):
				return updates, state.inner_states[group]

			updates, new_inner_state[group] = jax.lax.cond(update_group, 
				do_update,
				reject_update,
				operand=None
			)
			
		return updates, optax.MultiTransformState(new_inner_state)

	return optax.GradientTransformation(init_fn, update_fn)

but I get the error that do_update and reject_update returns different types, which I don't understand, as what I'm doing is basically the same that optax.maybe_update does.

The difference apparently is here:
[CustomNode(<class 'equinox.nn.linear.Linear'>[(('weight', 'bias'), ('in_features', 'out_features', 'use_bias'), (8, 8, True))], [*, *])]
vs
[CustomNode(<class 'equinox.nn.linear.Linear'>[(('weight', 'bias'), ('in_features', 'out_features', 'use_bias'), (8, 8, True))], [None, None])]

liukidar Sep 16, 2022
Author

Actually I thought more about the issue, and I believe that actually the condition is based on static values, because I'm checking if a parameter is None or not (updates_mask, _ = jtu.tree_flatten(jtu.tree_map(lambda m, v: m == True and v is not None, group_mask, updates))). From my understanding, this is a static check, since setting a param to None (or to an actual value) would change the shape of the model and shapes in Jax are static.

What do you think about this conclusion @mtthss ?

mtthss Sep 20, 2022
Maintainer

Yes I think you are correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optax + Equinox, calling .update with None gradients. #411

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Optax + Equinox, calling .update with None gradients. #411

liukidar Sep 6, 2022

Replies: 2 comments · 4 replies

mtthss Sep 6, 2022 Maintainer

liukidar Sep 6, 2022 Author

mtthss Sep 7, 2022 Maintainer

liukidar Sep 7, 2022 Author

liukidar Sep 16, 2022 Author

mtthss Sep 20, 2022 Maintainer

liukidar
Sep 6, 2022

Replies: 2 comments 4 replies

mtthss
Sep 6, 2022
Maintainer

liukidar Sep 6, 2022
Author

mtthss
Sep 7, 2022
Maintainer

liukidar Sep 7, 2022
Author

liukidar Sep 16, 2022
Author

mtthss Sep 20, 2022
Maintainer