Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Code Example : Forward-Forward Algorithm for Image Classification #1170

Merged
merged 4 commits into from
Jan 9, 2023
Merged

New Code Example : Forward-Forward Algorithm for Image Classification #1170

merged 4 commits into from
Jan 9, 2023

Conversation

suvadityamuk
Copy link
Contributor

@suvadityamuk suvadityamuk commented Dec 23, 2022

This PR adds a new Code Example implementation for the Forward-Forward algorithm, as introduced by Prof. Hinton in his paper at NeurIPS 2022.

Some things to note:

  • This is implemented completely using Keras and TensorFlow-backend code, but is an almost-direct port of this PyTorch implementation of the same algorithm here
  • Since the weight updates occur within the layer itself as per the algorithm, some of the code may be working against the original design of the Keras API. But such is the algorithm that looks to defy them altogether. Would appreciate any reviews and thoughts on that as well
  • I use TQDM and random for some cleaner visuals and simpler code.

Tagging @LukeWood @fchollet for a review. Thank you for your time!

Signed-off-by: Suvaditya Mukherjee suvadityamuk@gmail.com

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
@suvadityamuk
Copy link
Contributor Author

Hi, @fchollet! Gentle ping for a review, thank you! Oh, and Merry Christmas 🎄 !

@fchollet
Copy link
Contributor

Thanks for the PR! I'll take a close look in the next few days.

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It's a great example!

training instead of the traditionally-used method of backpropagation, as proposed by
[Prof. Geoffrey Hinton](https://www.cs.toronto.edu/~hinton/FFA13.pdf)
The concept was inspired by the understanding behind [Boltzmann
Machines](http://www.cs.toronto.edu/~fritz/absps/dbm.pdf). Backpropagation involves
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep all markdown links on a single line, otherwise they won't properly render

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make this change across the file

[Prof. Geoffrey Hinton](https://www.cs.toronto.edu/~hinton/FFA13.pdf)
The concept was inspired by the understanding behind [Boltzmann
Machines](http://www.cs.toronto.edu/~fritz/absps/dbm.pdf). Backpropagation involves
calculating loss via a cost function and propagating the error across the network. On the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can find a slightly better one-line description of backprop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update that.


The following example explores how to use the Forward-Forward algorithm to perform
training instead of the traditionally-used method of backpropagation, as proposed by
[Prof. Geoffrey Hinton](https://www.cs.toronto.edu/~hinton/FFA13.pdf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Proposed by Hinton in [The Forward-Forward Algorithm: Some Preliminary Investigations]() (2022)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will make this change

other hand, the FF Algorithm suggests the analogy of neurons which get "excited" based on
looking at a certain recognized combination of an image and its correct corresponding
label.
This method takes certain inspiration from the biological learning process that occurs in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add line breaks between paragraphs

Date created: 2022/12/21
Last modified: 2022/12/23
Description: Training a Dense-layer based model using the Forward-Forward algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an Accelerator: field (either GPU or None)

x = self.flatten_layer(x)
perm_array = tf.range(start=0, limit=x.get_shape()[0], delta=1)
x_pos = self.overlay_y_on_x(x, y)
y_numpy = y.numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no numpy

plt.show()
else:
x = layer(x)
return {"FinalLoss": loss}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train_step should return average values, e.g. the output of loss_tracker.result() (where loss_tracker is a metrics.Mean instance).

Also, use snake case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make the change

h_pos, h_neg = x_pos, x_neg
for idx, layer in enumerate(self.layers):
if idx == 0:
print("Input layer : No training")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not include any print statements or matplotlib plots in train_step. You should write a callback for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'll implement a callback and make use of that

model = FFNetwork(dims=[784, 500, 500])

model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.03), loss="mse", run_eagerly=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should arrive to a model that can be run in graph mode, without requiring run_eagerly=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will try to do so


results = accuracy_score(preds, y_test)

print(f"Accuracy score : {results*100}%")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you get?

Copy link
Contributor Author

@suvadityamuk suvadityamuk Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Accuracy score currently stands at ~24-27% on different runs (can set a constant Seed to enhance reproducibility). While the paper does get somewhat better results (not anywhere near SOTA though), I believe that can be achieved with more tuning and perhaps a wider network

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25% accuracy on MNIST tells you that your algorithm does not work, unfortunately. A simple logistic regression does ~92% or so. A simple logistic regression after applying ~90% noise on the input (setting to 0 90% of the pixels, randomly) still does 67%. Even if it were poorly tuned, the example should achieve at least 97% if the algorithm did work as expected.

Copy link
Contributor Author

@suvadityamuk suvadityamuk Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Let me iterate on this algorithm for a while, see if and where I am making some errors and I will get back to you on this

def forwardforward(self, x_pos, x_neg):
loss_list = []
for i in trange(self.num_epochs):
with tf.GradientTape() as tape:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we falling back to backprop here?
The loss is calculated, gradients are computed backwards, and then we're doing an optimizer step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening locally, solely for this layer. So wouldn't qualify for backward propagation just yet.

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
@suvadityamuk
Copy link
Contributor Author

Hi, @fchollet

I have just added a new commit with all the changes. Apologies for the delay, took some time to make sure all of your comments are addressed.
The new Test Accuracy is 97.75% as per my last training run. I have also made use of the xla.dynamic_update_slice as per your linked PR's suggestion (thanks a ton for that, by the way.)
I have also added more detailed comments and suggestions
Please let me know if any other changes are required.

Also, a very Happy New Year to you!

@suvadityamuk suvadityamuk requested a review from fchollet January 8, 2023 15:33
@suvadityamuk
Copy link
Contributor Author

Also, not sure why the docker-image CI is failing. It seems due to some keras_cv-related version issue although I have not used it within this example.

@fchollet
Copy link
Contributor

fchollet commented Jan 9, 2023

The new Test Accuracy is 97.75% as per my last training run. I have also made use of the xla.dynamic_update_slice as per your linked PR's suggestion (thanks a ton for that, by the way.)

Congrats on making it work!

The current code looks good to me. Please add the generated files. I pushed some copyedits, please pull them first.

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
@suvadityamuk
Copy link
Contributor Author

Hi, have added the generated files and some minor factual edits. Thank you!

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution! 👍

@fchollet fchollet merged commit 23b54b7 into keras-team:master Jan 9, 2023
@suvadityamuk
Copy link
Contributor Author

Happy to contribute!

@michaelStettler
Copy link

HI,

First of all thanks a lot for the code. It's very instructive to see how you implemented this directly in keras/tensorflow. Anyway, I wanted to try it and I just copy pasted the code but I got a:

KeyError: 'The optimizer cannot recognize variable dense_1/kernel:0. This usually means you are trying to call the optimizer to update different parts of the model separately. Please call optimizer.build(variables) with the full list of trainable variables before the training loop or use legacy optimizer
`tf.keras.optimizers.legacy.{self.class.name}.'

I've tried several thing from yesterday but none worked. Any suggestion on what that could mean on where to add this optimizer.build?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants