I started baking in the beginning of the pandemic in 2020. I was wondering what makes a cookie so crunchy, a bread that fluffy and a cake so spongy. So I decided to work on a project that will shed the light on a magic behind baking.
Baking project is an exploratory data analysis followed by a charecter-level LSTM RNN trained on ~ 26k recipes from Kaggle dataset.
is avaliable here https://www.kaggle.com/shuyangli94/food-com-recipes-and-user-interactions
We have cleaned and prepared the data by selecting only baking recipes that consist of no less then 2 steps and ingredients. Then, we assigned each recipe a 'recipe type'.
showed the key differences between cakes, breads and cookies in terms of effort, time and ingredients needed.
Pic. 1. Distribution of the number of inredients.
Pic. 2. Estimeted cooking time in minutes.
Here we notice that in both cases on average cake recipes seem to be more difficult to repeat as they require more ingredients and more effort i.e steps to folllow. For cookies we see less variability in number of ingredients and number of steps as well as time to cook. At the same time bread recipes are very diverse: from easy-to-follow 3 steps breads to very time consuming 145 steps recipe requiring 43 ingredients. On average it takes longer to cook a bread rather then cake or cookies.
gave us an idea of what cookies, cakes and bread are made of. We have scraped the most common ingredients for each recipe type.
Cookies:
Cakes:
Breads:
- The number of observations could be extended by scraping more recipes from web
- Some sweet breads should be marked as 'cakes' for better classification (i.e banana bread)
On a high level, Recurrent Neural Network (RNN) is a class of deep neural networks, most commonly applied to sequence-based data like speech, voice, text or music. They are used for machine translation, speech recognition, voice synthesis etc. The key feature of RNNs is that they are stateful, and they have an internal memory in which some context for the sequence may be stored. For better understanding consult https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Our RNN was was trained on Python using TensorFlow 2 with Keras API.
Model components:
For each character the model looks up the embedding, runs the LSTM one time-step with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character.
📌 TITLE
chocolate cake
👀 DESCRIPTION
this is the dough on the topping with bread machine.
🍒 INGREDIENTS
• white sugar
• eggs
• milk
• vanilla extract
• all-purpose flour
• baking powder
• salt
• baking soda
• baking soda
• salt
• butter
• sugar
• vanilla extract
• baking powder
• baking soda
• salt
• sugar
• vanilla
• cream
• salt📝 INSTRUCTIONS
▪︎ preheat oven to 350f
▪︎ butter a 9x5x3' pan
▪︎ blend self raising eggs , one at a time , beating well with the flour and salt
▪︎ stir in the chocolate chips and salt
▪︎ add the milk and vanilla
▪︎ stir in the butter , and mix well
▪︎ add the chocolate chips and remaining sugar
▪︎ add the eggs , one at a time , beating well after each addition
▪︎ pour into prepared pan
▪︎ bake at 350 for 12-12 minutes or until golden brown
▪︎ cool in pan for 10 minutes
▪︎ top with remaining cream cheese , then refrigerate for at least 2 hours or until the cake is set and the cake is set , about 25 minutes until the cake pan
▪︎ refrigerate for at least 1 hour
▪︎ remove f
Demo is avaliable in the last cell of the baking-project.ipynb
I have been using an opensource project https://gradio.app
- Recipe title, description, ingredients and instructions are disconneced most of the time
- We have lots of repetitions especially on the ingredients section
- For better perfomance we could, again, use more data