Skip to content

Baking project is an exploratory data analysis followed by a charecter-level LSTM RNN trained on ~ 26k recipes from Kaggle dataset

Notifications You must be signed in to change notification settings

irinadidid/baking-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

baking-project

I started baking in the beginning of the pandemic in 2020. I was wondering what makes a cookie so crunchy, a bread that fluffy and a cake so spongy. So I decided to work on a project that will shed the light on a magic behind baking.

Baking project is an exploratory data analysis followed by a charecter-level LSTM RNN trained on ~ 26k recipes from Kaggle dataset.

Data

is avaliable here https://www.kaggle.com/shuyangli94/food-com-recipes-and-user-interactions

We have cleaned and prepared the data by selecting only baking recipes that consist of no less then 2 steps and ingredients. Then, we assigned each recipe a 'recipe type'.

Preliminary data analysis

showed the key differences between cakes, breads and cookies in terms of effort, time and ingredients needed.

ingredients distribution

Pic. 1. Distribution of the number of inredients.

estimated cooking time

Pic. 2. Estimeted cooking time in minutes.

Here we notice that in both cases on average cake recipes seem to be more difficult to repeat as they require more ingredients and more effort i.e steps to folllow. For cookies we see less variability in number of ingredients and number of steps as well as time to cook. At the same time bread recipes are very diverse: from easy-to-follow 3 steps breads to very time consuming 145 steps recipe requiring 43 ingredients. On average it takes longer to cook a bread rather then cake or cookies.

Text analysis

gave us an idea of what cookies, cakes and bread are made of. We have scraped the most common ingredients for each recipe type.

Cookies:

cookie ingredients

Cakes:

cake ingredients

Breads:

bread ingredients

Things to improve

  • The number of observations could be extended by scraping more recipes from web
  • Some sweet breads should be marked as 'cakes' for better classification (i.e banana bread)

LSTM model training

On a high level, Recurrent Neural Network (RNN) is a class of deep neural networks, most commonly applied to sequence-based data like speech, voice, text or music. They are used for machine translation, speech recognition, voice synthesis etc. The key feature of RNNs is that they are stateful, and they have an internal memory in which some context for the sequence may be stored. For better understanding consult https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Our RNN was was trained on Python using TensorFlow 2 with Keras API.

Model components:

plot model

For each character the model looks up the embedding, runs the LSTM one time-step with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character.

Example generated recipe

📌 TITLE

chocolate cake

👀 DESCRIPTION

this is the dough on the topping with bread machine.

🍒 INGREDIENTS

• white sugar
• eggs
• milk
• vanilla extract
• all-purpose flour
• baking powder
• salt
• baking soda
• baking soda
• salt
• butter
• sugar
• vanilla extract
• baking powder
• baking soda
• salt
• sugar
• vanilla
• cream
• salt

📝 INSTRUCTIONS

▪︎ preheat oven to 350f
▪︎ butter a 9x5x3' pan
▪︎ blend self raising eggs , one at a time , beating well with the flour and salt
▪︎ stir in the chocolate chips and salt
▪︎ add the milk and vanilla
▪︎ stir in the butter , and mix well
▪︎ add the chocolate chips and remaining sugar
▪︎ add the eggs , one at a time , beating well after each addition
▪︎ pour into prepared pan
▪︎ bake at 350 for 12-12 minutes or until golden brown
▪︎ cool in pan for 10 minutes
▪︎ top with remaining cream cheese , then refrigerate for at least 2 hours or until the cake is set and the cake is set , about 25 minutes until the cake pan
▪︎ refrigerate for at least 1 hour
▪︎ remove f

Demo

Demo is avaliable in the last cell of the baking-project.ipynb

I have been using an opensource project https://gradio.app

demo

Things to improve

  • Recipe title, description, ingredients and instructions are disconneced most of the time
  • We have lots of repetitions especially on the ingredients section
  • For better perfomance we could, again, use more data

About

Baking project is an exploratory data analysis followed by a charecter-level LSTM RNN trained on ~ 26k recipes from Kaggle dataset

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published