Building neural networks without PyTorch
This exercise follows Andrej Karpathy's YouTube series on understanding the first principles of neural networks [1].
I've built deep neural network architectures with PyTorch before. I still wanted to implement a ground-up small-scale neural network for three reasons:
-
To understand the low-level knobs that impact complex deep learning architectures. How can I take the robustness of my neural networks to the next level?
-
To identify shortcomings in the state-of-the-art today. What ideas do we treat as immutable truths, when they were just one of the many options before their widespread adoption?
-
To develop intuition for borrowing ideas from different fields for ML. How did researchers attack problems that nobody had answers for, to eventually create something that could change society?
Four custom classes have been created in this exercise: Value, Neuron, Layers, and MLP.
Value is used to capture the network’s parameters. Compared to scalar parameters, Value parameters enable three additional functionalities: topological search for backward pass, computation & storage of gradients, and visualisation of the network.
Neuron is used to instantiate single neurons in each layer. It initialises the input weights and biases from the previous layer for each neuron.
Layer sets up the desired number of neurons in each layer.
And MLP sets up the desired layers in each network.
Custom functions are also written to run the training loop, forward pass, backward pass, and weight updates.
The neural network is then trained on a tiny dataset of 4 vectors to observe its behaviour with time, and its final version is visualized using a combination of custom and graphviz methods.