Recreating recent and notable deep learning architectures from scratch using only built in python functionality. No PyTorch or Tensorflow (except for the tensor class from PyTorch). Nothing else was used (not even gradient calculations or modules)
Import desired modules:
from ModernArchitecturesFromScratch.basic_operations_01 import *
from ModernArchitecturesFromScratch.fully_connected_network_02 import *
from ModernArchitecturesFromScratch.model_training_03 import *
from ModernArchitecturesFromScratch.convolutions_pooling_04 import *
from ModernArchitecturesFromScratch.callbacks_05 import *
from ModernArchitecturesFromScratch.batchnorm_06 import *
from ModernArchitecturesFromScratch.optimizers_07 import *
a full list of functions, classes and features defined in this API can be found here: https://bradleybrown19.github.io/ModernArchitecturesFromScratch/ Please see the tab on the left for a description of each imported module
Feel free to look at either the exported python files or the notebooks used to develop them!
Getting model, loss function and databunch
model = SequentialModel(Linear(784, 50, relu_after=True), ReLU(), Linear(50,10,False))
loss_func = CrossEntropy()
db = get_mnist_databunch()
model
(Layer1): Linear(784, 50)
(Layer2): ReLU()
(Layer3): Linear(50, 10)
show_im(db.train_ds.x)
Creating a learning rate scheduler and passing desired callbacks into Runner
class
schedule = combine_scheds([0.4, 0.6], [cos_scheduler(0.01,0.1), cos_scheduler(0.1,0.01)])
run = get_runner(model=model, lf=loss_func, db=db, opt=StatedOptimizer,
callbacks=[Stats([accuracy]), ProgressCallback(), HyperRecorder('lr'), Scheduler('lr', schedule)])
Fitting the model and looking at metrics
run.fit(5, 0.1)
epoch | train_loss | train_accuracy | valid_loss | valid_accuracy | time |
---|---|---|---|---|---|
0 | 0.431567 | 0.870520 | 0.184491 | 0.946840 | 00:01 |
1 | 0.175159 | 0.948120 | 0.132627 | 0.961500 | 00:01 |
2 | 0.134197 | 0.960060 | 0.105048 | 0.970180 | 00:01 |
3 | 0.110969 | 0.967400 | 0.087220 | 0.975720 | 00:01 |
4 | 0.095547 | 0.972800 | 0.075881 | 0.978820 | 00:01 |
run.cbs[3].plot_loss()
run.cbs[3].plot_param('lr')
resnet = GetResnet(50, c_in=1, c_out=10)
resnet
(Layer1): Reshape(1, 28, 28)
(Layer2): ConvBnActivation(1, 16)
(Layer3): ConvBnActivation(16, 64)
(Layer4): ConvBnActivation(64, 64)
(Layer5): MaxPool(ks: 3, stride: 2)
(Layer6): (
(Layer1): ResBlock(16, 256)
(Layer2): ResBlock(64, 256)
(Layer3): ResBlock(64, 256)
)
(Layer7): (
(Layer1): ResBlock(64, 512)
(Layer2): ResBlock(128, 512)
(Layer3): ResBlock(128, 512)
(Layer4): ResBlock(128, 512)
)
(Layer8): (
(Layer1): ResBlock(128, 1024)
(Layer2): ResBlock(256, 1024)
(Layer3): ResBlock(256, 1024)
(Layer4): ResBlock(256, 1024)
(Layer5): ResBlock(256, 1024)
(Layer6): ResBlock(256, 1024)
)
(Layer9): (
(Layer1): ResBlock(256, 2048)
(Layer2): ResBlock(512, 2048)
(Layer3): ResBlock(512, 2048)
)
(Layer10): AveragePool(ks: 1, stride: 2)
(Layer11): Flatten()
(Layer12): Linear(2048, 10)
A lot of this project was inspired and learned from FastAI's deep learning course part 2. Please see: https://course.fast.ai/part2 for a great course!