Skip to content

Commit

Permalink
Added missing English words (#578)
Browse files Browse the repository at this point in the history
arbitrary choice of two hidden layers **here** with 16 neurons each
  • Loading branch information
wafe authored May 2, 2024
1 parent f176c82 commit 3a051d2
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions 2017/gradient-descent/english/transcript.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ These digits are rendered on a 28x28 pixel grid, each pixel with some grayscale
Those are what determine the activations of 784 neurons in the input layer of the network.
And then the activation for each neuron in the following layers is based on a weighted sum of all the activations in the previous layer, plus some special number called a bias.
Then you compose that sum with some other function, like the sigmoid squishification, or a relu, the way I walked through last video.
In total, given the somewhat arbitrary choice of two hidden layers with 16 neurons each, the network has about 13,000 weights and biases that we can adjust, and it's these values that determine what exactly the network actually does.
In total, given the somewhat arbitrary choice of two hidden layers here with 16 neurons each, the network has about 13,000 weights and biases that we can adjust, and it's these values that determine what exactly the network actually does.
Then what we mean when we say that this network classifies a given digit is that the brightest of those 10 neurons in the final layer corresponds to that digit.
And remember, the motivation we had in mind here for the layered structure was that maybe the second layer could pick up on the edges, and the third layer might pick up on patterns like loops and lines, and the last one could just piece together those patterns to recognize digits.
So here, we learn how the network learns.
Expand Down Expand Up @@ -122,4 +122,4 @@ Whereas if you're actually training on a structured dataset, one that has the ri
And so what was also interesting about that is it brings into light another paper from actually a couple of years ago, which has a lot more simplifications about the network layers, but one of the results was saying how if you look at the optimization landscape, the local minima that these networks tend to learn are actually of equal quality, so in some sense if your dataset is structured, you should be able to find that much more easily.
My thanks, as always, to those of you supporting on Patreon.
I've said before just what a game changer Patreon is, but these videos really would not be possible without you.
I also want to give a special thanks to the VC firm Amplify Partners and their support of these initial videos in the series. Thank you.
I also want to give a special thanks to the VC firm Amplify Partners and their support of these initial videos in the series. Thank you.

0 comments on commit 3a051d2

Please sign in to comment.