Introduction + Why I made this project

https://github.com/Saadkhalid913/CNN_FROM_SCRATCH

I've used convolutional neural networks for image processing in my own personal projects. CNN's are some of the most fascinating advances in machine learning to date, and I've had a lot of fun using them to create interesting models. I've played around with the OpenCV image processing library, using CNN's with pre-trained parameters to detect body movement, apply filters to images, and other interesting tasks. I've also used the Tensorflow library to train and evaluate my own CNN's on image datasets.

One thing that I was missing, however, was an understanding of how CNN's worked on a lower level. I had a decent understanding of the convolution operation, implementing it myself in a web application (link here), but I had no idea how gradients were calculated for convolutional layers. In fact, I didn't have a firm understanding of gradients in general!

I decided it would be a good project to learn about backpropagation in regular, feed-forward neural networks, and then generalize those concepts to a convolutional layer.

Week One: Linear Algebra + Calculus Review

I was a bit out of practice in linear algebra and differential calculus, so it was necessary to review the subjects before I started doing any research. I used Khan Academy as well as MIT Open courseware for this

Week Two: Understanding the Backwards Propagation Step

I had a decent enough knowledge of neural networks, and how they arrived at results given some data. I had implemented forward propagation in a neural network as well, but of course, this is useless without a means to optimize the trainable parameters.

I looked through resources and videos on backward propagation and learned how to calculate the derivatives of loss functions, as well as activation functions.

I had two large gaps in my knowledge however, one was that I didn't understand the purpose of performing a transposition on the inputs during the propagation of the error to the previous layer. The second was that I was unsure how to calculate the derivative of a matrix dot product operation.

I gained an understanding of the second gap by performing the calculations by hand (not on an actual dataset of course, but some very small matrices of made-up values). The purpose of the transposition then becomes obvious. It is to reduce the matrix into the shape of the weights of the current layer. I explain this in my video linked below.

I gained an understanding of the first gap in my knowledge by understanding that matrix multiplication can be described as a set of linear equations, for which the derivative can be calculated by using the power rule. This is also explained in my video.