This is the second post of the series describing backpropagation algorithm applied to feed forward neural network training. In the last post we described what neural network is and we concluded it is a parametrized mathematical function. We implemented neural network initialization (meaning creating a proper entity representing the network – not weight initialization) and inference routine but we never made any connection to the data itself. In this post we will make such a connection and we will express meaning of parametrization “goodness” in terms of training data and network output.
Neural networks have been going through a renaissance recently. After exploding computational power availability (often GPU based), recent improvements in neural networks initialization (pre-training with RBMs or autoencoders), overcoming vanishing gradient problem in recurrent networks (LSTM) and advances in optimization techniques (AdaGrad, AdaDelta, Adam and others) neural networks are in the centre of attention, again. And the attention is tremendous in fact. Neural networks have won most of machine learning challenges during the last couple of years. Their bio-inspired nature and so their human-like roots caused neural networks popularity growth among regular people, too. Friends of mine with no scientific background whatsoever happen to poke me asking if I heard of “singularity” or neural networks by any chance?
In other words, neural networks can no longer be ignored, and the goal of this post is to catch up a bit and learn basics about them.
In this series of posts we will describe step by step how to train a feedforward neural network with backpropagation algorithm. However, if this is your first attempt to understand backpropagation I think it is good idea to take a look at ELI5/nomath tutorial by Andrej Karpathy, the next step might be to check out this material by Alex Graves (Neural Networks chapter). Another important thing, if you are looking for near-production ready implementation please take a look at Mocha.jl or MxNet.jl if it has to be Julia. Otherwise one might check out recently released TensorFlow, Torch7 or keras
Here, we will focus on backpropagation algorithm in the context of multi-label classification problem. It means we will be given a dataset that is labeled with more than two classes and our goal will be to classify each dataset entry correctly. Sometimes, to provide an example, we will refer to the MNIST dataset – dataset of images of handwritten digits. Finally we will run our experiments on MNIST.