This is the second post of the series describing backpropagation algorithm applied to feed forward neural network training. In the last post we described what neural network is and we concluded it is a parametrized mathematical function. We implemented neural network initialization (meaning creating a proper entity representing the network – not weight initialization) and inference routine but we never made any connection to the data itself. In this post we will make such a connection and we will express meaning of parametrization “goodness” in terms of training data and network output.
Please note this is mainly for educational purpose and the aim here is to learn. If you want to “just apply” logistic regression in julia please check out this one
Lets start with basic background.
Logistic regression is a mathematical model serving to predict binary outcome (here we consider binary case) based on set of predictors. Binary outcome (meaning outcome with two possible values, like physiognomical gender spam/not spam email or fraud/legit transaction) is very often called target variable and predictors can be also called features. The latter name is much more intuitive as it is closer to Earth, so to say, it suggests we deal with some sort of objects with known features and we try to predict binary feature that is unknown at the time of prediction. The main idea is to predict the target given set of features based on some training data. Training data is a set of objects for which we know the true target.
So our task is to derive a function that will later serve as a 0/1 labelling machine – mechanism to produce (hopefully) correct target for given object described with features. The class of functions we choose to derive the target is called the model. Such a class is parametrized with set of parameters and the task of learning is in fact to derive these parameters to let our model perform well. Performing well is on the other hand measured with yet another function called goal function. The goal function is what we optimize when we learn parameters for a model. It involves model parameters and training data. It is a function of model parameters and training data with known labels. So the problem of machine learning can be (ignorantly of course) reduce to the problem of estimating set of parameters of given class of functions (model) to minimize given goal function.