Posts tagged with: Julia

## Automatic differentiation for machine learning in Julia

Automatic differentiation is a term I first heard of while working on (as it turns out now, a bit cumbersome) implementation of backpropagation algorithm – after all it caused lots of headaches as I had to handle all derivatives myself with almost pen-and-paper like approach. Obviously I made many mistakes until I got my final solution working.

At that time, I was aware some libraries like Theano or Tensorflow handle derivatives in a certain “magical” way for free. I never knew exactly what happens deep in the guts of these libraries though and I somehow suspected it is rather painful than fun to grasp (apparently, I was wrong!).

I decided to take a shot and directed my first steps towards TensorFlow official documentation to quickly find out what the magic is. The term I was looking for was automatic differentiation.

---

## Chess position evaluation with convolutional neural network in Julia

In this post we will try to challenge the problem of chess position evaluation using convolutional neural network (CNN) – neural network type designed to deal with spatial data. We will first explain why we need CNNs then we will present two fundamental CNNs layers. Having some knowledge from the inside of the black box, we will apply CNN to binary classification problem of chess position evaluation using Julia deep learning library – Mocha.jl.

### Introduction – data representation

One of the challenges that frequently occurs in machine learning is proper representation of the input data. Ideally, data is desired to be represented in a way that it carries as much information while being digestable for the ML algorithms. Digestibility means fitting in existing mathematical frameworks where known abstract tools can be applied.

A common convenient representation of single observation is a vector in $$\mathbb{R}^n$$. Assuming such representation, ML problems may be seen from many different angles – with benefit of using well known abstractions/interpretations. One perspective that is very common is algebraic perspective – having the input data as a matrix (one vector per column), its eigendecomposition or various factorizations may be considered – they both yield important results in the context of machine learning. Set of vectors in $$\mathbb{R}^n$$ shapes a point cloud – when geometry of such cloud is considered manifold learning methods emerge. Linear model with least squares error has closed form solution in algebraic framework. In all of these cases, representing input data as vectors implies broad range of tools to handle the problem effectively.

For some domains though it is not obvious how to represent input as vectors while preserving original information contained in the data. An example of such domain is text. Text document is rich in various types of information – there is a semantics and syntax of the text or even personal style of the writer. It is not clear how to represent this unnamed information contained in text. People tend to simplify it and use Bag of Words (BoW) approach to represent text (which completely ignores ordering of words in a document – treats it a a set).

Another domain that suffers from similar problem is domain of images. The spatiality of the data is missing when representing images as vectors of dimensionality equal to the total number of pixels. When one represents image that way the spatial information is lost – the algorithm that later consumes the input vectors is usually not aware the original structure of images is a set of 2-dimensional grids (one matrix for each channel).

So far our neural network has not been aware of two dimensional nature of input data (MNIST). It could of course find it out itself learning relations between neighboring pixels, but, the fact is, it had no clue so far.

---

## Neural Networks in Julia – Hyperbolic tangent and ReLU neurons

get the code from here

Our goal for this post is to introduce and implement new types of neural network nodes using Julia language. These nodes are called ‘new’ because this post loosely refers to the existing code.

So far we introduced sigmoid and linear layers and today we will describe another two types of neurons. First we will look at hyperbolic tangent that will turn out to be similar (in shape at least) to sigmoid. Then we will focus on ReLU (rectifier linear unit) that on the other hand is slightly different as it in fact represents non-differentiable function. Both yield strong practical implications (with ReLU being considered more important recently – especially when considered in the context of networks with many hidden layers).

What is the most important though, adding different types of neurons to neural network changes the function it represents and so its expressiveness, lets then emphasize this as the main reason they are being added.

#### Hyperbolic tangent layer

From the biological perspective, the purpose of sigmoid activation function as single node ‘crunching function’ is to model passing an electrical signal from one neuron to another in brain. Strength of that signal is expressed by a number from $$(0,1)$$ and it relies on signal from the input neurons connected to the one under consideration. Hyperbolic tangent is yet another way of modelling it.

Let’s first take a look at the form of hyperbolic tangent:

$f(x) = \frac{\mathrm{e}^x – \mathrm{e}^{-x}}{\mathrm{e}^x + \mathrm{e}^{-x}}$ Continue Reading

---