Chess position evaluation with convolutional neural network in Julia

In this post we will try to challenge the problem of chess position evaluation using convolutional neural network (CNN) – a neural network type designed to deal with spatial data. We will first explain why we need CNNs then we will present two fundamental CNN layers. Having some knowledge from the inside of the black box, we will apply CNN to the binary classification problem of chess position evaluation using Julia deep learning library – Mocha.jl. Introduction – data representation One of the challenges that frequently occurs in machine learning is proper representation of the input da...

Apr 2, 2016 Classification

Neural Networks in Julia – Hyperbolic tangent and ReLU neurons

Our goal for this post is to introduce and implement new types of neural network nodes using Julia language. These nodes are called ‘new’ because this post loosely refers to the existing code. So far we introduced sigmoid and linear layers and today we will describe another two types of neurons. First we will look at hyperbolic tangent that will turn out to be similar (in shape at least) to sigmoid. Then we will focus on ReLU (rectifier linear unit) that on the other hand is slightly different as it in fact represents non-differentiable function. Both yield strong practical implications (w...

Feb 2, 2016 Classification

Optimization techniques comparison in Julia: SGD, Momentum, Adagrad, Adadelta, Adam

In today’s post we will compare five popular optimization techniques: SGD, SGD+momentum, Adagrad, Adadelta, and Adam – methods for finding local optimum (global when dealing with convex problems) of certain differentiable functions. In the experiments conducted later in this post, these functions will all be error functions of feed-forward neural networks of various architectures for the problem of multi-label classification of MNIST (dataset of handwritten digits). In our considerations, we will refer to what we know from previous posts. We will also extend the existing code. Stochastic g...

Jan 6, 2016 Classification

Backpropagation from scratch in Julia (part I)

Neural networks have been going through a renaissance recently. After exploding computational power availability (often GPU-based), recent improvements in neural networks initialization (pre-training with RBMs or autoencoders), overcoming vanishing gradient problem in recurrent networks (LSTM), and advances in optimization techniques (AdaGrad, AdaDelta, Adam, and others), neural networks are in the center of attention again. And the attention is tremendous, in fact. Neural networks have won most of the machine learning challenges during the last couple of years. Their bio-inspired nature an...

Dec 23, 2015 Classification

Random walk vectors for clustering (final)

Hi there. I have finally managed to be finishing the long series of posts about how to use random walk vectors for clustering problem. It’s been a long series and I am happy to finish it as the whole blog suddenly moved away from being about Julia and turned into a random walk weirdo… Anyways let’s finish what has been started. In the previous post we saw how to use many random walks to cluster given dataset. Presented approach was evaluated on toy datasets and our goal for this post is to try it on some more serious one. We will then apply same approach onto MNIST dataset—a set of handwri...

Nov 9, 2015 Clustering

Random walk vectors for clustering (III)

This is the third post about how to use random walk vectors for clustering. The main idea as was stated before is to represent point cloud as a graph based on similarities between points. Similarities between points are encoded in the form of a matrix, and the matrix is then treated as a weight matrix of a graph. Such a graph is then traversed randomly resulting in a set of random walk vectors (with seed vectors being focused on one different starting point each walk). Each random walk vector represents similarities between points once again—but this time it encodes global dataset shape aro...

Sep 23, 2015 Clustering

Random walk vectors for clustering (part II – perspective switch)

This post is a second part of the series of posts that will result in combining random walk vectors for clustering. So far we have understood what the similarity is, how to build similarity matrix based on distance matrix. We know the similarity matrix is a structure that encodes similarities between all objects in our dataset. Today we will further motivate our quest for similarity matrix started in the previous post. We will tell a little bit about what the graph is and how to switch from point cloud perspective to graph perspective. The bridge between these two worlds is the similarity ...

Aug 5, 2015 Clustering

Random walk vectors for clustering (part I – similarity between objects)

This post opens a series of short discussions on how to use multiple random walk vectors (vectors describing probability distributions of a random surfer on a graph—like PageRank) to find homogeneous groups of objects in a dataset. The main idea here is to combine the existing work of many researchers into one method. The mathematics behind these ideas might be complex, but we will try to be as ELI5-oriented as possible—so anyone can read, understand (at least the basics), and implement it themselves. Since the whole concept relies on components that exist on different abstraction levels, w...

Jul 30, 2015 Clustering

Logistic Regression Part Ii Evaluation

This post is a continuation of a recent post where we implemented the gradient descent method for solving logistic regression problems. Today, we will try to evaluate our algorithm to find out how it works in the wild. code Last time, the logistic regression problem was stated strictly as an optimization problem, which suggests our evaluation should consider the goal function itself—which is to some extent correct; indeed, it makes sense to check how ( J ) changes while the algorithm runs. But it is not the best idea one can have. Please recall our goal. We aimed for a parametrized labeli...

Jul 26, 2015

Basic visualization in Julia – Gadfly

In this post, we will walk through the basics of Gadfly – a visualization package written in Julia. Gadfly is Julia’s implementation of the layered grammar of graphics proposed by Hadley Wickham, who implemented his idea into the ggplot2 package, the main visualization library in R. Interestingly, the original inventor of the “grammar of graphics” (who inspired Wickham) is now employed by Tableau Software – a leading company in data visualization. The main motivation for the grammar of graphics is to formalize visualization for statistics. Authors use the word “grammar” so one can think of...

Jul 19, 2015 Visualization