Int8

Neural Networks in Julia – Hyperbolic tangent and ReLU neurons

Our goal for this post is to introduce and implement new types of neural network nodes using Julia language. These nodes are called ‘new’ because this post loosely refers to the existing code. So far we introduced sigmoid and linear layers and today we will describe another two types of neurons. First we will look at hyperbolic tangent that will turn out to be similar (in shape at least) to sigmoid. Then we will focus on ReLU (rectifier linear unit) that on the other hand is slightly different as it in fact represents non-differentiable function. Both yield strong practical implications (w...

Optimization techniques comparison in Julia: SGD, Momentum, Adagrad, Adadelta, Adam

In today’s post we will compare five popular optimization techniques: SGD, SGD+momentum, Adagrad, Adadelta, and Adam – methods for finding local optimum (global when dealing with convex problems) of certain differentiable functions. In the experiments conducted later in this post, these functions will all be error functions of feed-forward neural networks of various architectures for the problem of multi-label classification of MNIST (dataset of handwritten digits). In our considerations, we will refer to what we know from previous posts. We will also extend the existing code. Stochastic g...

Backpropagation from scratch in Julia (part I)

Neural networks have been going through a renaissance recently. After exploding computational power availability (often GPU-based), recent improvements in neural networks initialization (pre-training with RBMs or autoencoders), overcoming vanishing gradient problem in recurrent networks (LSTM), and advances in optimization techniques (AdaGrad, AdaDelta, Adam, and others), neural networks are in the center of attention again. And the attention is tremendous, in fact. Neural networks have won most of the machine learning challenges during the last couple of years. Their bio-inspired nature an...

Random walk vectors for clustering (final)

Hi there. I have finally managed to be finishing the long series of posts about how to use random walk vectors for clustering problem. It’s been a long series and I am happy to finish it as the whole blog suddenly moved away from being about Julia and turned into a random walk weirdo… Anyways let’s finish what has been started. In the previous post we saw how to use many random walks to cluster given dataset. Presented approach was evaluated on toy datasets and our goal for this post is to try it on some more serious one. We will then apply same approach onto MNIST dataset—a set of handwri...

Random walk vectors for clustering (III)

This is the third post about how to use random walk vectors for clustering. The main idea as was stated before is to represent point cloud as a graph based on similarities between points. Similarities between points are encoded in the form of a matrix, and the matrix is then treated as a weight matrix of a graph. Such a graph is then traversed randomly resulting in a set of random walk vectors (with seed vectors being focused on one different starting point each walk). Each random walk vector represents similarities between points once again—but this time it encodes global dataset shape aro...

Random walk vectors for clustering (part II – perspective switch)

This post is a second part of the series of posts that will result in combining random walk vectors for clustering. So far we have understood what the similarity is, how to build similarity matrix based on distance matrix. We know the similarity matrix is a structure that encodes similarities between all objects in our dataset. Today we will further motivate our quest for similarity matrix started in the previous post. We will tell a little bit about what the graph is and how to switch from point cloud perspective to graph perspective. The bridge between these two worlds is the similarity ...

Random walk vectors for clustering (part I – similarity between objects)

This post opens a series of short discussions on how to use multiple random walk vectors (vectors describing probability distributions of a random surfer on a graph—like PageRank) to find homogeneous groups of objects in a dataset. The main idea here is to combine the existing work of many researchers into one method. The mathematics behind these ideas might be complex, but we will try to be as ELI5-oriented as possible—so anyone can read, understand (at least the basics), and implement it themselves. Since the whole concept relies on components that exist on different abstraction levels, w...

Logistic Regression Part Ii Evaluation

This post is a continuation of a recent post where we implemented the gradient descent method for solving logistic regression problems. Today, we will try to evaluate our algorithm to find out how it works in the wild. code Last time, the logistic regression problem was stated strictly as an optimization problem, which suggests our evaluation should consider the goal function itself—which is to some extent correct; indeed, it makes sense to check how ( J ) changes while the algorithm runs. But it is not the best idea one can have. Please recall our goal. We aimed for a parametrized labeli...

Basic visualization in Julia – Gadfly

In this post, we will walk through the basics of Gadfly – a visualization package written in Julia. Gadfly is Julia’s implementation of the layered grammar of graphics proposed by Hadley Wickham, who implemented his idea into the ggplot2 package, the main visualization library in R. Interestingly, the original inventor of the “grammar of graphics” (who inspired Wickham) is now employed by Tableau Software – a leading company in data visualization. The main motivation for the grammar of graphics is to formalize visualization for statistics. Authors use the word “grammar” so one can think of...

Solving logistic regression problem in Julia

In this post, we will try to implement gradient descent approach to solve logistic regression problem. Please note this is mainly for educational purposes and the aim here is to learn. If you want to “just apply” logistic regression in Julia, please check out this one. Let’s start with basic background. Logistic regression is a mathematical model serving to predict binary outcomes (here we consider binary case) based on a set of predictors. A binary outcome (meaning outcome with two possible values, like physiological gender, spam/not spam email, or fraud/legitimate transaction) is very ...

© int8.io