Int8

Train your own South Park Fanatic AI with Mistral-7B

The purpose of this blog post is to experiment with the recent superstar of the local Large Language Models world, Mistral-7B. Developed by former Meta employees who previously worked on Llama LLM, Mistral-7B, despite being relatively small in size (“only” 7B parameters) beats several popular larger models in numerous benchmarks. The crucial aspect that sets Mistral-7b apart is its open-source release under an Apache 2.0 license, making it is freely available for commercial applications. code To showcase exactly what this model can do, we will attempt to create a South Park Fanatic: an ...

Local Large Language Models

Over the past few years, machine learning has become a bit daunting for regular folks. The coolest toys were being created by big companies with massive budgets, immense computational power, and the best talents in the world. Very often, the weights of the large models created by these big players were not disclosed, but rather bragged about in research papers/demo applications or exposed via paid API at best. Lots of regular individual researchers like myself happily training their humble models with scikit-learn soon realized it might be a bit hopeless trying to compete with that. The rec...

Attention mechanism in NLP – beginners guide

The field of machine learning is changing extremely fast for last couple of years. Growing amount of tools and libraries, fully-fledged academia education offer, MOOC, great market demand, but also sort of sacred, magical nature of the field itself (calling it Artificial Intelligence is pretty much standard right now) – all these imply enormous motivation and progress. As a result, well-established ML techniques become out-dated rapidly. Indeed, methods known from 10 years ago can often be called classical. This sort of revolution has happened recently. The default architectural choice for...

Are you OK, Cyberpunk? - Transformers diagnosis

At the end of 2020, after 8 years since announcement, Polish game development studio CDPR released its flag game titled Cyberpunk. A big success of CDPR’s previous game Witcher 3 and their “gamers-first” approach implied CDPR being perceived as a golden child of a gaming industry. CDPR was seen as one of few healthy apples in a basket of rotten fruits. Of course, there was only one emotion towards CDPR – love. All these raised the expectations towards CDPR’s new game very high. Announcement of Keanu Reeves – persona absolutely loved by the internet – “playing” one of the characters Johnny ...

Bellman Equations, Dynamic Programming and Reinforcement Learning (part 1)

Reinforcement learning has been on the radar of many, recently. It has proven its practical applications in a broad range of fields: from robotics through Go, chess, video games, chemical synthesis, down to online marketing. While being very popular, Reinforcement Learning seems to require much more time and dedication before one actually gets any goosebumps. Playing around with neural networks with PyTorch for an hour for the first time will give instant satisfaction and further motivation. Similar experience with RL is rather unlikely. If you are new to the field, you are almost guarantee...

Counterfactual Regret Minimization - the core of Poker AI beating professional players

The last 10 years have been full of unexpected advances in artificial intelligence. Among great improvements in image processing and speech recognition - the thing that got lots of media attention was AI winning against humans in various kinds of games. With OpenAI playing Dota2 and DeepMind playing Atari games in the background, the most significant achievement was AlphaGo beating Korean master in Go. It was the first time a machine presented super-human performance in Go marking - next to DeepBlue-Kasparov chess game in 1997 - a historical moment in the field of AI. Around the same time,...

Monte Carlo Tree Search – beginners guide

For quite a long time, a common opinion in academic world was that machine achieving human master performance level in the game of Go was far from realistic. It was considered a ‘holy grail’ of AI – a milestone we were quite far away from reaching within upcoming decade. Deep Blue had its moment more than 20 years ago and since then no Go engine became close to human masters. The opinion about ‘numerical chaos’ in Go established so well it became referenced in movies, too. Surprisingly, in March 2016 an algorithm invented by Google DeepMind called Alpha Go defeated Korean world champion in...

Large Scale Spectral Clustering with Landmark-Based Representation (in Julia)

In this post we will implement and play with a clustering algorithm of a mysterious name Large Scale Spectral Clustering with Landmark-Based Representation (or shortly LSC – corresponding paper here). We will first explain the algorithm step by step and then map it to Julia code (github link). Spectral Clustering Spectral clustering (wikipedia entry) is a term that refers to many different clustering techniques. The core of the algorithm does not differ though. In essence, it is a method that relies on spectrum (eigendecomposition) of input data similarity matrix (or its transformations)....

Automatic differentiation for machine learning in Julia

Automatic differentiation is a term I first heard of while working on (as it turns out now, a bit cumbersome) implementation of backpropagation algorithm – after all, it caused lots of headaches as I had to handle all derivatives myself with almost pen-and-paper-like approach. Obviously, I made many mistakes until I got my final solution working. At that time, I was aware some libraries like Theano or TensorFlow handle derivatives in a certain “magical” way for free. I never knew exactly what happens deep in the guts of these libraries though, and I somehow suspected it is rather painful t...

Chess position evaluation with convolutional neural network in Julia

In this post we will try to challenge the problem of chess position evaluation using convolutional neural network (CNN) – a neural network type designed to deal with spatial data. We will first explain why we need CNNs then we will present two fundamental CNN layers. Having some knowledge from the inside of the black box, we will apply CNN to the binary classification problem of chess position evaluation using Julia deep learning library – Mocha.jl. Introduction – data representation One of the challenges that frequently occurs in machine learning is proper representation of the input da...

© int8.io