Browsing posts in: Natural Language Processing

Train your own South Park Fanatic AI with Mistral-7B

[githup repo]

The purpose of this blog post is to experiment with the recent superstar of the local Large Language Models world, Mistral-7B. Developed by former Meta employees who previously worked on Llama LLM, Mistral-7B, despite being relatively small in size (“only” 7B parameters) beats several popular larger models in numerous benchmarks. The crucial aspect that sets Mistral-7b apart is its open-source release under an Apache 2.0 license, making it is freely available for commercial applications.

Continue Reading
---

Local Large Language Models

Over the past few years, machine learning has become a bit daunting for regular folks. The coolest toys were being created by big companies with massive budgets, immense computational power, and the best talents in the world. Very often, the weights of the large models created by these big players were not disclosed, but rather bragged about in research papers/demo applications or exposed via paid API at best. Lots of regular individual researchers like myself happily training their humble models with scikit-learn soon realized it might be a bit hopeless trying to compete with that. The recent release of ChatGPT and GPT-4 seemed to be the final nail in the coffin.

Some recent breakthroughs sparked a light in that dark tunnel, though. It seems like with bunch of tricks and hacks fine-tuning of Large Language Models can run even on everyday consumer hardware. In this blog post we are going to go through the most visible bits contributing to that.

Continue Reading
---

Attention mechanism in NLP – beginners guide

The field of machine learning is changing extremely fast for last couple of years. Growing amount of tools and libraries, fully-fledged academia education offer, MOOC, great market demand, but also sort of sacred, magical nature of the field itself (calling it Artificial Intelligence is pretty much standard right now) – all these imply enormous motivation and progress. As a result, well-established ML techniques become out-dated rapidly. Indeed, methods known from 10 years ago can often be called classical.

This sort of revolution has happened recently. The default architectural choice for NLP related problems,  recurrent neural network,  has been seriously challenged – to say the least. This very solid architecture is being quickly replaced by networks based on attention mechanism only that drops RNN entirely achieving at least comparable (and often better) performance both in NLP and Computer Vision.

This post is an attempt to go through the most significant papers related to attention mechanism with the goal to grasp basic knowledge and intuition about it. We will start by looking at its very first NLP application where it was introduced to solve neural machine translation in 2015. Then we will go through improvements to attention introduced in Transformer – neural networks architecture that uses one specific variant of attention mechanism as its main building block – skipping thus far seemingly necessary recurrent connections.

 

Continue Reading

---