Browsing posts in: Large Language Models

Train your own South Park Fanatic AI with Mistral-7B

[githup repo]

The purpose of this blog post is to experiment with the recent superstar of the local Large Language Models world, Mistral-7B. Developed by former Meta employees who previously worked on Llama LLM, Mistral-7B, despite being relatively small in size (“only” 7B parameters) beats several popular larger models in numerous benchmarks. The crucial aspect that sets Mistral-7b apart is its open-source release under an Apache 2.0 license, making it is freely available for commercial applications.

Continue Reading
---

Local Large Language Models

Over the past few years, machine learning has become a bit daunting for regular folks. The coolest toys were being created by big companies with massive budgets, immense computational power, and the best talents in the world. Very often, the weights of the large models created by these big players were not disclosed, but rather bragged about in research papers/demo applications or exposed via paid API at best. Lots of regular individual researchers like myself happily training their humble models with scikit-learn soon realized it might be a bit hopeless trying to compete with that. The recent release of ChatGPT and GPT-4 seemed to be the final nail in the coffin.

Some recent breakthroughs sparked a light in that dark tunnel, though. It seems like with bunch of tricks and hacks fine-tuning of Large Language Models can run even on everyday consumer hardware. In this blog post we are going to go through the most visible bits contributing to that.

Continue Reading
---