Over the past few years, machine learning has become a bit daunting for regular folks. The coolest toys were being created by big companies with massive budgets, immense computational power, and the best talents in the world. Very often, the weights of the large models created by these big players were not disclosed, but rather bragged about in research papers/demo applications or exposed via paid API at best. Lots of regular individual researchers like myself happily training their humble models with scikit-learn soon realized it might be a bit hopeless trying to compete with that. The recent release of ChatGPT and GPT-4 seemed to be the final nail in the coffin.
Some recent breakthroughs sparked a light in that dark tunnel, though. It seems like with bunch of tricks and hacks fine-tuning of Large Language Models can run even on everyday consumer hardware. In this blog post we are going to go through the most visible bits contributing to that.