GenAI is democratizing. China’s DeepSeek-R1 is available to all and makes “reasoning” AI skills free to use. The hegemony of closed American AI players is further challenged by France’s Mistral AI, which is achieving impressive results with its latest model, Mistral Small 3.
Mistral Small 3 is being released under the Apache 2.0 license, which gives users (almost) a free pass to do as they please with the new model. Equipped with 24 billion parameters (24B), it is quite a bit smaller than Alibaba’s Qwen 2.5 32B and Llama 3.3 70B, making it faster to run locally and somehow evenly skilled compared to these models, at least according to the human evaluations Mistral has presented. The benchmark results underscore this, although these statistics, as with DeepSeek-R1, are to be taken with a grain of salt.
Mistral does expect some more work to be required for end users than was the case with the new open source AI superstar DeepSeek-R1. There are two checkpoints of Mistral Small to pick from: one pre-trained and another fine-tuned, both of which will require some additional tweaking to bring about further improvements, such as reasoning skills or detailed, domain-specific knowledge. It probably won’t be long before someone lets Mistral Small 3 learn from the outputs of DeepSeek-R1, a process known as distillation.
Really small, though?
At 24 billion parameters, Mistral Small 3 can be run on a single Nvidia RTX 4090 or on a Macbook with 32GB of RAM. By the way, those content with roughly 2 tokens per second can also run it in a quantized form with, say, 32GB of DRAM on Windows or Linux in combination with a reasonably powerful GPU to get to 2-3 tokens per second, a tad faster than most people’s writing speed. A local AI solution such as Ollama automatically picks the hardware it can use, so anyone can freely give it a whirl and see if it’s usable.
Some would hesitate to call the new model ‘small’, as it happens. It’s possible to draw the line at models with far fewer parameters, which are able to run in edge environments. Mistral itself debuted Mistral 7B in 2023, which can be run on virtually any modern PC, often rapidly. Many small AI models sit around a balancing point of 12 to 14 billion parameters, such as Microsoft’s Phi-4. This size allows some of them to perform remarkably well given the constraints. Those looking to run GenAI in an embedded environment may end up with tiny models like Google’s Gemma 2B or Qwen 2.5 0.5B. Everything is relative, but it is clear that, these days, what Mistral understands by “small” is what’s still workable on a somewhat powerful PC without having to rely on internet access.
Now that AI companies have some more hands-on experience with AI, it appears that size is not the most important factor for high quality outputs. For example, both Mistral AI and Microsoft emphasize that their small models feature particularly high-quality training data. “Garbage in, garbage out” certainly holds true in this regard. Just look at large Gemini models that fall foul of search results due to the fact that they may effortlessly adopt a sarcastic Reddit post.
What a small model like Mistral Small is looking for, is to minimize the “perplexity” of the model. In other words, it must be as certain as possible of the next word to be generated. This can be done by distilling a smaller model with a larger model as its ‘teacher,’ in which Mistral Small picks up from Mistral Large, for example, how it reasons its way to a sensible answer (so-called ‘soft targets‘). This seems to be getting better and better lately, with DeepSeek’s distillations for smaller Alibaba Qwen and Meta Llama models being powerful examples of this and displaying behaviour similar to OpenAI’s venerable o1 model.
Also read: OpenAI squabbles with DeepSeek: the pot calling the kettle black
Mistral AI in full swing
Looking beyond just the specifications, Mistral Small 3 will be especially popular in Western Europe. After all, the model does not hesitate for a moment to discuss matters such as Taiwan or the Uighurs, which aren’t spoken of by DeepSeek in a general sense. In addition, Small 3 clearly speaks other languages better than, say, DeepSeek-R1 or its SLM competitors, which needn’t be surprising given that Mistral AI itself is French. Should a BLEU test be unleashed on all smaller models (a machine translation test), we should expect Mistral Small 3 to lead or be near the top.
Nevertheless, Mistral Small 3 may be a great pair with the lessons DeepSeek-R1 has taught the AI world after some tweaking. In the official announcement of the model, Mistral tells us that the model complements R1 well within the overall open source ecosystem. Such sharing of academic findings is something Mistral CEO Arthur Mensch has long touted as crucial to GenAI’s progression. He does so as a familiar figure in the field. The CEO first worked as a researcher for Google DeepMind, while his co-founders did the same at Facebook.
Meanwhile, Mistral AI is preparing an IPO. The company is “not for sale,” Mensch said, and wants to expand in its native Europe, as well as the APAC region and the US. By mid-2024, the company was valued at 5.8 billion euros with investors including Nvidia, Cisco and Samsung. It has already proven to be a pioneer when it comes to AI innovations. For example, Mistral boarded the Mixture-of-Experts (MoE) train early, which has now turned out to be a critical component for DeepSeek-R1’s efficiency push. Specifically, this MoE setup only activates the parameters in a model that can generate a meaningful output based on the input.
Open source: not the future, but now
The DeepSeek release has caused turmoil in the AI world. Wall Street had a brief freakout over it, as all the massive AI spending suddenly seemed less necessary than before: why spend billions of dollars when you can develop a state-of-the-art model with only a few million? It’s not that simple – OpenAI’s o1 and its Pro offering still delivers the best results, even if it’s expensive. On top of that, the Stargate project in the U.S. will provide less latency for end users, even more powerful LLMs, and potentially a bigger AI lead from the big U.S. tech conglomerates.
Yet it need not be so. US-based Meta is the odd one out of the Big Tech players with its free-to-run open-weight Llama models, while DeepSeek, Mistral, Alibaba and many others have been sharing their work with the open-source community for roughly two years already. Funnily enough, Mensch notes that this open stance was once the norm.
Indeed, until 2020, there was virtually unlimited access to the milestones of AI labs worldwide, the Mistral CEO has pointed out. He attributes this tipping point to the fact that some companies figured out a market fit was within reach. In late 2022, OpenAI struck gold with ChatGPT, which was a big step ahead of the competition, particularly after the release of GPT-4 in early 2023. This is despite the fact that the underlying technology is still not working properly, Mensch believes. It is possible that now, in early 2025, there is yet another tipping point: is the closed AI movement losing steam? The signs are there, as OpenAI, Google and Anthropic have all converged in benchmarks. DeepSeek has now done the same.
“It’s a cycle between openness and a closed nature that we’ve already observed in software,” Mensch remarked in late 2023. Indeed, we are still seeing it: many an open-source project has been transformed in recent years into a “source-available” solution, such as Red Hat Enterprise Linux and, as a recent example, the Fluent Assertions library. As far as GenAI is concerned, Mensch and his company believe it’s far too early to define a moat. Even staff at Google feared this was the case two years ago. They may have been onto something.
Also read: Local AI is one step closer through Mistral-NeMo 12B