Nvidia repackages powerful AI model into bite-sized format

Following hot on the heels of Microsoft, Nvidia has also released a smaller AI model that can run locally on devices with less computing power. The Mistral-NeMo-Minitron 8B model is a scaled-down version of an earlier model developed in collaboration with French AI startup Mistral. The secret behind it involves two innovative techniques called ‘pruning and distilling’.

According to Kari Briski, head of Nvidia’s AI and HPC division, this model is small enough to run on RTX workstations. At the same time, it is powerful enough to pass benchmarks for robust AI chatbots, virtual assistants, content generators, and educational tools. It would even be suitable for laptops and edge devices. In other words, you don’t always need a giant of an LLM for tasks that don’t need it.

Two techniques were employed to keep the model small but still sufficiently effective. These reduced a larger model (i.e. the 12 billion parameter Mistral NeMo 12B, itself only a month old) to a considerably more manageable size. By applying ‘pruning,’ Nvidia removed unnecessary components from the code base that were unnecessary for the intended tasks.

Further training on a specific dataset

The next step involved ‘distilling,’ where the reduced model is further trained on a smaller, specific data set to increase accuracy. This method is cheaper and produces higher accuracy output for the tasks at hand compared to training an entirely new small language model.

The code for the model is available on Hugging Face under an open-source license. The model itself is available as an Nvidia NIM microservice, with associated API. A downloadable version is still coming that can run on any system with a sufficiently powerful GPU.

Microsoft is also experimenting with models that use hardware efficiently. Yesterday it announced three new variants of the Phi-3.5 line. Among them is a model that uses the Mixture of Experts technology for the first time in this line.

Nvidia repackages powerful AI model into bite-sized format

By 'pruning' and 'distilling' only the most important bits remain

Further training on a specific dataset

Stay tuned, subscribe!

KnowBe4 evolves from security training to human risk management

The AI wave is forcing organizations to rethink their infrastructure

What we know about SafePay, the Ingram Micro attackers

A Ferrari needs brakes, innovation needs cybersecurity

Rise of AI transforms CISO’s role: from technical to strategic input

Cisco wants to use AI to defend AI at machine scale

AI agents have identities too: how do we secure them?

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon