Local AI is one step closer through Mistral-NeMo 12B

Running LLMs outside a datacenter is usually not a realistic prospect. Nvidia and Mistral are letting PC users run a new model that does work locally.

Mistral NeMo 12B is the name of the new AI model, presented this week by Nvidia and Mistral. “We are fortunate to collaborate with the NVIDIA team, leveraging their top-tier hardware and software,” said Guillaume Lample, cofounder and chief scientist of Mistral AI. “Together, we have developed a model with unprecedented accuracy, flexibility, high-efficiency and enterprise-grade support and security thanks to NVIDIA AI Enterprise deployment.”

The promise of the new AI model is significant. Whereas previous LLMs were tied to datacenters, Mistral NeMo 12B moves to workstations. And it does this without sacrificing performance, or well, that’s the promise.

Stumbling blocks

The main stumbling block is not the fact that laptops, desktops and even workstations wouldn’t be powerful enough for AI – speed isn’t really the core issue. The shortcoming is the amount of video memory available. A PC with a discrete GPU usually has 4, 6 or 8 GB at its disposal, with outliers for gaming or productivity purposes to 12, 16 or 24 GB. That’s not enough for the full-fledged versions of Meta’s Llama 3 or its alternatives, each requiring more than 80 GB or more.

However, users on the subreddit r/LocalLLaMA have been doing it for months: running an AI model on their own PC that really shouldn’t have fit. Normally, models from OpenAI, Meta, Google and their competitors have hardware requirements that only fit in a cloud environment. The trick used by these Reddit users is called quantization, allowing open-source models to function on lesser hardware. This “blurs” the parameters of an AI model, making it less accurate. When you boil it all down, the calculation performed to find Thus, there is a greater chance of AI hallucinations, or wrong answers.

Read more: How mobile AI works

Small models, big problems

For enterprise purposes, GenAI has yet to prove itself. A go-to solution from cloud vendors is to provide AI solutions that ensure data privacy. This can be done in various ways through private clouds or, for example, by applying strong security measures on data-in-use, but nothing beats a local workload. That is the only method that can be secured in a known, relatively simple way.

Mistral NeMo 12B comes as an Nvidia NIM microservice and is completely unsurprisingly optimized for Nvidia hardware. This containerized deployment makes it flexible and fast to use. “Models can be deployed anywhere in minutes instead of days,” Nvidia says.

For that matter, deployment still remains relatively limited. Only the most expensive workstations possess the required Nvidia L40S, RTX 4090 or RTX 4500 GPUs. The cheapest option among these is the 4090, which can be bought separately for just under 1,700 euros.

Also read: AMD targets Europe with acquisition of Silo AI for 665 million

Local AI is one step closer through Mistral-NeMo 12B

Stumbling blocks

Small models, big problems

Stay tuned, subscribe!

A Ferrari needs brakes, innovation needs cybersecurity

Memory-safe malware: Rust challenges security researchers

30% of Salesforce’s work is done by AI: what does that mean?

Tech sector calls on EU to pause AI Act

E-commerce solutions provider puts its own portfolio on display

Intel and Altera aim to bring AI to edge computing with new series of chips

RFID gives optimal insight and overview in both store and warehouse

Manhattan Associates provides supply chain software, is it more than a fancy name?

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices