3 min Applications

Mistral AI showcases its first multimodal LLM to the world

Europe's answer to OpenAI is not twiddling its thumbs

Mistral AI showcases its first multimodal LLM to the world

French AI startup Mistral AI is releasing Pixtral 12B, an advanced model that can process both images and text. This newcomer joins the growing number of multimodal AI systems, which already include (versions of) Anthropic’s Claude, OpenAI’s GPT-4o, and Google’s Gemini.

As its name implies, Pixtral 12 B has 12 billion parameters and also includes a 400 million-parameter vision adapter, allowing it to ‘see’ images in addition to text input. The model builds on Mistral’s older model, Nemo 12B, which, incidentally, could only process text. The new model allows users to upload images via URLs or base64 encoding.

The latter method converts an image into a string of characters, like a JSON file. The AI model can then decode this string into an image. The model can handle tasks such as creating captions for images, counting objects in a photo or illustration, and answering general questions about (the content of) images.

Released under Apache 2.0 license

According to Sophia Yang, Head of Developer Relations at Mistral, the model will soon be available on Le Chat and Le Platforme. That means anyone with a user account can basically try out the chatbot or API functionalities. Mistral AI has released the code and parameters of Pixtral 12B on GitHub and Hugging Face. The company warmly encourages developers to download, refine and further train the model.

The model is 24 GB, open-source, and freely available under the permissive Apache 2.0 license. That’s the same license as some other Mistral models, like Mistral 7B, Mixtral 8x22B, Mistral Nemo, and Mistral Embed. Other models from the French AI startup are bound by the Research (such as Mistral Large) or Non-Production licenses (Codestral). These restrict commercial use and, in the first case, allow only use in research.

Concerns about source material

LLMs, and especially multimodal ones, are often trained on information from the Internet or social media. In many cases, it concerns copyrighted material. In other cases, the source material was posted without the person who put it online ever suspecting that it would be used to train artificial intelligence.

For example, Mistral AI’s well-known U.S. rival OpenAI gratefully used forums like Reddit to train its own AI models, first without paying, then by striking a deal with the platform. That led to (brief and ultimately fruitless) resistance among users.

A nearly 6 billion euro valuation within a year

French AI startup Mistral AI recently raised the 600 million euros it had previously set as the target amount for its latest investment round. Its estimated enterprise value is now close to 6 billion euros—not bad for a company just over a year old.

This latest financing consists of 468 million euros in equity and 132 million euros in debt, the Financial Times reported. The company is now valued at 5.8 billion euros.

Also read: Mistral unveils Large 2 model: “large enough,” but good enough?