Meta introduces first multimodal Llama models

Meta has released Llama 3.2 LLM. Variants include the 11 and 90 billion-parameter vision multimodal LLMs and two smaller text-only variants of 1 and 3 billion parameters, mainly for on-device and edge applications.

Meta says that the two largest Llama 3.2 LLMs of 11 and 90 billion parameters are suitable for advanced image interpretation. They are also the social media and tech giant’s first multimodal LLMs. They include document-level understanding of maps and graphs, can convert images to text and perform vision tasks such as directionally indicating objects in images based on questions asked in plain language (Such as, “How far is the pot from this kettle?”).

According to Meta, these LLMs can also bridge the gap between image and language by extracting details from an image, understanding the scene in question and then generating a sentence that can be used as a caption for the image to tell a story.

LLM versions for on-device applications

The small 1 and 3 billion parameter LLMs include good multilingual text generation and ‘tool calling’ functionality. These allow developers to build on-device and edge apps with strong privacy settings to ensure data never leaves the device.

Meta sees two benefits here. First, it allows users to experience the responses to their prompts as more ‘immediate’ output. This is because processing takes place locally on the device in question. The second advantage is that running the processes locally better ensures privacy. Actions for messages or calendar activities, for example, are not sent to the cloud, making the operation of the app in question even more private.

Such an app can keep track of which queries should be forwarded locally on the device and which, if any, should be forwarded to the cloud for processing by a larger LLM. The 1 and 3 billion parameter LLMs are optimized for hardware from Qualcomm and MediaTek as well as Arm processors, according to Meta.

Llama Stack distributions

In addition to the models, Meta introduced the first Lllama Stack distributions. This should simplify and improve developers’ access to the Llama LLMs in different environments, including single-node, on-premises, cloud, and on-device environments.

Components of Llama Stack include the Llama CLI for building, configuring, and running Llama Stack distributions, client code in multiple programming languages such as Python, Node.js, and Agents API Provider, and Docker containers for Llama Stack Distribution Server and Agents API Provider.

Multiple distributions have also been released, including a single-node Llama Stack Distribution via internal Meta deployment and Ollama, cloud-based Llama Stack distributions from AWS, Databricks, Fireworks, and Together, on-device distributions on iOS via PyTorch ExecuTorc, and a Dell-supported on-premises Llama Stack Distribution.

Azure and Google Cloud availability

Furthermore, Meta’s various Llama 3.2 versions are now available via Microsoft Azure and Google Cloud. The Llama offerings on Azure include: Llama 3.2 1B, Llama 3.2 3B, Llama 3.2-1B-Instruct; Llama 3.2-3B-Instruct, Llama Guard 3 1B, Llama 3.2 11B Vision Instruct, Llama 3.2 90B Vision Instruct and Llama Guard 3 11B Vision.

The Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct LLMs are also now available in the Azure AI Model Catalog.

Google Cloud offers all four Llama 3.2 LLMs in Vertex AI Model Garden. Only the Llama 3.2 90B LLM is currently available in preview through Google’s Model-as-a-Service (MaaS) product.

Top story

Domain-specific AI beats general models in business applications

Visma’s AI team is quietly redefining document processing across Europe. With a background spanning nearly ...

Berry Zwets 2 days ago

Tech calendar

Meta introduces first multimodal Llama models

Company also comes with text-only variants that run directly on devices

LLM versions for on-device applications

Llama Stack distributions

Azure and Google Cloud availability

Stay tuned, subscribe!

Amazon S3: almost 20 years old, but still very modern

What we know about SafePay, the Ingram Micro attackers

Is English the next programming language? JetBrains’ CEO says no

The AI wave is forcing organizations to rethink their infrastructure

SentinelOne further delivers on Purple AI promise with new Athena release

CyberArk and SentinelOne join forces for better identity security

SentinelOne wants to make the autonomous SOC a reality

Purple AI takes SentinelOne platform to the next level

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices