Google DeepMind has released Gemma 4, a family of four open-weight AI models. They are equipped to run on local devices, from small edge endpoints to workstations. The 31B Dense ranks third among open models on the Arena AI leaderboard despite its limited size. All models are available under an Apache 2.0 license, which removes previous restrictions and broadens the ecosystem for developers.
The four open-weight models are built for advanced reasoning and agentic workflows, Google says. The lineup consists of Effective 2B (E2B), Effective 4B (E4B), a 26B Mixture of Experts (MoE), and a 31B Dense model, all under an Apache 2.0 license.
The 31B Dense currently ranks third among open models on the Arena AI text leaderboard, while the 26B MoE holds sixth. Both outperform models up to twenty times their size in parameter count, according to Google. Since Google first donated Gemma to the open-source community in February 2024, developers have downloaded the series more than 400 million times and spawned over 100,000 community variants.
From smartphones to workstations
The E2B and E4B models target edge devices. They run fully offline on phones, Raspberry Pi, and Nvidia Jetson Orin Nano with near-zero latency, and feature native audio input. The edge models support a 128K context window; the larger 26B and 31B variants offer up to 256K tokens. Android developers can prototype agentic flows via the AICore Developer Preview today.
For desktop use, the 31B and 26B models fit on a single 80GB NVIDIA H100 GPU. The 26B MoE activates only 3.8 billion parameters at inference, keeping latency low.
Apache 2.0 and a broader ecosystem
The Apache 2.0 license is probably the most remarkable part about this model family. It marks a clear break from earlier Gemma releases. Gemma 3 offered multimodal support and four size variants but with more restrictive terms. Gemma 4 adds day-one compatibility with vLLM, llama.cpp, Ollama, NVIDIA NIM, LM Studio, and more. Hugging Face CEO Clément Delangue has already called the Apache 2.0 release “a huge milestone.”
The license allows not only for free use of the model, but also modification and further distribution, albeit with the minor proviso to include a notice of attribution. Third-party benchmarks show Gemma 4 ahead of OpenAI’s open models. But the 31B Dense does not uniformly lead, with models like Qwen 3.5 27B scoring closely on several metrics.
The models are available now via Google AI Studio, Kaggle, Ollama, and Hugging Face. For production, Google Cloud offers deployment via Vertex AI and Cloud Run.