JetBrains has released Mellum2 as open source, a 12B-parameter model built for software engineering environments. With a Mixture-of-Experts architecture, only 2.5 billion parameters are active per token, reducing inference time by more than half compared to similar models. The model is available under the Apache 2.0 license.
The original Mellum was a 4B-parameter model focused on code completion, deployed as part of JetBrains’ AI Assistant. In addition to code, Mellum2 now also supports natural language and is suitable for routing, summarization, and intermediate reasoning processes in modern AI workflows.
The model has been published on Hugging Face under the Apache 2.0 license. Users can run Mellum2 locally, host it themselves, or fine-tune it for their own applications.
MoE architecture significantly reduces inference costs
Mellum2 uses a Mixture-of-Experts (MoE) design. Of the 12 billion parameters, only 2.5 billion are active per token. This significantly reduces computational costs and enables fast inference for real-time applications. According to JetBrains, inference time is less than half that of comparable dense models, offering a concrete advantage in production environments. The model is not multimodal and is trained exclusively on natural language and code, which keeps it fast and specialized. On benchmarks for code generation, mathematics, and reasoning, Mellum2 performs comparably to other models of similar size, according to JetBrains’ technical report.
JetBrains positions Mellum2 as a “focal model,” a fast, specialized component for tasks requiring high frequency and low latency. Concrete applications include routing AI workloads, building RAG pipelines, controlling sub-agents in complex workflows, and private deployment on proprietary infrastructure.
“At JetBrains, we believe the future belongs to coordinated systems, not single models,” the company states. Mellum2 is intended as one of those specialized components alongside large frontier models, not as a replacement for them.