IBM has released a new generation of open-source language models, Granite 4. The series combines two neural network architectures and is designed to deliver better performance with less memory usage.
At launch, the Granite 4 family consists of four models ranging in size from 3 to 32 billion parameters. According to IBM, they perform more efficiently than previous generations, thanks to a hybrid design that combines the Transformer architecture with Mamba, a new and hardware-efficient network structure.
One of the smaller models, Granite-4.0-Micro, uses only the Transformer approach. This is known for its attention mechanism, which allows the model to select and prioritize the most important parts of a text. The three other models add elements of the Mamba architecture to this. Mamba offers similar capabilities, but utilizes a mathematical system known as a state space model, originally developed for space applications.
Lower memory pressure
One advantage of Mamba is the lower memory pressure with long input prompts. Whereas the memory usage of a Transformer increases rapidly, with Mamba it remains limited. This makes the models cheaper and faster, which is particularly useful in real-time applications or on lighter hardware.
The Granite 4 series is built on the latest version of the Mamba architecture, Mamba 2. It is more compact and efficient, requiring less hardware for the same calculations. The largest model, Granite-4.0-H-Small, has 32 billion parameters and uses a mixture-of-experts design in which only a portion of the parameters are activated. IBM sees this as a suitable solution for automated customer support.
The two smaller hybrid models, Granite-4.0-H-Tiny and Granite-4.0-H-Micro, have 7 billion and 3 billion parameters, respectively. They are intended for applications where speed is more important than maximum accuracy.
According to IBM, Granite-4.0-H-Tiny consumes much less memory than its predecessor, Granite 3.3 8B. In internal tests, the model used only one-sixth of the RAM, while improving output quality. An IBM researcher stated that the efficiency of the new architecture only partly explains the progress; refined training methods and a larger training corpus contribute significantly to the improved performance.
Granite 4 is available through IBM’s watsonx.ai platform and through external services such as Hugging Face. IBM also plans to offer the models through Amazon SageMaker JumpStart and Microsoft Azure AI at a later date, and intends to expand with new variants featuring more advanced reasoning capabilities.