Microsoft is launching three new advanced small language models as an extension of the Phi series. These models have reasoning capabilities that enable them to analyze and answer complex questions effectively.
The models presented—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—are designed to run locally on a PC with a graphics processor or on mobile devices. They are intended for situations where speed and efficiency are essential without compromising intelligence.
This launch follows Phi-3, which added support for multiple modalities to the compact model series.
According to Microsoft, phi-4 reasoning contains 14 billion parameters and can deliver performance comparable to larger models on complex tasks. Phi-4 reasoning-plus has the same size but has been further refined through reinforcement learning and processes 1.5 times more tokens to achieve higher accuracy. However, this does result in longer processing times and higher computing power.
Mathematical applications
The smallest model, Phi-4-mini-reasoning, contains 3.8 billion parameters and is optimized for mathematical applications. It is primarily intended for use on mobile devices and other devices with limited capacity, and focuses on educational purposes, among other things.
According to Microsoft, the Phi reasoning models represent a new category of small language models. Combining techniques such as distillation, reinforcement learning, and high-quality training data has found a balance between model size and performance. They are small enough for use in systems with low latency tolerance, but can compete with much larger models in terms of reasoning ability.
To achieve these capabilities, Phi-4 reasoning was trained with web data and selected examples from OpenAI’s o3-mini model. Phi-4-mini reasoning was further refined with synthetic training data generated by Deepseek-R1. This training set contained more than a million math problems of varying difficulty, ranging from high school to PhD level.
Synthetic data is often used to train AI models via a teacher model, which creates and enriches practice material. Such a model can generate countless math and physics problems, including step-by-step solutions. In this way, the student model learns how to arrive at an answer, not just what the answer is. The model can perform broadly and deeply by tailoring the problems to various curricula while remaining compact.
Better performance than heavier models
Despite their smaller size, Phi-4-reasoning and Phi-4-reasoning-plus perform better than models such as OpenAI’s o1-min and DeepSeek1-Distill-Llama-70B on many mathematical and scientific tests at Ph.D. level, according to Microsoft. They also score better than the full DeepSeek-R1 model (671 billion parameters) on the AIME 2025 test, a three-hour math competition that qualifies the US team participating in the International Mathematical Olympiad.
The new Phi-4 models are now available via Azure AI Foundry and HuggingFace.