Phi-4 is a small state-of-the-art SLM with 14 billion parameters that outperforms even OpenAI’s large language model GPT-4 in MATH and GPQA AI benchmarks.
Microsoft claims, according to Neowin, that this SLM’s strong performance in mathematical reasoning is due to its use of high-quality synthetic datasets. This in addition to the curation of high-quality organic data. And thanks to improvements after training the model.
The synthetic data for training were generated using various techniques. These include multi-agent prompting, self-revision workflows and instruction inversion. These synthetic data make up the bulk of the training data. In addition, Microsoft used techniques such as rejection sampling. This is to refine the output of the model during the post-training process.
Concerns about leakage of benchmark test sets
In the technical paper on Phi-4, Microsoft also addressed concerns about leakage of benchmark test sets over the Web. Microsoft has improved the data decontamination process for Phi-4 to prevent unfair influence on evaluation results. To confirm this, Microsoft tested the model on the November 2024 AMC-10 and AMC-12 math competitions. These took place after Microsoft’s training data were collected.
Phi-4 outperforms both similar models of the same size, but also outperforms larger advanced models, including Gemini 1.5 Pro. Microsoft states that Phi-4’s top performance on the MATH benchmark is not due to overfitting or contamination.
Phi-4 has limitations
Phi-4 also has weaknesses, as it remains fundamentally limited by its size. The model can give erroneous information about actual knowledge. It is also less able to strictly follow detailed instructions. To evaluate the model’s security, the Phi-4 team worked with Microsoft’s independent AI Red Team (AIRT) to identify safety and security risks posed by Phi-4. And it did so in both average and challenging user scenarios.
Phi-4 is now available on Azure AI Foundry under a Microsoft Research License Agreement (MSRLA). Next week, Microsoft will also release it on Hugging Face.