Microsoft introduces open-source multimodal Phi-4 reasoning model

Microsoft has released a new multimodal reasoning model: Phi-4-reasoning-vision-15B. The model combines two existing algorithms using a mid-fusion approach and can analyze images, scientific graphs, and screen interfaces. Despite its smaller size, it outperforms comparable models on mathematical and scientific benchmarks.

The model builds on two existing algorithms: SigLIP-2 and Phi-4 Reasoning, a reasoning model that Microsoft made available as open source last year. SigLIP-2 converts images into a numerical format that neural networks can process.

The two algorithms are combined using a technique called mid-fusion. Unlike models in which all layers support multimodal processing, in Phi-4-reasoning-vision-15B only some of the layers do so. This leads to less hardware usage at the expense of some output quality.

It is noteworthy that the reasoning functionality can be enabled and disabled via prompts. Users who want to further reduce the infrastructure load can simply disable the reasoning option.

Training on open-source data and corrected captions

For training, Microsoft primarily used open-source data comprising images and text descriptions. The company went through a multi-step process to improve quality. High-quality datasets were set aside. Images with incorrect captions were given new descriptions, generated with GPT-4o and o4-mini. In addition, Microsoft added internally generated training data, data from targeted acquisitions, and examples of behavior the model should avoid.

On the MathVista_Mini benchmark, Phi-4-reasoning-vision-15B scored 17 percent higher than Google’s gemma-3-12b-it. This is a benchmark specific to multimodal mathematics. The model also achieved higher scores on more than half a dozen other evaluations.

Deployable for AI agents and visual analysis

Developers can use the model to build AI agents that interact with applications via the user interface. Phi-4-reasoning-vision-15B can deduce the functions of interface elements, such as buttons and menus, from screenshots. In addition, the model is suitable for analyzing complex visual files.

Microsoft has made the code available via Hugging Face, GitHub, and Azure.

Mega investment: OpenAI raises $110 billion from Amazon and Nvidia

OpenAI raises $110 billion in a new financing round. Amazon is investing up to $50 billion, Nvidia and SoftBa...

Berry Zwets February 27, 2026

Top story

SAS CTO Bryan Harris: AI requires pragmatism, not hype

There is a strong urge to apply AI. Both managers and technical teams feel motivated to make the technology a...

Berry Zwets February 19, 2026

Accenture buys Ookla: Speedtest, DownDetector under new management

Ookla is set to become part of Accenture for $1.2 billion. If regulators approve the deal, Speedtest and Down...

Erik van Klinken 1 day ago

Top story

51 AI agents book your next trip to Australia

How TravelEssence is expanding its travel agency with OutSystems

Laura Herijgers 7 hours ago

Expert Talks

Microsoft introduces open-source multimodal Phi-4 reasoning model

Training on open-source data and corrected captions

Deployable for AI agents and visual analysis

Stay tuned, subscribe!

51 AI agents book your next trip to Australia

After Pentagon deal, OpenAI sets its sights on NATO

Dutch Tax Authority hands US software company control over VAT system

What makes Salesforce agents reliable? Architecture explained

"Not all clouds are created equal" in the AI era: how is OCI different?

How AI is creating brand new attack surfaces in cloud security

AI data centers: the road to 1 megawatt per rack explained

Advancing Europe’s public agenda through Open Source Software Foundations

4 steps to create a future-proof data infrastructure

Secure networking: the foundation for the AI era

Why AI adoption requires a dedicated approach to cyber governance

Appdevcon

Webdevcon

Dutch PHP Conference

De IT Afdeling van de toekomst

GITEX ASIA 2026

Southeast Asia AI Application Summit 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices