In March 2024, Microsoft set up its own AI team. The goal was clear: to break away from its reliance on OpenAI and make its own mark against Google, Meta, Anthropic, and others. The first end products are now a reality: MAI-Voice-1 and MAI-1 preview. What does this first move by Microsoft AI mean?
Microsoft’s Copilot offering was made possible largely by a deal with early GenAI frontrunner OpenAI. The ChatGPT builder received $10 billion and, in exchange, gave Microsoft direct access to models such as GPT-4 and now GPT-5. This state of affairs always seemed to be a temporary solution for Microsoft, which has a lot of catching up to do in order to build its own state-of-the-art LLMs. Relations have soured, too, as OpenAI is reportedly trying to wriggle its way out of the 2023 agreement and shut off Microsoft’s access to the latest models. Now, it appears the tech giant has commenced on its path towards autonomy regardless.
Tuning
Interestingly, Microsoft has chosen for its debut a model that generates voices. MAI-Voice-1 delivers AI-driven speech, up to a minute long in one demo, based on a simple prompt. In a Copilot Labs experience, the high-quality audio is particularly striking, but the exact implementation for a larger audience is still pending. Presumably, Microsoft hopes that users of AI PCs will turn on their microphones and start talking to Copilot.
It is still too early to say, but we believe there is a good chance that Microsoft and Google will eventually expand this battleground as voice-driven assistants mature. Where Gemini on Android is already busy taking over the functionality of Google Assistant, a similar AI companion could be of service on Windows. The question is whether this really is preferable to text-driven communication with AI tools. After all, talking to one’s device has been something of a niche for the past 15 years even if the likes of Siri and Google Assistant have popularized the notion of doing such a thing.
In the arena
With the second model, MAI-1 preview, Microsoft is quite literally stepping into the AI arena. Specifically, the company has unleashed this model on LMArena, a blind taste test between different LLMs. On this platform, multiple models respond to the same prompt from a user, who then indicates which one they prefer. The end results should speak for themselves, although a reliance on user feedback has to be paired with attempts to make models adhere to logic, facts, ethics, best practices and, well, the actual task it needs to perform.
MAI-1 preview is intended to lay the foundation for Microsoft’s first universal AI model. In other words, it should take over the functionality of OpenAI’s models within Copilot and enable more features later on.
From one to many
Microsoft says it has big ambitions, but the tech giant believes that one AI all-rounder in the form of MAI-1 is not enough. Multiple specialized models would really be the big step forward, the company contemplates. Whether this will ends up taking the form of Gemini’s “Gems,” where users can emphasize a character or behavior in Gemini, is unclear.
It is just as possible that Microsoft will opt for compact LLMs, also known as Small Language Models. It has already launched a series of Phi models that were as compact as they were limited. The hope within Redmond is that MAI’s talent pool will make enough of a difference to compete with the “big boys” among AI companies, strange as it may be to not categorize Microsoft of all companies in such a way.
Read also: Microsoft makes its Phi-4 small language model open source