The company has revealed the extent of its gargantuan engineering efforts to support their investment with OpenAI.
When Microsoft invested $1 billion in an artificial intelligence startup called OpenAI four years ago, it agreed to build a giant, advanced supercomputer for the AI research company. What followed, according to a Bloomberg report, was a massive engineering project that saw Microsoft string together “tens of thousands” of Nvidia GPUs in order to deliver the required computing capacity.
What made ChatGPT possible
Microsoft used Nvidia’s A100 graphics chips, which were the industry standard at the time for training AI models. It also had to change how it positions servers on racks to prevent power outages, according to the article. Scott Guthrie, the Microsoft executive vice president who oversees cloud and AI, wouldn’t give a specific cost for the project, but told Bloomberg “it’s probably larger” than several hundred million dollars.
Nidhi Chappell, Microsoft general manager of Azure AI infrastructure, said that despite the size of the initial project, it’s only the beginning. “We built a system architecture that could operate and be reliable at a very large scale. That’s what resulted in ChatGPT being possible”, she said. “That’s one model that came out of of it. There’s going to be many, many others”.
Building “a better cloud for AI”
Indeed, Microsoft is already seeing a return on that mammoth investment. It uses that same set of resources it built for OpenAI to train and run its own large artificial intelligence models, such as the chatbot-powered Bing search engine and a version of Dynamics 365 featuring ChatGPT. It also sells the system to other customers. Microsoft also recently signed an expanded deal with OpenAI in which to add $10 billion to its investment, and has launched a new family of AI models using the cutting-edge Nvidia H100 GPUs.
Microsoft’s Guthrie told Bloomberg that the plan was always to expand beyond OpenAI’s immediate requirements. “We didn’t build them a custom thing”, he said. “It started off as a custom thing, but we always built it in a way to generalize it so that anyone that wants to train a large language model can leverage the same improvements. That’s really helped us become a better cloud for AI broadly”.