Microsoft makes large deep learning models more cost efficient

Breakthroughs in scaling now allows even smaller developers to work with large deep learning models. Microsofts new DeepSpeed tool is a breaktrough in scaling AI.

Microsoft has released a new version of its open-source DeepSpeed tool that it says will enable the creation of “extremely large” deep learning models with a trillion parameters, more than five times as many as the largest model currently in use. The new tool will also help developers working on smaller projects, according to the company.

DeepSpeed is a software library for performing artificial intelligence training. The AI tool has already gone through multiple iterations since its release in February. The latest version has effectively increased the maximum size of the models it can train from 100 billion to more than a trillion parameters.

“Democratizing” AI development

Microsoft says DeepSpeed can now train a trillion-parameter language model using 100 of Nvidia Corp.’s previous-generation V100 graphics cards. Previously, such a task would have required 4,000 of Nvidia’s current-generation (and more expensive) A100 graphics cards a 100 days to complete.

These cost reductions will give a boost to AI projects across many sectors. Groups such as OpenAI that are working to push the envelope on the size of neural networks could use the tool to reduce the hardware costs associated with their work. Startups and smaller enterprises working on day-to-day applications of AI could also use DeepSpeed to build more powerful models with more parameters than their limited budgets would otherwise allow.

DeepSpeed “democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models,” Microsoft executives Rangan Majumder and Junhua Wang wrote in a blog post.

New technologies drive cost efficiencies

The giant leap in DeepSpeed’s efficiency is due to new technologies. For example, “ZeRO-Offload” improves how many parameters AI training servers can handle by making creative use of the memory and CPUs in those servers. Another innovation, called “3D parallelism,” distributes work among the training servers in a way that increases hardware efficiency.

“3D parallelism adapts to the varying needs of workload requirements to power extremely large models with over a trillion parameters while achieving near-perfect memory-scaling and throughput-scaling efficiency,” Microsoft’s Majumder and Wang wrote. “In addition, its improved communication efficiency allows users to train multi-billion-parameter models 2–7x faster on regular clusters with limited network bandwidth.”

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Whitepapers

Microsoft makes large deep learning models more cost efficient

“Democratizing” AI development

New technologies drive cost efficiencies

Stay tuned, subscribe!

Domain-specific AI beats general models in business applications

Zscaler Cellular brings Zero Trust to IoT and OT devices

Many roads lead to Oracle: the routes taken by VTTI and Hendrix Genetics

Amazon S3: almost 20 years old, but still very modern

EUVD security database is Europe’s next step towards autonomy

Dutch government starts consultation for NIS2 bill

NIS2: law lacks future-proof ideas, challenging ambitions and recovery

Don’t wait for NIS2 legislation, organizations can do a lot now

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon