Nvidia created a new NeMo Megatron AI development toolkit to make AI training faster.

Nvidia announced a new edition of the NeMo Megatron AI toolkit that will allow software teams and staff to train and accelerate neural networks. The update particularly promises to minimize the time required to train developers to make sophisticated NLP models.

GPT-3 and NeMo megatron

OpenAI LLC – an AI research and development company – introduced an advanced NLP model titled ‘Generative Pre-Trained Transformer 3’ or GPT-3 This model can execute various tasks such as translating texts and producing software codes. OpenAI offers commercial cloud services that allow organizations to access several specialized GPT-3 editions and create customized versions.

NeMo Megatron is the AI toolkit Nvidia upgraded and includes features that can help train for GPT-3 models. The American multinational technology company believes the features will facilitate a 30 percent decrease in the time it takes to train developers.

 “Training can now be done on 175 billion-parameter models using 1,024 NVIDIA A100 GPUs in just 24 days — reducing time to results by 10 days, or some 250,000 hours of GPU computing, prior to these new releases”, researchers stated.

Key features

This speed-up is due to two key features: ‘selective activation recomputation’ and ‘sequence parallelism’. As per Nvidia, both features accelerate artificial intelligence training differently.

‘Sequence parallelism’ utilizes layers that help build software that expedites processing. It can parallelize computations that could be executed only after one another, increasing performance. Moreover, it also minimizes the requirement to perform similar calculations several times.

‘Selective activation recomputation’ further minimizes overall calculation numbers. Various artificial intelligence models employ computing operations (activations) to process information. NeMo Megatron efficiently performs activation computations, reducing training times.

“We arrived at the optimal training configuration for a 175B GPT-3 model in under 24 hours”, Nvidia’s researchers said. “Compared with a common configuration that uses full activation recomputation, we achieve a 20 percent to 30 percent throughput speed-up.”