Nvidia’s TensorRT 8 is here to boost AI inference

Get a free Techzine subscription!

Nvidia is accelerating artificial intelligence with the launch of its next generation of TensorRT software. On Tuesday, Nvidia launched the eighth iteration of its popular AI software, used in high-performance deep learning inference.

TensorRT 8 combines a deep learning optimizer fitted with a runtime that gives users low-latency, high-throughput inference for many AI applications.

In AI, ‘inference’ is an important aspect of getting results. Where training refers to the development of the algorithm’s ability to understand datasets, the inference is about its ability to act on the information by inferring answers to specific questions.

Inference needs

The AI world is on an upward trajectory, meaning that the need to infer is going up. To that end, and with the ever-growing amounts of data, AI needs to work faster, which TensorRT 8 promises it can power. Nvidia announced in a blog post that the inference time on the new iterations will be half the current average.

That means it can be used to develop high-performance search engines, ad recommendation systems, chatbots deployed in the cloud or at the network edge.

Some transformer optimizations in TensorRT 8 will, according to Nvidia, deliver record-setting speed for language applications.

Exponential complexity

The problem with AI models currently is that they are growing quite complex, with worldwide demand surging for real-time apps that use AI.

The TensorRT 8 is timely since it brings new capabilities that include, for instance, the ability to run BERT-Large, one of the most widely-used transformer-based models, in 1.2 milliseconds.

Nvidia said that TensorRT 8 is now generally available and will be free to all members of the Nvidia Developer Program. The new versions of the iteration’s plug-ins, parsers, and samples are available through an open-source license via the TensorRT GitHub Repository.