GPUs waste 50% of their time: fiber offers solution

GPUs waste 50% of their time: fiber offers solution

GPUs are idle much of the time because they are waiting for instructions. This is because connections in data centers still largely consist of copper wire rather than fiber optics.

This is according to IBM research. IBM engineer John Knickerbocker estimates that GPUs spend about half their time idle. “That’s a huge amount of energy being wasted.”

IBM now claims to have made significant progress in solving this problem. The company is unveiling a new process for co-packaged optics. That integrates optical components directly with electronic chips in a single package. This enables connectivity between devices in a data center at the speed of light.

Less signal loss

The company has built and successfully tested interconnections based on polymer optical waveguides. These flexible and lightweight structures, made of polymer materials, conduct light along a path. They reduce signal loss while maintaining signal integrity.

The module reduces energy requirements by more than 80% compared to electrical connections. It also increases the length of cables that can connect components within a data center from the current one meter to hundreds of meters.

According to IBM, this allows large artificial intelligence language models to be trained up to five times faster, saving an estimated annual energy consumption of 5,000 U.S. households per trained model.

Increasing demand for energy

Before the emergence of generative AI and large language models (LLMs), the demand for computing power doubled every 20 months, reports Mukesh Khare, general manager of IBM’s semiconductor division and vice president of hybrid cloud research at IBM Research. “Since the advent of LLMs, this has doubled every six months.”

Electricity consumption and its associated carbon footprint are often ignored consequences of AI. The International Energy Agency estimated earlier this year that the power consumption of data centers handling AI and cryptographic workloads could double by 2026, equaling Japan’s total electricity consumption.

Polymer optical waveguide technology is widely used in telecommunications, data communications, and sensor applications. However, it has never been economically feasible in data centers. Reasons include the high initial cost, the fragility of the media, the dominance of copper wire in existing systems, and the size of optical fibers.

At about 250 microns in diameter, or three times the width of a human hair, these fibers take up about a quarter of a millimeter of space. That is considerably more than the corresponding space required by electronic circuits.

Significant gap

“Although the industry made great progress in building faster chips, the speed at which these chips communicate with each other did not keep up with that pace,” Khare says. “There is a gap of several orders of magnitude.”

IBM researchers used PWG technology to position bundles of high-density optical fibers along the edge of a chip so that it could communicate directly through the polymer fibers. This approach achieved tolerances of half a micron or less between a fiber and the connector, which is considered the measure of success.

According to the company, the new optical structures make it possible to bundle six times as many optical fibers at the edge of a silicon photonic chip as is currently possible. Each fiber can span only a few centimeters and carry terabits of data per second. When configured to transmit multiple wavelengths per optical channel, CPO technology can increase bandwidth between chips up to 80 times.

Further innovation is possible

IBM says the process has achieved an 80% reduction in size compared with conventional optical channels. Tests indicate that further reduction is possible, yielding a bandwidth increase of up to 1,200%.

The co-packaged optics modules are ready for commercial use and will be produced at IBM’s facility in Bromont, Quebec.