Huawei integrates storage and AI into a five-layer infrastructure

Huawei integrates storage and AI into a five-layer infrastructure

The number of processed tokens and agents in production is growing exponentially. Corporate infrastructure must be built to support this, or else the opportunities offered by data and AI will remain underutilized. According to Huawei, a five-layer AI stack is required to deploy artificial intelligence securely and quickly for heavy workloads. We visited the company during the Innovative Data Infrastructure Forum 2026 in Paris.

With the event theme “Data Awakening, Infra Evolving,” it is clear that Huawei’s focus is shifting toward building infrastructure solutions centered on data. This enables more intelligence to be incorporated into the business environment. Yuan Yuan, President of Huawei Data Storage Product Line, sees this as absolutely necessary due to the token explosion. During a presentation and in conversation with the press, he pointed to the projected figures regarding tokens and agents. Last year, approximately 6 billion tokens per minute were processed worldwide. This year, that number is expected to grow to 15 billion. Meanwhile, IDC figures estimate nearly 30 million active AI agents at the end of last year, up from an estimated 2.2 billion in four years.

Tokens, in his words, are becoming as indispensable as air and water. But tokens alone don’t tell the whole story. The question is what infrastructure is needed to make those tokens meaningful for business processes, so that AI becomes truly valuable. Yuan shares an example from the Chinese healthcare sector. One of China’s largest hospitals deployed three AI models for digital pathology. One was used to detect cancer cells, another to emulate microscopic examination, and the last to handle patient communication. The models were trained on a million digital pancreatic images and absorbed knowledge from 300 medical textbooks, processed via 16 GPU cards. By deploying AI here, the time to generate a pathology report dropped from 40 minutes to 15 seconds.

Tip: Huawei unveils full-stack AI data center strategy

Five layers for an AI stack

Yuan divides the infrastructure required to achieve desired results into five layers. At the top is the Agent Framework, an environment for the independent development and deployment of agents, including secure sandboxes. Below that are the models themselves, involving refinement, alignment with local regulations, and adjustment of model parameters.

The third layer is compute, the infrastructure area where Huawei has built a strong reputation over the years. This layer deserves some extra attention. This primarily involves GPU cards and inference processors. These are the components for which Huawei has developed its own technology, as geopolitical developments prevent it from relying on industry standards like Intel/AMD and Nvidia. On the one hand, this has been addressed with Arm-based Kunpeng processors that handle storage controllers and server tasks. On the other hand, there are Ascend NPUs for loading datasets and executing AI workloads.

Back to the next layers. The layer below compute is the AI Data Platform, which includes a knowledge base, a KV cache for memory optimization, and a memory system. The bottommost layer is the data lake, designed to consolidate data from various organizational units.

That five-layer approach sounds quite comprehensive and is already highly applicable in production environments. Consider a car manufacturer aiming for fully autonomous vehicles. To train for Level 5 autonomy, it requires more than 100 terabytes of sensor data per vehicle, spread across multiple data centers and globally available. Identifying rare scenarios, such as a dog running across the street at a red light in the rain, requires ultra-fast semantic searches across hundreds of terabytes of video footage per second. This is really only possible with a layered approach.

Yet another application involves AI-driven software development. Large coding projects require dozens or even hundreds of iteration rounds. Reloading all context to the GPU repeatedly is inefficient. A KV cache at the storage level provides each user with fast, dedicated access. On top of that, there is a memory system that stores progress, debug results, and experiences, so the AI agent improves with each round.

Specific products per layer

Huawei translates the five layers into a series of products. For the data lake, there is the OceanStor Pacific storage system, which scales from a few terabytes to 900,000 petabytes. 100 petabytes fit into a 2U chassis, with a power consumption of 25 watts per terabyte. The platform supports semantic search on images, video, and text, including cross-modal queries such as image-to-text.

For the AI data platform, Huawei offers a 3+1 solution. This combines a knowledge base with over 95 percent retrieval accuracy, a petabyte-level KV cache that can reduce time-to-first-token by 45-90 percent, and a built-in memory system. Context Memory Storage (CMS) targets hyperscalers and supercomputers and provides a shared key-value cache. Yuan states that the CMS architecture connects heterogeneous computing systems.

For model engineering, there is a one-stop tool for fine-tuning and switching models, supporting over 30 popular models. The agent platform, called Nexus, operates without requiring users to have programming skills. In theory, doctors, finance professionals, and teachers could build agents using a graphical tool without knowing Python. Prompts and skills are continuously optimized through a self-evolving structure.

Security woven throughout the stack

Due to the geopolitical situation, Huawei regularly receives questions about security. In this regard, the company is making a significant commitment by obtaining the highest security certifications and standards. In this way, Huawei aims to assure European organizations that they can use its critical infrastructure products with confidence. To this end, security is woven throughout the entire infrastructure. Yuan identifies four types of threats. At the agent level, there is a risk that agents will perform unwanted actions, such as modifying files or system settings. At the model level, there is the danger of model poisoning: injecting malicious code to manipulate outcomes. At the platform level, the concern is data tampering, or the modification of mission-critical data. And at the data lake level, ransomware is a threat.

Here, a private AI stack over exclusively public cloud services can be a good choice. Privacy protection, compliance with local regulations, and the ability to generate value independently are the main reasons for this. These factors can be decisive when building a new infrastructure—one that enables data awakening and delivers on the promise of AI in practice.