The AI wave is forcing organizations to rethink their infrastructure

The AI wave is forcing organizations to rethink their infrastructure

The rapid rise of AI is creating a new reality. Whereas infrastructure has traditionally provided a stable basis for traditional workloads, AI applications require scalability, flexibility, and close integration with data and applications. As a result, organizations can no longer stick to familiar models and tooling. The AI wave is forcing them to take a critical look at how their infrastructure is currently set up and what it needs to look like in the future. We delve into this AI infrastructure during a roundtable discussion with experts from AWS, NetApp, Nutanix, Pure Storage, Red Hat, and SUSE.

In the previous article we published following the roundtable, it became clear that AI only works if the infrastructure is right. And that means the entire breadth of the infrastructure: from compute and storage to governance and security. Existing IT foundations need to be adapted or even rebuilt to enable AI and scale it up sustainably and responsibly. Technical and organizational challenges often overlap.

How are organizations dealing with this need for change? What bottlenecks are they encountering in practice? What does the growth of AI mean for their choices regarding cloud, on-premise, and hybrid models? And how are they ensuring that infrastructure does not become a brake on innovation, but rather a catalyst for AI-driven growth?

Between on-premise and cloud

Een man in een blauw shirt zit aan een tafel met drinkglazen, waterflessen en een Coca-Cola fles voor een raam met gedessineerde gordijnen.
Marco Bal (Pure Storage)

For many organizations, the cloud discussion around AI still revolves around the choice between on-premise and public cloud. But according to Marco Bal of Pure Storage, that is too simplistic a dichotomy. “There’s actually something in between,” he says. “In addition to hyperscalers such as AWS, Azure, and Google, more and more AI factories and specialized service providers are emerging that offer AI infrastructure as a service, without the need to move data to the large public clouds.” For companies that do not want to use the public cloud due to compliance or data management reasons, but are seeking flexibility, these alternatives are valuable.

Bal sees the AI wave further deepening the multicloud strategy. Not only are workloads being spread across multiple platforms, but new forms are also emerging between fully managed SaaS services and fully proprietary infrastructure. Ultimately, the form that fits best depends on the use case. “It’s all about flexibility: can you quickly scale your infrastructure and adapt it to changing business requirements?” Those who manage to build in this flexibility will turn infrastructure into a driver rather than a constraint.

Efficiency requires a new way of thinking

Twee mannen zitten aan een vergadertafel met papieren, glazen en naamplaatjes voor zich, in een kamer met gedessineerde gordijnen en een raam met groen buiten.
From left to right: Ricardo van Velzen (Nutanix) and Felipe Chies (AWS)

Ricardo van Velzen of Nutanix shares a provocative thought during the roundtable discussion about the fundamental way organizations deal with data and computing power. “We talk constantly about moving data, but why don’t we ever turn that discussion around? Why do we move data, rather than the CPU?” Van Velzen asks aloud. His point is that moving CPU capacity to where the data is located can be more efficient than the other way around.

This way of thinking invites a rethinking of existing infrastructure models. Instead of moving large amounts of data between systems, which is often slow, expensive, and impractical, it may be smarter to bring computing power closer to the data. This aligns well with the growing demand for flexibility and scalability that AI applications require. Such an approach can help reduce latency and network load, while also enabling faster and more efficient innovation.

By deploying CPU and computing power flexibly, companies can build infrastructures that are better aligned with the dynamics of AI projects. However, this requires a paradigm shift within IT teams, where traditional concepts are abandoned and people are willing to embrace new ways of working.

AI requires a redesign of architectures

Twee mannen zitten aan een houten tafel met naamkaartjes, kopjes en glazen voor zich in een goed verlichte kamer met grote ramen en een spiegel.
From left to right: Eric Lajoie (SUSE) and Pascal de Wild (NetApp)

At the table, Eric Lajoie from SUSE shares a concrete practical example of how AI infrastructure often turns out to be more complex in reality than in a vendor blueprint. He refers to a project in which an organization built a GPU-as-a-service environment based on the Nvidia AI Enterprise architecture. This involved the use of a so-called inner and outer storage ring, in which NVMe storage in an isolated network was separated from a broader data lake. What initially appeared to be best practice turned out to have a significant shortcoming in practice. External parties were unable to import their data easily. In this practical situation, there was no way to get data from a public SaaS environment or other tenants into that storage infrastructure.

According to Lajoie, this is an important lesson for organizations building AI infrastructure. Don’t blindly follow a blueprint; instead, consider the entire data flow, including information from outside sources. Not only does network architecture play a role here, but also the choice of storage protocols, QoS separation between different layers, and compatibility with Kubernetes-based stacks. Many organizations only discover during implementation that some storage providers require kernel modules, while modern operating systems are often read-only. This can have a profound impact on performance, especially when models are running on S3 storage. “Sometimes customers have to fall back on NFS for high-performance storage, simply because S3 doesn’t perform well with their OS,” explains Lajoie.

It is precisely these kinds of infrastructure details that are crucial for successful AI implementations. According to Lajoie, organizations need to learn to think in terms of data flows, regardless of whether they are working in a public cloud or on-premises environment. Air-gapped infrastructures also require secure and efficient methods for bringing data in, he emphasizes. AI therefore also requires a well-thought-out storage ecosystem in which scalability, compatibility, and connectivity are taken into account from day one.

Visibility into data movements

Een man met een bril en een grijs overhemd zit aan een houten tafel met papieren en glazen voor zich, in een kamer met grote ramen en gordijnen die bomen buiten laten zien.
Ruud Zwakenberg (Red Hat)

Pascal de Wild of NetApp also sees that many AI challenges can ultimately be traced back to how organizations handle their data. Whether it’s training, inferencing, or gathering insights, data and computing power must be close together. “The data paths and data movements are crucial,” says De Wild. “You don’t want to just copy all your on-prem data to AWS. It’s about having exactly the right data available at the right time, in the right place.” This requires clear agreements between different infrastructure partners, as well as a precise data policy within the organization itself.

According to De Wild, classifying datasets is a necessary first step in this process. He shares the example of an HR dataset. Such a dataset can only be stored in the cloud if you know exactly what data it contains and what can be done with it. For successful AI implementations, organizations must determine in advance which data is suitable for which infrastructure layer and which data must remain within national borders or on-premises for compliance or privacy reasons. Without these insights, there is a risk that data will be moved unnecessarily, resulting in high costs and risks.

Generative AI requires different infrastructure choices

Felipe Chies of AWS notes that generative AI differs fundamentally from traditional AI workloads, such as computer vision or statistical analysis. “With computer vision, you work with numbers and metrics, and you can determine exactly whether something is correct. But with GenAI, you work with language, human preferences, and semantics. Two answers can mean the same thing but be phrased completely differently,” says Chies. This makes assessing output and model quality a lot more complex, especially if companies want to maintain their own tone or communication style.

To manage this effectively, Chies believes that advanced evaluation tools and model comparison mechanisms are needed. He cites the evaluation options offered by AWS within Bedrock as an example. These allow you to test whether a model is hallucinating by ‘grounding’ answers in facts. This will enable customers to automatically compare multiple answers, test models, and quickly switch to new versions as soon as they become available. This enables rapid evolution.

AI agents are colleagues, so treat them as such

An interesting trend that many companies are keeping an eye on when setting up their AI infrastructure is agentic AI. According to Ruud Zwakenberg of Red Hat, there is a parallel between these AI agents and human employees in terms of how they function within the infrastructure. “An AI agent is basically a new colleague. It’s a knowledge worker, but artificial,” he says. Just like with real employees who have customer contact, such as a bank employee, you, as an organization, are responsible for what is said on your behalf. Such an AI agent must therefore be trained; you don’t want it to simply respond to a customer. And everything you say can be used against you.

According to Zwakenberg, this means that AI systems must be well embedded not only technically, but also organizationally. AI agents that communicate with customers must be monitored just like people, to identify errors or inappropriate responses. You have to treat such an agent as an employee, Zwakenberg emphasizes. That means you must train, supervise, and correct the agent as necessary. You must ensure that it operates within the framework of your brand, policy, and compliance rules.

This requires clear governance around AI. Who is responsible, how do you guarantee quality, and how do you intervene when things go wrong? Zwakenberg advocates an approach in which AI is not seen as an external black box, but as an integral part of the organization. Ultimately, it’s about being able to trust what the AI does on your behalf. In Zwakenberg’s view, you can only achieve this if you organize it as seriously as your other customer channels.

From technology to trust

The common thread running through all the contributions at the table is clear. AI infrastructure is not simply a matter of more computing power or storage capacity. It requires well-considered choices tailored to the nature of the data, the dynamics of workloads, and the organization’s requirements. Moreover, the impact of AI extends far beyond the IT department. It impacts governance, policy, and even how companies define their identity and manage customer relationships.

Anyone who wants to take AI seriously cannot afford to view technology in isolation from its context. The key to sustainable AI innovation lies in the combination of smart infrastructure, robust data foundations, and clear frameworks for responsibility.

This was the second story in our AI infrastructure series. In the next article, we will discuss how AI requires mature choices from companies.