4 min

Data moves. While some data will inevitably come to rest and reside in a storage repository with one or other level of accessibility (depending on mission-criticality, cost of storage or other factors), data mostly moves between application workloads, analytics engines and more. 

This core cadence teases technology vendors into using the term ‘data-driven workloads’, when of course really, almost all workloads are data-driven.

Regardless though, we will inevitably be talking about the transport of data in various forms and guises for all eternity, a truth that has driven the tech trade to develop a whole slew of orchestration offerings.

Firmly residing in this space is Alluxio, a company that describes itself as an open source data orchestration platform specialist focused on (yes you guessed it) data-driven workloads, which in this case does actually refer to workloads running in environments such as large-scale analytics and AI/ML.

Compute engines – storage systems

Currently celebrating its version 2.9 release with an open bar party and unlimited canapes for all, Alluxio highlights its technology’s position as the ‘key layer’ between compute engines and storage systems. In terms of functionality, this layer provides support for scale-out multi-tenant architectures (i.e. both essentially horizontal IT system expansions that involve the adoption more resources) – and it does this with a new cross-environment synchronisation feature.

It also manages that scale-out function through enhanced manageability with improvement in the tooling and guidelines for deploying Alluxio on Kubernetes, plus also… there is improved security and performance with a strengthened S3 API (Amazon Simple Storage Service) and POSIX API (Portable Operating System Interface).

Variety of regions, engines, storage

Users working with Alluxio have said that the technology works well for environments where an organisation’s infrastructure is spread across regions, compute engines and storage types. 

“We are running one thousand nodes of Alluxio to optimise model training jobs and interactive queries,” said Peng Chen, engineer manager in the big data team at Tencent, a Chinese multinational technology and entertainment conglomerate and holding company. 

Adit Madan, director of product management at Alluxio explains that ‘tenant-dedicated satellite clusters’ have become more common while architecting data platforms. He points to Alluxio’s ability to actively synchronise metadata across multiple environments, which is said to make the adoption of such an architecture easier.

What is tenant isolation?

Talk of tenant-dedicated clustering is obviously the kind of thing that cloud engineers fill their lunch break discussions with, but what does this mean and how would tenant isolation into dedicated zones help a cloud architecture to work better, or more effeciently or more cost-effectively?

Alluxio explains it quite succinctly and says that tenant isolation provides the scale and economic benefits of a multi-tenant architecture while preventing different teams from competing for access to shared data lake storage. 

With a new cross-environment synchronisation feature, Alluxio evolves its architecture with improved scalability and manageability enabling data platform teams to deploy multiple per-tenant Alluxio clusters between compute and storage cluster across any environment, based on workload capacity. 

According to Madan and team, running Alluxio on Kubernetes helps standardise deployment methodologies across cloud, multi-cloud, hybrid-cloud and on-premises environments. This new release introduces the Alluxio operator, which simplifies deploying, configuring, provisioning and managing multiple Alluxio clusters, reducing DevOps complexity. 

Alluxio on Kubernetes also makes data stack portable to any environment, preventing vendor lock-in. Lastly, in Alluxio 2.9, authentication and access policies are now centralized through the communications between compute engines and Alluxio via S3 API. Therefore, Alluxio provides a unified security experience across heterogeneous storage either on-premise or in the cloud.

Cross-cloud reality

There’s an obvious theme here and it’s the rise of the heterogeneous cloud. Alluxio mentioned multi-cloud hybrid-cloud, but could have also added poly-cloud i.e. the separation of individual (typically large) application and data workloads out across more than one instance with more than one Cloud Services Provider (CSP).

So the trend is clear, which is why Alluxio 2.9 introduces the company’s new cross-environment synchronisation feature. This makes one Alluxio cluster aware of another Alluxio cluster by automatically syncing the metadata between Alluxio clusters. 

Deploying Alluxio clusters across any environment can achieve tenant-level isolation with the metadata of Alluxio clusters in sync at scale. This feature is particularly useful when adopting satellite architecture with compute clusters segregated across team-level tenants for isolation. 

The cloud is growing up and growing upwards, but perhaps now – most of all – the cloud is growing outwards and straddling a wider horizontal footprint, thereby making themes like tenant isolation not just Big Bang Theory (TV show, not the start of the universe) style discussions.

Next time you hear a technologist talk about scalability, ask them whether they mean up, down, left or right?