Open source object storage company MinIO has popped open DataPod, a reference architecture for building data infrastructures that will support exascale AI and large-scale data lake workloads. Object storage is well-suited to AI because it manipulates typically unstructured data (often located in expansive data lakes) as distinct units alongside its metadata with a custom identifier. Prolifically used in new data services built with generative AI, a reference architecture acts as a common language for software engineers to use with recommended structures, integrations and optimal delivery methods when building complex projects. As intelligence services now mushroom and snowball, exascale AI is – as it sounds – a measure of ‘supercomputer’ performance that features software systems capable of processing at least one exaflop (exa floating point operations per second) for AI modeling and analytics.
That’s a lot of storage for a lot of compute, which is why MinIO is attempting to lay claim to being the de facto object storage technology for AI.
Exabyte expectations
As deep learning workloads within AI scale towards exabyte levels, the complexity and cost of AI data infrastructure obviously increases. Why? Because AI data infrastructure needs to support high concurrency (several or many computing calculations need to happen at the same time with synchronous real-time immediacy) and be able to handle varied Input/Output (I/O) workloads across different AI pipeline phases. They also need to be able to deliver extremely high throughput and ensure low latencies. Perhaps the biggest challenge of all is the fact that at the exascale level, there is no model that supports keeping that level of data in the public cloud with its data access and egress charges.
“Looking back, 2023 was the year of experimentation with generative AI, but in 2024, companies will look to move these workflows into production, leaning heavily on the foundational data infrastructure behind them,” said Anand Babu ‘AB’ Periasamy, co-founder and co-CEO at MinIO. “We’re seeing customers increase their storage footprints by 4x to 10x to support AI initiatives, while repatriating workloads back to the private cloud because the financial mathematics dictates it – everything you do in the public cloud, you can do on the private cloud, but at a savings of 60%-70%. MinIO DataPod provides the roadmap for building a data infrastructure that seamlessly scales with AI deployments, while keeping costs in check.”
It seems clear, high-end data management and analytics are two must-haves to derive the most value from corporate data and enterprises are constantly collecting and storing data for AI applications. MinIO is built to power analytics with scalability that allows organizations to expand their storage capacity on-demand.
Microblink and you’ll miss it
“MinIO is essential to Microblink,” said Filip Suste, platform team engineering manager at Microblink, an AI-powered document scanning and verification company. “Our global clients rely on us for the highest level of data security and MinIO allows us to provide that while maintaining complete control over our infrastructure.”
With the release of MinIO’s Enterprise Object Store earlier this year, the product set now is tailored for large-scale AI and machine learning, data lake use and database workloads. The platform itself is essentially software-defined and can be used for running on any cloud or on-premises infrastructure. With AI workloads demanding a combination of hardware and software-defined storage, MinIO’s new infrastructure is supposed to allow infrastructure administrators to set up the required commodity of off-the-shelf hardware with MinIO enterprise object store.