4 min

Tags in this article

, ,

Data is mushrooming. Because data sets are getting bigger (every day) and because we’re seeking to extend the realm and purview of data analytics and data models across increasingly vast and typically ever more distributed data topographies, our ability to evidence reach and scope in data has never been greater. Among the myriad methodologies being used to cope with this scale are methods designed to break analytics down into smaller bite-sized modular chunks, so how does this approach work?

One company that positions itself in this space is Voltron Data, a specialist in accelerating modular and composable data analytics systems.

In an era when many of us may have thought we’d moved beyond the tagline of big data (surely almost all data is on the verge of massiveness if we aggregate all the work being carried out in typical enterprise applications, right?), Voltron is still comfortable talking about it as a core technology paradigm. Indeed, the company was named one of the ‘10 Hottest Big Data Startups of 2022’, if that’s the kind of list that floats your boat.

More charge than Spark?

New from Voltron right now is Theseus, a distributed execution engine built to shoulder data processing challenges at a scale that is said to be ‘beyond the capabilities’ of CPU-based analytics systems like Apache Spark. 

As many will know, open source Apache Spark is a multi-language engine for executing data engineering, data science and machine learning on single-node machines or clusters – so how does Voltron claim to have more electrical charge than Spark?

According to Josh Patterson, co-founder and CEO of Voltron Data, organizations at the forefront of AI are constrained by the data processing tasks associated with making it happen in the first place i.e. ETL, feature engineering and data transformation being key parts of AI/ML. 

This means, he says, that they cannot ramp up AI capabilities efficiently because they cannot afford to build out big data CPU clusters fast enough. The performance divergence between GPUs and CPUs is only growing; this problem is getting exponentially worse.

We don’t need no legacy CPUs

AI systems are headed straight for The Wall – an inflexion point where CPU-based big data systems reach peak performance and can no longer keep up with GPU-powered AI platforms. We won’t be able to keep up with AI demand at scale until data processing fundamentally changes. Data processing engines must leverage accelerated compute, memory, networking and storage. We are thrilled to introduce Theseus to the world – an engine that is built to leverage the latest hardware innovations and helps companies get over The Wall,” said Patterson.

Voltron Data’s Theseus is a distributed query engine built to take advantage of full system hardware acceleration. This engine enables data system builders to unify data analytics and AI pipelines on GPUs while lowering energy consumption and carbon footprint. Theseus is composable because it plugs into enterprise data platforms through open, modular standards, such as Arrow, Ibis, RAPIDS, Substrait, Velox etc. Users don’t need to move their data; they can swap out engines based on their needs, and they can use their programming language of choice.

Accelerated & embeddable

The technology here runs on accelerated semiconductors, such as Nvidia GPUs, x86 and ARM processors, with plans to support more semiconductors in the future. Plus it’s embeddable, Voltron Data offers a distributed query engine with a revenue share model instead of a pay-per-use model. Partners can build new (or enhance existing) data analytics products on top of Theseus and Voltron Data only sees revenue when partners generate revenue. 

“The end goal for organizations is to scale their use of AI,” said Mike Leone, principal analyst at Enterprise Strategy Group. “[They also want to] scale data-driven decision making, empower more stakeholders and enable more complex analytical and predictive models. [This is all] putting pressure on existing data systems. Data growth, analytical/application complexity and data movement will require organizations to rethink their data infrastructures when AI scale challenges and bottlenecks arrive. Voltron Data is solely focused on delivering a scalable data engine that unifies hardware, languages and frameworks to solve the eventual scale problem that organizations definitively will hit with existing data platforms.”

As Bradley Shimmin, chief analyst for AI platforms, analytics and data management at Omdia has said, the constant influx of new technologies, data types and data sources together conspire to hold data processing back, turning what should be a crucial, timely asset into a costly performance bottleneck. 

Spark has reached its limit

“In the era of AI, enterprises face a proliferation of data sources, abstraction of coding languages and strategic needs for every employee to be more data-driven. At the same time, Spark has reached its limits as an analytic processing system for the generation of big data,” said Hyoun Park, chief analyst of Amalgam Insights. “As the average enterprise now accesses over a thousand data sources, businesses must invest their data processing capabilities to support the next order of magnitude for analytics and AI demands. Voltron Data has taken an important step forward with this maiden voyage of Theseus to solve all of these data issues for the era of AI.”

Theseus aims to answer the challenges presented here and is available to enterprises and government agencies as well as through partners – HPE is the first partner to embed Theseus as its accelerated data processing engine as part of HPE Ezmeral Unified Analytics Software.

Free image use: Wikimedia Commons