Confluent: How businesses use data streaming

Confluent: How businesses use data streaming

As a real-time data streaming platform specialist, Confluent is known for its enterprise-grade platform based on open source Apache Kafka. With Jay Kreps – the man who cofounded the initial development of Apache Kafka itself – at the helm, it wants to “eradicate the burdensome aspects” of running and maintaining an open source technology for data streaming of this kind at an enterprise level within mission-critical environments. How is that going? We found out during Confluent’s annual user conference in Austin, Texas.

Confluent is working to extend its platform with additional developer tools. Also, it wants to provide a means for Managed Service Providers (MSPs), Original Equipment Manufacturers (OEMs) and Independent Software Vendors (ISVs) to embed Confluent Cloud into their applications in a performant and monetisable way. More on that later, though. Before we dive into Confluent, let’s remind ourselves why an organisation would need to think about using real-time data and streaming technologies in the first place.

Why use data streaming?

Streaming itself (we drop the real-time prefix for readability purposes) can be defined as the continuous flow of data as it is generated by a variety of sources across people, machines and components. The core tenets of data streaming are the processes devoted to making sure data streams can be processed, connected, stored, analyzed, acted upon and governed throughout.

The discipline of streaming encompasses both bounded and unbounded data streams. Bounded data streams have a defined beginning and end because they are defined for a measured span of purpose. Unbounded streams on the other hand have a start but no defined end, not because they have any less purpose per se, but because they do not terminate and provide data as it is generated on an ongoing basis. Data streams manifest themselves in a variety of volumes and formats and can appear at the application level as well as lower down in the ‘information substrate’ across networking devices, server log files and website activity or transactions.

Where does data streaming happen?

We find real world examples of data streaming in real-time stock trades, retail inventory management systems, social media feeds, multiplayer games and ride-sharing apps… to name just a few. Further use cases might appear in remote health monitoring in healthcare, in financial markets analysis in the banking industry, in personalised shopping experience platforms. Other areas include SIEM (Security Information and Event Management): analysing logs and real-time event data for monitoring, metrics and threat detection – and in machine learning and AI where combining past and present data into one central nervous system brings possibilities for predictive analytics.

Confluent provides further colour here and says that, “For example, when a passenger calls [Uber], real-time streams of data join together to create a seamless user experience. Through this data, the application pieces together real-time location tracking, traffic stats, pricing and real-time traffic data to simultaneously match the rider with the best possible driver, calculate pricing, and estimate time to destination based on both real-time and historical data. In this sense, streaming data is the first step for any data-driven organization, fueling big data ingestion, integration and real-time analytics.”

The ‘opposite’ of real-time streams is generally agreed to be batch processing i.e. computer storage and analytics systems that require data to be downloaded in batches before it can worked on (which was traditionally an overnight process, so far from the real-time reality that you know on Uber or your other ride-hailing apps), but legacy batch data processing is for many use cases (but not all) becoming insufficient for modern businesses that need to respond to information in milliseconds, or at least seconds.

CEO Kreps on keynote vision

“Data today – especially for AI applications – has to be the right data in the right place at the right time,” said Kreps, speaking in Austin, Texas this month at Confluent Current, the company’s enterprise-grade Apache Kafka managed services convention. “Imagine we’re working in a consumer bank, we need to connect an intelligent model that has some reasoning abilities to all the data that’s flying around. Some customer questions might be quite simple (like what time does the bank open?), but many questions will be quite specific (queries made that relate to a user’s account), so the training process for compute (and AI) models that serve big batch operations capable of the more complex queries here would traditionally take weeks or months.”

In answer to this need then, Kreps says that we need to combine stored data with runtime data in order to be able to gather all the information we need about different systems and put it in a place where we can make queries of our data repositories in a way through an index pipeline that understands the relevance of side information. But how will connect to all these systems? He says that there’s a great use case for data streaming here – Apache Flink is a case in point – where we understand what AI needs at any point in time.

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments and perform computations at in-memory speed and at any scale.

“We’re moving from a world where software systems are orientated around people, where we have been used for humans to login to systems and perform actions… and we’re going to a world where we’re running continuous applications that require always-on data streaming services in order for us to get the most out of them,” said Kreps. “We know that data lakehouses offer a useful means of information management, but they suffer from after-the-fact governance… so if data has moved upstream without being managed at the base layer, the engineers upstream working on data clean up will not always know about all the data dependencies that exist at base level, so things get broken. As we go past the time when overnight batch processing becomes more archaic, we see those ‘solutions’ as something that is really not in sync with the business. The wider challenge here is that there are many ‘data consumers’ today and it’s not just users, but all the machine-based operational systems that a modern digital business will now use.”

If these upstream elements break, we’re talking about real pain and real revenue risk, therefore we need to shift left with our governance so that the information that enters the stream is clean and usable for business benefits So says Kreps, we need to shift left and think about the concept of data products i.e. being able to present ‘manufactured and productised’ data and data-streaming itself in a way that other teams and individuals inside the business can use… and so that the data itself can flow around the business in real-time.

The question now of course is how we get there? The tools and patterns for stream processing are now so much more widely understood (and stress tested where they have been used at scale at an enterprise-grade level) … and Kreps says that the rise of AI is helping to further power the adoption of data streaming.

What did Confluent announce?

Looking across Confluent’s data streaming announcements this year, the organisation’s new support of Table API makes Apache Flink available to Java and Python developers – Table API is a unified, relational API for stream and batch processing where Table API queries can be run on batch or streaming input without modifications. Confluent’s private networking for Flink provides enterprise-level protection for use cases with sensitive data; Confluent Extension for Visual Studio Code accelerates the development of real-time use cases; and Client-Side Field Level Encryption encrypts sensitive data for stronger security and privacy.

“The true strength of using Apache Flink for stream processing empowers developers to create applications that instantly analyse and respond to real-time data, significantly enhancing responsiveness and user experience,” said  Stewart Bond, research vice president at IDC. “Managed Apache Flink solutions can eliminate the complexities of infrastructure management while saving time and resources. Businesses must look for a Flink solution that seamlessly integrates with the tools, programming languages and data formats they’re already using for easy implementation into business workflows.”

Confluent also used this year’s show to launch its Confluent OEM Program as a service for original equipment manufacturers. The new program for Managed Service Providers (MSPs), Cloud Service Providers (CSPs) and Independent Software Vendors (ISVs) makes it easy to launch and enhance customer offerings with a complete data streaming platform for Apache Kafka and Apache Flink. With a license to globally redistribute or embed Confluent’s platform, partners are promised the ability to bring real-time products and Kafka offerings to market and monetize customer demand for data streaming with limited risk. The programme offers implementation guidance and certification to help partners launch enterprise-ready offerings and ongoing technical support to ensure long-term customer success.

It’s not when, it’s how

“As data-driven technologies like generative AI become essential to enterprise operations, the conversation has shifted from ‘if’ or ‘when’ a business will need data streaming to ‘what’s the fastest, most cost-effective way to get started?’” said Kamal Brar, senior vice president, worldwide ISV and APAC, Confluent. “We help our partners unlock new revenue streams by meeting the growing demand for real-time data within every region they serve. Confluent offers the fastest route to delivering enterprise-grade data streaming, enabling partners to accelerate service delivery, reduce support costs, and minimize overall complexity and risk.”

The need for real-time data has cemented data streaming as a critical business requirement. According to ISG Software Research, by 2026, more than three-quarters of enterprises’ standard information architectures will include streaming data and event processing.

“To meet this need, teams often turn to popular open source technologies like Kafka and Flink,” notes Kreps and team. “However, building and maintaining open source software, especially at scale, quickly becomes prohibitively expensive and time-consuming. On average, self-managing Kafka takes businesses more than two years to reach production scale, with ongoing platform development and operational costs exceeding millions of dollars per year. Over time, solutions built with open source Kafka and Flink consume more and more engineering resources, which impacts a business’s ability to focus on differentiation and maintain a competitive advantage.”

More streaming, more governance, more correctness

Looking ahead, Kreps held a closed press and analyst discussion after his keynote session and completely opened up (as is his affable nature) about what he and the engineering team had either missed and then added, been surprised about, or worked harder than initially expected to get right on the Confluent Cloud services and the organisation’s wider set of platform and tools.

“All those data warehouses and data lakehouses aren’t going away,” said Kreps. “But as they continue to provide their various strains of data store, Confluent seeks to be the flow mechanism that enables streaming to exist effectively and securely across them so that we can enable the real-time interactions that need to happen across an enterprise. Looking onward, governance (and streaming governance in general) will be a key part of that and – if I am honest – I was almost surprised by the amount of governance we were working to apply… but, essentially, this has been immensely popular with the customers, so are following what industry needs in that sense. As we work across the four central pillars of streaming – processing – connecting – and governing, we will continue to ensure that ‘data correctness’ is maintained throughout.”

The stream dream is perhaps no longer a daydream and, if we buy the messages here from Confluent, then it’s certainly no longer a burdensome nightmare either. As the sun sets on Austin, Confluent Current continues next year with European and APAC convention location events, plus a US parent show.

11111111