Information is accelerating. From the top-tier information services that we use on our desktop and mobile applications every day, through the middle layers that form the interconnecting fabric of the web and the cloud… and downwards to the backend base substrate operations systems that underpin everything, information is accelerating at every level.
With so much of our total information base now being increasingly fuelled by log file data related to and created by ‘events’ (in the computer science sense, not the party get together sense) created at the compute edge across the Internet of Things, in some zones, information is accelerating exponentially.
Principle, paradigm & process
Where these realities get us to is data accelerating to a point where we talk about data streaming. Sometimes also written as streaming data, this is a computing principle, paradigm and process that describes the time-ordered movement of (typically real-time) data through an IT system and its associated devices, connections and endpoints.
Developing applications with data streaming capabilities is one thing. Securing, maintaining and managing data streaming channels is another. Confluent, Inc. does both, based on enterprise–level services based upon Apache Kafka. As many will know, Kafka is an open source distributed event streaming platform created by Confluent co-founder and CEO Jay Kreps and his colleagues Neha Narkhede and Jun Rao.
The company has detailed its latest capabilities for developers at its annual user conference staged in Austin, Texas this month. Confluent Stream Designer is a point-and-click user interface designed to be a progression toward making data streaming technologies accessible to developers beyond those who are already specialized Apache Kafka experts.
Pinpointing point-in-time pipelines
Alongside the newly launched Stream Designer service, Confluent has also announced a new tier of capabilities for Stream Governance, its governance suite for streaming data.
Also available is the advanced package for Stream Governance, which the company says will help organizations resolve issues within complex data pipelines through ‘point-in-time lineage’ functionality. In other words, users can look into the past — allowing them to see how a data stream changed over a 24-hour period or within any 1-hour window over a 7-day range in order to answer questions like, “What happened to the pipeline on Friday at 3pm when support tickets started arriving?” or “What did this pipeline look like last week when my manager seemed happier with the configuration?”
Paired with new lineage search capabilities, users can see when and where a corruption, error, mismatch or any other erroneous element exists in order to resolve as quickly as possible.
Data will also be easier to find with business metadata, which enables users to add helpful context to data like what team owns the data, how it is being used, and who to contact with questions about it — and they can enforce quality controls globally with the Schema Registry function.
The core technology proposition (and therefore also business proposition) here is that if teams are able to safely and confidently access data streams, organizations can build critical applications faster.
“Businesses heavily rely on real-time data to make fast and informed decisions, so it’s paramount that the right teams have quick access to trustworthy data,” said Chad Verbowski, senior vice president of engineering, Confluent. “With Stream Governance, organizations can understand the full scope of streams flowing across their business so they can quickly turn that data into endless use cases.”
Verbowski and team insist that data streaming use cases are rapidly growing as real-time data powers more of the business. This has caused a proliferation of data that holds endless business value if teams are able to confidently share it across the organization.
Building on the suite of features initially introduced with Stream Governance Essentials, Stream Governance Advanced delivers more ways to easily discover, understand, and trust data in motion. With scalable quality controls in place, organizations can democratize access to data streams while achieving always-on data integrity and regulatory compliance.
Data stream history stories
New capabilities include point-in-time playbacks for Stream Lineage. This is designed to make troubleshooting complex data streams faster and easier with the ability to understand where, when and how data streams have changed over time. Point-in-time lineage provides a look back into a data stream’s history over a 24-hour period or within any one-hour window over a seven-day range..
For example, teams can now see what happened on Thursday at 5pm when support tickets started coming in. Paired with the new ability to search across lineage graphs for specific objects such as client IDs or topics, point-in-time lineage makes it easier to identify and resolve issues in order to keep mission-critical services up for customers and new projects on track for deployment.
Confluent has also tabled business metadata for Stream Catalog so that organizations can improve data discovery with the ability to build more contextual, detail-rich catalogues of data streams. Alongside previously available tagging of objects, business metadata gives individual users the ability to add custom, open-form details represented as key-value pairs to entities they create such as topics. These details, from users who know the platform best, are critical to enabling effective self-service access to data for the larger organization.
While tagging has allowed users to flag a topic as ‘sensitive’, business metadata allows that user to add more context such as which team owns the topic, how it is being used, who to contact with questions about the data, or any other details necessary.
Exploring the catalog is now even easier with GraphQL API, giving users a simple, declarative method to specify and get the exact data they need while enabling a better understanding of data relationships on the platform.
Data streaming mainstream
For want of a less stream-enriched term, we can perhaps now stand back and ask whether data streaming could be about to become a mainstream element of modern enterprise IT stacks. So will it?
The answer is probably yes, increasingly, but not exclusively.
Not every enterprise software application will require data streaming or streaming data prowess. Some less connected more static applications like perhaps a calculator, a human language translation tool and some other more monolithic apps that hail from the legacy stack can mostly get by without data streaming. But these are fast becoming the exception.
Modern applications will be increasingly cloud-native, increasingly always-on and increasingly driven by real-time data, in streams. Providing create, build and governance controls for these flows should enable organizations to elevate to higher airstreams without turbulence. But still, just to be sure, please fasten your seatbelts.