A new tool for data engineers has just been unveiled at the Snowflake Summit. Openflow, a data integration service based on Apache NiFi, is coming to the AI Data Cloud. Openflow helps get data (structured, unstructured, streaming, and SaaS) from different sources to AI applications quickly and reliably. Users get access to hundreds of ready-made connectors, but can also build their own.
Snowflake recognizes that data movement is one of the biggest challenges companies face when deploying AI. Data is often spread across different systems, both on-premises and in the cloud. This is exactly where Openflow comes in, simplifying the process from data extraction to use in AI applications.
The platform uses Apache NiFi, an open source project for automating data flows between different systems. By integrating this well-known technology into a managed service, companies can avoid setting up complex infrastructure for data movement themselves. Snowflake can rely on this open source project thanks to its acquisition of Datavolo at the end of 2024. Using “data processors,” Datavolo technology automates processes for extracting, cleaning, transforming, and enriching data.
AI requires a different approach
In recent years, Snowflake has been working to make the AI Data Cloud more widely applicable. Initially, the platform was mainly suitable for traditional analytics workloads, like business intelligence. In such use cases, structured data in batch processes is sufficient. However, the move into AI workloads, which began two years ago, also requires a broader perspective. The data requirements for AI differ significantly from those of traditional business intelligence. AI models require access to both structured and unstructured data, including text, images, and real-time data streams.
“Snowflake Openflow dramatically simplifies data accessibility and AI readiness,” said VP of Product Chris Child at the launch of the new tool. Child emphasized that more companies are embracing an AI-first data strategy. This strategy requires access to all business data in a single platform. The new platform supports both streaming and batch processing. For real-time applications, Snowpipe Streaming can process up to 10 gigabytes per second. Data is available for queries within 5 to 10 seconds after ingest. This performance enables inline transformations during the streaming process. Data no longer needs to be stored before it can be processed, which reduces the total processing time.
Hundreds of connectors available immediately
One of the key benefits of Openflow is its hundreds of pre-built connectors. Companies can connect directly to systems such as Salesforce Data Cloud, Microsoft SharePoint, Oracle, Workday, and ServiceNow. Messaging platforms and cloud object stores are also supported. The image below shows the various systems that can be connected to.
Based on the image above, we can conclude that Snowflake Openflow supports most major enterprise systems. If there is no connector available for your system, data engineers can build a custom connector in minutes. Snowflake recognizes that this flexibility is important for companies with specific IT architectures or legacy systems.
Open source as a foundation
Openflow builds on Apache NiFi, an open source framework used by thousands of enterprise organizations. They use it as an integration and automation platform for designing, visualizing, and managing data flows between systems. By embracing the open standards of Apache NiFi, Openflow aims to prevent vendor lock-in. Snowflake has expanded the foundation of Apache NiFi with enterprise features, including governance, security, and observability.
All data integration is centralized in a single platform with extensible connectivity to various data sources. Users can choose between deployment in their cloud environment via Bring Your Own Cloud or via Snowpark Container Services. Both options are offered as managed services to reduce the operational burden.
Unlike legacy data platforms that often cause vendor lock-in, Snowflake positions Openflow as an open architecture. Users can move data to different data lakes and lakehouses and adapt it to new industry standards such as Apache Iceberg.
The service is now generally available in all AWS commercial regions via BYOC deployment. Snowpark Container Services is still in private preview.