Snowflake launches Snowpark Connect to run Spark code natively

Snowflake announces the public preview of Snowpark Connect for Spark. The new architecture enables Apache Spark code to run directly in Snowflake warehouses without maintaining separate Spark clusters.

Until now, many Snowflake organizations have chosen the Spark Connector to process Snowflake data with Spark code. However, this approach introduced data movement, resulting in additional costs, latency, and governance complexity.

Snowpark Connect eliminates these issues by executing data processing directly in Snowflake. This prevents data movement and reduces latency, while maintaining a unified governance framework.

The solution works with Apache Iceberg tables, including externally managed Iceberg tables and catalog-linked databases. Organizations can leverage the power of the Snowflake platform without moving data or rewriting Spark code.

Tip: Snowflake further into open data via Apache Iceberg updates

Spark Connect as a foundation

With the introduction of Apache Spark 3.4, Spark Connect became available, a client-server architecture that decouples user code from the Spark cluster. This separation forms the basis for Snowpark Connect.

The new solution eliminates the complexity of managing separate Spark environments. Organizations no longer have to struggle with dependencies, version compatibility, and Spark infrastructure upgrades.

Performance and cost benefits

Snowflake claims significant benefits for customers using Snowpark Client. On average, they see 5.6 times faster performance compared to managed Spark solutions. In addition, they achieve 41 percent cost savings.

With Snowpark Connect, organizations get these benefits without having to rewrite their existing Spark code. The solution supports modern Spark DataFrame, Spark SQL, and user-defined functions (UDFs). Snowflake’s elastic compute runtime with virtual warehouses provides automatic performance tuning and scaling.

Current limitations

Snowpark Connect currently only supports Spark 3.5.x versions and is limited to Python environments. Java and Scala support is in development.

Key Spark functionalities such as RDD, Spark ML, MLlib, Streaming, and Delta APIs are not yet part of Snowpark Connect. Semantic differences may exist between supported APIs and standard Spark implementations.

The solution is available through various clients, including Snowflake Notebooks, Jupyter notebooks, Snowflake stored procedures, VSCode, Airflow, and Snowpark Submit.

Stay tuned, subscribe!

Cisco extends data center to the edge with Unified Edge

SAP opens platform with MCP: AI agents can communicate with SAP

Why your SOC needs a ROC

Cybersecurity needs more women

Qualcomm tells us how ARM chips will disrupt the enterprise PC market

Is ServiceNow competing with Salesforce? We talk to Amit Zavery

Workday CTO outlines bold AI agent strategy and major acquisitions

SAP's AI workforce strategy: upskilling 100,000 employees

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

How to Safeguard and Prepare Exchange Server against Natural Disasters?

Discover Why Northern Europe Chooses Redgate Monitor

Dell Technologies Forum

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices