Microsoft and the .NET Foundation released version 1.0 of .NET for Apache Spark. The package enables developers to use Spark in .NET programming languages such as C# and F#.
To develop the package, a team of people from Azure Data Engineering, the Mobius project and .NET was put together.
Big data with C#
There was a demand to be able to build big data applications without the requirement for knowledge of Scala or Python. Until now, those were the main two languages with support for Spark.
In response to that demand, at Microsoft Build 2019, the .NET Foundation announced it was working on support for Apache Spark. After twelve pre-release versions, the project is ready for use.
When the organization showed the first preview of the package, it claimed that the .NET integration would deliver over twice the performance as when Python is used. According to Microsoft, this gain has been maintained in the release version.
With version 1.0 of .NET for Apache Spark, developers can build Spark applications with .NET user-defined functions (UDF) and .NET applications that adhere to .NET Standard 2.0, although .NET Core 3.1 or later is recommended.
In addition, there is support for DataFrame APIs, including Spark SQL, based on Apache Spark 2.4/3.0. There is also an API extension framework for adding support to other Spark libraries.
Future versions will receive support for Language Integrated Query. The team is also prioritizing additional deployment options and integration with DevOps pipelines.
Version 1.0 of .NET for Apache Spark is built into the next major release of Azure Synapse and Azure HDInsight and works on Windows, macOS and Linux.