Astronomer is the company behind Astro, a managed service for Apache Airflow. Increasingly enjoying proliferation and adoption across software application development communities, Apache Airflow is an open source platform for developing, scheduling and monitoring batch-oriented data pipelines or workflows. This summer’s latest platform release at Astronomer sees additions to Astro that include new capabilities for pipeline resilience and release velocity. With new support for dbt on Astro, this marks the platform’s first expansion ‘beyond’ Airflow as it now seeks to truly widen its platform engineering playbook .
As a technical aide-mémoire, dbt is a SQL-first transformation workflow designed to allow developers to deploy analytics code according to best practices that cover aspects of software engineering including modularity, portability, CI/CD and documentation.
According to Analytics8, “[As a technology], dbt (data build tool) makes data engineering activities accessible to people with data analyst skills to transform the data in the warehouse using simple select statements, effectively creating the entire transformation process with code. [Users] can write custom business logic using SQL, automate data quality testing, deploy the code and deliver trusted data with data documentation side-by-side with the code.”
Data and analytics consultant Simon Collis, that freedom to perform analytics functions is more important today than ever due to the shortage of data engineering professionals. He suggests that anyone who knows SQL can now build production-grade data pipelines, reducing the barrier to entry that previously limited staffing capabilities for legacy technologies.
Pipelines & data flows
Apache Airflow is said to be the industry’s de-facto standard for expressing pipelines and orchestrating data flows as code. Widely recognised as an advanced workflow management and orchestration solution, Airflow provides over 1600 data integrations with most of the popular databases, applications, AI frameworks and tools, as well as hundreds of cloud services.
Since its launch in 2023, Astro has been a way to run Airflow with enterprise-grade features. This commercially managed option costs money, yes… but it also allows teams to focus on data instead of pipelines, consolidate their data stack and confidently upgrade and roll back versions of Airflow.
“For years, Astro has given our customers the best way to run Airflow and many have told us they want us to bring unified orchestration and observability to other technologies that are critical to their data stack,” said Andy Byron, chief executive officer, Astronomer. “If it adds value and lowers the cost of ownership for our customers, Astronomer will continue to build out a unified platform to support whatever enterprises are building, from analytics to AI.”
Users of Airflow’s orchestration and observability functions on Astro have led to a desire for Astronomer to extend support to other popular open source software, building out a unified platform for every facet of data-driven enterprises. Astronomer insists that it has begun executing on that vision with this summer’s release, allowing customers to orchestrate and run dbt Core on Astro in a single pipeline with a few lines of code.
Astronomer will continue to add support for additional open source software in the coming years.
Cosmos code cometh
In 2023, due to demand from customers and the Airflow community, Astronomer released Cosmos, an open source package that allows users to integrate dbt projects into Airflow in a few lines of code. One year later, Cosmos is downloaded 1.3M times per month, making it the most popular way to orchestrate dbt with Airflow. Now, Astronomer software engineers have a fully managed way to deploy and run dbt and Airflow together on Astro.
“Our open-source integration became incredibly popular very quickly, but it doesn’t solve all the challenges of running dbt and Airflow together,” said Julian LaNeve, chief technology officer, Astronomer. “In a time where organisations are being asked to do the same or more with the same budgets, we’ve had customers come to us with concerns about the price of other approaches at the enterprise scale, and we’re seeing them move their dbt transformations onto Astro where it’s 10x cheaper while still solving the operational challenges.”
As the so-called ‘fully-managed layer of enterprise-grade’ technologies permeates deeper into the world of open source data management, there appears to be little rebuttal or negative reaction from the actual user base itself. That may mean the data-developer cognoscenti places great trust in firms like Astronomer to deliver robust functions and still give back to the open source, or it may mean they just need ‘stuff that securely works’ in real world deployments – let’s hope it’s a bit of both.