Note

This page is a collection of opinions of different stakeholders as planning for Airflow 3.0 Release. It is forked-off the discussion in Devlist: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach. It is just a set of points, no concrete planning yet and not agreed. Please rather see this page as a playground to collect items to discuss and notes to share.

As this page is intended to collect thoughts and ideas, please feel free to adjust content and leave your name as vote/stakeholder. It is intended that we can use this space to collaborate between Airflow maintainer group and interested stakeholders.

If one argument fits into multiple sections please add it just to one. We might move the content and re-align over time, redundancy is not helpful first place.

Airflow 2.x Pain Points (Which require to make a 3.0)

Note

In Airflow 2.0 Planning (see Airflow 2.0 - Planning [Archived]) there were a couple of learning's from Airflow 1.x considered and a planning for 2.0 was made. It was a great effort to carry it over. For sure times are changing, complexity and requirements have been moved since 2.0. In this chapter/table please describe and collect the points that hinder you/us to change existing 2.x chain to achieve the needed requirements.

Item / Description
Describe your Airflow 2.x Pain point that need to be fixed in 3.0
Stakeholder backing
Please add your name if you agree - so we see how many people share this pain
Open Questions to discuss
Please post questions that need to be discussed for understanding of the raised point or clarifications (preventing too many comments, please directly here as text)
Any Tactical Alternatives?
Please add/describe any tactical alternative that would be an option to prevent a 3.0

LLM/Gen-AI mainly as the important trigger

Reiterating the fact that this needs more work, I do believe this can be incremental to Airflow. As Astronomer, we have worked on the LLM Providers which we contributed to Airflow late last year. But clearly, there is so more to do, both from building awareness of the patterns / templates to use, as well as patterns to support in Airflow to make these easier to use and adopt.

  • Q by Jens Scheffler: Can you please describe what important trigger can not be satisfied with Airflow 2.x? Is it the flexibility of workflows, speed of releases, complexity of setup, missing integrations?
  • Incrementally improve provider w/o breaking change to Airflow 3.0?
Cloud Native is the "way to go"
  • Q by Jens Scheffler: Is this point rather targeted that Airflow is not "just offered as a cloud native PaaS" by all major hyperscalers and the cloud-native offering is not as good as other engines? Or is it rather that you think Airflow lags features of better native integrations to hyperscalers service offerings of services which Airflow need to interact with?

Need to submit DAGs in other ways than dropping them to a shared DAG folder

Different DAG distribution processes

  • Q by Jens Scheffler: I always understood otherwise the discussions that the DAG files are also one core conceptual thing also in regards of security. If we get the feeling that this approach is to cumbersome for a lot of users I believe the DAG dropping approach still could be changed w/o the need to define Airflow 3.0 for this with an tactical alternative. This is also in conjunction with AIP-63: DAG Versioning
  • Provide an alternative to Git Sync and let the DAGs deployed by other means.
Local testing and fast iteration on developing pipelines
  • Q by Jens Scheffler: I also see that local development is sometimes complex and we also struggle to have some scripts allowing easy local DAG development. But I see it as main burden that all needed back-ends are not available on the same level or directly reachable because of security on the place where you develop, But I'd like to understand more details of this, can you please elaborate on this pain?

  • Would it help to develop a better "dry run" for all major operators such that local development can test better based on mocks?

Ability to run tasks with workflow with "affinity" so that they can share inputs/outputs in shared CPU/GPU memory

  • Q by Jens Scheffler: You mean similar like Kubernetes but w/o the burden to off-load these functions and features into a K8s cluster to implement an affinity? Or what pain do you refer to?
  • Allow templating to "executor" and "queue" fields such that by workflow logic tasks can be routed to a specific endpoint/runner

Ability to integrate seamlessly with other workflow engines - making Airflow a "workflow of workflows"

  • Q by Jens Scheffler: What explicitly are you missing here? Via some provider packages you could integrate or enhance. What would you expect "on top" that requires Airflow to be 3.0? (I would find it cool but not critical to expose other workflow engines tasks into the Grid and status but this is too much dream)
  • Enhancements to provider packages with deeper integration (Github/Gitlab/ADO/which?)

Maintaining 700+ Python dependencies within all complexity of provider packages

As we have with the integrated provider packages a complex ecosystem we need to fix CI almost every second day. It would be good to think about modularization of provider packages and code - even with keeping a monorepo such that dependencies in provider packages can be modeled independent (e.g. keep a Venv per provider) and not a global Python site-package tree. This was very visible in migration to Python 3.12.

  • Allow provider packages to be executed in a Venv such that not all python dependencies must get installed together in the same site-package tree. But on the other hand Operators need to be parseable in the DAG, so Operator parsing and execution might need to be un-bundled.

First user experience: Most common failures in DAG authoring

Many user fail in their first DAG editing (if they do not start with a copy&paste template) because with the Python code they do not understand the difference between the DAG parsing and execution stage. It is not directly visible what "global code" and code between operators/tasks is executed when. Compensation is the path to Jinja Templating which is also in most cases added complexity. TaskFlow approach had made it better in many cases but is not a viable approach for many operator which just consume "properties" as configuration.
If you compare users how they can easily define pipelines in Github or ADO in YAML, for many users a YAML based definition might be a much easier entry point.
I propose we re-discuss how DAGs are modelled such that "pitfalls" and "first user problems" are not needed to be answered via TaskFlow.

  • Vikram Koka - don't know if I was hitting the same that you also wanted to raise? Else split this into a second item please.
  • For YAML maybe integrate something liek "DAG Factory" into the core as alternative if no real Python Code for DAGs is needed. Still leave the Pythonic approach of DAG creation to power users of course.
  • Today the UI very much in the hidden secret detail pages shows the "Rendeed Template" fields. But for the users they need to manually assemble the "final" properties which are applied to the Operator at point of execution (together with the properties on another page that are not templated) - Would be cool to have a better view of actually which parameters alltogether (templated or not) were used to execute a task in the graph (Mainly for non TaskFlow tasks)

Airflow User Improvements

  • Q by Jens Scheffler: Can you please describe what important UI can not be satisfied with Airflow 2.x? Is it the flexibility of UIs, missing integration points?
  • Add capacity to UI shortcomings. UI team is anyway "under-staffed", it could well be that also Airflow 3.0 will suffer from this and raising a 3.0 discussion w/o capacity on UI will not make UI better than today.

Easy adoption of Airflow by new users

We have discussed this many times, but we absolutely need to make the individual first-time adoption of Airflow better.

I think the most common term I recall here is the notion of "Airflow Standalone", but whatever the term may be, an ultra quick, simple install of Airflow and the getting started experience is something we owe our community.

  • Jens Scheffler: I actually see that Airflow Standalone is quite cool. Main breaking point is that for "Connections" and modelling the DAG a lot of other dependencies need to be installed and in most cases w/o backend access (e.g. K8s for running KubernetesPodOperator) it is hard to test what you code. Don't know if structural changes will help here?
  • Would it help to develop a better "dry run" for all major operators such that local development can test better based on mocks?

Integration improvements / Provider maintainability

The changes we made as part of Airflow 2.0 to split the Core Airflow releases from the Provider releases was clearly a good choice and made a huge impact. However, the integration maintainability balanced with growth still seems like it could use a significant set of improvements. Elad and I spoke about this a couple of days ago as well and I don't have a clear set of next steps here, but definitely worth exploring.

  • Q by Jens Scheffler: Can you please elaborate more what need to be improved to understand the concern?

Airflow 3.0 Fundamental (Breaking) Concept Demand / Wish List

Note

Airflow 2.x was under the focus to keep a Semantic Release Versioning promise, so we piled-up a lot of things (e.g. technical interfaces) which we did not change and defined as "will be cleaner in Airflow 3.0" or where we could not change structures because of our backwards-compatibility promise. Please list things that are in your mind which need to change and are a (mainly technical) reason to spin a 3.0 version. Besides technical items this should also list fundamental concepts.

Item / Description
Describe the Airflow 3.0 Breaking Point that can not be achieved non-breaking in 2.x. Also try to sketch the "Value" it brings to user or product.
Describe the Pressure
Please describe what the impact would be if we would not go with this, e.g. competitive/comparable products that carry this and Airflow has a gap because of existing 2.x concepts
Stakeholder backing
Please add your name if you agree - so we see how many people share this demand
Open Questions to discuss
Please post questions that need to be discussed for understanding of the raised point or clarifications (preventing too many comments, please directly here as text)

DAG / Workflow Support for non-linear complexity.

DAGs are only one-way. And findemantally this is because loops would call a task multiple times and you would need to keep the context and execution history all separate (see also AIP-64: Keep TaskInstance try history). But there is real demand to have support for "experimental approaches" calling for loops, e.g. attempt to train a network until the desired state is reached. For such cases a DAG is not the right thing, a workaround would be to have a long running task that calls a second DAG in a loop.

Support non-linear workflows which have experimental character, unable to model such as DAG.

What we must Keep from 2.x Approach

Note

We know that no software is perfect. We always have more wishes. Consider we are moving to Airflow 3.0, which things from the 2.x is a "must have" to keep. Not that we forget about these.

Item / Description
Describe the Airflow 2.x approach or feature we must not break.
Rationale
Describe the reason
Stakeholder backing
Please add your name if you agree - so we see how many people share this requirement
Open Questions to discuss
Please post questions that need to be discussed for understanding of the raised point or clarifications (preventing too many comments, please directly here as text)

Continuing to have the option of using the many thousands of operators with 90+ providers

If we would lose all providers with 3.0 all the ecosystem would need to start from scratch. This would make Airflow un-usable.
  • Q by Jens Scheffler: We can assume that small adjustments must be made such that providers are made compatible with 3.0? Or would you assume it is not allowed to have breaking changes on this side?

Allowing to scale and complexity of DAGs we have with Airflow 2.x today

Because setups existing  today must be further supported with Airflow 3.0 as well


Things we need to consider as "Promise" for Migration

Note

Most of the contributors are with Airflow since 1.x. If not then most of us at least as a user have gone through the migration from 1.x to 2.x. In this chapter/are please list the things that need to be assured for a 3.x planning that we need to consider. We know we need to make a user transition "easy" to migrate over from 2.x to 3.0 - assuming with a 3.0 version we do not want to lose a large user base.

Item / Description
Describe the Airflow 2.x approach or feature we must not break.
Rationale
Describe the reason
Stakeholder backing
Please add your name if you agree - so we see how many people share this requirement
Open Questions to discuss
Please post questions that need to be discussed for understanding of the raised point or clarifications (preventing too many comments, please directly here as text)
With a 3.0 version we move a lot of existing installs out of the comfort zone. Users might be scared to migrate and will be long time on a 2.x release until all stability and function is possible in a 3.x branch. We also might lose users and the effort of migration will have many considering to migrate to other products / solutionsIn support channels (Slack/Github) we see a lot of requests still coming for Airflow 1.x and 2.3-2.5 - seems a lot of people are not regularly upgrading. If we release a 3.0 with a lot of breaking changes we might lose a lot of users and installs.

Ideas for Target Airflow 3.0 Design

Note

In this section please sketch ideas and Designs we should consider. This is the most "playground-like" areas. Rather drop more than less, see it as "brainstorming" field.


Item / Description
Describe the Airflow 2.x approach or feature we must not break.
Stakeholder backing
Please add your name if you agree - so we see how many people share this requirement
Open Questions to discuss
Please post questions that need to be discussed for understanding of the raised point or clarifications (preventing too many comments, please directly here as text)



  • No labels