Status

StateCompleted
Discussion Thread[DISCUSS] AIP-3: Drop Python2 support in Airflow 2.0
JIRA

AIRFLOW-4196 - Getting issue details... STATUS

Created

$action.dateFormatter.formatGivenString("yyyy-MM-dd", $content.getCreationDate())

In Release2.0.0

Motivation

Having to support Python 2 and 3 concurrently causes some maintenance and development burden (which is lessened a bit by six and backports modules), and significant extra test time on Travis.

Python 2 is reaching End of Life in January 1, 2020 and will receive zero updates, even security ones past this date.

Django dropped support for Python 2 with their 2.0 release in December 2017, and this proposal has us follow suit. Airflow 2.0 is already a fairly major breaking change, so this could be an opportune time to do this.

Considerations

Many people are still on Python 2.7, and we will need to consider how we announce this change, and how long we give people to migrate their installs.

We have at least one hooks that is Python2 only - AIRFLOW-2697 (HDFS specifically that uses a Python2 only module, snakebite).

RHEL may not ship an "officially" packaged version of Python 3 (it's hard/impossible? to find out if you aren't already a RedHat customer. An RPM of Python3 is available via EPEL, but that is not an "official" package from RedHat Inc.). My answer to this problem is to encourage companies to pay us to continue supporting Airflow on Python 2.7 (smile)

12 Comments

  1. Thanks for drafting this, Ash.  I feel like it covers our discussion thread well.

    It's really concerning to me how many people are still on Python 2.7.  While there are a small number of edge cases (snakebite etc / http://py3readiness.org/), I think we agree that most people are better off running on Python 3.

    I like the idea of encouraging support of companies depending on Python 2 for it.


  2. The community started recently adding types to the code of Airflow to help new contributors, and make the code more readable/maintainable: https://github.com/apache/airflow/pull/4926/files
    Right now we're limited to setting these types in the comments, to maintain Airflow 2.7 compatibility, which is a pity.

  3. I'll kick off the list of things to fix/improve when dropping Python 2 support. From this list we could create JIRA tickets. Probably missing a lot, feel free to add.

    Intermediate:

    • Place big warning in readme that Python 2 support will be dropped
    • If the interface of something changes, place deprecation warning

    To be resolved:

    • Remove all try import py2 version, except import py3 version imports
    • Remove all sys.version_info[0] == 3 imports
    • Remove all __future__ imports
    • Remove half the CI pipeline, which runs tests on Python 2
    • Replace unicode strings by "normal" strings
    • Make setup.py compatible with Python 3 only
    • Replace os.path by pathlib (exists since Python 3.5)
    • In all classes, replace super(__class__, self).__init__(...) by super().__init__(...) (In Python 3 super(__class__, self) == super())
    • Fix all mypy annotations which are currently comments because Python 2
    • Replace class MyClass(object) by class MyClass() (in Python 3 there's no need to state object)
    • Replace __metaclass__ attribute by metaclass as a class keyword argument
    • Replace @abstractproperty by @abstractmethod (see https://docs.python.org/3/library/abc.html#abc.abstractproperty)
    • Remove references to the imp module, this is replaced by importlib
  4. Snakebite has been a real issue especially in the latest release (1.10.3) where extending the BaseSensorOperator initializes all sensors including HDFS. I know there is a snakebite python3 version (snakebite-py3 for pip) https://pypi.org/project/snakebite-py3/ but I don't want to have to hack my instance but rather let it figure out the correct dependencies.

    from airflow.operators.sensors import BaseSensorOperator

    from airflow.hooks.hdfs_hook import HDFSHook

    from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
    File "/usr/local/lib/python3.6/site-packages/snakebite/client.py", line 1473
    baseTime = min(time * (1L << retries), cap);

    1. I think we already have related PR https://github.com/apache/airflow/pull/3560 and we decided to move to PyArrow

      1. Good point and that's fine but there is an issues here. Based on the last comment (and this PR is almost a year old now) this PR now has a bunch of conflicts. Essentially might warrant a rewrite (or heavy refactor). Are we going to keep things broken in the interim? IMHO this will be an easy patch with just updating the dependency but not necessarily the end solution.

        1. Yes, the PR submit had been a year, but author still working on it  at 2019-02-28, https://github.com/apache/airflow/pull/3560#issuecomment-461791862

  5. MySQL-python does not support Python3 and because Apache-Airflow has removed support for Python2, it is difficult to find a workaround for MySQL-python.

    1. I don't see MySQL-python in https://github.com/apache/airflow/blob/master/setup.py. Instead we have mysqlclient, which should suffice, right?

    2. We run our tests against mysql on python3 using https://pypi.org/project/mysqlclient/ "This is a fork of MySQLdb1. (that adds Py3 support)" - you should use that instead.

  6. Ash Berlin-Taylor Fokko Driesprong Kaxil Naik Bas Harenslak
    What is the current state of this AIP? If you are planning to do any other work, could I ask you to migrate your ticket to Github Issue.  If we've done all the work, can I request a status update for this AIP.