About externalization

The Flink community has created and maintains multiple Flink connectors, which can be found in multiple locations.

The Flink community wants to improve on the overall connector ecosystem, which includes that we want to move existing connectors out of Flink's main repository and as a result decouple the release cycle of Flink with the release cycles of the connectors. This should result in:

The following has been discussed and agreed by the Flink community with regards to connectors:

This document outlines common rules for connectors that are developed & released separately from Flink (otherwise known as "externalized").

Versioning

Source releases:

<major>.<minor>.<patch>

Jar artifacts:

<major>.<minor>.<patch>-<flink-major>.<flink-minor>

For example, 1.0.0-1.15

This may imply releasing the exact same connector jar multiple times under different versions.

Branch model

The default branch is called main and is used for the next major iteration.

Remaining branches are called v<major>.<minor>, for example v3.2.

Branches are not specific to a Flink version, i.e., no v3.2-1.15 .

Flink compatibility

The Flink versions supported by the project (at the time of writing the last 2 minor Flink versions) must be supported.

How this is achieved is left to the connector, as long as it conforms to the rest of the proposal.


The flink.version  that's set in the root pom.xml should be set to the lowest Flink version that's supported. You can't use highest because there are no guarantees that something that's working in e.g. Flink 1.18 is working in Flink 1.17.


Since branches may not be specific to a particular Flink version this may require the creation of dedicated modules for each supported Flink version.

The architecture of such modules is up to the connector.

For example, there could be:

Support

The last 2 major connector releases are supported with only the latter receiving additional features, with the following exceptions:

For a given major connector version only the latest minor version is supported.

This means if 1.1.x is released there will be no more 1.0.x release

Examples

ChangeInitial stateFinal state

New minor Connector version


Connector versionSupported Flink versionSupport
v1.01.14-1.15patch
v2.01.14-1.15feature



Connector versionSupported Flink versionSupport
v1.01.14-1.15none
v1.11.14-1.15patch
v2.01.14-1.15feature


New major Connector version


Connector versionSupported Flink versionSupport
v1.01.14-1.15patch
v2.01.14-1.15feature



Connector versionSupported Flink versionSupport
v1.01.14-1.15none
v2.01.14-1.15patch
v3.01.14-1.15feature


New major Connector version

The last 2 major version versions do not cover all supported Flink versions.


Connector versionSupported Flink versionSupport
v1.01.14patch
v2.01.15feature



Connector versionSupported Flink versionSupport
v1.01.14patch
v2.01.15patch
v3.01.15feature


New minor Flink version

An older connector does not support any supported Flink version.


Connector versionSupported Flink versionSupport
v1.01.14patch
v2.01.15feature



Connector versionSupported Flink versionSupport
v1.01.14none
v2.01.15-1.16feature


Externalization guide

https://github.com/apache/flink-connector-elasticsearch/ is the most complete example of an externalized connector.

Git history

When moving a connector out of the Flink repo the git history should be preserved.

Use the git-filter-repo tool to extract the relevant commits.

As an example, the externalization of the Cassandra connector required these commands to be run (in a fresh copy of the Flink repository!!!):

python3 git-filter-repo --path docs/content/docs/connectors/datastream/cassandra.md --path docs/content.zh/docs/connectors/datastream/cassandra.md --path flink-connectors/flink-connector-cassandra/
python3 git-filter-repo --path-rename flink-connectors/flink-connector-cassandra:flink-connector-cassandra

The result should be that only the desired modules related to the connector exist in your local branch.

Then rebase this branch on top of the bootstrapped externalized connector repo, then apply changes to make things actually work.

Parent pom

We have a parent pom that connectors should use.

	<parent>
		<groupId>org.apache.flink</groupId>
		<artifactId>flink-connector-parent</artifactId>
		<version>1.1.0</version>
	</parent>

It handles various things; from setting up essential modules (like the compiler plugin), to QA (including license checks!), testing and Java 11/17 support.

(Almost) everything is opt-in, requiring the project to put a plugin into the <build>  section.

See the bottom of the <properties>  for properties that sub-projects should define.

Making changes to the parent pom

Making changes to the parent pom requires releasing org.apache.flink:flink-connector-parent artifact. But before releasing it, the changes can be tested in CI with the test project hosted in the ci branch. As the 2 components are not hosted in the same branch, a workaround so that the test project can use this updated parent without releasing it is to:


steps:
      - name: Temp check out parent_pom code
        uses: actions/checkout@v3
        with:
          ref: "my_parent_pom_branch"

      - name: Temp install parent_pom
        run: mvn clean install


CI utilities

We have a collection of ci utilities that connectors should use.

https://github.com/apache/flink-connector-shared-utils/tree/ci_utils

The CI utilities requires maintainers to think about against which Flink versions the connector should be tested. Most likely this is something like this:

  1. CI for PRs

    The push_pr.yml workflow can be used like this:

    jobs: jobs:
      compile_and_test:
        strategy:
          matrix:
            flink: [ 1.17.2 ]
            jdk: [ '8, 11' ]
            include:
              - flink: 1.18.1
                jdk: '8, 11, 17'
        uses: apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils
        with:
          flink_version: ${{ matrix.flink }}
          jdk_version: ${{ matrix.jdk }}
      python_test:
        strategy:
          matrix:
            flink: [ 1.17.2, 1.18.1 ]
        uses: apache/flink-connector-shared-utils/.github/workflows/python_ci.yml@ci_utils
        with:
          flink_version: ${{ matrix.flink }}


  2. CI for nightly/weekly checks

    The weekly.yml  can be used like this:

name: Nightly
on:
  schedule:
    - cron: "0 0 * * 0"
  workflow_dispatch:
jobs:
  compile_and_test:
    if: github.repository_owner == 'apache'
    strategy:
      matrix:
        flink_branches: [{
          flink: 1.17-SNAPSHOT,
          branch: main
        }, {
          flink: 1.18-SNAPSHOT,
          jdk: '8, 11, 17',
          branch: main
        }, {
          flink: 1.19-SNAPSHOT,
          jdk: '8, 11, 17, 21',
          branch: main
        }, {
          flink: 1.17.1,
          branch: v3.0
        }, {
          flink: 1.18.0,
          branch: v3.0
        }]
    uses: apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils
    with:
      flink_version: ${{ matrix.flink_branches.flink }}
      connector_branch: ${{ matrix.flink_branches.branch }}
      jdk_version: ${{ matrix.flink_branches.jdk || '8, 11' }}
      run_dependency_convergence: false


Release utilities

We have a collection of release scripts that connectors should use.

https://github.com/apache/flink-connector-shared-utils/tree/release_utils

See the contained README.md for details.

Documentation integration

The documentation should follow this structure:

<root>/docs/content/<english_content>
<root>/docs/content.zh/<chinese_content>

See https://github.com/apache/flink/tree/master/docs#include-externally-hosted-documentation for more information on how to integrate the docs into Flink.

For generating a Maven dependency pom snippet, use the connector_artifact shortcode instead of artifact. This allows the Flink docs to inject the Flink version suffix.

Common review issues

Lack of production architecture tests

Within Flink the architecture tests for production are centralized in flink-architecture-tests-production, while the test architecture tests are spread out into each module. When externalizing the connector a separate architecture tests for production code must be added to the connector module(s).

Dependency convergence errors on transitive Flink dependencies

Flink is pulling transitively pulling in different version of dependencies like Kryo or objenesis, that must be converged in the connector.

Excess test dependencies

Flink defines several default test dependencies, like JUnit4 or hamcrest. These may not be required by the connector if it was already migrated to JUnit5/assertj.

DockerImageVersions usages

The DockerImageVersions class is a central listing of docker images used in Flink tests. Since connector-specific entries will be removed once the externalization is complete connectors shouldn't rely on this class but handle this on their own (either creating a trimmed-down copy, hard-coding the version or deriving it from a Maven property).

Bundling of flink-connector-base

Connectors should not  bundle the connector-base module from Flink and instead set it to provided, as contained classes may rely on internal Flink classes.