Abstract

Linkis builds a computation middleware layer to decouple the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.), while enables cross engine context sharing, unified job& engine governance and orchestration.

Linkis codebase: https://github.com/WeBankFinTech/Linkis


Proposal

Linkis is designed to solve computation governance problems in complex distributed environments (typically in a big data platform), where you have to deal with different types, versions, or clusters of underlying data engines and hundreds of diversified engine clients at the upper application layer as well.

Linkis acts as a proxy between the upper applications layer and underlying engines layer. By abstracting and implementing the 3 common phases of a job/request for submit, prepare and execute, Linkis is able to facilitate the connectivity, governance and orchestration capabilities of different kind of engines like OLAP, OLTP (developing), Streaming, and handle all these "computation governance" affairs in a standardized reusable way.

We are actively operating the Linkis community and we are looking forward to increase community activity continuously.

We propose to contribute the Linkis codebase to the Apache Software Foundation. We believe that bringing Linkis into Apache Software Foundation and following the COMMUNITY-LED DEVELOPMENT "APACHE WAY" could continuously improve project quality and community vitality.


Background

In today's complex and distributed environment, the communication, coordination and governance of application services have developed mature solutions from SOA to micro-services, and many practices from ESB to Service Mesh to decouple different services.

However, things go different while an application service needs to communicate with the underlying engines. Engines are isolated from each other, and the client-server tight coupling pattern goes everywhere. Each and every upper application has to directly connect to and access various underlying engines in a tightly coupled way, and solves the "computation governance" problems on its own, including maintaining different client environments, submiting the job, monitoring job status, fetching the output, handling large number of concurrent client instances, watching the bad jobs, adapt to engine version changes, etc.

It lacks a common layer of "computation middleware" between the numerous upper-layer applications and the countless underlying engines to handle all these "computation governance" affairs in a standardized reusable way, that's why we started the Linkis project.

Firstly, Linkis could reduce the complexity of connectivity. Instead of maintaining a variety of engine client environments, users now only need to install the Linkis client, or even just HTTP client while using the REST interface. Routing query to desired clusters could be done by simply providing a tag.

Secondly, Links provides governance capabilities such as multi-tenancy, concurrency control, resource management, query validation, privilege enhancement and auditing.

Meanwhile, Linkis enables orchestration strategies such as routing, load-balance, active-active and hybrid computation across engines (some still under development).


Rationale

Linkis is built on distributed microservice architecture with great scalability and extendibility. The enhancements of high concurrency and fault tolerance make it more stable and reliable. It has already supported many production environments with large number of daily jobs over a long term.

Linkis's microservices are divided into 3 groups: Computation Governance Services, Public Enhancement Services, and Microservice Governance Services.

  • Computation Governance Services(CGS) group is responsible for the core process of job/request submission, preparation and execution, lifecycle management, resource management, validation and orchestration.

  • Public enhancement Services(PES) group provides basic public functions including job context sharing, material management and data source management, to serve other Linkis services and upper application systems.

  • Microservice Governance Services(MGS) group includes customized Spring Cloud Gateway, Eureka and Open Feign, to provide basic functions like routing, service registration and discovery, and RPC framework.

By providing capabilities of multi-tenant, high concurrency, job dispatching/management policies, unified resource control and orchestration, Linkis makes the submission, preparation and execution of computation jobs more flexible, reliable and controllable, and successfully return the results. It could greatly reduce the overall development, operation and maintenance costs, and the architecture complexity.

Based on Linkis the computation middleware, new upper layer applications could be quickly developed by reusing the Linkis computation governance functions, as what’s done in the open source big data platform suite “WeDataSphere” (https://github.com/WeBankFinTech/WeDataSphere).

Linkis currently mainly supports OLAP and Streaming engines, and we are planning to support OLTP engines better. Containerization is also one of the important development directions of Linkis.


Initial Goal

- Migrate the existing codebase, website, and documentation to Apache-hosted infrastructure.

- Work with the infrastructure team to implement and approve our code review, build, and testing workflows in the context of the ASF.

- Incremental development and release under Apache guidelines.

- Grow and diversify the Linkis community in the Apache Way.


Current Status

Meritocracy

Linkis project was started at WeBank and has been an open-source project on GitHub since July 2019. Linkis has been quickly adopted by many organizations, more than 500 organizations have tested Linkis based on our sandbox application records, dozens of them have introduced Linkis into production based on the users’ spontaneous feedbacks, distributed in various industries including banking, telecommunications, insurance, manufacturing, education, internet, etc.

Linkis already has contributors and users from different companies. We’ve set up the Committer team and we’re constantly seeking for potential new committer. New Contributors are always highly welcomed and guided by existed committers. Users could get timely support from community IM groups and GitHub.


Community

Linkis now has 15 committers from 6 companies including WeBank, China Telecom, Kanzhun Ltd., iQIYI Inc., HONOR Mobile Phone, and Samoyed Digital. We have a developer IM group for more than 100 people from different organizations, and 9 user IM groups for more than 4,500 people.


Core Developers

The core developers of Linkis are working in the big data team of different companies, mainly in WeBank since the project was initiated there.

- Shuai Di (WeBank)

- Qiang Yin (WeBank)

- Heping Wang (WeBank)

- Yongkun Yang (WeBank)

- Zhiyue Yang (WeBank)

- You Liu (WeBank)

- XiaoGang Wang (China Telecom)

- Hui Zhu (Kanzhun)

- Zheng Wang (iQiyi)

- Rong Zhang (Honor)


Releases

Linkis has released multiple versions as listed here: https://github.com/WeBankFinTech/Linkis/releases

We will follow the ASF guidelines more closely, and adopt the ASF source release process upon joining the incubator.


Code Reviews

Linkis’s code reviews are currently public on Github: https://github.com/WeBankFinTech/Linkis/pulls .


Alignment

As Linkis was built to address connectivity and other computation governance issues with various underlying engines, it depends on multiple ASF projects such as Spark, Flink, Hive and Hadoop. Linkis’s Engine Connector Manager service will start different Engine Connectors to connect to different underlying engines, providing computation governance abilities which benefits the usage and maintenance of these engines. Linkis will continue to expand the types of engines it supports in ASF projects, such as HBase, Kylin, and more.


Known Risks

Orphaned Products

The risk of Linkis becoming an orphan product is very low, because it’s already been the core infrastructure component in the production environments of dozens of companies' big data platforms, including large companies like WeBank, China Telecom, Ping An Insurance Company, Hikvision, etc. Hundreds of thousands of computation jobs are performed through Linkis in these companies everyday. Developers from these companies are increasingly joining the Linkis community as contributors.

Linkis has 12 major releases so far, and received 355 PRs from contributors, which indicates the activity and vitality of the Linkis community. Linkis is also the core component of the open source big data platform suite “WeDataSphere”, even more users and developers are already active in this larger community. We are looking forward to further expand and diversify the community by joining Apache. We are also futher improving the adherence to the Community-Led development pattern, and the standardization and transparency of community governance.

Inexperience with Open Source

Linkis’s core developers have been running Linkis as a community-oriented open source project for a period of time, some of them already have experience working with other open source communities. The current Linkis user group scale of more than 4500 people is also a proof of our commitment and passion for operating the open source community.

Meanwhile, we’ve begun to refine our community governance efforts under the guidance of Apache mentors, and we’ll learn more about how to operate the open source community effectively and properly by following the Apache way in our incubator journey.


Homogenous Developers

Most of the current core developers work at WeBank where the Linkis project started. We also had developers from China Telecom, Kanzhun, iQiyi and Honor Mobile Phone elected to the committer group, and already have led the release of several versions of Linkis. Samoyed Digital has the latest nominated committer because of their solid contributions to Linkis data source management module.

Though Linkis community may not be diverse enough yet, we are constantly looking for new contributors and potential committers to enhance the diversity of the community and the vitality of the project.


An Excessive Fascination with the Apache Brand

We acknowledge that the Apache brand would add a lot of value and reputation to Linkis, and will benefit the cooperation and promotion at the global scale. However, our primary purpose is to build a more diverse and viable community and to gain stability for long-term development as submitting Linkis to Apache. We will also strictly follow the ASF's rules and policies under the guidance of the Incubator PMC.


Documentation

Documentation about Linkis can be found at https://github.com/WeBankFinTech/Linkis-Doc . Following links provide more information:

- Codebase at Github: https://github.com/WeBankFinTech/Linkis

- Issue Tracking: https://github.com/WeBankFinTech/Linkis/issues

- Releases: https://github.com/WeBankFinTech/Linkis/releases



Initial Source

https://github.com/WeBankFinTech/Linkis



External Dependencies

Back-end:



Dependencies


License


Comment


caffeine


Apache 2.0




cglib


Apache 2.0




commons-beanutils


Apache 2.0




commons-codec


Apache 2.0




commons-collections


Apache 2.0




commons-dbcp


Apache 2.0




commons-exec


Apache 2.0




commons-io


Apache 2.0




commons-lang3


Apache 2.0




commons-math3


Apache 2.0




commons-net


Apache 2.0




commons-text


Apache 2.0




dozer-core


Apache 2.0




druid


Apache 2.0




fastjson


Apache 2.0




gson


Apache 2.0




guava


Apache 2.0




hadoop-auth


Apache 2.0




hadoop-client


Apache 2.0




hadoop-common


Apache 2.0




hadoop-hdfs


Apache 2.0




hadoop-yarn-client


Apache 2.0




hive-common


Apache 2.0




hive-exec


Apache 2.0




hive-jdbc


Apache 2.0




httpclient


Apache 2.0




httpmime


Apache 2.0




jackson-annotations


Apache 2.0




jackson-databind


Apache 2.0




jackson-module-scala


Apache 2.0




javacsv


LGPL




jaxrs-ri


CDDL, GPL 1.1


will remove


jersey-container-servlet


CDDL, GPL 1.1


will remove


jersey-container-servlet-core


CDDL, GPL 1.1


will remove


jersey-entity-filtering


CDDL, GPL 1.1


will remove


jersey-json


CDDL, GPL 1.1


will remove


jersey-media-json-jackson


CDDL, GPL 1.1


will remove


jersey-media-multipart


CDDL, GPL 1.1


will remove


jersey-server


CDDL, GPL 1.1


will remove


jersey-servlet


CDDL, GPL 1.1


will remove


jersey-spring3


CDDL, GPL 1.1


will remove


jetty-server


Apache 2.0, EPL 1.0




jetty-webapp


Apache 2.0, EPL 1.0




json4s-jackson


Apache 2.0




jsp-api


CDDL, GPL 2.0


will remove


junit


EPL 1.0




libthrift


Apache 2.0




log4j-1.2-api


Apache 2.0




log4j-api


Apache 2.0




log4j-core


Apache 2.0




log4j-slf4j-impl


Apache 2.0




mockito-all


MIT




mybatis-plus-boot-starter


Apache 2.0




mysql-connector-java


GPL 2.0


will remove


netty-all


Apache 2.0




pagehelper


MIT




poi-ooxml


Apache 2.0




protostuff-api


Apache 2.0




protostuff-core


Apache 2.0




protostuff-runtime


Apache 2.0




py4j


BSD 2-clause




reactor-netty


Apache 2.0




reflections


BSD 2-clause




scalacheck


BSD 3-clause




scalacheck-shapeless


Apache 2.0




scala-compiler


Apache 2.0




scala-library


Apache 2.0




scalamock-scalatest-support


MIT




scalap


Apache 2.0




scala-reflect


Apache 2.0




scalatest


Apache 2.0




slf4j-api


MIT




spark-core


Apache 2.0




spark-hive


Apache 2.0




spark-repl


Apache 2.0




spark-sql


Apache 2.0




spark-testing-base


Apache 2.0




spoiwo


MIT




spring-boot


Apache 2.0




spring-boot-actuator-autoconfigure


Apache 2.0




spring-boot-starter


Apache 2.0




spring-boot-starter-actuator


Apache 2.0




spring-boot-starter-aop


Apache 2.0




spring-boot-starter-cache


Apache 2.0




spring-boot-starter-jetty


Apache 2.0




spring-boot-starter-log4j2


Apache 2.0




spring-boot-starter-quartz


Apache 2.0




spring-boot-starter-reactor-netty


Apache 2.0




spring-boot-starter-web


Apache 2.0




spring-cloud-commons


Apache 2.0




spring-cloud-config-client


Apache 2.0




spring-cloud-context


Apache 2.0




spring-cloud-gateway-core


Apache 2.0




spring-cloud-starter


Apache 2.0




spring-cloud-starter-config


Apache 2.0




spring-cloud-starter-gateway


Apache 2.0




spring-cloud-starter-netflix-eureka-client


Apache 2.0




spring-cloud-starter-netflix-eureka-server


Apache 2.0




spring-cloud-starter-openfeign


Apache 2.0




spring-core


Apache 2.0




spring-jdbc


Apache 2.0




spring-security-crypto


Apache 2.0




spring-test


Apache 2.0




spring-tx


Apache 2.0




spring-web


Apache 2.0




websocket-client


Apache 2.0, EPL 1.0




websocket-server


Apache 2.0, EPL 1.0




xlsx-streamer


Apache 2.0




xstream


BSD 3-clause



Front-end:



axios          


MIT




highlight.js      


BSD-3-Clause




iview          


MIT




lodash         


MIT




moment         


MIT




monaco-editor      


MIT




sql-formatter      


MIT




svgo          


MIT




vue           


MIT




vue-i18n        


MIT




vue-router       


MIT




vuedraggable      


MIT




vuescroll        


MIT





Required Resources

Mailing List

Currently Linkis has no mailing list. The usual mailing lists are expected to be set up when entering incubation:

private@linkis.incubator.apache.org for PPMC discussions;

dev@linkis.incubator.apache.org for development discussions;

notification@linkis.incubator.apache.org for user notifications, and notifications from GitHub.


Git Repositories

Upon entering incubation, we request to move the existing repository from https://github.com/WeBankFinTech/Linkis to Apache infrastructure like https://github.com/apache/Incubator-Linkis.


Issue Tracking

The Linkis community would like to continue using GitHub Issues if possible.


Other Resources

Apache Jenkins


Source and Intellectual Property Submission Plan

Most of the current code is Apache 2.0 licensed and the copyright is assigned to WeBank. If the project enters incubator, WeBank will transfer the source code & trademark ownership to ASF via a Software Grant Agreement.


Initial Committers

- Shuai Di (shuaidi1024@gmail.com)

- Qiang Yin (enjoyyin91@gmail.com)

- Heping Wang (wpeace1212@gmail.com)

- Yongkun Yang (wimkunkun@gmail.com)

- Zhiyue Yang (zjyzy19920513@gmail.com)

- You Liu (liuyou181020@gmail.com)

- Deyi Hua (david_hua1996@hotmail.com)

- Le Bai (blgg931026@gmail.com)

- Xiaogang Wang (Adamyuanyuan@gmail.com)

- Hui Zhu (huashuizhuhui@gmail.com)

- Zhen Wang (wangzhen077@gmail.com)

- Rong Zhang (brian.rongzhang@gmail.com)

- Xiaohua Yi (yixiaohuamax@gmail.com)

- Ke Zhou (bleachzk@gmail.com)

- Jian Xie (Jackyxxie@gmail.com)


Affiliations

Shuai Di, Qiang Yin, Heping Wang, Yongkun Yang, Zhiyue Yang, You Liu, Deyi Hua, Le Bai, Ke Zhou and Jian Xie of the initial committers are employees of WeBank.

Xiaogang Wang of the initial committers is an employee of China Telecom.

Hui Zhu of the initial committers is an employee of Kanzhun.

Zhen Wang of the initial committers is an employee of iQiyi.

Rong Zhang of the initial committers is an employee of HONOR Mobile Phone.

Xiaohua Yi of the initial committers is an employee of Samoyed Digital.


Sponsors

Champion

Junping_Du (ASF Member, IPMC Member), junping_du@apache.org


Nominated Mentors

Duo Zhang (ASF Member, IPMC Member), zhangduo@apache.org

Jerry Shao (ASF Member, IPMC Member), jshao@apache.org

Junping_Du (ASF Member, IPMC Member), junping_du@apache.org

Lidong Dai (IPMC Member), lidongdai@apache.org

Shao Feng Shi (ASF Member, IPMC Member), shaofengshi@apache.org


Sponsoring Entity

We request the Apache Incubator to sponsor this project.


  • No labels