Abstract

Fury is a high-performance, multi-language, and automatic serialization framework powered by JIT and zero-copy.

Proposal

Fury provides a fast and easy-to-use serialization framework for multiple languages with the following:

  • Efficient protocol:
    • Zero-copy: cross-language out-of-band serialization; direct accessed row format without parsing.
    • Meta share: minimizes schema/tag overhead across multiple objects and serialization.
    • Multiple binary protocols: object graph, flat graph, and row format for different scenarios.
  • Efficient implementation:
    • Generate highly efficient serializer code at runtime to speed serialization.
    • Use static codegen when dynamic codegen is not feasible.
  • Easy-to-use API
    • No-IDL, operating on language-native objects directly; 
    • Support polymorphism and references(optional).
    • 100% compatible with JDK serialization API with much faster implementation, drop-in replaces JDK/Kryo/Fst/Hessian without modifying any code.
    • Support graalvm native image without meta config.


Currently, Fury supports cross-language serialization among Java/Python/Golang/Rust/Scala/TypeScript, delivering performance up to 170x faster than JDK and is the fastest serialization framework in jvm-serializers.

Background

Fury was developed at Ant Group in 2019 as a component of distributed computing engines and open-sourced at Github in 2023.07.

A number of systems at Ant Group, including big data (Flink/Spark), microservices (SOFA/Dubbo), and AI (online learning, search recommendation), use Fury for serialization every day. Alibaba uses Fury in many scenarios, too.

Rationale

Fury provides fast performance by generating highly efficient serializer code at runtime and easy-to-use serialization by operating on language-native objects directly.

We believe the performance provided by Fury will be increasingly useful for big data and serving systems compared to other frameworks, such as Kryo/Fst/Hessian/JDK/Pickle, as data volumes and the need for faster transfer/processing/loading continue to grow.

On the other hand, we think there can be a more easy-to-use serialization programming model for building cross-language systems.

For new developers, they are forced to learn Protobuf's or Flatbuffer's IDL language, schema compiler, and generated serialization code API for every involved language to build simple applications; for advanced cross-language circular/shared reference and polymorphism support, developers are forced to write tedious, cumbersome and error-prone code, which will be a disaster for deeply nested object graphs. The automatic cross-language serialization provided by Fury frees developers from these complexities.

A detailed description of the design and target use cases can be found at Fury Blog.

Initial Goals

  • Build a more diverse community with contributors for multiple language implementations.
  • Facilitate the adoption and integration of Fury by ensuring its neutrality and duration.
  • Get more feedback to standardize our protocols and provide binary compatibility.

Current Status

Meritocracy

Fury was inner-sourced within Ant Group by Chaokun in 2022. Weipeng, who is the maintainer of Fury NodeJS and Rust, joined later. Since being open-sourced in 2023.07, Fury has gained strong interest from companies and individuals. We will continue supporting new contributors, and those who make contributions with quality will be invited as committers.

Community

Fury was inner-sourced last year with its initial community built up inside Ant Group. With Fury open-sourced at Github in 2023.07, we put more effort into building the community. We received a number of positive feedback: link1, link2, link3, also the issue list.

Users

Currently, Fury has a group of individual users, and organizations such as Alibaba/Vipshop are Fury users, too. Here are some of Fury’s use cases:

  • Ant Ray uses Fury for serialization at 100W+ CPU cores every day at Ant Group.
  • Ant Mars replaced its serialization with Fury, which sped up task scheduling TPS by 2.5X and  4X for data transfer.
  • Ant real-time-graph computing systems replaced  Kryo with Fury to speed graph serialization.
  • A few companies replaced Kryo with Fury in Flink jobs for faster data transfer and state persistence.
  • Lindorm at Aliyun uses Fury for serialization between clients and servers.
  • Taobao Android app, which has 880 million monthly active users, is considering using Fury for IPC serialization between Android processes.
  • Vipshop replaced Protobuf with Fury, reducing the end-to-end latency by 30ms.

Developers

As for developers, although Fury has attracted 20+ developers in the last four months, we have only three core developers: Chaokun, Weipeng, and Mingyang. 

It's a risk indeed; the community is not diverse and big enough currently. We expect to attract more contributors in the future to address this by evolving the software and following The Apache Way.

The need for serialization is tremendous - it provides the potential for a large community.

Core Developers

  • Chaokun Yang: He founded this project, working at Ant Group. He is an open-source enthusiast. (GitHub ID: chaokunyang)
  • Weipeng wang: He is the author of Fury NodeJS and Rust, working at Ant Group. He is a NodeJS & Rust wizard. (GitHub ID: wangweipeng2)
  • Mingyang Liu: He is the core developer of Fury C++. He is a PMC member of Apache Kvrocks. (GitHub ID: PragmaTwice)

Alignment

Fury can be used to speed up serialization for data transfer, state persistence, and task scheduling in Apache projects such as Dubbo, Spark, Flink, Fluo, and so on.

Known Risks

Project Name

The name of Fury is short, easy to remember, and also indicates that Fury is fast. Ant Group has registered a trademark for Fury and will donate it to ASF.

Orphaned Products

Fury is used widely in Ant Group; many core products use Fury every day. Developers in Ant Group will continue improving this project to provide better support for current and future requirements.

Besides, other organizations, such as Alibaba/Vipshop, also use Fury for their core products, and several of them have already contributed to Fury. Serialization is used widely; we believe the developer and user communities will continue to grow.

Inexperience with Open Source

The creator of the Fury, Chaokun Yang, is an open-source enthusiast who has actively participated in the open-source community for over five years. He has contributed to many open-source projects, including Ray, Mars, etc.

In addition, ASF Members tison and PJ Fanning also contribute to Fury and would be the mentor of this project.

Length of Incubation

Expect to enter incubation in two months and graduate in about two years.

Homogenous Developers

Currently, Fury has only three core developers, but they are not homogenous: although Chaokun and Weipeng work at the same company, they know each other only due to their common interest in Fury. Mingyang Liu joined the Fury community recently, and he mainly contributed to C++ part of Fury.

We don’t have enough diversity for now. It’s a risk, although we’re optimistic about future developer diversity. Since Fury is open-source, we have attracted more than 20 developers to contribute. We will keep building community diversity following The Apache Way.

Reliance on Salaried Developers

Although Fury is created at work time in Ant Group, Chaokun and Weipeng contribute to Fury in their spare time. They love the process of building such a versatile framework and the value it brings to all users and organizations. They will continue to work on Fury even if they leave their current cooperation, and Mingyang Liu also contributes to Fury in his spare time. We plan to attract more committers to address this risk. 

Relationships with Other Apache Products

Fury can be integrated with many Apache projects for faster serialization:

  • Dubbo: Dubbo has supported the use of Fury for RPC serialization.
  • Flink:  Flink at Ant Group uses Fury to serialize record data in DataStream, state serialization, and task serialization.
  • RocketMQ: rocketmq-streams uses Fury for serialization at Alibaba.
  • HBase: Alibaba has a hbase-compatible system called Lindorm using Fury for serialization between clients and servers.
  • Spark: Spark can use Fury to serialize RDD record data, state, task, and RPC messages.
  • Arrow: Fury provided row format and automatic conversion from/to arrow columnar format, which can be a complement for arrow.

We believe such integration could also be applied to and benefit the open-source Flink/HBase, and we have a plan to discuss with the Flink/HBase community for the upstream.

An Excessive Fascination with the Apache Brand

We believe the Apache way and its neutrality, not only the brand, will help Fury grow. Multiple-language serialization is a community-driven project. A neutral organization will be better for the community than a single company in the long run.

Documentation

The documentation is hosted at https://www.furyio.org/ and generated from https://github.com/fury-project/fury-sites.

Initial Source

Fury was created in 2019.07 and open-sourced in 2023.07 under Apache Licence 2.0.

Source and Intellectual Property Submission Plan

External Dependencies

Release Dependencies:

  • BSD ZERO
    • Tslib:2.4.0
  • BSD-3-Clause license
    • numpy
    • cloudpickle
  • Python License
    • pickle5
  • MIT
    • chrono:0.4.31
    • lazy_static:1.4.0
    • quote:1.0.33
    • syn: 2.0.39
    • proc_macro2:1.0.69
    • node-gyp:9.4.0
  • Apache-2.0
    • org.slf4j:slf4j-api:jar:1.7.30
    • com.google.guava:guava:jar:32.1.2-jre
    • org.codehaus.janino:janino:jar:3.1.10
    • org.javassist:javassist:jar:3.28.0-GA
    • org.apache.arrow:arrow-vector:jar:5.0.0
    • org.apache.arrow:arrow-memory-core:jar:5.0.0
    • org.apache.arrow:arrow-memory-unsafe:jar
    • https://github.com/abseil/abseil-cpp
    • cython >= 0.29.14
    • wheel
    • pyarrow 6.0.1


Test Dependencies:

  • Apache 2.0
    • org.testng:testng:jar:7.5.1
    • org.projectlombok:lombok:jar:1.18.30
    • com.github.olivergondza:maven-jdk-tools-wrapper:jar:0.1
    • org.apache.commons:commons-lang3:jar:3.12.0
    • com.esotericsoftware:kryo:jar:4.0.0
    • com.esotericsoftware:reflectasm:jar:1.11.3
    • com.esotericsoftware.minlog:minlog:jar:1.2
    • de.ruedigermoeller:fst:jar:2.57
    • org.apache.logging.log4j:log4j-api:jar:2.20.0
    • org.apache.logging.log4j:log4j-core:jar:2.20.0
    • org.apache.logging.log4j:log4j-slf4j-impl:jar:2.20.0
    • io.timeandspace:smoothie-map:jar:2.0.2
    • org.apache.avro:avro:jar:1.11.3
    • com.google.flatbuffers:flatbuffers-java:jar:2.0.3
    • com.google.protobuf:protobuf-java:jar:3.16.3
    • org.openjdk.jmh:jmh-core:jar:1.33
    • org.openjdk.jmh:jmh-generator-annprocess:jar:1.33
    • com.caucho:hessian:jar:4.0.63
    • io.protostuff:protostuff-core:jar:1.7.2
    • io.protostuff:protostuff-runtime:jar:1.7.2
    • com.alibaba.fastjson2:fastjson2:jar:2.0.34
    • jest-junit: 16.0.0
    • typescript: 4.8.4
    • https://github.com/apache/arrow
    • com.google.flatbuffers:flatbuffers-java:jar:2.0.3:
  • BSD-3-Clause license

Cryptography

N/A

Required Resources

Mailing lists


Subversion Directory

N/A

Git Repositories

Issue Tracking

The community would like to continue using GitHub Issues.

Other Resources

The community has already chosen GitHub actions as continuous integration tools.

Initial Committers

tison's comment: Although only three initial committers are listed above, PJ (who contributes to Jackson also) and I, as mentors, would participate in the development. Also, another podling that I mentored, named OpenDAL, has four initial committers but so far invited nine (days before its tenth) committers and two PPMC members, done eight (now during its ninth) releases. From my experience with Fury's initial committers, I saw several shared characteristics with OpenDAL's members. So, I'd invest efforts to help this project grow within the ASF Incubator.

Sponsors

Champion:

tison [tison@apache.org]

Nominated Mentors:

tison [tison@apache.org]
PJ Fanning [fanningpj@apache.org]
Yu Li [liyu@apache.org]
Xin Wang [xinwang@apache.org]
Enrico Olivelli [eolivelli@apache.org]

Sponsoring Entity:

The Incubator


  • No labels