Pulsar Proposal

Abstract

Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication.

Proposal

Pub-sub messaging is a very common design pattern that is increasingly found in distributed systems powering Internet applications. These applications provide real-time services, and need publish-latencies of 5ms on average and no more than 15ms at the 99th percentile. At Internet scale, these applications require a messaging system with ordering, strong durability, and delivery guarantees. In order to handle the “five 9’s” durability requirements of a production environment, the messages have to be committed on multiple disks or nodes.

Pulsar has been developed at Yahoo to address these specific requirements by providing a hosted service supporting millions of topics for multiple tenants. The current incarnation of Pulsar has been open-sourced under Apache license in September 2016 and it is the direct evolution of systems that were developed at Yahoo since 2011.

We believe there is currently no other system that provides a multi-tenant hosted messaging platform capable of supporting a huge number of topics while maintaining strict guarantees for durability, ordering and low latency. Current solutions would require to run multiple individual clusters with additional operational work and capacity overhead.

Since the open sourcing of Pulsar, the development has been done exclusively on the public Github repository and two major releases were shipped (1.15 and 1.16), along with multiple minor ones. Several other companies have expressed interest in the project and its future direction.

Rationale

Pulsar is a platform that is built on top of several other Apache projects. In particular, Apache BookKeeper is used to store the data and Apache ZooKeeper is used for coordination and metadata storage. Pulsar is also interoperable out of the box with Apache Storm, to provide an easy to use stream processing solution.

We want to establish a community outside the scope of initial core developers at Yahoo and we believe that the Apache Foundation is a great fit and long-term home for Pulsar, as it provides an established process for community-driven development and decision making by consensus. This is exactly the model we want to adopt for future Pulsar development.

Initial Goals

The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines.

Current Status

Pulsar has been in service at large scale for more than 2 years at Yahoo. In this time around 60 different applications were integrated with Pulsar. Other companies are evaluating it as well and have been contributing code to the project.

Meritocracy

We value meritocracy and we understand that it is the basis to form an open community that encourages multiple companies and individuals to contribute and get invested in the project future. We will encourage and monitor participation and make sure to extend privileges and responsibilities to all contributors.

Community

We have validated, through the interest demonstrated by Pulsar users at Yahoo, that a reliable hosted pub-sub messaging platform represent a very important building block for web-scale distributed applications. We believe that many companies can benefit by applying the same model and that bringing Pulsar to Apache will get the community to grow stronger.

Core Developers

Pulsar has been initially developed at Yahoo and received significant contributions from Yahoo Japan. After having open-sourced the project there have been contribution from developers from several external companies.

Alignment

Pulsar builds upon other Apache projects such as ZooKeeper and BookKeeper, along with a number of other Apache libraries. We have already integrated with Storm and we envision to integrate with multiple other systems in the streaming and big data space.

Known Risks

Orphaned Products

Yahoo has been doing most of the development and, given that many internal platforms depends on Pulsar, it is heavily invested in the long term success of the the project. Yahoo has a long history participating in open-source projects, and has been also a long time contributor to the Apache community.

Inexperience with Open Source

Many Pulsar contributors are already familiar with the open source process and several of them are committers on other Apache projects. We will be actively working with experienced Apache community members to improve our project.

Homogenous Developers

The initial committers are employed by large companies including Yahoo, Yahoo! Japan, Salesforce and MercadoLibre. We hope to grow the community and to include additional committers based on their contributions to the project.

Reliance on Salaried Developers

It is expected that Pulsar development will occur on both salaried time and on volunteer time, after hours. The majority of initial committers are paid by their employer to contribute to this project. However, they are all passionate about the project, and we are confident that the project will continue even if no salaried developers contribute to the project.

Relationships with Other Apache Products

As mentioned in the Rationale section, Pulsar is closely dependent and integrated with BookKeeper and ZooKeeper and Storm. There are ongoing to integrate with other projects such Apache Spark. We look forward to collaborating with those communities, as well as other Apache communities.

An Excessive Fascination with the Apache Brand

We are applying to the Incubator process because we think it is the next logical step for the Pulsar project after open-sourcing the code in 2016. This proposal is not for the purpose of generating publicity. Rather, we want to make sure to create a very inclusive and meritocratic community, outside the umbrella of a single company. Yahoo has a long standing history of contributing to Apache projects and the Pulsar developers and contributors understand the implication of making it an Apache project.

Documentation

Initial Source

The Pulsar codebase is currently hosted on Github: https://github.com/yahoo/pulsar. This is the exact codebase that we would migrate to the Apache Software Foundation.

Source and Intellectual Property Submission Plan

The Pulsar source code in Github is currently licensed under Apache License v2.0 and the copyright is assigned to Yahoo. All the contributions from external parties have been received under Apache style CLA. If Pulsar fulfills and passes the conditions for being an Incubator project in the ASF, Yahoo will transition the source code ownership to the Apache Software Foundation via the Software Grant Agreement.

External Dependencies

To the best of our knowledge, all of Pulsar dependencies are distributed under Apache compatible licenses.

External dependencies licensed under Apache License 2.0:

Athenz, JCommander, HPPC - High Performance Primitive Collections for Java, FasterXML Jackson, Caffeine Async Cache, GSon, Guava, Netty, DataSketches, Joda-time, Jna Java Native Access, Lz4-java, AsyncHttpClient, Jetty, SnakeYAML

ASF Projects:

BookKeeper, ZooKeeper, Storm, Log4J, Commons (BeanUtils, CLI, Codec, Collections, Configuration, Digester, IO, Lang, Lang3, Logging)

Others:

Required Resources

Mailing lists

Git Repository

Issue Tracking

Initial Committers

Affiliations

Sponsors

Champion

Nominated Mentors

Sponsoring Entity

PulsarProposal (last edited 2017-05-17 02:11:01 by BryanCall)