PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks.
The PredictionIO platform consists of the following components:
The PredictionIO community also maintains a Template Gallery, a place to publish and download (free or proprietary) engine templates for different types of machine learning applications, and is a complemental part of the project. At this point we exclude the Template Gallery from the proposal, as it has a separate set of contributors and we’re not familiar with an Apache approved mechanism to maintain such a gallery.
PredictionIO was started with a mission to democratize and bring machine learning to the masses.
Machine learning has traditionally been a luxury for big companies like Google, Facebook, and Netflix. There are ML libraries and tools lying around the internet but the effort of putting them all together as a production-ready infrastructure is a very resource-intensive task that is remotely reachable by individuals or small businesses.
PredictionIO is a production-ready, full stack machine learning system that allows organizations of any scale to quickly deploy machine learning capabilities. It comes with official and community-contributed machine learning engine templates that are easy to customize.
As usage and number of contributors to PredictionIO has grown bigger and more diverse, we have sought for an independent framework for the project to keep thriving. We believe the Apache foundation is a great fit. Joining Apache would ensure that tried and true processes and procedures are in place for the growing number of organizations interested in contributing to PredictionIO. PredictionIO is also a good fit for the Apache foundation. PredictionIO was built on top of several Apache projects (HBase, Spark, Hadoop). We are familiar with the Apache process and believe that the democratic and meritocratic nature of the foundation aligns with the project goals.
The initial milestones will be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines, as well as growing our developer and user communities.
PredictionIO has undergone nine minor releases and many patches. PredictionIO is being used in production by Salesforce.com as well as many other organizations and apps. The PredictionIO codebase is currently hosted at GitHub, which will form the basis of the Apache git repository.
We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We intend to invite additional developers to participate. We will encourage and monitor community participation so that privileges can be extended to those that contribute.
Acceptance into the Apache foundation would bolster the already strong user and developer community around PredictionIO. That community includes many contributors from various other companies, and an active mailing list composed of hundreds of users.
The core developers of our project are listed in our contributors and initial PPMC below. Though many are employed at Salesforce.com, there are also engineers from ActionML, and independent developers.
The ASF is the natural choice to host the PredictionIO project as its goal is democratizing Machine Learning by making it more easily accessible to every user/developer. PredictionIO is built on top of several top level Apache projects as outlined above.
PredictionIO has a solid and growing community. It is deployed on production environments by companies of all sizes to run various kinds of predictive engines.
In addition to the community contribution to PredictionIO framework, the community is also actively contributing new engines to the Template Gallery as well as SDKs and documentation for the project. Salesforce is committed to utilize and advance the PredictionIO code base and support its user community.
PredictionIO has existed as a healthy open source project for almost two years and is the most starred Scala project on GitHub. All of the proposed committers have contributed to ASF and Linux Foundation open source projects. Several current committers on Apache projects and Apache Members are involved in this proposal and intend to provide mentorship.
The initial list of committers includes developers from several institutions, including Salesforce, ActionML, Channel4, USC as well as unaffiliated developers.
Like most open source projects, PredictionIO receives substantial support from salaried developers. PredictionIO development is partially supported by Salesforce.com, but there are many contributors from various other companies, and an active mailing list composed of hundreds of users. We will continue our efforts to ensure stewardship of the project to be independent of salaried developers by meritocratically promoting those contributors to committers.
PredictionIO relies heavily on top level apache projects such as Apache Spark, HBase and Hadoop. However it brings a distinguished functionality, rather than just an abstraction - Machine Learning in a plug-and-play fashion.
Compared to Apache Mahout, which focuses on the development of a wide variety of algorithms, PredictionIO offers a platform to manage the whole machine learning workflow, including data collection, data preparation, modeling, deployment and management of predictive services in production environments.
PredictionIO is already a widely known open source project. This proposal is not for the purpose of generating publicity. Rather, the primary benefits to joining Apache are those outlined in the Rationale section.
PredictionIO boasts rich and live documentation, included in the code repo (docs/manual directory), is built with Middleman, and publicly hosted at https://docs.prediction.io
Currently, the PredictionIO codebase is distributed under the Apache 2.0 License and hosted on GitHub: https://github.com/PredictionIO/PredictionIO
PredictionIO has the following external dependencies:
Upon acceptance to the incubator, we would begin a thorough analysis of all transitive dependencies to verify this information and introduce license checking into the build and release process by integrating with Apache RAT.
PredictionIO does not include cryptographic code. We utilize standard JCE and JSSE APIs provided by the Java Runtime Environment.
We request that following resources be created for the project to use
predictionio-private@incubator.apache.org (with moderated subscriptions)
predictionio-dev
predictionio-user
predictionio-commits
We will migrate the existing PredictionIO mailing lists.
The PredictionIO team would like to use Git for source control, due to our current use of GitHub.
git://git.apache.org/incubator-predictionio
https://predictionio.incubator.apache.org/docs/
PredictionIO currently uses the GitHub issue tracking system associated with its repository: https://github.com/PredictionIO/PredictionIO/issues. We will migrate to Apache JIRA.
JIRA PREDICTIONIO https://issues.apache.org/jira/browse/PREDICTIONIO
Andrew Purtell <apurtell at apache dot org>
Apache Incubator PMC