NLPCraft Apache Incubator Proposal

Abstract

NLPCraft is a Java-based library for adding natural language interface to any applications.

Proposal

NLPCraft (https://nlpcraft.org) provides an API and a runtime that allows adding natural language interface to any applications with REST APIs. NLPCraft is simple to use: define a semantic model and intents to interpret user input; securely deploy this model and use REST API to explore the data using natural language from any applications. NLPCraft can work with any underlying data sources, devices, or services - public or private, and it has no hardware lock-in. Models and intents for NLPCraft can be developed using any JVM-based languages like Java, Scala, Kotlin, or Groovy. NLPCraft exposes REST APIs for integration with client applications.

Rationale

The initial impulse behind NLPCraft was a desire to have a modern NLP toolkit that is squarely targeted for enterprise developers and less so for academic interests. Most, if not all, current NLP projects in both Python and Java ecosystems are geared heavily towards academic interests and concentrate on low-level functionality such as tokenization, lemma/stematization, named entity recognition (NER), part-of-speech tagging, co-referencing, etc. These are all important underlying mechanisms for modern NLP systems but largely represent a “bucket of bolts” rather than a cohesive toolkit that someone could efficiently use to build production-ready, enterprise-grade NLP applications.

NLPCraft is built on top of this low-level tooling and concentrates on a single mission: providing an easy-to-use API and a runtime for a domain-specific natural language understanding and translating it into actions. NLPCraft is closer in its functionality to Google DialogFlow, Amazon Alexa or Microsoft LUIS but takes a different approach. Instead of relying on ML/DL statistical approaches with massive pre-existing corpora and training phases, it uses modern semantic modeling and deterministic intent matching which requires none of the above. It introduced comprehensive intent and NER DSLs and familiar Java annotation-based integration.

NLPCraft also addresses the enterprise deployment concerns by cleanly separating REST endpoints from secure data model hosting. That allows applications to use NLPCraft to provide a secure natural language interface to enterprise private data sources.

Another unique feature of NLPCraft is its use of the model-as-a-code approach. Unlike Google DialogFlow, Amazon Alexa or Microsoft LUIS it doesn’t require the use of proprietary online tools, custom languages and other tools to spread the logic across multiple systems and mediums. Everything in NLPCraft is a code including the models and intents which significantly reduces the complexity of the development.

Initial Goals

We have three initial goals that we plan on completing while in incubation: (1) move the current GitHub codebase to ASF infrastructure and integrate with ASF development process and practices, (2) grow and diversify the NLPCraft community; we are well aware that the current team is small but we aim on growing and expanding it during the incubation, and (3) deliver the initial Apache release.

Current Status

NLPCraft is under active development for the last 5 years and can be found at https://github.com/nlpcrafters

Meritocracy

We understand the role and importance of meritocracy in ASF. We practiced the same principle while working on NLPCraft for the last 5 years. The principle of strict meritocracy was introduced by one of the original project members (Nikita Ivanov, Apache Ignite PMC). We also understand that meritocracy isn’t a one-time decision but a process and a continuous practice within the community - something that we are aiming at establishing during the incubation.

Community

The current community consists of the current project committers and the number of “friends and family members”. One of the driving goals of joining ASF is to build and grow a healthy community of users and developers around this project to broaden its appeal and technical capabilities - well beyond those set by the initial project members.

Core Developers

The core developers of the NLPCraft are Aaron Radzinski, Sergey Makov, Nikita Ivanov (Apache Ignite PMC), Dmitriy Monakhov and Sergey Kamov all which contributed either ideas, code, or documentation. All core developers will be initial committers in the current proposal. Over the last 5 years, other developers worked on the project in various capacities and we intend to ask them to further contribute to the project.

Alignment

There are several alignment points between NLPCraft and ASF:

  • NLPCraft is built around Apache Ignite and Apache OpenNLP and we are intimately familiar with development/release/discussion principles behind these projects and Apache Way overall.

  • NLPCraft is already using Apache 2.0 license.

  • We have been guided by existing Apache member (Nikita Ivanov) on most of our internal processes like, for example, the use of compatible licenses and consensus-based project management.

Known Risks

Project Name

Name “NLPCraft” is free from branding or any other conflicts to the best of our knowledge.

Orphaned Products

We don’t anticipate being at risk of being an orphaned product. Initial committers have worked on the project for years in their spare time and have proven their commitment to make this open-source project a success. All initial committers are committed to continue working on this project.

Inexperience With Open Source

NLPCraft has been licensed under an Apache 2.0 license since its inception. As mentioned before, we have enjoyed the guidance from the mentor of the Apache Ignite project (Nikita Ivanov) from the beginning.

Length Of Incubation

We are aware that the key challenge for us during the incubation will be community building. While we don’t have any artificial timeline for becoming a top-level project - community growth is an absolute priority for us during the incubation period.

Homogenous Developers

The proposal initial committers span 3 countries and 12 hours of time zone differences. They are all experienced in working in a geographically distributed environment.

Reliance On Salaried Developers

From its inception, the NLPCraft has been a product of spare time, long nights and genuine interest in building something new and unique. We don’t expect the reliance on salaried developers would be a problem for this project.

Relationship With Other ASF Projects

NLPCraft architecture is based heavily on Apache Ignite and Apache OpenNLP (and uses many other ASF projects in its implementation). We expect a strong and tight collaboration between NLPCraft and Ignite/OpenNLP projects.

Excessive Fascination With ASF Brand

We view ASF less of a brand per se but a platform where we can find and attract like-minded open source developers, and together build and grow a healthy community around this project, further develop it with new ideas and widen its use.

Documentation

Documentation is available at https://nlpcraft.org/docs.html

Initial Source

The initial source for this project is a collection of GitHub repos https://github.com/nlpcrafters

Source and IP Submission Plan

All initial committers will sign ICLA with ASF. The current NLPCraft project uses Apache 2.0 license already so we don’t expect any IP-related issues.

External Dependencies

The external dependencies all have Apache 2.0 compatible licenses.

Cryptography

NLPCraft provides an implementation of a Blowfish password hasher for encryption functionality. This implementation is based on “A Future-Adaptable Password Scheme" by Niels Provos and David Mazieres.

Required Resources

Mailing lists

We plan to use the following mailing lists:

Git & JIRA

We would like to continue using Git for source code management and we’d like to enable GitHub mirroring. We will need to following source repositories:

  • nlpcraft (main project)

  • nlpcraft-java-client (java client)

  • nlpcraft-ui (test/management UI)

JIRA ID: NLPCraft

Initial Committers & Affiliation

List of initial committers in alphabetical order:

  • Aaron Radzinski (aradzinski at datalingvo dot com)

  • Dave Fisher (wave at  apache dot org)
  • Dmitriy Monakhov (dmonakhov at datalingvo dot com)
  • Evans Ye (evansye at apache dot org)
  • Furkan Kamaci (furkankamaci at gmail dot com)
  • Nikita Ivanov (mk61hacker at gmail dot com)

  • Konstantin Boudnik (cos at apache dot org)

  • Roman Shaposhnik (rvs at apache dot org)

  • Paul King (paulk at asert dot com dot au)
  • Sergey Makov (hekate dot dev at gmail dot com)

  • Sergey Kamov (skhdlemail at gmail dot com)

Sponsors

Apache Champion

  • Konstantin Boudnik (cos at apache dot org)

Nominated Mentors

  • Konstantin Boudnik (cos at apache dot org)

  • Roman Shaposhnik (rvs at apache dot org)

  • Evans Ye (evansye at apache dot org)

  • Dave Fisher (wave at  apache dot org)
  • Paul King (paulk at asert dot com dot au)
  • Furkan Kamaci (furkankamaci at gmail dot com)

Sponsoring Entity

  • Apache Incubator
  • No labels