Defining Apache Hadoop

This is a draft, please provide feedback to the hadoop-general mailing list

Excluding quoted text, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

This document is to help clarify ways that other organizations may choose to incorporate Apache Hadoop software into their products or services, and in particular, provide guidance as to appropriate naming styles for third party software related to Apache Hadoop software.

Apache Product Naming

The source code of the Apache™ Hadoop® software is released under the Apache License, as is the source code for the many other Hadoop-related Apache products.

The trademark policy for all Apache Software Foundation (ASF) projects including Hadoop is defined by the Apache Trademark Policy. In particular, much like any other organization's trademark for software products, it is important to understand:

The key point is that the only products that may be called Apache Hadoop or Hadoop are the official releases by the Apache Hadoop project as managed by that Project Management Committee (PMC). It's also important to remember that HADOOP is a registered trademark of the Apache Software Foundation.

Derivative Works

All products which include the official Apache Hadoop artifacts, or included artifacts that are somehow on the source code used to generate these artifacts are derivative works. The Apache License enumerates the licensing conditions you must comply with for such derivative works.

Products that are derivative works of Apache Hadoop are not Apache Hadoop, and may not call themselves versions of Apache Hadoop, nor Distributions of Apache Hadoop.

Derivative works may choose to declare that they are Powered by Apache™ Hadoop®. Please see our FAQ entry on Powered By naming styles.

There have been cases in the past where this policy has been unclear, and some products were named like XYZ distribution of Hadoop. Such existing vendors of derivative works have been required to change their product names to become compliant with the current Apache Trademark Policy - most are in the process of doing so. No other supplier of derivative works of Apache Hadoop may describe their products in such a way.

Domain Names

The use of the name Hadoop in domain names is covered by the Apache Third Party Domain Name Branding Policy.

Compatibility

Some products have been released that have been described as "compatible" with Hadoop, even though parts of the Hadoop codebase have either been changed or replaced. The Apache™ Hadoop® developer team are not a standards body: they do not qualify such (derivative) works as compatible. Nor do they feel constrained by the requirements of external entities when changing the behavior of Apache Hadoop software or related Apache software.

The key point is this: the Apache Hadoop codebase defines what Apache Hadoop is, so only that codebase can not only assert that it is 100% compatible with Apache Hadoop, but back it up implicitly: Apache Hadoop is compatible with Apache Hadoop, because it is Apache Hadoop.

Other entities may claim that other products (including derivative works) are compatible with Apache Hadoop. The Apache Hadoop development team is not a standards body, and cannot confirm or deny such assertions. All that we can say is "there is no official certification that a product is compatible with Hadoop, other than when a release of the Apache source tree is declared a new release of Apache Hadoop itself".

For background on this, please consult the email thread Defining Hadoop Compatibility in the hadoop-general list, bearing in mind that the participants in the discussion are engineers, not lawyers or trademark experts, so their opinions cannot be considered normative.

Examples

Here are some example naming/branding options and their issues. "Automotive Joe" is an entirely fictional character and bears no resemblance to any individual or company.

INAPPROPRIATE: Automotive Joe's Hadoop

Bad. Hadoop is used in a product name, which infringes on Apache's Hadoop mark. Additionally, there are no "versions of Hadoop," except those that Apache releases. What Joe is selling is something "powered by Hadoop", or a derivative work, but it isn't Hadoop. Finally, the acronym "AJH" is derived from Hadoop. Even if the product title is changed, that acronym is likely to be infringing.

INAPPROPRIATE: Automotive Joe's Hadoop Distribution

There is only one distribution of Hadoop: Apache Hadoop. Everything else is a derivative work.

Yes, we know about CDH, but that acronym has been grandfathered in -its product description no longer describes itself as a distribution of Hadoop. It is a distribution "that includes Apache Hadoop".

INAPPROPRIATE: "Automotive Hadoop(TM)" by Joe's Automotive

Bad: Unless this is for a wrench or other product completely unrelated to computer software, this is a clear infringement on Apache's Hadoop registered mark.

INAPPROPRIATE: Camshaft: it's a Hadoop for the Automotive industry

It's good that Joe has created his own product name and brand, but saying "a Hadoop" is trouble. If it does contain Hadoop-related artifacts, then it breaks the trademark rules. If it doesn't contain ASF code, then it falls foul of the Generic Trademark problem: the ASF don't want their products to be generified, and will send a note reminding Joe of their rights and obligations.

APPROPRIATE: Camshaft: Joe's datamining solution for the Automotive industry

Good: it defines a new product "Camshaft", and opts to use the Apache Hadoop brand to emphasize its heritage. The marketing text sells the product.

APPROPRIATE: Automotive Joe's "Hadoop for Automotive Engineers"

Good: provided it credits Apache properly inside, this appears to be a good book title. Furthermore, because it's the "Automotive Joe" book series, and not "Automotive Joe's Hadoop" series, the series doesn't infringe anything. Please see our FAQ entry on using Apache marks in book titles.

INAPPROPRIATE: Automotive Joe: Hadooping the motor industry

Bad: This is into the word of Generic trademarks again. Hadoop is a noun, not a verb or an adjective.

INAPPROPRIATE: Crankshaft: Automotive Joe's complete rewrite of Hadoop"

Bad because it isn't Apache Hadoop. It's hard to say "rewrite", and better to discuss the features.

Better to say "Crankshaft: a Big Data engine and filesystem that resembles Apache Hadoop"

INAPPROPRIATE: Automotive Joe's Hadoop filesystem

Hadoop does support multiple filesystems, and has a reasonably stable interface for them. Example filesystems include Amazon S3, POSIX-compatible native filesystems, IBM's GPFS, HP IBRIX and others. For this reason, the only "Hadoop" filesystem that should use Hadoop in its brand name is "HDFS": Hadoop Filesystem, but any other filesystem is welcome to declare their support for Hadoop.

The Apache Hadoop project has attempted to define the behavior of HDFS and so inform other filesytem implementors what that they need to do to integrate with Hadoop, with tests for this in the Hadoop source tree and Apache Bigtop. Following the HCFS work will help you integrate with Hadoop —but it does not grant any rights to product naming, or the right to state categorically that your filesystem is totally compatible with the behavior of HDFS expected by program.

APPROPRIATE: Automotive Joe's JoeFS filesystem -with support for Apache Hadoop

This is good because it create's its own filesystem brand (which is wider than just Hadoop), shows how easy it is to switch to it, and states some clear benefits of using JoeFS rather than HDFS.

Appropriate: Crankshaft: Joe's filesystem and MapReduce Engine

Provided that it really is the Apache 0.21 distribution's JARs that run against the Gearbox filesystem, this seems good. If Automotive Joe's development team has had to make changes -rather than just add a new filesystem support JAR- then the derivative work naming rules will kick in.

INAPPROPRIATE: Automotive Joe's Crankshaft: 100% compatible with Hadoop

Bad, because "100% compatible" is a meaningless statement. Even Apache releases have regressions; cases were versions are incompatible *even when the Java interfaces don't change*. A statement about compatibility ought to be qualilified "Certified by Joe's brother Bob as 100% compatible with Apache Hadoop(TM)". In the US, the marketing team may be able to get way with the "100% compatible" claim, but in some EU countries, sticking that statement up your web site is a claim that residents can demand the vendor justifies, or take it down.

OK: Automotive Joe's Crankshaft: like Apache Hadoop only faster

This could be a defensible statement, though saying "3.4X faster" isn't. Yes, your app may have terasorted on your 20 node cluster faster than the published benchmarks for Hadoop 0.20.1, but remember that many of the big Hadoop users don't publicise the fact -let alone their benchmarks- and terasorting isn't the primary role of their cluster. Furthermore Hadoop is evolving, and your statements may soon be invalid.

Finally, criticising Hadoop is not polite. If you feel you ever have need to work with the Hadoop developers, this isn't a good way to start building a relationship.

INAPPROPRIATE: Automotive Joe's Hadoop to Go!

No Hadoop-as-a-Service offering can use Hadoop in it's product name. Notice how Amazon call theirs "Elastic MR"? That lets them create their own brand, and stops them trying to trade off the Apache Hadoop brand.

INAPPROPRIATE: Automotive Joe's Hadoop Storage Infrastructure

There are a number of filesystems that work with Apache Hadoop as well as the location-aware HDFS: anyone is free to implement the Hadoop filesystem interfaces, and so provide a binding to their new or existing filesystem. However, this doesn't mean that the vendor can use Hadoop in the product name.

Ask First: Automotive Joe's Hadoop Party Fest

Review Apache 3rd party events page and talk to the Apache Conference Planning team (concom@) before using Hadoop in conference titles.

Ask First: Automotive Joe's Certification Program for Apache Hadoop

Talk to VP, Brand Management for the ASF on trademarks@ first.

TALK TO THE ASF FIRST : Press Release Automotive Joe: donating their crankshaft technologies to Hadoop

It's not appropriate to make press releases about Apache software unless you have the permission of the ASF Press Team. They can advise you on content, and will help you co-ordinate a press release from the ASF too. Announcing some contribution or milestone in an Apache project without co-ordinating it with them isn't a good way to collaborate with the ASF.

Additionally, it's only when code is checked in to the source tree that something becomes part of Hadoop. Until then, it's a JIRA issue that runs a serious risk of being ignored unless it is compelling enough to the entire Hadoop community. To get your patches in:

PLEASE TALK TO THE ASF FIRST : Press Release Automotive Joe's crankshaft technologies now support Apache Hadoop

It is, of course, perfectly reasonable to make a press release about your own software, as long as you don't imply any endorsement from the Apache Hadoop project.

Even so, the Their details are on the ASF Press Team love to be involved in such announcements, as they can provide some advice and co-ordinate your announcement with matching announcements from the ASF itself.

Defining Hadoop (last edited 2014-09-13 10:41:13 by SteveLoughran)