Products that include Apache Hadoop or derivative works and Commercial Support
The following companies provide products that include Apache Hadoop, a derivative work thereof, commercial support, and/or tools and utilities related to Hadoop.
Please see Defining Hadoop to see the Apache Hadoop's project's copyright, naming, trademark and compatibility policies.
This listing is provided as a reference only. No endorsements are given or implied.
The sole products that can be called a release of Apache Hadoop come from apache.org. Some companies release or sell products that include the official Apache Hadoop release files, and/or their own and other useful tools. Other companies or organizations release products that include artifacts build from modified or extended versions of the Apache Hadoop source tree. Such derivative works are not supported by the Apache Team: all support issues must be directed to the suppliers themselves.
The Apache Software Foundation strongly encourages users of Hadoop -in any form- to get involved in the Apache-hosted mailing lists. Even though you may only get support through the supplier of any derivative work of Apache Hadoop, by participating in the Hadoop user and developer lists, you can become an active part of the Hadoop community. Your needs may be addressed in future versions of the code, and you will be able to get in touch with many other users of the technology.
Entries are listed alphabetically by company name.
Amazon offers a version of Apache Hadoop on their EC2 infrastructure, sold as Amazon Elastic MapReduce.
Cascading - Cascading is a feature-rich API for defining and executing complex and fault tolerant data processing workflows on a Apache Hadoop cluster. Cascading 2.0 is Apache-licensed.
Cloudera distributes a platform of open source Apache projects called Cloudera's Distribution including Apache Hadoop or CDH. In addition, Cloudera offers its enterprise customers a family of product and services that complement the open-source Apache Hadoop platform. These include comprehensive training sessions, architectural services and technical support for Hadoop clusters in development or in production. We serve a wide range of customers including retail, government, financial service, healthcare, life sciences, digital media, advertising, networking and telephony enterprises.
- Cloudspace is a web technology consulting company, since 1996. Cloudspace uses Apache Hadoop to scale client and internal projects on Amazon's EC2 and bare metal architectures.
- Datameer Analytics Solution (DAS) is a Hadoop-based solution for big data analytics that includes data source integration, storage, an analytics engine and visualization.
- DAS Log File Aggregator is a plug-in to DAS that makes it easy to import large numbers of log files stored on disparate servers.
Data Mine Lab is a London based consultancy developing solutions based on Hadoop, Mahout, HBase and Amazon Web Services. Data Mine Lab uses combination of cloud computing, MapReduce, columnar databases and open source Business Intelligence tools to develop solutions that add value to their customers' businesses and the data they collect.
A Debian package of Apache Hadoop is available. Please see the Debian Wiki on Hadoop.
- Greenplum HD offers two products based on Apache Hadoop that offer big data analytics. Available in Community and Enterprise editions, Greenplum HD software provides a complete platform, including installation, training, and global support. In addition, the Greenplum HD Module combines Hadoop and the Greenplum Database in one purpose-built Data Computing Appliance. Greenplum HD makes Hadoop faster, more dependable, and easier to use.
Twelve technology companies have partnered with Greenplum offering additional business intelligence, data transfer, and other technology capabilities on top of Greenplum’s HD products. The Greenplum HD partners include Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend and VMWare.
- The Greenplum HD Enterprise Edition is based on technology from MapR Technologies.
- Major contributors to Apache Hadoop and dedicated to working with the community to make Apache Hadoop more robust and easier to install, manage, use, integrate and extend.
Provides Hortonworks Data Platform Powered by Apache Hadoop, which is a 100% open source distribution of Apache Hadoop. Version 1 is based upon Hadoop-0.20.205 and Version 2 will be based upon Hadoop-0.23.
Provider of expert technical support, training and partner-enablement services for both end-user organizations and technology vendors.
- HStreaming offers real-time stream processing and continuous advanced analytics built into Hadoop. Scales to 1000+ cluster nodes and processes millions of events per second.
- Single Hadoop platform for both stream and batch processing jobs with common code base including Apache Pig. Compatible with all major Hadoop distributions.
Available as free community edition, enterprise edition, and cloud service.
IBM offers a derivative version of Apache Hadoop that IBM supports on IBM JVMs on a number of platforms/operating systems. Their [http://www-01.ibm.com/software/data/infosphere/biginsights/|IBM BigInsights] product is built upon Apache Hadoop.
Impetus' LADAP system is built for large enterprises and Websites to effectively derive intelligence out of raw data from discrete sources. With LADAP, an in-depth analysis can be undertaken on data from many different sources including social networks, to find out the patterns and structures within it. More info about LADAP @Impetus
The HCM (Hadoop Cluster Management) tool is a solution that automates the cluster setup and management activities, thus reducing the overall time, cost and effort required for setting up and managing Hadoop clusters. More info about HCM @Impetus
With a strong focus, established thought leadership and open source contributions in the area of Big Data analytics and consulting services, Impetus uses its Global Delivery Model to help technology businesses and enterprises evaluate and implement solutions tailored to their specific context, without being biased towards a particular solution. More info about BigData @Impetus
Distributes Karmasphere Studio for Hadoop, which allows cross-version development and management of Apache Hadoop jobs in a familiar integrated development environment.
- Another Apache project using Hadoop to build scalable machine learning algorithms like canopy clustering, k-means and many more.
MapR sells a high performance map-reduce framework based on Apache Hadoop that includes the standard eco-system components. A significant amount of re-engineering of the file system and the map-reduce components allows significantly higher performance than standard Hadoop while eliminating Hadoop's single points of failure (the NameNode and JobTracker) and allowing full read-write access to the cluster file store via NFS.
Nutch - Apache Nutch: flexible web search engine software
Pentaho – Open Source Business Intelligence
- Pentaho provides a complete, end-to-end open-source BI alternative to proprietary offerings like Oracle, SAP and IBM.
- Offers an easy-to-use, graphical ETL tool that is integrated with Apache Hadoop for managing data and coordinating Hadoop related tasks in the broader context of your ETL and Business Intelligence workflow.
- Provides Reporting and Analysis capabilities against big data in Hadoop.
Learn more at http://www.pentaho.com/hadoop.
Provides Pervasive DataRush, a parallel dataflow framework which improves performance of Apache Hadoop and MapReduce jobs by exploiting fine-grained parallelism on multicore servers. (contact)
Platform Computing provides an Enterprise Class MapReduce solution for Big Data Analytics with high scalability and fault tolerance. Platform MapReduce provides unique scheduling capabilities and its architecture is based on almost two decades of distributed computing research and development. Based on the same low-latency distributed architecture deployed in the leading financial institutions on Wall Street, the solution meets the needs of the most demanding enterprise customers. With comprehensive GUI management tools and commercial support available for HDFS, the solution also supports other distributed file systems.
- Provides consulting services around Apache Hadoop and Apache HBase, along with large-scale search using Apache Lucene, Apache Solr, and Elastic Search.
Runs the popular search-hadoop.com search service.
Talend - The Open Source Integration Company
Talend Integration Suite MPx includes support for Apache Hadoop's distributed file system (HDFS) that provides high throughput access to application data.
- Supports Apache Hive for data summarization and ad-hoc querying.
Think Big Analytics - Flexible Data Solution Services
Think Big Analytics offers expert consulting services specializing in Apache Hadoop, MapReduce and related data processing architectures.
- Offers 1 week Brainstorm Workshops and 6 week Deployment Iterations delivered collaboratively with our customers.
Tresata - Big Data Analytics Platform for the Financial Services Industry
- Big Data as a Service Platform built for Data used in the Financial Services Industry
- Financial Industry's first software platform architected from the ground up on Hadoop
Analyze structured (transactions, balances) & unstructured (online, mobile, voice, social) data
- Data storage, processing, analytics and visualization all done on Hadoop