Differences between revisions 301 and 302
Revision 301 as of 2013-12-14 05:30:08
Size: 8619
Editor: ChenHe
Comment:
Revision 302 as of 2014-01-13 09:49:44
Size: 8932
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
<<TableOfContents(4)>>
Line 26: Line 27:
 * [[http://gora.apache.org|Apache Gora]], open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.

Apache Hadoop

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

#format html

General Information

  • HBase, a Bigtable-like structured storage system for Hadoop HDFS

  • Apache Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.

  • Hive a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in Hadoop

  • ZooKeeper is a high-performance coordination service for distributed applications.

  • Hama, a Google's Pregel-like distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.

  • Mahout, scalable Machine Learning algorithms using Hadoop

  • Hadoop Compatible FileSystems (HCFS)

  • Apache Gora, open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.

User Documentation

Setting up a Hadoop Cluster

Tutorials

MapReduce

The MapReduce algorithm is the foundational algorithm of Hadoop, and is critical to understand.

Contributed parts of the Hadoop codebase

  • These are independent modules that are in the Hadoop codebase but not tightly integrated with the main project -yet.

Developer Documentation


CategoryHomepage

FrontPage (last edited 2014-10-24 12:27:46 by DevopamMittra)