Differences between revisions 305 and 306
Revision 305 as of 2014-01-24 05:44:31
Size: 8993
Editor: ArpitAgarwal
Comment:
Revision 306 as of 2014-10-24 12:27:46
Size: 9178
Comment: added a tutorial link for hadoop installation on Apple/Macintosh OSX (Lion)
Deletions are marked like this. Additions are marked like this.
Line 62: Line 62:
 * [[http://www.slideshare.net/devopam/hadoop-on-osx/| Running Hadoop on Mac OSX (Multi-Node Cluster)]] Tutorial on how to setup a multi-node Hadoop cluster on Macintosh OSX (Lion).

Apache Hadoop

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

General Information

  • HBase, a Bigtable-like structured storage system for Hadoop HDFS

  • Apache Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.

  • Hive a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in Hadoop

  • ZooKeeper is a high-performance coordination service for distributed applications.

  • Hama, a Google's Pregel-like distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.

  • Mahout, scalable Machine Learning algorithms using Hadoop

  • Hadoop Compatible FileSystems (HCFS)

  • Apache Gora, open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.

User Documentation

Setting up a Hadoop Cluster

Tutorials

MapReduce

The MapReduce algorithm is the foundational algorithm of Hadoop, and is critical to understand.

Contributed parts of the Hadoop codebase

  • These are independent modules that are in the Hadoop codebase but not tightly integrated with the main project -yet.

Developer Documentation


CategoryHomepage

FrontPage (last edited 2014-10-24 12:27:46 by DevopamMittra)