This HADOOP2 space was migrated from old Hadoop wiki. Please check https://cwiki.apache.org/confluence/display/HADOOP for the current information.

Apache Hadoop

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

General Information

Related-Projects

User Documentation

Setting up a Hadoop Cluster

Tutorials

MapReduce

The MapReduce algorithm is the foundational algorithm of Hadoop, and is critical to understand.

Contributed parts of the Hadoop codebase

These are independent modules that are in the Hadoop codebase but not tightly integrated with the main project -yet.

Developer Documentation

Related Resources


CategoryHomepage