Differences between revisions 2 and 3
Revision 2 as of 2006-04-25 15:12:02
Size: 941
Editor: BenjaminReed
Comment:
Revision 3 as of 2009-09-20 23:55:10
Size: 944
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
 * ["conf"], an assortment of classes for handling key-value pairs used in system configuration.
  * An HadoopMapReduce job is described with [:JobConfFile:an XML Job Configuration File].
 * ["DFS"], the Hadoop Distributed Filesystem.
 * ["io"], an assortment of IO-related classes. Includes a compressed UTF8 string implementation, code for performing external sorts, and a "poor-man's B-Tree" implementation for looking up items in large key-value sets.
 * ["ipc"], a fast and easy remote procedure call system
 * HadoopMapReduce, a distributed job allocation system built on top of DFS. It employs a [http://labs.google.com/papers/mapreduce.html MapReduce]-like programming model
 * [[conf]], an assortment of classes for handling key-value pairs used in system configuration.
  * An HadoopMapReduce job is described with [[JobConfFile|an XML Job Configuration File]].
 * [[DFS]], the Hadoop Distributed Filesystem.
 * [[io]], an assortment of IO-related classes. Includes a compressed UTF8 string implementation, code for performing external sorts, and a "poor-man's B-Tree" implementation for looking up items in large key-value sets.
 * [[ipc]], a fast and easy remote procedure call system
 * HadoopMapReduce, a distributed job allocation system built on top of DFS. It employs a [[http://labs.google.com/papers/mapreduce.html|MapReduce]]-like programming model

Overview of Hadoop

Hadoop is a collection of code libraries and programs useful for creating very large distributed systems. Much of the code was originally part of the Nutch search engine project.

Hadoop includes the following parts:

  • conf, an assortment of classes for handling key-value pairs used in system configuration.

  • DFS, the Hadoop Distributed Filesystem.

  • io, an assortment of IO-related classes. Includes a compressed UTF8 string implementation, code for performing external sorts, and a "poor-man's B-Tree" implementation for looking up items in large key-value sets.

  • ipc, a fast and easy remote procedure call system

  • HadoopMapReduce, a distributed job allocation system built on top of DFS. It employs a MapReduce-like programming model

HadoopOverview (last edited 2009-09-20 23:55:10 by localhost)