|Deletions are marked like this.||Additions are marked like this.|
|Line 7:||Line 7:|
|* [[FAQ]]||* [[FAQ]] [[http://www.profi-fachuebersetzung.de/language-translation.html|Translation agency]] / [[http://www.profischnell.com|Übersetzung Polnisch Deutsch]]|
|Line 14:||Line 14:|
|* [[Support|People and companies for hire]].||* [[Support|People and companies for hire]].|
Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.
Official Apache Hadoop Website: download, bug-tracking, mailing-lists, etc.
Overview of Apache Hadoop
Distributions and Commercial Support for Hadoop (RPMs, Debs, AMIs, etc)
PoweredBy, a list of sites and applications powered by Apache Hadoop
GettingStartedWithHadoop (lots of details and explanation)
QuickStart (for those who just want it to work now)
Command Line Options for hadoop shell script.
Troubleshooting What do when things go wrong
Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) (tutorial on installing, configuring and running Hadoop on a single machine)
HowToConfigure Hadoop software
Performance: getting extra throughput
Hadoop Windows/Eclipse Tutorial: Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
HBase, a Bigtable-like structured storage system for Hadoop HDFS
Apache Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
Hive a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in Hadoop
ZooKeeper is a high-performance coordination service for distributed applications.
HadoopStreaming (Useful for using Hadoop with other programming languages)
DistributedLucene, a Proposal for a distributed Lucene index in Hadoop
MountableHDFS, Fuse-DFS & other Tools to mount HDFS as a standard filesystem on Linux (and some other Unix OSs)
HDFS-APIs in perl, python, php, etc
Chukwa a data collection, storage, and analysis framework
HDFS-RAID Erasure Coding in HDFS
Roadmap, listing release plans.
Jira usage guidelines
Nutch Hadoop Tutorial (Useful for understanding Hadoop in an application context)
IBM MapReduce Tools for Eclipse - Out of date. Use the Eclipse Plugin in the MapReduce/Contrib instead
- Hadoop IRC channel is #hadoop at irc.freenode.net.
Using Spring and Hadoop (Discussion of possibilities to use Hadoop and Dependency Injection with Spring)
Hama, a Distributed Matrix Computational Package based on Hadoop Map/Reduce
Heart, a Planet-Scale RDF Data Store and a Distributed Processing Engine
Mahout, scalable Machine Learning algorithms using Hadoop
Live Hadoop A three-node, distributed Hadoop cluster running on an OpenSolaris live CD
Grid Engine integration Oracle Grid Engine product documentation on the built-in Hadoop integration
SGE Integration A guide on tight-integration of Hadoop on Sun Grid Engine
Hadoop Tutorial Series Learning progressively important core Hadoop concepts with hands-on experiments using the Cloudera Virtual Machine
Hadoop distributed file system New Hadoop Connector Enables Ultra-Fast Transfer of Data between Hadoop and Aster Data's MPP Data Warehouse.