Differences between revisions 1 and 2
Revision 1 as of 2007-03-13 22:22:57
Size: 1842
Editor: pool-70-16-0-4
Comment:
Revision 2 as of 2007-03-20 02:33:05
Size: 2970
Editor: pool-70-16-0-4
Comment:
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
=== How do I open Nutch's data files ===
You will need to interact with Nutch's files using Hadoop's MapFile and SequenceFile classes. This simple code sample shows opening a file and reading the values.

{{{
MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);

        Class keyC = reader.getKeyClass();
        Class valueC = reader.getValueClass();

        while (true) {
            WritableComparable key = null;
            Writable value = null;
            try {
                key = (WritableComparable)keyC.newInstance();
                value = (Writable)valueC.newInstance();
            } catch (Exception ex) {
                ex.printStackTrace();
                System.exit(-1);
            }

            try {
                if (!reader.next(key, value)) {
                    break;
                }

                out.println(key);
                out.println(value);
            } catch (Exception e) {
                e.printStackTrace();
                out.println("Exception occured. " + e);
                break;
            }
        }

}}}

This page is a collection of information that is useful for new developers. Some of this is going to need to be moved to the Hadoop Wiki but I am putting it here first as I assemble this. Please feel free to add, comment and make corrections.

Steve

To new developers: If you want to begin to develop on Nutch do not forget to get started looking at the Hadoop source code. Hadoop is the platform that Nutch is implemented on. In order to understand anything about how Nutch works you need to also understand Hadoop.

What are the Hadoop primitives and how do I use them? Why are they there (what functionality do the add over regular primitives)?

These primitives implement the Hadoop Writable interface (or WritableComparable). What this does is gives Hadoop control over the serialization of these objects. If you look at the higher level Hadoop File System objects like ArrayFile you will see that they implement the same interfaces for serialization. Using these primitive types allows the serialization to be done in the same way as higher order data structures such as MapFile.

How does the Hadoop implementation of MapReduce work?

  1. First you need a JobConf. This class contains all the relevant information for the job. Information that you need to ensure that you include in the JobConf include:

  2. Then you need to submit your job to Hadoop to be run. This is done by calling JobClient.runJob. JobClient. runJob submits the job for starting and handles receiving status updates back from the job. It starts by creating an instance of the JobClient. It continues to push the job toward execution by calling JobClient.submitJob

  3. JobClient.submitJob handles splitting the input files and generating the MapReduce task.

How do I open Nutch's data files

You will need to interact with Nutch's files using Hadoop's MapFile and SequenceFile classes. This simple code sample shows opening a file and reading the values.

MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);

        Class keyC = reader.getKeyClass();
        Class valueC = reader.getValueClass();

        while (true) {
            WritableComparable key = null;
            Writable value = null;
            try {
                key = (WritableComparable)keyC.newInstance();
                value = (Writable)valueC.newInstance();
            } catch (Exception ex) {
                ex.printStackTrace();
                System.exit(-1);
            }

            try {   
                if (!reader.next(key, value)) {
                    break;
                }

                out.println(key);
                out.println(value);
            } catch (Exception e) {
                e.printStackTrace();
                out.println("Exception occured. " + e);
                break;
            }
        }

Tutorials

Getting_Started (last edited 2015-02-24 01:35:10 by LewisJohnMcgibbney)