Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  • A plugin.xml file that tells nutch about your plugin.
  • A build.xml file that tells ant how to build your plugin.
  • Wiki Markup
    The source code of your plugin in the directory structure recommended/src/java/org/apache/nutch/parse/recommended/\[Source_Here\].
    \\

Plugin.xml

Your plugin.xml file should look like this:

...

Wiki Markup
In order to build it, change to your plugin's directory where you saved the build.xml file (probably \[!YourCheckoutDir\]/src/plugin/recommended), and simply type

No Format

ant

Hopefully you'll get a long string of text, followed by a message telling you of a successful build.

...

Wiki Markup
We'll need to create two files for unit testing:  a page we'll do the testing against, and a class to do the testing with.  Again, let's assume your plugin directory is \[!YourCheckoutDir\]/src/plugin and that your test plugin is under that directory.  Create directory recommended/data, and under it make a new file called recommended.html

No Format

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">

<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>recommended</title>
    <meta name="generator" content="TextMate http://macromates.com/">
    <meta name="author" content="Ricardo J. Méndez">
    <meta name="recommended" content="recommended-content"/>
    <!-- Date: 2007-02-12 -->
</head>
<body>
    Recommended meta tag test.
</body>
</html>

This file contains the meta tag we're currently parsing for, with the value recommended-content. After that gratuitous bit of free publicity for my current favorite editor, let's move on to the testing class.

...

Wiki Markup
Create a new tree structure, this time for the test code, for example recommended/src/test/org/apache/nutch/parse/recommended/\[Test_Source_Here\].  There you'll create a file called [TestRecommendedParser].java.

No Format

package org.apache.nutch.parse.recommended;

import org.apache.nutch.metadata.Metadata;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.parse.ParseResult;
import org.apache.nutch.parse.ParseUtil;
import org.apache.nutch.protocol.Content;
import org.apache.hadoop.conf.Configuration;
import org.apache.nutch.util.NutchConfiguration;

import java.util.Properties;
import java.io.*;
import java.net.URL;

import junit.framework.TestCase;

/*
 * Loads test page recommended.html and verifies that the recommended
 * meta tag has recommended-content as its value.
 *
 */
public class TestRecommendedParser extends TestCase {

  private static final File testDir =
    //new File(System.getProperty("test.data"));
    new File("/work/nutch-1.2/src/plugin/recommended/data");

  public void testPages() throws Exception {
    pageTest(new File(testDir, "recommended.html"), "http://foo.com/",
             "recommended-content");

  }


  public void pageTest(File file, String url, String recommendation)
    throws Exception {

    String contentType = "text/html";
    InputStream in = new FileInputStream(file);
    ByteArrayOutputStream out = new ByteArrayOutputStream((int)file.length());
    byte[] buffer = new byte[1024];
    int i;
    while ((i = in.read(buffer)) != -1) {
      out.write(buffer, 0, i);
    }
    in.close();
    byte[] bytes = out.toByteArray();
    Configuration conf = NutchConfiguration.create();

    Content content =
      new Content(url, url, bytes, contentType, new Metadata(), conf);
    ParseResult parseResult = new ParseUtil(conf).parseByExtensionId("parse-html",content);
    Metadata metadata = parseResult.get(url).getData().getContentMeta();
    assertEquals(recommendation, metadata.get("Recommended"));
    assertTrue("somesillycontent" != metadata.get("Recommended"));
  }
}

As you can see, this code first parses the document, looks for the Recommended item in the object contentMeta - which we saved on RecommendedParser - and verifies that it's set to value recommended-content.

...

Wiki Markup
Now add some lines to the build.xml file located in \[!YourCheckoutDir\]/src/plugin/recommended directory, so that at a minimum its contents are:

No Format

<?xml version="1.0"?>

<project name="recommended" default="jar">

  <import file="../build-plugin.xml"/>

  <!-- for junit test -->
  <mkdir dir="${build.test}/data"/>
  <copy file="data/recommended.html" todir="${build.test}/data"/>

</project>

These lines will copy the test data to the proper directory for testing.

...