...
- A plugin.xml file that tells nutch about your plugin.
- A build.xml file that tells ant how to build your plugin.
Wiki Markup The source code of your plugin in the directory structure recommended/src/java/org/apache/nutch/parse/recommended/\[Source_Here\]. \\
Plugin.xml
Your plugin.xml file should look like this:
...
Wiki Markup |
---|
In order to build it, change to your plugin's directory where you saved the build.xml file (probably \[!YourCheckoutDir\]/src/plugin/recommended), and simply type |
No Format |
---|
ant
|
Hopefully you'll get a long string of text, followed by a message telling you of a successful build.
...
Wiki Markup |
---|
We'll need to create two files for unit testing: a page we'll do the testing against, and a class to do the testing with. Again, let's assume your plugin directory is \[!YourCheckoutDir\]/src/plugin and that your test plugin is under that directory. Create directory recommended/data, and under it make a new file called recommended.html |
No Format |
---|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>recommended</title>
<meta name="generator" content="TextMate http://macromates.com/">
<meta name="author" content="Ricardo J. Méndez">
<meta name="recommended" content="recommended-content"/>
<!-- Date: 2007-02-12 -->
</head>
<body>
Recommended meta tag test.
</body>
</html>
|
This file contains the meta tag we're currently parsing for, with the value recommended-content. After that gratuitous bit of free publicity for my current favorite editor, let's move on to the testing class.
...
Wiki Markup |
---|
Create a new tree structure, this time for the test code, for example recommended/src/test/org/apache/nutch/parse/recommended/\[Test_Source_Here\]. There you'll create a file called [TestRecommendedParser].java. |
No Format |
---|
package org.apache.nutch.parse.recommended;
import org.apache.nutch.metadata.Metadata;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.parse.ParseResult;
import org.apache.nutch.parse.ParseUtil;
import org.apache.nutch.protocol.Content;
import org.apache.hadoop.conf.Configuration;
import org.apache.nutch.util.NutchConfiguration;
import java.util.Properties;
import java.io.*;
import java.net.URL;
import junit.framework.TestCase;
/*
* Loads test page recommended.html and verifies that the recommended
* meta tag has recommended-content as its value.
*
*/
public class TestRecommendedParser extends TestCase {
private static final File testDir =
//new File(System.getProperty("test.data"));
new File("/work/nutch-1.2/src/plugin/recommended/data");
public void testPages() throws Exception {
pageTest(new File(testDir, "recommended.html"), "http://foo.com/",
"recommended-content");
}
public void pageTest(File file, String url, String recommendation)
throws Exception {
String contentType = "text/html";
InputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream((int)file.length());
byte[] buffer = new byte[1024];
int i;
while ((i = in.read(buffer)) != -1) {
out.write(buffer, 0, i);
}
in.close();
byte[] bytes = out.toByteArray();
Configuration conf = NutchConfiguration.create();
Content content =
new Content(url, url, bytes, contentType, new Metadata(), conf);
ParseResult parseResult = new ParseUtil(conf).parseByExtensionId("parse-html",content);
Metadata metadata = parseResult.get(url).getData().getContentMeta();
assertEquals(recommendation, metadata.get("Recommended"));
assertTrue("somesillycontent" != metadata.get("Recommended"));
}
}
|
As you can see, this code first parses the document, looks for the Recommended item in the object contentMeta - which we saved on RecommendedParser - and verifies that it's set to value recommended-content.
...
Wiki Markup |
---|
Now add some lines to the build.xml file located in \[!YourCheckoutDir\]/src/plugin/recommended directory, so that at a minimum its contents are: |
No Format |
---|
<?xml version="1.0"?>
<project name="recommended" default="jar">
<import file="../build-plugin.xml"/>
<!-- for junit test -->
<mkdir dir="${build.test}/data"/>
<copy file="data/recommended.html" todir="${build.test}/data"/>
</project>
|
These lines will copy the test data to the proper directory for testing.
...