Adding a New Language to Nutch....

If you want to have Nutch in your language - hopefully the below helps. I have been Googling around and digging in some source code...

  • Unzip Nutch 1.0 to any folder
  • Translate the .properties files that you find in src/web/locale/org/nutch/jsp :
  • For each file make sure that you have your own version ending in _<langcode>.properties e.g. _fa.properties . Btw OmegaT is an excellent Translation memory program to help with standardizing terms etc.
  • Make a folder src/web/include/<langcode> with a file header.xml - again this needs translated.
  • Make a folder src/web/pages/<langcode> and copy the .xml files from the English folder and then translate them. In search.xml look for the line:
<input type="hidden" name="lang" value="fa"/>

Change the value of lang to match the language you are adding (e.g. fa)

  • Add your language to src/web/include/footer.html
  • Look in build.xml in the base directory and find the lines that look like the following. Add an entry for your language:
    <antcall target="generate-locale">
	<param name="doc.locale"  value="fa"/>
    </antcall>

Where fa is the language to add.

  • In the Nutch base directory run ant
ant generate-docs
  • It seems like some changes are needed to search.jsp to make it behave as users would expect. The original appears to expect the language of the browser to take precedence over the language selected... After out.flush() at about line 160 add the following in src/web/jsp/search.jsp:
  //see what locale we should use
  Locale ourLocale = null;
  if(!queryLang.equals("")) {
	ourLocale = new Locale(queryLang);
	language = new String(queryLang);
  }else {
	ourLocale = request.getLocale();
  }

Then change the line:

<i18n:bundle baseName="org.nutch.jsp.search"/>

to:

<i18n:bundle baseName="org.nutch.jsp.search" locale="<%=ourLocale%>"/>
  • Now we are ready to build it:
ant war
  • Copy the .war file to your servlet container's webapp directory. If everything went well you will see your language code in the bottom, then you can select it, and the search interface will come back with the localisation you just put in.
  • No labels