Nutch Scoring

This page is dedicated to Scoring implementations within Apache Nutch. The language used within this document is intended to reflect that used within the Nutch community and vocabulary may vary from time to time, words may be used interchangeably to refer to the same thing, etc. If you feel there is a discrepancy with this document then please let us know by contacting us.

Introduction

Amongst other things Apache Nutch is pluggable and modular with extensible interfaces. Parse, Index and ScoringFilter can all use custom implementations. This document explains the basics of scoring in Apache Nutch, including information on:

What Scoring is... what it means in Nutch

A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments.

Where Scoring takes place within the Nutch Crawl cycle

Scoring occurs in numerous places throughout the Nutch codebase and consequently within the crawl cycle. This section describes the point of occurence and functional purpose Scoring serves at each step. You will see that the list of elements has been structured to represent the logical and typical progression of a Nutch crawl cycle.

Scoring extension points

Examples

Development Issues

FixingOpicScoring - In planning.