Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This list of Nutch configuration properties is intended for development. It includes deprecated properties and properties used only "internally". The list is generated from nutch-default.xml and Java sources.

Legend:
*Def. *

...

Def: defined in nutch-default.xml

...

*Used *

indent

 read or set from Java code
 

*Temp. *

indent

 temporarily used to pass settings (eg, from command-line arguments) to map or reduce jobs
 

*Depr. *

indent

 deprecated
 

*(owr.) *

...

Used: read or set from Java code

Temp: temporarily used to pass settings (eg, from command-line arguments) to map or reduce jobs

Depr.: deprecated

(owr.): some properties are defined in nutch-default.xml

...

(and

...

may

...

be

...

set

...

in

...

nutch-site.xml)

...

but

...

are

...

overwritten

...

by

...

a

...

command-line

...

argument

...

(tests

...

and

...

benchmarks

...

are

...

excluded)

...

*

(test) *

indent

 overwritten only in tests and benchmarks
 

: overwritten only in tests and benchmarks


 

 

Trunk

 

 

 

 

2.x

 


1.X (master Branch)

2.x (deprecated codebase)

Property

Def.

Used

Temp.

Depr.

Def.

Used

Temp.

Depr.

CrawlDBScanner.regex


X

X

 

 

 

 

 






CrawlDBScanner.status


X

X

 

 

 

 

 






anchorIndexingFilter.deduplicate

X

X

(test)

 


X

X

(test)

 


arc.url.version


X

 

 

 

 

 

 







content.server.port


X

 

 




X

 

 



crawl.gen.delay

X

X

 

 



X

X

 

 



crawldb.url.filters


X

X

 

 

 




X

 


crawldb.url.normalizers


X

 

 

 

 

 

 

crawldb.







crawldb.url.normalizers.scope


X

 

 

 

 

 

 







creativecommons.exclude.unlicensed


X

 

 




X

 

 



db.default.fetch.interval

X

X

 


NUTCH-1409

X

X

 


NUTCH-1409

db.fetch.interval.default

X

X

(test)

 


X

X

 

 



db.fetch.interval.max

X

X

 

 



X

X

 

 



db.fetch.retry.max

X

X

 

 



X

X

 

 



db.fetch.schedule.adaptive.dec_rate

X

X

 

 



X

X

 

 



db.fetch.schedule.adaptive.inc_rate

X

X

 



X

 

X

X

 

 



db.fetch.schedule.adaptive.max_interval

X

X

 

 



X

X

 

 



db.fetch.schedule.adaptive.min_interval

X

X

 

 



X

X

 

 



db.fetch.schedule.adaptive.sync_delta

X

X

 

 



X

X

 

 



db.fetch.schedule.adaptive.sync_delta_rate

X

X

 

 



X

X

 

 



db.fetch.schedule.class

X

X

(test)

 


X

X

 

 



db.fetch.schedule.mime.file

X

X

 

 

 

 

 

 







db.ignore.external.links

X

X

 

 



X

X

 

 



db.ignore.internal.links

X

X

 

 

X

 

 



X




db.injector.overwrite

X

X

 

 

 

 

 

 







db.injector.update

X

X

(test)

 

 

 

 

 






db.max.anchor.length

X

X

 

 



X

 

 




db.max.fetch.interval


X

 


NUTCH-1409


X

 


NUTCH-1409

db.max.inlinks

X

X

 

 



X

 

 




db.max.outlinks.per.page

X

X

 

 



X

X

 

 



db.parsemeta.to.crawldb

X

X

 

 



X

 

 




db.preserve.backup

X

X

 

 

 

 

 

 

db







db.reader.stats.sort


X

X

 



X

X

 


db.reader.topn


X

X

 

 

 

 

 






db.reader.topn.min


X

X

 

 

 

 

 






db.score.count.filtered

X

X

 

 



X

X

 

 



db.score.injected

X

X

 

 



X

X

 

 



db.score.link.external

X

X

 

 



X

X

 

 



db.score.link.internal

X

X

 

 



X

X

 

 



db.signature.class

X

X

 

 



X

X

 

 



db.signature.text_profile.min_token_len

X

X

 

 



X

X

 

 



db.signature.text_profile.quant_rate

X

X

 

 



X

X

 

 



db.update.additions.allowed

X

X

 

 



X

X

 

 



db.update.max.inlinks

X

X

 

 



X

X

 

 



db.update.purge.404

X

X

 

 

 

 

 

 







dc.language


X

 

 

 

 

 

 







domain.statistics.mode


X

X

 



X

X

 


elastic.index

 

 

 

 





X

 

 




elastic.max.bulk.docs

 

 

 

 





X

 

 




elastic.max.bulk.size

 

 

 

 





X

 

 




encodingdetector.charset.min.confidence

X

 

 




X

 

 




fail.on.job.failure

 

 

 

 






X

 

 



fetcher.exit


X

 

 

 

 

 

 







fetcher.follow.outlinks.depth

X

X

 

 

 

 

 

 







fetcher.follow.outlinks.depth.divisor

X

X

 

 

 

 

 

 







fetcher.follow.outlinks.ignore.external

X

X

 

 

 

 

 

 







fetcher.follow.outlinks.num.links

X

X

 

 

 

 

 

 







fetcher.job.resume

 

 

 

 






X

 

 



fetcher.max.crawl.delay

X

X

 

 



X

X

 

 



fetcher.max.exceptions.per.queue

X

X

 

 



X

 

 




fetcher.parse

X

X

(test)

 


X

X

 

 



fetcher.queue.depth.multiplier

X

X

 

 



X

X

 

 



fetcher.queue.mode

X

X

 

 



X

X

 

 



fetcher.queue.use.host.settings

 

 

 

 





X

X

 

 



fetcher.server.delay

X

X

 

 



X

X

 

 



fetcher.server.min.delay

X

X

 

 



X

X

 

 



fetcher.store.content

X

X

 

 



X

X

 

 



fetcher.threads.fetch

X

X

(owr.)

 


X

X

 

 



fetcher.threads.per.host

 

 

 




NUTCH-1409

 

 

 




NUTCH-1409

fetcher.threads.per.host.by.ip

 

 

 

 






X

 

 



fetcher.threads.per.queue

X

X

 

 



X

X

 

 



fetcher.threads.timeout.divisor

X

X

 

 

 

 

 

 







fetcher.throughput.threshold.check.after

X

X

(owr.)

 


X

X

 

 



fetcher.throughput.threshold.pages

X

X

 

 



X

X

 

 



fetcher.throughput.threshold.retries

X

X

 

 

 

 

 

 

fetcher.throughput







fetcher.throughput.threshold.sequence

 

 

 

 





X

X

 

 



fetcher.timelimit


X

X

 



X

X

 


fetcher.timelimit.mins

X

X

 

 



X

X

 

 



fetcher.verbose

X

X

 

 



X

 

 




file.content.ignored

X

 

 




X

 

 




file.content.limit

X

X

(test)

 


X

X

(test)

 


file.crawl.parent

X

X

 

 



X

X

 

 



free.generator.filter


X

 

 

 

 

 

 

free.







free.generator.normalize


X

 

 

 

 

 

 







ftp.content.limit

X

X

 

 



X

X

 

 



ftp.follow.talk

X

X

 

 



X

X

 

 



ftp.keep.connection

X

X

 

 



X

X

 

 



ftp.password

X

X

 

 



X

X

 

 



ftp.server.timeout

X

X

 

 



X

X

 

 



ftp.timeout

X

X

 

 



X

X

 

 



ftp.username

X

X

 

 



X

X

 

 



generate.batch.id

 

 

 

 






X

 

 



generate.count.mode

X

X

 

 



X

X

 

 



generate.curTime


X

 

 




X

 

 



generate.filter


X

 

 




X

 

 



generate.max.count

X

X

 

 



X

X

 

 



generate.max.distance

 

 

 

 





X

X

 

 



generate.max.num.segments


X

 

 

 

 

 

 







generate.max.per.host

X

X

 


NUTCH-1409

 

 

 




NUTCH-1409

generate.max.per.host.by.ip


X

 


NUTCH-1409

 

 

 




NUTCH-1409

generate.min.interval

X

X

 

 

 

 

 

 







generate.min.score

X

X

 

 




X

 

 



generate.normalise


X

 

 




X

 

 



generate.partition.seed

 

 

 

 






X

 

 



generate.restrict.status


X

 

 

 

 

 

 







generate.topN


X

 

 




X

 

 



generate.update.crawldb

X

X

 

 



X

X

 

 



hostdb.concurrency.level

 

 

 

 






X

 

 



hostdb.lru.size

 

 

 

 






X

 

 



htmlparsefilter.order

X

X

 

 



X

X

 

 



http.accept

X

X

 

 



X

X

 

 



http.accept.language

X

X

 

 



X

X

 

 



http.agent.description

X

X

 

 



X

X

 

 



http.agent.email

X

X

 

 



X

X

 

 



http.agent.host

X

X

 

 



X

X

 

 



http.agent.name

X

X

(test)

 


X

X

(test)

 


http.agent.url

X

X

 

 



X

X

 

 



http.agent.version

X

X

 

 



X

X

 

 



http.auth.file

X

X

 

 



X

X

 

 



http.auth.verbose


X

 

 




X

 

 



http.content.limit

X

X

 

 



X

X

 

 



http.max.delays

X

 

 




X

 

 




http.proxy.host

X

X

(test)

 


X

X

(test)

 


http.proxy.password

X

X

 

 



X

X

 

 



http.proxy.port

X

X

(test)

 


X

X

(test)

 


http.proxy.realm

X

X

 

 



X

X

 

 



http.proxy.username

X

X

 

 



X

X

 

 



http.redirect.max

X

X

 

 

 

 

 

 







http.robots.403.allow

X

X

 

 



X

X

 

 



http.robots.agents

X

X

(test)

 


X

X

(test)

 


http.timeout

X

X

 

 



X

X

 

 



http.useHttp11

X

X

 

 



X

X

 

 



http.verbose

X

X

 

 



X

X

 

 



index.content.md

X

X

 

 

 

 

 

 







index.db.md

X

X

 

 

 

 

 

 







index.parse.md

X

X

(test)

 

 

 

 

 






index.replace.regexp

X

X

 

index.static

X

X

 

 

 

 

 

 







index.static

 

 

 

 

 

X

X







indexer.add.domain

X

X

 

 

 

 

 

 







indexer.delete


X

 

 

 

 

 

 







indexer.delete.robots.noindex


X

 

 

 

 

 

 







indexer.max.content.length

X

X

 

 

 

 

 

 

indexer.max







indexer.max.title.length

X

X

 

 



X

X

(test)

 


indexer.score.power

X

X

 

 



X

X

 

 



indexer.skip.notmodified

X

X

 

 

 

 

 

 







indexer.url.filters


X

X

 

 

 




X

 


indexer.url.normalizers


X

 

 

 

 

 

 







indexer.writer.classes


X

X

 



X

X

 


indexingfilter.order

X

X

 

 



X

X

 

 

injector



injector.current.time


X

X

 



X

X

 


lang.analyze.max.length

X

X

 

 



X

 

 




lang.extraction.policy

X

X

 

 



X

X

 

 



lang.identification.only.certain

X

X

 

 



X

X

 

 



lang.ngram.max.length

 

 

 

 





X

 

 




lang.ngram.min.length

 

 

 

 





X

 

 




link.analyze.damping.factor

X

X

 

 

 

 

 

 







link.analyze.initial.score

X

X

 

 

 

 

 

 







link.analyze.iteration


X

X

 

 

 

 

 






link.analyze.normalize.score


X

 

 




X

 

 



link.analyze.num.iterations

X

X

 

 

 

 

 

 







link.analyze.rank.one


X

X

 

 

 

 

 






link.delete.gone

X

X

 

 

 

 

 

 







link.ignore.internal.domain

X

X

 

 

 

 

 

 

link.







link.ignore.internal.host

X

X

 

 

 

 

 

 







link.ignore.limit.domain

X

X

 

 

 

 

 

 







link.ignore.limit.page

X

X

 

 

 

 

 

 







link.loops.depth

X

X

 

 

 

 

 

 

linkdb.url.filters

X

X

 

 

 

X

 







link.score.updater.clear.score

X

X

 

 

 

 

 

 







linkdb.url.filters


X

X




X


linkdb.url.normalizer


X

 

 

 

 

 

 







linkdb.url.normalizer.scope


X

 

 

 

 

 

 







metatag.description


X

 

 

 

 

 

 







metatag.keywords


X

 







metatags.names

 

X

 

 

 

 

metatags.names

X

X

X

(test)

 

 

 

 

 






mime.type.magic

X

X

 

 



X

X

 

 



mime.types.file

X

X

 

 



X

X

 

 



moreIndexingFilter.indexMimeTypeParts

X

X

(test)

 


X

X

(test)

 


moreIndexingFilter.mapMimeTypes

X

X

 

 

 

 

 

 







nutch.conf.uuid


X

 

 




X

 

 

parse.



parse.filter.urls

X

X

(owr.)

 

 

 

 

 






parse.job.force

 

 

 

 






X

 

 



parse.job.resume

 

 

 

 






X

 

 



parse.normalize.urls

X

X

(owr.)

 

 

 

 

 






parse.plugin.file

X

X

(test)

 


X

X

(test)

 


parser.caching.forbidden.policy

X

X

 

 



X

X

 

 



parser.character.encoding.default

X

X

 

 



X

X

 

 



parser.fix.embeddedparams

X

 

 

 

 

 

 

parser.







parser.html.form.use_action

X

X

(test)

 


X

X

(test)

 


parser.html.impl

X

X

 

 



X

X

 

 



parser.html.outlinks.ignore_tags

X

X

 

 



X

X

 

 



parser.skip.truncated

X

X

 

 



X

X

 

 



parser.timeout

X

X

 

 



X

X

 

 



partition.url.mode

X

X

 

 



X

X

 

 



partition.url.seed


X

X

 



X

 

 



plugin.auto-activation

X

X

 

  plugin



X

X

 

 



plugin.excludes

X

X

 

 



X

X

 

 



plugin.folders

X

X

 

 



X

X

 

 



plugin.includes

X

X

(test)

 


X

X

(test)

 


schema.prefix

 

 

 

 

 

 







X

 


scoring.filter.order

X

X

 

 



X

X

 

 



segment.dump.dir


X

 

 

 

 

 

 







segment.merger.filter


X

X

 

 

 

 

 






segment.merger.normalizer


X

X

 

 

 

 

 






segment.merger.segmentName


X

X

 

 

 

 

 






segment.merger.slice


X

X

 

 

 

 

 






segment.proxy.port


X

 

 




X

 

 



segment.reader.co


X

X

 

 

 

 

 






segment.reader.fe


X

X

 

 

 

 

 






segment.reader.ge


X

X

 

 

 

 

 






segment.reader.pa


X

X

 

 

 

 

 






segment.reader.pd


X

X

 

 

 

 

 






segment.reader.pt

X

X

 

 

 

 

 

sftp.password

 

 

 

 

X

 

 

sftp.port

 

 

 

 

X

 

 

sftp.server

 

 

 

 


X

 

X

 






sftp.

user

password

 






X

 



sftp.port

 






X

 



sftp.server






X

 



sftp.user

 






X



solr.auth

X

X

 

 

 

 

 

 







solr.auth.password


X

 

 

 

 

 

 







solr.auth.username


X

 

 

 

 

 

 







solr.commit.index

X

X

 

 

X



X

 

 

X



solr.commit.size

X

X

 

 



X

X

 

 



solr.mapping.file

X

X

 

 



X

X

 

 



solr.params


X

 

 

 

 

 

 







solr.server.url


X

 

 




X

 

 



storage.crawl.id

 

 

 

 





X

X

 

 



storage.data.store.class

 

 

 

 





X

X

(test)

 


storage.schema.host

 

 

 

 





X

X

 

 



storage.schema.webpage

 

 

 

 





X

X

 

 



subcollection.default.field

X

 

 

 

 

 

 








subcollection.default.fieldname


X

 

 

 

 

 

 







subcollections.config


X

 

 




X

 

 



subcollections.xml


X

 

 




X

 

 



tika.config.

file

file


X

X

 

 

 

 

 

 







urlfilter.automaton.file

X

X

 

 



X

X

 

 



urlfilter.automaton.rules


X

 

 




X

 

 



urlfilter.domain.file

X

X

 

 



X

X

 

 



urlfilter.domain.rules


X

 

 




X

 

 



urlfilter.domainblacklist.file


X

 

 

 

 

 

 







urlfilter.domainblacklist.rules


X

 

 

 

 

 

 







urlfilter.order

X

X

 

 



X

X

 

 



urlfilter.prefix.file

X

X

 

 



X

X

 

 



urlfilter.prefix.rules


X

 

 




X

 

 



urlfilter.regex.file

X

X

 

 



X

X

 

 



urlfilter.regex.rules


X

 

 




X

 

 



urlfilter.suffix.file

X

X

(test)

 


X

X

(test)

 


urlfilter.suffix.rules


X

 

 




X

 

 



urlmeta.tags

X

X

 

 

 

 

 

 







urlnormalizer.hosts.

file

file


X

X

 

 

 

 

 

 







urlnormalizer.hosts.rules


X

 

 

 

 

 

 







urlnormalizer.loop.count

X

X

 

 



X

X

 

 



urlnormalizer.order

X

X

(test)

 


X

X

(test)

 


urlnormalizer.regex.file

X

X

 

 



X

X

 

 



urlnormalizer.regex.rules


X

 

 




X

 

 



webgraph.url.filters

X


X

 

 

 

X

 

webgraph.url.normalizers




X

 

 

 

 

 


webgraph.url.normalizers


X

 







webtable.dump.content

 

 

 

 






X

 

 



webtable.dump.headers

 

 

 

 






X

 

 



webtable.dump.links

 

 

 

 






X

 

 



webtable.dump.text

 

 

 

 






X

 

 



webtable.url.regex

 

 

 

 






X

 

 



back to FrontPage