AndyHedges - 02 Jul 2004

bin/nutch readdb

called java class

net.nutch.db.WebDBReader

command line options

bin/nutch readdb <db> [-pageurl url] | [-pagemd5 md5] | [-dumppageurl] | [-dumppagemd5] | [-toppages ] | [-linkurl url] | [-linkmd5 md5] | [-dumplinks] | [-stats]

-stats

Displays stats on what is in the database

example

$nutch readdb data/db -stats 040702 100856 loading file:/C:/csp2/nutch/conf/nutch-default.xml 040702 100857 loading file:/C:/csp2/nutch/conf/nutch-site.xml Stats for net.nutch.db.WebDBReader@2bb514


Number of pages: 1886 Number of links: 21201

-pageurl url

Displays info on a particular url in the database (N.B. the url must be in the exact form it is in the database including trailing slashes, query strings (and order of query strings), case and so on).

example

$nutch readdb data/db -pageurl http://www.example.com/
Version: 4 URL: http://www.example.com/
ID: 5ef4623d0b61f32c5677695a4bbb86d6 Next fetch: Sun Aug 01 09:40:47 BST 2004 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 42 Score: 1780542.2 NextScore: 1823969.9

-pagemd5 url

Displays info on a particular ID (md5) in the database .

example

$nutch readdb data/db -pagemd5 5ef4623d0b61f32c5677695a4bbb86d6 Version: 4 URL: http://www.example.com/
ID: 5ef4623d0b61f32c5677695a4bbb86d6 Next fetch: Sun Aug 01 09:40:47 BST 2004 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 42 Score: 1780542.2 NextScore: 1823969.9

  • No labels