Attachment 'nutch-site.xml'

Download

   1 <?xml version="1.0"?>
   2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
   3 
   4 <!-- Put site-specific property overrides in this file. -->
   5 
   6 <configuration>
   7 
   8 <property>
   9   <name>http.agent.name</name>
  10   <value>Nutch test</value>
  11   <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  12   please set this to a single word uniquely related to your organization.
  13 
  14   NOTE: You should also check other related properties:
  15 
  16 	http.robots.agents
  17 	http.agent.description
  18 	http.agent.url
  19 	http.agent.email
  20 	http.agent.version
  21 
  22   and set their values appropriately.
  23 
  24   </description>
  25 </property>
  26 
  27 <property>
  28   <name>http.agent.description</name>
  29   <value></value>
  30   <description>Further description of our bot- this text is used in
  31   the User-Agent header.  It appears in parenthesis after the agent name.
  32   </description>
  33 </property>
  34 
  35 <property>
  36   <name>http.agent.url</name>
  37   <value></value>
  38   <description>A URL to advertise in the User-Agent header.  This will 
  39    appear in parenthesis after the agent name. Custom dictates that this
  40    should be a URL of a page explaining the purpose and behavior of this
  41    crawler.
  42   </description>
  43 </property>
  44 
  45 <property>
  46   <name>http.agent.email</name>
  47   <value></value>
  48   <description>An email address to advertise in the HTTP 'From' request
  49    header and User-Agent header. A good practice is to mangle this
  50    address (e.g. 'info at example dot com') to avoid spamming.
  51   </description>
  52 </property>
  53 
  54 
  55 <property>
  56   <name>plugin.includes</name>
  57   <value>protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|language-identifier|analysis-(fr|de)</value>
  58   <description>Regular expression naming plugin directory names to
  59   include.  Any plugin not matching this expression is excluded.
  60   In any case you need at least include the nutch-extensionpoints plugin. By
  61   default Nutch includes crawling just HTML and plain text via HTTP,
  62   and basic indexing and search plugins.
  63   </description>
  64 </property>
  65 
  66 </configuration>

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2007-02-03 16:45:53, 2.0 KB) [[attachment:nutch-site.xml]]
 All files | Selected Files: delete move to page

You are not allowed to attach a file to this page.