SimplePostTool, also called post.jar, is a simple self-containted command line tool for indexing data to Solr. It is not meant for production use, but a quick way to get up to speed.

post.jar resides inside the Solr distribution, in the folder "example/exampledocs". It is made as a single .java file (see SVN) without dependencies, so it does on purpose not use SolrJ.

The tool can index both XML/JSON/CSV strucured files as well as a file tree of rich text documents. It also includes a simple web crawler.

Note that you do not *need* to use this tool to index data to Solr. Solr uses standards based HTTP protocol, so you can use any tool or library capable of communicating over HTTP GET/POST, such as for instance the popular curl tool.

Usage

  java [SystemProperties] -jar post.jar
    [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]

Examples

Get full help:

cd solr/example/exampledocs
java -jar post.jar -h

Post a single XML file in Solr's Update XML format:

java -jar post.jar *.xml

Send XML instructions directly on the command line, e.g. to delete a document:

java -Ddata=args -jar post.jar '<delete><id>42</id></delete>'

Post a JSON document, specifying the content-type:

java -Dtype=application/json -jar post.jar *.json

Post all CSV, XML, JSON and PDF documents using AUTO mode which detects type based on file name:

java -Dauto -jar post.jar *.csv *.xml *.json *.pdf

Posts all content of a folder recursively, with auto detection of file type and selecting correct handler:

java -Dauto -Drecursive -jar post.jar my-folder

Same as above. Post a folder recursively, but only index PPT and HTML file types:

java -Dauto -Dfiletypes=ppt,html -jar post.jar my-folder

Send the contents of a URL:

java -Ddata=web -jar post.jar http://example.no/

Crawl a web site recursively (default 1 level):

java -Ddata=web -Drecursive -jar post.jar http://example.no/
  • No labels