Introduction
Solr 1.4
The Term Vector Component (TVC) is a SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field:
<field name="features" type="text" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>
For each document, the TVC can return, the term vector, the term frequency, inverse document frequency, position and offset information. As with most components, there are a number of options that are outlined in the samples below.
Sample Usage
All examples are based on using the Solr example.
Enabling the TVC
Changes required in solrconfig.xml
You need to enable the TermVectorComponent in your solr configuration:
<searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>
A RequestHandler configuration using this component could look like this:
<requestHandler name="tvrh" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
HTTP Requests
http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true
In the example, the component is associated with a request handler named tvrh, but you can associate it with any RequestHandler. To turn on the component for a request, add the tv=true parameter (or add it to your RequestHandler defaults configuration).
Example output: See TermVectorComponentExampleEnabled.
Options
http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true
- tv.tf - Return document term frequency info per term in the document.
- tv.df - Return the Document Frequency (DF) of the term in the collection. This can be expensive.
- tv.positions - Return position information.
- tv.offsets - Return offset information for each term in the document.
- tv.tf_idf - Calculates tf*idf for each term. Requires the parameters tv.tf and tv.df to be "true". This can be expensive. (not shown in example output)
Alternatively, a shortcut for all options on is:
- tv.all=true
Example output: See TermVectorComponentExampleOptions.
Schema requirements see: FieldOptionsByUseCase.
Other Options
- tv.fl - List of fields to get TV information from. Optional. If not specified, the fl parameter is used.
- tv.docIds - List of Lucene document ids (not the Solr Unique Key) to get term vectors for.
SolrJ
Neither the SolrQuery class nor the QueryResponse class offer specific method calls to set TermVectorComponent parameters or get the "termVectors" output. However, there is a patch for it: SOLR-949.
History
SOLR-651 covers the development history.