Differences between revisions 2 and 3
Revision 2 as of 2011-03-01 14:47:25
Size: 1443
Editor: JanHoydahl
Comment: Indenting
Revision 3 as of 2012-09-24 16:10:39
Size: 1444
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
All these caes may be solved by introducing payloads in your scoring, given you have the client side magic to detect the words. The beauty is that since the boost is stored with the term, you do not need any heavy parsing or calculation query time. All documents originally containing "Word" would surf up higher in your results, even if everything is lowercased both on index and query side. And even if HTML markup is lost after parsing, your parser would already have tagged the titles and boldface words before removing the markup. All these cases may be solved by introducing payloads in your scoring, given you have the client side magic to detect the words. The beauty is that since the boost is stored with the term, you do not need any heavy parsing or calculation query time. All documents originally containing "Word" would surf up higher in your results, even if everything is lowercased both on index and query side. And even if HTML markup is lost after parsing, your parser would already have tagged the titles and boldface words before removing the markup.

Payloads are byte arrays (optionally) stored with every term on a field. Payloads may be used for several use cases in Solr, like boosting certain terms over others. Another use is POS tagging.

Term boosting with Payloads

Imagine the following use cases:

  • You index a large HTML page in your body field, but want to boost words in headings and with boldface
  • You want to boost all nouns (say you have a clever client side parser which detects nouns)
  • Indexing German content, you want to boost all names more than other (capitalized) nouns
  • You have a product called "Word" and want to boost all occurrences over the lowercase "word" words :)

All these cases may be solved by introducing payloads in your scoring, given you have the client side magic to detect the words. The beauty is that since the boost is stored with the term, you do not need any heavy parsing or calculation query time. All documents originally containing "Word" would surf up higher in your results, even if everything is lowercased both on index and query side. And even if HTML markup is lost after parsing, your parser would already have tagged the titles and boldface words before removing the markup.

For a step-by-step description on how to enable Payload boosting, see Lucid's blog post here.

Payloads (last edited 2012-09-24 16:10:39 by BradfordDabbs)