User Tagging Design

Data Set

A10: title = Lucene in Action (LIA)
A11: title = Lucene in Action Deux
A12: title = Solr Flare in Action
A13: title = Practical Perl 

erik tagged 
  A10 with "lucene"
  A11 with "lucene","solr"
  A12 with "ruby","solrflare"
  A13 with

yonik tagged
  A10 with
  A11 with "lucene","solr","excellent"
  A12 with "solr"
  A13 with "foo"

Use Cases

U-addUserTag

Allow a user to tag a book

  • Example: Allow erik to tag A10 with "lucene"
U-delUserTag

Allow a user to remove a tag from a book, or all instances of a tag they have used.

U-delUser

Remove all tags that were added by a specific user

U-tagFacet

Show number of books tagged with each tag, restricted by users current search results and filters.

  • Example: user submits a search of "title:lucene", and the resulting tag counts are "lucene(2), solr(2), excellent(1)"
    Notice that the count is number of books with that tag, not the number of tags... there are 3 "lucene" tags on the books, but only 2 books are tagged "lucene".
U-userFacet

Show number of books tagged tagged by each user.

  • Example: user submits a search of "title:lucene", and the resulting tag counts are "erik(2), yonik(1)"
    Notice that the count is number of books tagged, not the number of tags on books.
U-tagSuggest

When a user is tagging a book, allow them to type in the first few letters and then give a dropdown list of existing tags to choose from. Sort by tag popularity, optionally show counts.
Tag popularity: number of users using that tag, or number of books
with that tag? Either could work if necessary,

  • Example1: user types in "so" into the textbox when tagging a book, and

they are automatically shown "solr(2), solrflare(1)" (uses #books tagged)

  • Example2: user types in "so" into the textbox when tagging a book, and

they are automatically shown "solr(3), solrflare(1)" (uses #tag instances)

U-tagNarrow

User selects an existing tag to narrow their search results by.
Any displayed results (including facet counts) must have all tags that
have been selected by the user.

  • Example: narrow search results by the tag "solr"
U-userTagNarrow

Show all books a specific user tagged with a specific tag, or restrict search results by the same.

  • Example: restrict matches to books with erik's "solr" tag => restricts to A11
U-tagNarrowSuggest

Allow the user to narrow their search results by typing in
a tag instead of selecting it from a list. When the user has typed
one or two letters, automatically pop up a list of tags starting
with that prefix. Optionally sort tags by number of books it applies to
in the current search results.

  • Example: search "title:lucene", user types "so" and is presented with solr(1)
U-userNarrow

Restrict books to those tagged by a specific user.

  • Example: search "title:*" restrict to books tagged by erik => A10,A11,A12
  • Example2: search "title:*", facet by tag, restrict to books tagged by erik:
    facet counts={lucene(2),solr(2),ruby(1),solrflare(1),excellent(1)}
    (note that this does *not* restrict shown tag counts to erik's tags)
U-userTagsNarrow

Restrict *tags* to those of a specific user.

  • Example: search "lucene", facet by tag, restrict to erik's tags:
    facet counts={lucene(2),solr(1)}
U-userNarrowMulti

Restrict books to those tagged by a specific users.

  • Example: search "title:*" restrict to books tagged by erik or yonik => A10,A11,A12,A13
  • Example2: search "title:*" restrict to books tagged by erik and yonik => A10,A11,A12
U-tagRelevance

When searching for a specific tag, increase the relevance of books that have more instances of that tag.

  • Example: search for tag "lucene" and show A11 before A10
U-tagTimeliness

Restrict to tags added in the last year (or time period)

Machine Tags or Triple Tags

http://www.flickr.com/groups/api/discuss/72157594497877875

Tag Hierarchies

Implementations

Flat Schema #1

Add tags directly to the documents as a single user/tag token.

U-addUserTag
add to A10, field utag="~erik#lucene"   // single token
add to A10, field utag2="~erik","#lucene" // two tokens, added via copyField with a tokenizer that splits the original

Alternative:

add to A10, field utag="erik#lucene"   // or "erik lucene", single token
add to A10, field user="erik"  // via copyField
add to A10, field tag="lucene" // via copyField

The latter looks simpler, but the former allows phrase queries to match different components of a tag with a single query. A Lucene PhraseQuery across multiple fields would also work for the latter if this capability is needed.

Relevancy Calculations for Tags

To leverage Relevancy calculations, you'd include the tag as part of the regular fulltext search (q), vs. just adding it as a filter (fq).

If multiple users have tagged a document with "lucene", then that field's density for the term will be higher, so Relevancy will also tend to be higher. However, another document with only 1 tag, which happens to be 'lucene', will likely still rank higher than a heavily tagged document with only 40% of the tags equal to 'lucene', given Lucene's default relevancy formulas.

More advanced relevancy models would need more sophisticated implementations, for example perhaps a custom Similarity class.

U-delUserTag

remove A10.utag="~erik#lucene"

U-delUser

q="utag:~erik*", get set of documents, remove all tags starting with ~erik

U-tagFacet

q="title:lucene" facet.field=utag2 facet.prefix=#

U-userFacet

q="title:lucene" facet.field=utag2 facet.prefix=~

U-tagSuggest
  • Example1: facet.field=utag2 facet.prefix=#so
  • Example2: not easily doable... would require more work within solr to count up tf's
U-tagNarrow

fq=utag2:#solr

U-userTagNarrow
  • fq=utag:~erik#solr
  • OR fq="utag2:"~erik #solr"
U-tagNarrowSuggest

q=title:lucene facet.field=utag2 facet.prefix=#so

U-userNarrow

q=title:* fq=utag2:~erik

U-userTagsNarrow

q=lucene fq=utag2:~erik facet.field=utag facet.prefix=~erik

U-userNarrowMulti
  • Example: q=title:* fq=utag2(sad)~erik OR ~yonik)
  • Example2: q=title:* fq=utag2(sad)+~erik +~yonik)
U-tagRelevance

q=utag2:#lucene

U-tagTimeliness

??? reserve another prefix for fields like time

  • No labels