User Tagging Design

Data Set

A10: title = Lucene in Action (LIA)
A11: title = Lucene in Action Deux
A12: title = Solr Flare in Action
A13: title = Practical Perl 

erik tagged 
  A10 with "lucene"
  A11 with "lucene","solr"
  A12 with "ruby","solrflare"
  A13 with

yonik tagged
  A10 with
  A11 with "lucene","solr","excellent"
  A12 with "solr"
  A13 with "foo"

Use Cases

U-addUserTag

Allow a user to tag a book

U-delUserTag

Allow a user to remove a tag from a book, or all instances of a tag they have used.

U-delUser

Remove all tags that were added by a specific user

U-tagFacet

Show number of books tagged with each tag, restricted by users current search results and filters.

U-userFacet

Show number of books tagged tagged by each user.

U-tagSuggest

When a user is tagging a book, allow them to type in the first few letters and then give a dropdown list of existing tags to choose from. Sort by tag popularity, optionally show counts.

they are automatically shown "solr(2), solrflare(1)" (uses #books tagged)

they are automatically shown "solr(3), solrflare(1)" (uses #tag instances)

U-tagNarrow

User selects an existing tag to narrow their search results by.

U-userTagNarrow

Show all books a specific user tagged with a specific tag, or restrict search results by the same.

U-tagNarrowSuggest

Allow the user to narrow their search results by typing in

U-userNarrow

Restrict books to those tagged by a specific user.

U-userTagsNarrow

Restrict *tags* to those of a specific user.

U-userNarrowMulti

Restrict books to those tagged by a specific users.

U-tagRelevance

When searching for a specific tag, increase the relevance of books that have more instances of that tag.

U-tagTimeliness

Restrict to tags added in the last year (or time period)

Machine Tags or Triple Tags

http://www.flickr.com/groups/api/discuss/72157594497877875

Tag Hierarchies

Implementations

Flat Schema #1

Add tags directly to the documents as a single user/tag token.

U-addUserTag

add to A10, field utag="~erik#lucene"   // single token
add to A10, field utag2="~erik","#lucene" // two tokens, added via copyField with a tokenizer that splits the original

Alternative:

add to A10, field utag="erik#lucene"   // or "erik lucene", single token
add to A10, field user="erik"  // via copyField
add to A10, field tag="lucene" // via copyField

The latter looks simpler, but the former allows phrase queries to match different components of a tag with a single query. A Lucene PhraseQuery across multiple fields would also work for the latter if this capability is needed.

Relevancy Calculations for Tags

To leverage Relevancy calculations, you'd include the tag as part of the regular fulltext search (q), vs. just adding it as a filter (fq).

If multiple users have tagged a document with "lucene", then that field's density for the term will be higher, so Relevancy will also tend to be higher. However, another document with only 1 tag, which happens to be 'lucene', will likely still rank higher than a heavily tagged document with only 40% of the tags equal to 'lucene', given Lucene's default relevancy formulas.

More advanced relevancy models would need more sophisticated implementations, for example perhaps a custom Similarity class.

U-delUserTag

U-delUser

U-tagFacet

U-userFacet

U-tagSuggest

U-tagNarrow

U-userTagNarrow

U-tagNarrowSuggest

U-userNarrow

U-userTagsNarrow

U-userNarrowMulti

U-tagRelevance

U-tagTimeliness

??? reserve another prefix for fields like time

UserTagDesign (last edited 2011-01-17 22:30:57 by MarkBennett)