Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  • CrawlBase
  • CrawlList
  • FetchedContent
  • ParsedContent
    • Url → MapWritable
    • Wiki Markup
      [MapWritable] would contain Text → Writable or Writable\[\] and would allow the parsing of all different types of elements within the content (href, headers, etc.)

  • Processing
    • Processing would take the ParsedContent and translate that into multiple specific data parts. These data parts aren't used by any part of the system except Scoring.
    • Processing would be specific functions including updating the CrawlBase, peforming analysis on ParsedContent, Integration of data from other sources.
    • Some processors would translate content into formats needed by scorers.
    • Processors are not constrained by specific data structures to allow flexibility in analysis, updating, blocking or removal, and other forms of data processing. The only requirement is scoring programs must be able to access processing output data structure in a one to one relationship.
  • Scoring
    • Url → Field
    • Url → Float
    • Field is a name, value(s), and score, being Text, Text, and Float respectively.
    • The fields become the fields that are indexed with the scores becoming field boosts.
    • Scoring takes the specific data parts from analysis and outputs the above formats.
    • Field needs lucene semantics.
    • Indexing
      • Indexing indexes Fields for a document according to the field values and boosts. Indexing does not determine either field values or boost values.
      • Indexing aggregates document boosts to create a final document score.

...