Differences between revisions 17 and 18
Revision 17 as of 2013-09-03 15:18:25
Size: 4691
Editor: 216
Comment:
Revision 18 as of 2013-10-02 21:14:55
Size: 4726
Editor: 24
Comment: fix "Solr Info" link for Nutch
Deletions are marked like this. Additions are marked like this.
Line 21: Line 21:
 * [[http://lucene.apache.org/nutch/|Nutch]] (web) [[http://wiki.apache.org/nutch/RunningNutchAndSolr|Solr Info]]  * [[http://lucene.apache.org/nutch/|Nutch]] (web) [[http://wiki.apache.org/nutch/NutchTutorial|Solr Info (included as part of the Nutch Tutorial)]]

This page documents that relate to the broader Solr Ecosystem of Search and Information Retrieval and may be of interest to Solr users. Some of these projects may have features that can be used directly in conjunction with Solr. You may also be interested in the IntegratingSolr page which documents client libraries and projects for integrating Solr with other applications not specific to Search/IR.

Solr distributions / forks

Data acquisition

There are numerous ways to bring data into Solr. Many people roll their own solution or use the DataImportHandler

Crawlers And Connectors

Web, email, and file crawlers.

Pipelines / Document Processing

Frameworks for flexible document processing. See DocumentProcessing for more background and criteria for a proposal. Some crawlers/connectors have their own pipeline capability and they are not repeated here.

A common problem amongst the ETLs is that each step in the pipeline accepts and emits records in a fixed flat schema, they don't support dynamic name-value pairs. And these are not document oriented; if you want to pass a DOM of some kind then you serialize it into a field. However, the ETLs are all far more mature than nascent document or XML oriented pipelines.

Indexing

Generating the Lucene/Solr Index

Hadoop:

Monitoring

Tools or services for monitoring Solr-specific performance metrics.

SolrEcosystem (last edited 2013-10-02 21:14:55 by 24)