|
Size: 4527
Comment: Added Aspire by Search Technologies
|
← Revision 15 as of 2013-03-11 20:13:10 ⇥
Size: 4525
Comment: Indent fix
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 60: | Line 60: |
| == Monitoring == | = Monitoring = |
This page documents that relate to the broader Solr Ecosystem of Search and Information Retrieval and may be of interest to Solr users. Some of these projects may have features that can be used directly in conjunction with Solr. You may also be interested in the IntegratingSolr page which documents client libraries and projects for integrating Solr with other applications not specific to Search/IR.
Contents
Solr distributions / forks
LucidWorks Enterprise (LWE) (not free)
Solandra - A tight integration of Solr and Cassandra. The result is Solr with the awesome scalability properties of Cassandra.
LucidWorks Certified Distribution for Solr Pure open-source Solr; more stable than latest Apache release. The code is on github (see the branches & tags).
Data acquisition
There are numerous ways to bring data into Solr. Many people roll their own solution or use the DataImportHandler
Crawlers And Connectors
Web, email, and file crawlers.
Heritrix (web)
Crawl-Anywhere (web) Solr Info
DataImportHandler (email, file)
ManifoldCF (web, file) Solr Info
Aperture (web, email, file)
- Presently, more of a framework for a crawler.
Pipelines / Document Processing
Frameworks for flexible document processing. See DocumentProcessing for more background and criteria for a proposal. Some crawlers/connectors have their own pipeline capability and they are not repeated here.
Aspire (by Search Technologies) - integrates with Solr. Not open-source but free.
Hydra (by Findwise) - integrates with Solr.
- ETL (Extract Transform Load) -- many are applicable; these are a couple notable ones:
Custom Talend components for SOLR can be found on Talend Forge Exchange with associated doc and tutorial
A common problem amongst the ETLs is that each step in the pipeline accepts and emits records in a fixed flat schema, they don't support dynamic name-value pairs. And these are not document oriented; if you want to pass a DOM of some kind then you serialize it into a field. However, the ETLs are all far more mature than nascent document or XML oriented pipelines.
- ESBs (Enterprise Service Buses) -- not listed; various
One of the XProc implementations (an XML pipeline spec) such as Calabash
Indexing
Generating the Lucene/Solr Index
Hadoop:
Monitoring
Tools or services for monitoring Solr-specific performance metrics.