Work in progress

This document is not fully finished but it may be useful for people geting started with solr development

SOLR-130 Open questions

1. The schema.xml and solrconfig.xml are in parts very well explained. But in some areas like as an example .. <indexDefaults> and other places there are no explanation. It would be nice to get more info there. Specifically for example if increase <mergeFactor> to 1000 what will happen? what are the highest value for each properties? what is for example a "safe value".
2. It would be nice to create a deployment scenarios i.e a single server install with XXX CPU and YYY memory just running Solr with AAA thousand docs how should your config look like and why? and you can get about xxx Query/Sec or something..
3. It would be nice to have a multi server deployment with some server spec and then how should the deployment be.
4. It would also be nice to have more info regarding stopwords synonoms etc. usage and facet etc.. 


Solr is an open source enterprise search server based on the Lucene Java search library. Solr tries to expose all of the Lucene goodness through an easy to use, easy to configre, HTTP interface. Besides the configuration, Solr's other means of being a value add is in it's IndexReader management, it's caching, and it's plugin support for mixing and matching request handlers, output writers, and field types as easily as you can mix and match Analyzers.


Typically it's not recommended do have your front end users/clients hitting Solr directly as part of an HTML form submit ... the more conventional way to think of it is that Solr is a backend service, which your application can talk to over HTTP -- if you were dealing with a database, you wouldn't expect that you could generate an HTML form for your clients and then have them submit that form in some way that resulted in their browser using JDBC (or ODBC) to communicate directly with your database, their client would communicate with your App, which would validate their input, impose some security checks on the input, and then execute the underlying query to your database -- working with Solr should be very similar, it just so happens that instead of using JDBC or some other binary protocol, Solr uses HTTP, and you *can* talk to it directly from a web browser, but that's really more of a debugging feature then anything else.

Getting started

If you followed the tutorial you may now ask yourself how to create your own application based on SOLR. The easiest way is to use the example webapp and strip everything that you do not need.

First we will setup a basic directory structure:

|-- solr
|   |-- README.txt
|   `-- conf
|       |-- admin-extra.html
|       |-- protwords.txt
|       |-- schema.xml
|       |-- scripts.conf
|       |-- solrconfig.xml
|       |-- stopwords.txt
|       |-- synonyms.txt
|       `-- xslt
|           `-- example.xsl
`-- webapps
    `-- mySolr

Now we need content in the webapps/, for this we can first build the core application and then extract the content again into the webapps/mySolr/ (with any zip application):ant dist-war Alternative we can copy all resources referenced by the ant target ourself (of course compiled).

Basic structure with jetty

|-- etc
|   |-- LICENSE.javax.servlet.txt
|   |-- LICENSE.javax.xml.html
|   |-- LICENSE.jsse.txt
|   |-- admin.xml
|   |-- jetty-jmx.xml
|   |-- jetty.xml
|   `-- webdefault.xml
|-- ext
|   |-- ant.jar
|   |-- commons-el.jar
|   |-- commons-logging.jar
|   |-- jasper-compiler.jar
|   |-- jasper-runtime.jar
|   |-- mx4j-remote.jar
|   |-- mx4j-tools.jar
|   `-- mx4j.jar
|-- lib
|   |-- javax.servlet.jar
|   |-- jsp
|   |-- org.mortbay.jetty.jar
|   `-- org.mortbay.jmx.jar
|-- solr
|   |-- README.txt
|   `-- conf
|       |-- admin-extra.html
|       |-- protwords.txt
|       |-- schema.xml
|       |-- scripts.conf
|       |-- solrconfig.xml
|       |-- stopwords.txt
|       |-- synonyms.txt
|       `-- xslt
|           `-- example.xsl
|-- start.jar
`-- webapps
    `-- mySolr
        |-- META-INF
        |   |-- LICENSE.txt
        |   |-- MANIFEST.MF
        |   `-- NOTICE.txt
        |-- WEB-INF
        |   |-- lib
        |   |   |-- apache-solr-1.2-dev-incubating.jar
        |   |   |-- commons-fileupload-1.1.1.jar
        |   |   |-- commons-io-1.2.jar
        |   |   |-- lucene-core-nightly.jar
        |   |   |-- lucene-highlighter-nightly.jar
        |   |   |-- lucene-snowball-nightly.jar
        |   |   `-- xpp3-
        |   `-- web.xml
        `-- index.html

This is the basic structure when you are developing with a standalone jetty included. Where etc/, ext/, lib/ and start.jar are jetty specific and not needed when developing for tomcat. However in our small example we will use jetty and copy this files from the example.

The most important directoy is solr/. Here we store all configuration parameter needed by solr.

Setting up your own fields

As soon you are using your own fields you will have to define them in the schema.xml. Since we used the example schema as basis for our app, remove the following fields and add your own:

<field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
<field name="name" type="text" indexed="true" stored="true"/>
<field name="manu" type="text" indexed="true" stored="true" omitNorms="true"/>
<field name="cat" type="text_ws" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
<field name="features" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="includes" type="text" indexed="true" stored="true"/>

<field name="weight" type="sfloat" indexed="true" stored="true"/>
<field name="price"  type="sfloat" indexed="true" stored="true"/>
<field name="popularity" type="sint" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="manu_exact" type="string" indexed="true" stored="false"/>

Add your documents to the index

Add your own documents (conform to the schema definitions you used) and test.

Changing the result output

FIXME: add more information