Differences between revisions 3 and 4
Revision 3 as of 2007-02-05 12:10:50
Size: 6434
Editor: proxycache
Comment: Fixing typo and ask for collaboration on this page
Revision 4 as of 2007-09-07 08:03:30
Size: 7134
Editor: proxycache
Comment: Focusing on using ant for generate our mySolr
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
== SOLR-130 Open questions == == Open questions ==
Line 15: Line 15:
== Production ==
Line 16: Line 17:
== Development ==
This document describes how to create and deploy your custom solr server based on an out-of-the-box solr distribution. We will use Apache Ant for the deployment and possible patching of standard solr files.
Line 18: Line 21:
If you followed the tutorial you may now ask yourself how to create your own application based on SOLR. The easiest way is to use the example webapp and strip everything that you do not need. If you followed the tutorial you may now ask yourself how to create your own application based on SOLR. We use the example webapp dir structure and strip everything that we do not need or do not want to duplicate. The most likely files you need are in the solr/conf directory since they are controlling the behavior of our application.
Line 20: Line 23:
First we will setup a basic directory structure: First we will setup a basic directory structure (assuming we only want to change some fields) and create a blank build.xml:
Line 24: Line 27:
| |-- README.txt
Line 26: Line 28:
| |-- admin-extra.html
| |-- protwords.txt
Line 29: Line 29:
| |-- scripts.conf
| |-- solrconfig.xml
| |-- stopwords.txt
| |-- synonyms.txt
| `-- xslt
| `-- example.xsl
`-- webapps
    `-- mySolr
| `-- solrconfig.xml
`-- build.xml
Line 39: Line 33:
Now we need content in the webapps/, for this we can first build the core application and then extract the content again into the webapps/mySolr/ (with any zip application):{{{ant dist-war}}}
Alternative we can copy all resources referenced by the ant target ourself (of course compiled).
In the build.xml we will later generate a fully functional web-application using Jetty as servlet engine. The following is the basic structure when you are developing with a standalone jetty included. Where etc/, ext/, lib/ and start.jar are jetty specific and not needed when developing for tomcat. However in our small example we will use jetty and copy this files from the example via our ant script.
Line 42: Line 35:
= Basic structure with jetty = The most important directoy is solr/. Here we store all configuration parameter needed by solr where the idea is our above defined files have priority before the core files from solr which we will copy over via our little ant script.

= Basic structure with jetty after build=
Line 81: Line 76:
    `-- mySolr     `-- solr
Line 98: Line 93:
This is the basic structure when you are developing with a standalone jetty included. Where etc/, ext/, lib/ and start.jar are jetty specific and not needed when developing for tomcat. However in our small example we will use jetty and copy this files from the example.
Line 100: Line 94:
The most important directoy is solr/. Here we store all configuration parameter needed by solr. = Ant script=
{{{
<project>
  <property environment="env"/>
  <!-- SOLR_HOME must be set as an environment variable -->
  <property name="solr.home" value="${env.SOLR_HOME}"/>
  <condition property="solr.set" >
    <isset property="solr.home" />
  </condition>
  <fail unless="solr.set">Please set SOLR_HOME in your environment.
  export SOLR_HOME=~/src/apache/solr/
  </fail>
  <import file="${env.SOLR_HOME}/build.xml"/>
</project>
}}}
Line 102: Line 110:
= Extending ==

Work in progress - Help wanted

This document is not fully finished but it may be useful for people getting started with solr development. Please share your experience, collaborating on this document.

Open questions

  1. The schema.xml and solrconfig.xml are in parts very well explained. But in some areas like as an example .. <indexDefaults> and other places there are no explanation. It would be nice to get more info there. Specifically for example if increase <mergeFactor> to 1000 what will happen? what are the highest value for each properties? what is for example a "safe value".

  2. It would be nice to create a deployment scenarios i.e a single server install with XXX CPU and YYY memory just running Solr with AAA thousand docs how should your config look like and why? and you can get about xxx Query/Sec or something..
  3. It would be nice to have a multi server deployment with some server spec and then how should the deployment be.
  4. It would also be nice to have more info regarding stopwords synonoms etc. usage and facet etc..

Introduction

Solr is an open source enterprise search server based on the Lucene Java search library. Solr tries to expose all of the Lucene goodness through an easy to use, easy to configre, HTTP interface. Besides the configuration, Solr's other means of being a value add is in it's IndexReader management, it's caching, and it's plugin support for mixing and matching request handlers, output writers, and field types as easily as you can mix and match Analyzers.

Architecture

Production

Typically it's not recommended do have your front end users/clients hitting Solr directly as part of an HTML form submit ... the more conventional way to think of it is that Solr is a backend service, which your application can talk to over HTTP -- if you were dealing with a database, you wouldn't expect that you could generate an HTML form for your clients and then have them submit that form in some way that resulted in their browser using JDBC (or ODBC) to communicate directly with your database, their client would communicate with your App, which would validate their input, impose some security checks on the input, and then execute the underlying query to your database -- working with Solr should be very similar, it just so happens that instead of using JDBC or some other binary protocol, Solr uses HTTP, and you *can* talk to it directly from a web browser, but that's really more of a debugging feature then anything else.

Development

This document describes how to create and deploy your custom solr server based on an out-of-the-box solr distribution. We will use Apache Ant for the deployment and possible patching of standard solr files.

Getting started

If you followed the tutorial you may now ask yourself how to create your own application based on SOLR. We use the example webapp dir structure and strip everything that we do not need or do not want to duplicate. The most likely files you need are in the solr/conf directory since they are controlling the behavior of our application.

First we will setup a basic directory structure (assuming we only want to change some fields) and create a blank build.xml:

$mySolr      
|-- solr
|   `-- conf
|       |-- schema.xml
|       `-- solrconfig.xml
`-- build.xml

In the build.xml we will later generate a fully functional web-application using Jetty as servlet engine. The following is the basic structure when you are developing with a standalone jetty included. Where etc/, ext/, lib/ and start.jar are jetty specific and not needed when developing for tomcat. However in our small example we will use jetty and copy this files from the example via our ant script.

The most important directoy is solr/. Here we store all configuration parameter needed by solr where the idea is our above defined files have priority before the core files from solr which we will copy over via our little ant script.

= Basic structure with jetty after build=

$mySolr
|-- etc
|   |-- LICENSE.javax.servlet.txt
|   |-- LICENSE.javax.xml.html
|   |-- LICENSE.jsse.txt
|   |-- admin.xml
|   |-- jetty-jmx.xml
|   |-- jetty.xml
|   `-- webdefault.xml
|-- ext
|   |-- ant.jar
|   |-- commons-el.jar
|   |-- commons-logging.jar
|   |-- jasper-compiler.jar
|   |-- jasper-runtime.jar
|   |-- mx4j-remote.jar
|   |-- mx4j-tools.jar
|   `-- mx4j.jar
|-- lib
|   |-- javax.servlet.jar
|   |-- jsp
|   |-- org.mortbay.jetty.jar
|   `-- org.mortbay.jmx.jar
|-- solr
|   |-- README.txt
|   `-- conf
|       |-- admin-extra.html
|       |-- protwords.txt
|       |-- schema.xml
|       |-- scripts.conf
|       |-- solrconfig.xml
|       |-- stopwords.txt
|       |-- synonyms.txt
|       `-- xslt
|           `-- example.xsl
|-- start.jar
`-- webapps
    `-- solr
        |-- META-INF
        |   |-- LICENSE.txt
        |   |-- MANIFEST.MF
        |   `-- NOTICE.txt
        |-- WEB-INF
        |   |-- lib
        |   |   |-- apache-solr-1.2-dev-incubating.jar
        |   |   |-- commons-fileupload-1.1.1.jar
        |   |   |-- commons-io-1.2.jar
        |   |   |-- lucene-core-nightly.jar
        |   |   |-- lucene-highlighter-nightly.jar
        |   |   |-- lucene-snowball-nightly.jar
        |   |   `-- xpp3-1.1.3.4.O.jar
        |   `-- web.xml
        `-- index.html

= Ant script=

<project>
  <property environment="env"/>
  <!-- SOLR_HOME must be set as an environment variable -->
  <property name="solr.home" value="${env.SOLR_HOME}"/>
  <condition property="solr.set" >
    <isset property="solr.home" />
  </condition>
  <fail unless="solr.set">Please set SOLR_HOME in your environment.
  export SOLR_HOME=~/src/apache/solr/
  </fail>
  <import file="${env.SOLR_HOME}/build.xml"/>
</project>

= Extending ==

Setting up your own fields

As soon you are using your own fields you will have to define them in the schema.xml. Since we used the example schema as basis for our app, remove the following fields and add your own:

<field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
<field name="name" type="text" indexed="true" stored="true"/>
<field name="manu" type="text" indexed="true" stored="true" omitNorms="true"/>
<field name="cat" type="text_ws" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
<field name="features" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="includes" type="text" indexed="true" stored="true"/>

<field name="weight" type="sfloat" indexed="true" stored="true"/>
<field name="price"  type="sfloat" indexed="true" stored="true"/>
<field name="popularity" type="sint" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="manu_exact" type="string" indexed="true" stored="false"/>

Add your documents to the index

Add your own documents (conform to the schema definitions you used) and test.

Changing the result output

FIXME: add more information

mySolr (last edited 2009-09-20 22:05:28 by localhost)