DASL Configuration

The default implementation scans the complete resource tree provided in the scope of DASL query and tests for each resource whether it matches the condition or not.

This works, but is quite slow.

To avoid this, you caurrently have the following options:

Searching meta-data using RDMBS

If you are using a JDBCStore/J2EEStore you can use the database to search the metadata. To enable this add the parameter use-rdbms-expression-factory to your store definition.

Searching meta-data with the Lucene based properties indexer

Note this is under delevlopment, and will be part of Slide 2.2. To check this out you can use cvs HEAD.

Identified on 7/19/2006, the <propertiesindexer> has issues with bindings. Recommend you add/modify a slide.properties file with the org.apache.slide.binding=false parameter. See SlidePropertiesFile.

Searching the meta data.

Enabling

To use this indexer add the following to your store definition.

Parameter

parameter

description

required/default

indexpath

directory where the index data is stored

true/none

asynchron

If set to false the index is updated inside the transaction. If set to true the index in updated on a separate thread. So the transaction can be finished before the index is updated.

no/false

priority

Priority ofthe indexing thread if asynchron is true. Must be a value between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY

no/Thread.NORM_PRIORITY

includes

A comma separated list of pathes for which indexix should happen. If empty all inthe store is indexed

no

optimization-threshold

The number of write accesses to the index after which the index is optimized

no/100

supported DASL operators and data types

The indexer currently supports the datatypes:

string indexed with out any modification

date indexed as a normalized date string (without seconds)

integer indexed as a normalized integer string (between Long.MIN_VALUE and Long.MAX_VALUE)

text indexed in a tokenized and normalized form (normalized using Lucene analyzers)

string

date

integer

text

eq

*

*

*

-

lt

+

*

*

-

gt

+

*

*

-

lte

+

*

*

-

gte

+

*

*

-

like

*

~

~

-

is-defined

*

*

*

*

between

+

*

*

-

propcontains

-

-

-

*

* supported (if indexing for the property is enabled)

+ ditto but the order of strings is limited to char code ordering

~ supported but not executed with the index (so will be slow)

- unsupported (will return an error)

Also supported are the boolean operators and, or, not and the special operators is-collection and is-principal.

Configuring what properties are indexed

TODO

To reduce the indexing overhead, not all properties are index by default. For properties that are not indexed the default search implementation we be called.

By default the following properties are indexed:

namespace

property

type

DAV:

displayname

string

DAV:

getcontenttype

string

DAV:

getcontentlanguage

string

DAV:

getcontentlength

integer

DAV:

getlastmodified

date

DAV:

creationdate

date

User defined text properties

You can add additional properties to the indexing, including user defined properties.

The following sample defines two user defined properties in the namepace http://any.domain/test/. Both are text properties analyzed with different analyzers.

Operators (extensions)

Operator property-contains

Is an extension to RFC. It works like the contains operator but for properties. This is intended for use with properties that contains abstracts, keyword lists etc.

Usage

1. Search for a single word

2. Search for words with wildcards

3. Search for phrases

Searching content with the Lucene based content indexer

Enabling

To use this indexer add the following to your store definition.

Parameter

parameter

description

required/default

indexpath

directory where the index data is stored

true/none

asynchron

If set to false the index is updated inside the transaction. If set to true the index in updated on a separate thread. So the transaction can be finished before the index is updated.

no/false

priority

Priority ofthe indexing thread if asynchron is true. Must be a value between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY

no/Thread.NORM_PRIORITY

includes

A comma separated list of pathes for which indexix should happen. If empty all inthe store is indexed

no

optimization-threshold

The number of write accesses to the index after which the index is optimized

no/100

analyzer

Search for a single word

Extractors

The content indexer will only process resources that match any content extractor. So don't forget to configure the content extractors according to your needs. If you want to include text, pdf and word documents into your search, your extractor configuration could look like this:

{{{<!-- Extractor configuration --> <extractors>

</extractors>}}}

DaslConfiguration (last edited 2009-09-20 22:02:45 by localhost)