DASL Configuration
The default implementation scans the complete resource tree provided in the scope of DASL query and tests for each resource whether it matches the condition or not.
This works, but is quite slow.
To avoid this, you caurrently have the following options:
If you are using JDBCStore/J2EEStore you can enable metadata searching using the database.
You can enable metadata searching using integrated lucene search engine.
You can enable content searching using integrated lucene search engine.
Searching meta-data using RDMBS
If you are using a JDBCStore/J2EEStore you can use the database to search the metadata. To enable this add the parameter use-rdbms-expression-factory to your store definition.
-
<store name="myStore"> <parameter name="cache-mode">full</parameter> <nodestore classname="org.apache.slide.store.impl.rdbms.JDBCStore"> ... your JDBCStore configuration .. <parameter name="use-rdbms-expression-factory">true</parameter> </nodestore> <securitystore><reference store="nodestore"/></securitystore> <lockstore><reference store="nodestore"/></lockstore> <revisiondescriptorsstore><reference store="nodestore"/></revisiondescriptorsstore> <revisiondescriptorstore><reference store="nodestore"/></revisiondescriptorstore> <contentstore><reference store="nodestore"/></contentstore> </store>
Searching meta-data with the Lucene based properties indexer
Note this is under delevlopment, and will be part of Slide 2.2. To check this out you can use cvs HEAD.
Identified on 7/19/2006, the <propertiesindexer> has issues with bindings. Recommend you add/modify a slide.properties file with the org.apache.slide.binding=false parameter. See SlidePropertiesFile.
Searching the meta data.
Enabling
To use this indexer add the following to your store definition.
-
<propertiesindexer classname="org.apache.slide.index.lucene.LucenePropertiesIndexer"> <parameter name="indexpath">store/index/metadata</parameter> </propertiesindexer>
Parameter
|
parameter |
description |
required/default |
|
indexpath |
directory where the index data is stored |
true/none |
|
asynchron |
If set to false the index is updated inside the transaction. If set to true the index in updated on a separate thread. So the transaction can be finished before the index is updated. |
no/false |
|
priority |
Priority ofthe indexing thread if asynchron is true. Must be a value between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY |
no/Thread.NORM_PRIORITY |
|
includes |
A comma separated list of pathes for which indexix should happen. If empty all inthe store is indexed |
no |
|
optimization-threshold |
The number of write accesses to the index after which the index is optimized |
no/100 |
supported DASL operators and data types
The indexer currently supports the datatypes:
string indexed with out any modification
date indexed as a normalized date string (without seconds)
integer indexed as a normalized integer string (between Long.MIN_VALUE and Long.MAX_VALUE)
text indexed in a tokenized and normalized form (normalized using Lucene analyzers)
|
|
string |
date |
integer |
text |
|
eq |
* |
* |
* |
- |
|
lt |
+ |
* |
* |
- |
|
gt |
+ |
* |
* |
- |
|
lte |
+ |
* |
* |
- |
|
gte |
+ |
* |
* |
- |
|
like |
* |
~ |
~ |
- |
|
is-defined |
* |
* |
* |
* |
|
between |
+ |
* |
* |
- |
|
propcontains |
- |
- |
- |
* |
* supported (if indexing for the property is enabled)
+ ditto but the order of strings is limited to char code ordering
~ supported but not executed with the index (so will be slow)
- unsupported (will return an error)
Also supported are the boolean operators and, or, not and the special operators is-collection and is-principal.
Configuring what properties are indexed
TODO
To reduce the indexing overhead, not all properties are index by default. For properties that are not indexed the default search implementation we be called.
By default the following properties are indexed:
|
namespace |
property |
type |
|
DAV: |
displayname |
string |
|
DAV: |
getcontenttype |
string |
|
DAV: |
getcontentlanguage |
string |
|
DAV: |
getcontentlength |
integer |
|
DAV: |
getlastmodified |
date |
|
DAV: |
creationdate |
date |
User defined text properties
You can add additional properties to the indexing, including user defined properties.
The following sample defines two user defined properties in the namepace
http://any.domain/test/. Both are text properties analyzed with different analyzers.
-
<propertiesindexer classname="org.apache.slide.index.lucene.LucenePropertiesIndexer"> <parameter name="indexpath">${datapath}/store1/index/metadata</parameter> <configuration name="indexed-properties"> <property name="abstract" namespace="http://any.domain/test/"> <text analyzer="org.apache.lucene.analysis.de.GermanAnalyzer"/> <is-defined/> </property> <property name="keywords" namespace="http://any.domain/test/"> <text analyzer="org.apache.lucene.analysis.WhitespaceAnalyzer"/> <is-defined/> </property> </configuration> </propertiesindexer>
Operators (extensions)
Operator property-contains
Is an extension to RFC. It works like the contains operator but for properties. This is intended for use with properties that contains abstracts, keyword lists etc.
Usage
-
<searchrequest xmlns:D="DAV:" xmlns:S="http://jakarta.apache.org/slide/" xmlns:u="http://any.domain/test/"> ... <D:where> <S:property-contains> <D:prop><u:abstract/></D:prop> <D:literal>Server</D:literal> </S:property-contains> </D:where> ... </searchrequest>
1. Search for a single word
-
<S:property-contains> <D:prop><u:abstract/></D:prop> <D:literal>Word</D:literal> </S:property-contains>
2. Search for words with wildcards
-
<S:property-contains> <D:prop><u:abstract/></D:prop> <D:literal>prefix*</D:literal> </S:property-contains>
<S:property-contains> <D:prop><u:abstract/></D:prop> <D:literal>wild?ard</D:literal> </S:property-contains>
3. Search for phrases
-
<S:property-contains> <D:prop><u:abstract/></D:prop> <D:literal>a longer phrase of text</D:literal> </S:property-contains>
Searching content with the Lucene based content indexer
Enabling
To use this indexer add the following to your store definition.
-
<contentindexer classname="org.apache.slide.index.lucene.LuceneContentIndexer"> <parameter name="indexpath">store/index/content</parameter> </contentindexer>
Parameter
|
parameter |
description |
required/default |
|
indexpath |
directory where the index data is stored |
true/none |
|
asynchron |
If set to false the index is updated inside the transaction. If set to true the index in updated on a separate thread. So the transaction can be finished before the index is updated. |
no/false |
|
priority |
Priority ofthe indexing thread if asynchron is true. Must be a value between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY |
no/Thread.NORM_PRIORITY |
|
includes |
A comma separated list of pathes for which indexix should happen. If empty all inthe store is indexed |
no |
|
optimization-threshold |
The number of write accesses to the index after which the index is optimized |
no/100 |
|
analyzer |
|
|
Search for a single word
-
<D:contains>Word</D:contains> </S:property-contains>
Extractors
The content indexer will only process resources that match any content extractor. So don't forget to configure the content extractors according to your needs. If you want to include text, pdf and word documents into your search, your extractor configuration could look like this:
<!-- Extractor configuration --> <extractors> <extractor classname="org.apache.slide.extractor.PDFExtractor"/> <extractor classname="org.apache.slide.extractor.MSWordExtractor"/> <extractor classname="org.apache.slide.extractor.TextContentExtractor"/> </extractors>