Persistence Manager (PM) FAQ
What is a Persistence Manager (PM)?
The PM is an *internal* Jackrabbit component that handle the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace. Also the Jackrabbit version handler uses a separate persistence manager.
The persistence manager implementations to use are configured using the PersistenceManager configuration element within the Workspace configuration template (and the instantiated workspace configuration files) and the Versioning configuration element. The layout of a PersistenceManager configuration element is shown below.
<PersistenceManager class="...">
<param name="...">...</param>
...
</PersistenceManager>
The class attribute of the PersistenceManager element contains the fully qualified class name of the persistence manager implementation class. The class must implement the org.apache.jackrabbit.core.persistence.PersistenceManager interface and have an empty default constructor. String properties can be assigned using the param elements according to the JavaBean conventions.
See also
http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/persistence/PersistenceManager.html
The PM sits at the very bottom layer in jackrabbits system architecture. Reliability, integrity and performance of the PM are *crucial* to the overall stability & performance of the repository. If e.g. the data that a PM is based upon is allowed to change through external means the integrity of the repository would be at risk (think of referential integrity / node references e.g.).
Which Persistence Manager is the fastest?
The bundle persistence managers are usually the fastest. Bundle persistence managers store each node together with all the properties as one unit. Large binary properties are stored to the BLOBStore by default (or DataStore if configured). Setting the minimum blob size for bundle persistence managers very high decreases the performance.
Storing the data in the file system does not require a database. Depending on the file system and database, database persistence managers are sometimes slower and sometimes faster than the BundleFsPersistenceManager. When using a database, please note that embedded Java databases do not have network overhead.
Consistency / Atomicy
The database persistence managers are atomic if the database is atomic.
The current file based persistence managers are not always atomic. They do support transactions in Jackrabbit, the exception is after a crash: When the process is stopped while a transaction is being written to disk (power failure, process killed, Runtime.halt() called, VM crash), some data of a transaction may be committed and some not. Theoretically, some nodes may be corrupt (depending how and when the system crashed). The algorithms used are minimizing this risk, for example the parent node is written last so in most cases there is no problem even after a crash.
What's the PM responsibility?
The PM interface was never intended as being a general SPI that you could implement in order to integrate external datasources with proprietary formats (e.g. a customers database). the reason why we abstracted the PM interface was to leave room for future performance optimizations that would not affect the rest of the implementation (e.g. by storing the raw data in a b-tree based database instead of individual file).
How smart should a PM be?
A PM should not be 'intelligent'. It should not 'interpret' the data. The only thing a PM should care about is to efficiently, consistently and reliably store and read the data encapsulated in the passed nodeState & propertyState objects. Though it might be feasible to write a custom persistence manager to represent existing legacy data in a level-1 (read-only) repository, I don't think the same is possible for a level-2 repository. At a minimum, it certainly would not be recommended.
File System (FS)
Jackrabbit uses the org.apache.jackrabbit.core.fs.FileSystem interface as a file system abstraction. Although this interface does not cover all direct file system use of Jackrabbit, it still allows for flexibility in selecting where and how to store various parts of the repository. For example, because it is possible with Jackrabbit to configure separate file systems for different system components (e.g., global repository state, workspaces, search indexes, versioning, etc), it might make sense to store the search indexes on a fast disk and the archived node versions on a slower disk.
What combination of FS and PM is the best choice?
It depends on your priorities. If you want to store your data in a RDBMS, use BundleDbPersistenceManager in conjunction with either a LocalFileSystem or DbFileSystem. If you want to store your data in an more readily accessible format (just in case ;), you might want to try an XMLPersistenceManager paired with a LocalFileSystem.
Which are the current options? What are the status, pros and cons of each implementation?
Bundle Database PM
Depending on the database, one of the following:
org.apache.jackrabbit.core.persistence.bundle.DerbyPersistenceManager (Apache Derby; Java)
org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager (H2 Database Engine; Java)
org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager (MySQL)
org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager (PostgreSQL)
org.apache.jackrabbit.core.persistence.bundle.MSSqlPersistenceManager (MS SQL Server)
org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager (Oracle 10 or newer)
org.apache.jackrabbit.core.persistence.bundle.Oracle9PersistenceManager (Oracle 9)
org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager (generic database)
Status: mature (the default persistence manager)
JDBC based; zero-deployment: schema is automatically created
Atomic
Very fast if used with DataStore or BLOBStore
As of Jackrabbit 1.4, this persistence manager supports auto-reconnect to the database.
As of Jackrabbit 1.4, the connection can be created using a JNDI data source as well. See also UsingJNDIDataSource.
Bundle File-System PM
If the JVM process is killed the repository might turn inconsistent
Not meant to be used in production environments (except for read-only uses)
org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager
Status: mature
Very fast if used with DataStore or BLOBStore
In-Memory PM
All data is lost as soon as the repository is closed
org.apache.jackrabbit.core.persistence.mem.InMemPersistenceManager
Status: mature
For testing and small (read-only) workspaces
Keeps all content in memory
A custom binary serializer is optionally used to load and store the persistent state during initialization and shutdown of the persistence manager.
Very fast
Simple Database PM
Subclasses of org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager
Status: mature
JDBC based; zero-deployment: schema is automatically created
Atomic
Fast
ObjectPersistenceManager
If the JVM process is killed the repository might turn inconsistent
Not meant to be used in production environments
Status: obsolete, mature
Persists data in an abstract FileSystem using a simple binary serialization format
XMLPersistenceManager
If the JVM process is killed the repository might turn inconsistent
Status: obsolete, mature
Persists data in an abstract FileSystem using XML serialization format
ORMPersistenceManager
Status: obsolete, experimental & unfinished, still being maintained?
Referential integrity is possible, but not implemented
Not so easy to configure.
LocalFileSystem:
Status: mature
Slow on window boxes
MemoryFileSystem:
All data is lost as soon as the repository is closed
Status: mature
For testing and small (read-only) workspaces
Keeps all content in memory
Very fast
DbFileSystem:
Status: mature
Atomic
Meant to be used in combination with a Database Persistence Manager as repository & workspace file system