This page is intended as a collection of tips and tricks on how to improve the LMF performance for different situations. The LMF default configuration is meant to run on moderate machines and laptops, and therefore a bit conservative. If you have a high-end server with a lot of resources (processors, memory), you should consider the following improvements:
Disabling Unnecessary Components
The LMF distribution comes with a big collection of components. If you don't need these functionalities, consider turning off or removing them. Removing a component is as easy as removing the jar file from the WEB-INF/lib directory. The following components are good candidates:
- lmf-versioning: carries out versioning after each transaction and thus roughly doubles the time needed for committing data to the database; can be turned off by setting the configuration option "versioning.enabled" to false
- lmf-search: sends all updated resources to indexing in SOLR, which can be very I/O intensive and also require lookups to the Linked Data Cloud; can be turned off by setting the configuration option "solr.enabled" to false
- lmf-reasoner: if reasoning rules are installed, the reasoner will be triggered by each transaction to run the rules over updated triples; can be disabled by setting reasoner.enabled to false or by not installing any rules
In production environments with big amounts of data, you should use the PostgreSQL database instead of the embedded H2 database. Postgres itself offers considerably better performance, and the LMF implements a number of optimizations that make use of PostgreSQL features, e.g. native SPARQL querying.
The database can be changed in the Configuration section of the LMF Core module. Please create a Postgres database and LMF user and then reconnect the LMF to this database. A restart is not required. Currently, however, the data from the embedded database is NOT copied to the new database, so you would need to re-import your data.
In the default configuration, PostgreSQL is not configured for good performance. In order to get a decent database performance, improve PostgreSQL performance settings (see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server). Try setting at least shared_buffers to 256M, work_mem to a higher value, and optionally also play with the synchronous_commit option.
Multithreaded Task Execution
The LMF is highly parallelized and thread safe and on appropriate machines making use of parallel task execution (e.g. imports, indexing, querying) may result in a considerable performance improvement.
For the GeoNames import, I partition my data into 8 parts that are imported by 8 threads, so 8 is a good number, but on high-end machines you can also experiment with 20 parallel imports. The improvement will be more or less linear until other hardware limits are reached (I/O, number of processors/cores), i.e. 8 threads = 8 times faster.
Note that the Semantic Search Indexer by default already uses 4 parallel threads for indexing resources stored in the LMF. In extreme situations you can improve performance a bit by setting the solr.threads option to a higher value.
Memory and Caching
The default installation for the LMF configures 1GB of memory for the LMF. If you have enough memory, increasing this value will give the LMF the opportunity to work with bigger transactions and do more caching and thus reduce the amount of I/O requests needed. You can increase the value by changing the -Xmx setting in setenv.sh. On good machines, you can set it to 12GB or even higher.
To tune the caching settings, look for the two files ehcache-lmf.xml and ehcache-db.xml and update the settings for the different caches there. You can monitor cache usage in the Admin Interface, LMF Core, System menu.
Given the tuning options described above, it is obvious that the LMF can benefit significantly from faster hardware if configured properly. If you intend to run on high-end hardware, take into account the following parameters:
- number of processors/cores: can affect the number of threads you can run in parallel to perform tasks
- I/O performance: this is the most important parameter; run the PostgreSQL database and LMF Home directory on a Solid State Disk and you will get dramatic performance improvements
- memory: up to a certain level, increasing your memory can help, but the effect will not be as big as the previous two options. More than 12GB or RAM for the LMF are rarely reasonable.