Differences between revisions 1 and 2
Revision 1 as of 2013-02-05 15:31:06
Size: 4726
Comment: imported content
Revision 2 as of 2013-11-27 19:41:59
Size: 4334
Comment: -XX:+DisableExplicitGC, plus some general revisions
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#summary A collection of tips and tricks for improving LMF performance in different contexts #summary A collection of tips and tricks for improving Marmotta performance in different contexts
Line 5: Line 5:
This page is intended as a collection of tips and tricks on how to improve the LMF performance for different situations. The LMF default configuration is meant to run on moderate machines and laptops, and therefore a bit conservative. If you have a high-end server with a lot of resources (processors, memory), you should consider the following improvements: This page is intended as a collection of tips and tricks on how to improve Marmotta performance for different situations. The default configuration is meant to run on moderate machines and laptops, and therefore a bit conservative. If you have a high-end server with a lot of resources (processors, memory), you should consider the following improvements:
Line 9: Line 9:
The LMF distribution comes with a big collection of components. If you don't need these functionalities, consider turning off or removing them. Removing a component is as easy as removing the jar file from the WEB-INF/lib directory. The following components are good candidates:
  * lmf-versioning: carries out versioning after each transaction and thus roughly doubles the time needed for committing data to the database; can be turned off by setting the configuration option "versioning.enabled" to false
  * lmf-search: sends all updated resources to indexing in SOLR, which can be very I/O intensive and also require lookups to the Linked Data Cloud; can be turned off by setting the configuration option "solr.enabled" to false
  * lmf-reasoner: if reasoning rules are installed, the reasoner will be triggered by each transaction to run the rules over updated triples; can be disabled by setting reasoner.enabled to false or by not installing any rules
The Marmotta distribution comes with a big collection of components. If you don't need these functionalities, consider turning off or removing them. Removing a component is as easy as removing the dependency in the {{{pom.xml}}} or removing the jar file from the {{{WEB-INF/lib}}} directory. The following components are good candidates:
  * marmotta-versioning: carries out versioning after each transaction and thus roughly doubles the time needed for committing data to the database; can be turned off by setting the configuration option "versioning.enabled" to false
  * marmotta-reasoner: if reasoning rules are installed, the reasoner will be triggered by each transaction to run the rules over updated triples; can be disabled by setting reasoner.enabled to false or by not installing any rules
Line 16: Line 15:
In production environments with big amounts of data, you should use the PostgreSQL database instead of the embedded H2 database. Postgres itself offers considerably better performance, and the LMF implements a number of optimizations that make use of PostgreSQL features, e.g. native SPARQL querying. In production environments with big amounts of data, you should use the PostgreSQL database instead of the embedded H2 database. PostgreSQL itself offers considerably better performance, and Marmotta implements a number of optimizations that make use of PostgreSQL features, e.g. native SPARQL querying. Therefore, '''''PostgreSQL is the recommended database''''' for real usages.
Line 18: Line 17:
The database can be changed in the Configuration section of the LMF Core module. Please create a Postgres database and LMF user and then reconnect the LMF to this database. A restart is not required. Currently, however, the data from the embedded database is NOT copied to the new database, so you would need to re-import your data. The database can be changed in the configuration section at the core module. . A restart is not required, the connection is changed in live. Currently, however, the data from the embedded database is NOT copied to the new database, so you would need to manually re-import your data.
Line 24: Line 23:
The LMF is highly parallelized and thread safe and on appropriate machines making use of parallel task execution (e.g. imports, indexing, querying) may result in a considerable performance improvement.

For the GeoNames import, I partition my data into 8 parts that are imported by 8 threads, so 8 is a good number, but on high-end machines you can also experiment with 20 parallel imports. The improvement will be more or less linear until other hardware limits are reached (I/O, number of processors/cores), i.e. 8 threads = 8 times faster.

Note that the Semantic Search Indexer by default already uses 4 parallel threads for indexing resources stored in the LMF. In extreme situations you can improve performance a bit by setting the solr.threads option to a higher value.
Marmotta is highly parallelized and thread safe and on appropriate machines making use of parallel task execution (e.g. imports, indexing, querying) may result in a considerable performance improvement.
Line 33: Line 27:
The default installation for the LMF configures 1GB of memory for the LMF. If you have enough memory, increasing this value will give the LMF the opportunity to work with bigger transactions and do more caching and thus reduce the amount of I/O requests needed. You can increase the value by  changing the -Xmx setting in setenv.sh. On good machines, you can set it to 12GB or even higher. The minimum recommended, and what usually comes by default in the installed, is 1GB of memory for Marmotta. If you have enough memory, increasing this value will give Marmotta the opportunity to work with bigger transactions and do more caching and thus reduce the amount of I/O requests needed. You can increase the value by changing the -Xmx setting. On good machines, you can set it to 8GB or even higher.
Line 35: Line 29:
To tune the caching settings, look for the two files ehcache-lmf.xml and ehcache-db.xml and update the settings for the different caches there. You can monitor cache usage in the Admin Interface, LMF Core, System menu. To tune the caching settings, take a look to {{{ehcache*.xml}}} files and update the settings for the different caches there. You can monitor cache usage in the Admin Interface: Core, System menu.

For avoiding issues in some big data scenarios, due we guess some third-party libraries could be doing a [[http://sumitpal.wordpress.com/2010/05/06/how-we-solved-gc-every-1-minute-on-tomcat/|wrong usage of explicit garbage collection]], we recommend to add {{{-XX:+DisableExplicitGC}}} to your JVM options.
Line 39: Line 35:
Given the tuning options described above, it is obvious that the LMF can benefit significantly from faster hardware if configured properly. If you intend to run on high-end hardware, take into account the following parameters: Given the tuning options described above, it is obvious that Marmotta can benefit significantly from faster hardware if configured properly. If you intend to run on high-end hardware, take into account the following parameters:
Line 41: Line 37:
  * I/O performance: this is the most important parameter; run the PostgreSQL database and LMF Home directory on a Solid State Disk and you will get dramatic performance improvements
  * memory: up to a certain level, increasing your memory can help, but the effect will not be as big as the previous two options. More than 12GB or RAM for the LMF are rarely reasonable.
  * I/O performance: this is the most important parameter; run the PostgreSQL database and Marmotta Home directory on a Solid State Disk and you will get dramatic performance improvements
  * memory: up to a certain level, increasing your memory can help, but the effect will not be as big as the previous two options. More than 12GB or RAM for Marmotta are rarely reasonable.

Introduction

This page is intended as a collection of tips and tricks on how to improve Marmotta performance for different situations. The default configuration is meant to run on moderate machines and laptops, and therefore a bit conservative. If you have a high-end server with a lot of resources (processors, memory), you should consider the following improvements:

Disabling Unnecessary Components

The Marmotta distribution comes with a big collection of components. If you don't need these functionalities, consider turning off or removing them. Removing a component is as easy as removing the dependency in the pom.xml or removing the jar file from the WEB-INF/lib directory. The following components are good candidates:

  • marmotta-versioning: carries out versioning after each transaction and thus roughly doubles the time needed for committing data to the database; can be turned off by setting the configuration option "versioning.enabled" to false
  • marmotta-reasoner: if reasoning rules are installed, the reasoner will be triggered by each transaction to run the rules over updated triples; can be disabled by setting reasoner.enabled to false or by not installing any rules

Using PostgreSQL

In production environments with big amounts of data, you should use the PostgreSQL database instead of the embedded H2 database. PostgreSQL itself offers considerably better performance, and Marmotta implements a number of optimizations that make use of PostgreSQL features, e.g. native SPARQL querying. Therefore, PostgreSQL is the recommended database for real usages.

The database can be changed in the configuration section at the core module. . A restart is not required, the connection is changed in live. Currently, however, the data from the embedded database is NOT copied to the new database, so you would need to manually re-import your data.

In the default configuration, PostgreSQL is not configured for good performance. In order to get a decent database performance, improve PostgreSQL performance settings (see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server). Try setting at least shared_buffers to 256M, work_mem to a higher value, and optionally also play with the synchronous_commit option.

Multithreaded Task Execution

Marmotta is highly parallelized and thread safe and on appropriate machines making use of parallel task execution (e.g. imports, indexing, querying) may result in a considerable performance improvement.

Memory and Caching

The minimum recommended, and what usually comes by default in the installed, is 1GB of memory for Marmotta. If you have enough memory, increasing this value will give Marmotta the opportunity to work with bigger transactions and do more caching and thus reduce the amount of I/O requests needed. You can increase the value by changing the -Xmx setting. On good machines, you can set it to 8GB or even higher.

To tune the caching settings, take a look to ehcache*.xml files and update the settings for the different caches there. You can monitor cache usage in the Admin Interface: Core, System menu.

For avoiding issues in some big data scenarios, due we guess some third-party libraries could be doing a wrong usage of explicit garbage collection, we recommend to add -XX:+DisableExplicitGC to your JVM options.

High-Performance Hardware

Given the tuning options described above, it is obvious that Marmotta can benefit significantly from faster hardware if configured properly. If you intend to run on high-end hardware, take into account the following parameters:

  • number of processors/cores: can affect the number of threads you can run in parallel to perform tasks
  • I/O performance: this is the most important parameter; run the PostgreSQL database and Marmotta Home directory on a Solid State Disk and you will get dramatic performance improvements
  • memory: up to a certain level, increasing your memory can help, but the effect will not be as big as the previous two options. More than 12GB or RAM for Marmotta are rarely reasonable.

PerformanceTuning (last edited 2013-11-27 19:41:59 by SergioFernandez)