Differences between revisions 15 and 16
Revision 15 as of 2013-10-21 10:41:06
Size: 5229
Editor: HyunsikChoi
Comment:
Revision 16 as of 2014-03-06 04:52:19
Size: 0
Editor: HyunsikChoi
Comment: moved to new doc.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Preliminary =
== catalog-site.xml and tajo-site.xml ==
Tajo's configuration is based on Hadoop's configuration system. Tajo uses two config files:

 * catalog-site.xml - configuration for the catalog server.
 * tajo-site.xml - configuration for other tajo modules.

Each config consists of a pair of a name and a value. If you want to set the config name a.b.c with the value 123, add the following element to an appropriate file.

{{{
  <property>
    <name>a.b.c</name>
    <value>123</value>
  </property>
}}}

Tajo has a variety of internal configs. If you don't set some config explicitly, the default config will be used for for that config. Tajo is designed to use only a few of configs in usual cases. You may not be concerned with the configuration.

In default, there is no tajo-site.xml in ${TAJO}/conf directory. If you set some configs, first copy $TAJO_HOME/conf/tajo-site.xml.templete to tajo-site.xml. Then, add the configs to your tajo-site.xml.

== tajo-env.sh ==
'''''tajo-env.sh''''' is a shell script file. The main purpose of this file is to set shell environment variables for TajoMaster and TajoWorker java program. So, you can set some variable as follows:
{{{
VARIABLE=value
}}}

If a value is a literal string, type this as follows:
{{{
VARIABLE='value'
}}}

= TajoMaster Configuration =

== Tajo Rootdir Setting ==
Tajo uses HDFS as a primary storage layer. So, one Tajo cluster instance should have one tajo rootdir. A user is allowed to specific your tajo rootdir as follows:
{{{
  <property>
    <name>tajo.rootdir</name>
    <value>hdfs://namenode_hostname:port/path</value>
  </property>
}}}

Tajo rootdir must be a url form like scheme://hostname:port/path. The current implementaion only supports hdfs:// and file:// schemes. The default value is ''file:///tmp/tajo-${user.name}/''.

== TajoMaster Heap Memory Size ==
The environment variable ''TAJO_MASTER_HEAPSIZE'' in '''''conf/tajo-env.sh''''' allow Tajo Master to use the specified heap memory size.

If you want to adjust heap memory size, set TAJO_MASTER_HEAPSIZE variable in '''''conf/tajo-env.sh''''' with a proper size as follows:
{{{
TAJO_MASTER_HEAPSIZE=2000
}}}

The default size is 1000 (1GB).


= Tajo Worker Configuration =

== Worker Heap Memory Size ==
The environment variable ''TAJO_WORKER_HEAPSIZE'' in '''''conf/tajo-env.sh''''' allow Tajo Worker to use the specified heap memory size.

If you want to adjust heap memory size, set TAJO_WORKER_HEAPSIZE variable in '''''conf/tajo-env.sh''''' with a proper size as follows:
{{{
TAJO_WORKER_HEAPSIZE=8000
}}}

The default size is 1000 (1GB).
== Temporary Data Directory ==
TajoWorker stores temporary data on local file system due to out-of-core algorithms. It is possible to specify one or more temporary data directories where temporary data will be stored.

'''''tajo-site.xml'''''
{{{
  <property>
    <name>tajo.worker.tmpdir.locations</name>
    <value>/disk1/tmpdir,/disk2/tmpdir,/disk3/tmpdir</value>
  </property>
}}}

== Maximum number of parallel running tasks for each worker ==
Each worker can execute multiple tasks at a time. Tajo allows users to specify the maximum number of parallel running tasks for each worker.

'''''tajo-site.xml'''''
{{{
  <property>
    <name>tajo.worker.parallel-execution.max-num</name>
    <value>12</value>
  </property>
}}}

= Catalog Configuration =
If you want to customize the catalog service, copy $TAJO_HOME/conf/catalog-site.xml.templete to catalog-site.xml. Then, add the following configs to catalog-site.xml. Note that the default configs are enough to launch Tajo cluster in most cases.

 * tajo.catalog.master.addr - If you want to launch a catalog server separately, specify this address. This config has a form of hostname:port. Its default value is 0.0.0.0:9002.
 * tajo.catalog.store.class - If you want to change the persistent storage of the catalog server, specify the class name. Its default value is tajo.catalog.store.DerbyStore. In the current version, Tajo provides three persistent storage classes as follows:

  * tajo.catalog.store.DerbyStore - this storage class uses Apache Derby.
  * tajo.catalog.store.MySQLStore - this storage class uses MySQL.
  * tajo.catalog.store.MemStore - this is the in-memory storage. It is only used in unit tests to shorten the duration of unit tests.

<<Anchor(DefaultPortNumbers)>>
= RPC/Http Service Configuration and Default Addresses =
== Tajo Master ==
||Service Name||Config Property Name||Description||default address||
||Tajo Master Umbilical Rpc||tajo.master.umbilical-rpc.address|| ||localhost:26001||
||Tajo Master Client Rpc||tajo.master.client-rpc.address|| ||localhost:26002||
||Tajo Master Info Http||tajo.master.info-http.address|| ||0.0.0.0:26080||
||Tajo Catalog Client Rpc||tajo.catalog.client-rpc.address|| ||localhost:26005||

== Worker ==
||Service Name||Config Property Name||Description||default address||
||Tajo Worker Peer Rpc||tajo.worker.peer-rpc.address|| ||0.0.0.0:28091||
||Tajo Worker Client Rpc||tajo.worker.client-rpc.address|| ||0.0.0.0:28092||
||Tajo Worker Info Http||tajo.worker.info-http.address|| ||0.0.0.0:28080||