Differences between revisions 18 and 19
Revision 18 as of 2013-10-30 04:06:59
Size: 3959
Editor: HyunsikChoi
Comment:
Revision 19 as of 2014-03-06 04:56:47
Size: 0
Editor: HyunsikChoi
Comment: moved to new doc.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Prerequisites =
 * Hadoop 2.0.3-alpha or 2.0.5-alpha
 * Java 1.6 or higher
 * Protocol buffer 2.4.1

= Build Tajo from Source Code =
 
Download the source code and build Tajo as follows:

{{{
$ git clone https://git-wip-us.apache.org/repos/asf/incubator-tajo.git tajo
$ cd tajo
$ mvn package -DskipTests -Pdist -Dtar
$ ls tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz
}}}

If you meet some errors or you want to know the build instruction in more detail, please read [[BuildInstruction|Build Instruction]].

= Unpack tarball =

You should unpack the tarball (refer to build instruction).

{{{
$ tar xzvf tajo-0.2.0-SNAPSHOT.tar.gz
}}}

This will result in the creation of subdirectory named tajo-x.y.z-SNAPSHOT. You MUST copy this directory into the same directory on all cluster nodes.

<<Anchor(Configuration)>>
= Configuration =
First of all, you need to add the environment variables to '''''conf/tajo-env.sh'''''.
{{{
# Hadoop home. Required
export HADOOP_HOME= ...

# The java implementation to use. Required.
export JAVA_HOME= ...
}}}

= Running Tajo =
To launch the tajo master, execute start-tajo.sh.
{{{
$ $TAJO_HOME/bin/start-tajo.sh
}}}

After then, you can use tajo-cli to access the command line interface of Tajo. If you want to how to use tsql, read [[https://wiki.apache.org/tajo/tsql|Tajo Interactive Shell]] document.
{{{
$ $TAJO_HOME/bin/tsql
}}}

If you type \? on tsql, you can see help documentation.

= First Query Execution =
First of all, we need to prepare some data for query execution. For example, you can make a simple text-based table as follows:
{{{
$ mkdir /home/x/table1
$ cd /home/x/table1
$ cat > data.csv
1|abc|1.1|a
2|def|2.3|b
3|ghi|3.4|c
4|jkl|4.5|d
5|mno|5.6|e
<CTRL + D>
}}}

This schema of this table is (int, text, float, text).

{{{
$ $TAJO_HOME/bin/tsql

tajo> create external table table1 (id int, name text, score float, type text) using csv with ('csvfile.delimiter'='|') location 'file:/home/x/table1';
}}}

In order to load an external table, you need to use 'create external table' statement. In the location clause, you should use the absolute directory path with an appropriate scheme. If the table resides in HDFS, you should use 'hdfs' instead of 'file'.

If you want to know DDL statements in more detail, please see [[QueryLanguage|Query Language]].
{{{
tajo> \d
table1
}}}

'\d' command shows the list of tables.

{{{
tajo> \d table1

table name: table1
table path: file:/home/x/table1
store type: CSV
number of rows: 0
volume (bytes): 78 B
schema:
id INT
name TEXT
score FLOAT
type TEXT
}}}

'\d [table name]' command shows the description of a given table.

Also, you can execute SQL queries as follows:

{{{
tajo> select * from table1 where id > 2;
final state: QUERY_SUCCEEDED, init time: 0.069 sec, response time: 0.397 sec
result: file:/tmp/tajo-hadoop/staging/q_1363768615503_0001_000001/RESULT, 3 rows ( 35B)

id, name, score, type
- - - - - - - - - - - - -
3, ghi, 3.4, c
4, jkl, 4.5, d
5, mno, 5.6, e

tajo>
}}}

= Distributed mode on HDFS cluster =
Add the following configs to tajo-site.xml file.

{{{
  <property>
    <name>tajo.rootdir</name>
    <value>hdfs://hostname:port/tajo</value>
  </property>

  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>hostname:26001</value>
  </property>

  <property>
    <name>tajo.catalog.client-rpc.address</name>
    <value>hostname:26005</value>
  </property>
}}}

If you want to know Tajo's configuration in more detail, see [[Configuration]] page.

Before launching the tajo, you should create the tajo root dir and set the permission as follows:
{{{
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tajo
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tajo
}}}

Then, execute start-tajo.sh
{{{
$ $TAJO_HOME/bin/start-tajo.sh
}}}

Enjoy Apache Tajo!