Differences between revisions 12 and 13
Revision 12 as of 2013-09-30 11:14:30
Size: 6962
Editor: HyunsikChoi
Comment:
Revision 13 as of 2014-03-06 04:53:03
Size: 0
Editor: HyunsikChoi
Comment: moved to new doc.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Data Types =
Tajo provides SQL data types. TajoDataTypes describes the detailed data types.

= DDL =
== CREATE TABLE ==
Synopsis

{{{
CREATE TABLE <table_name> [(<column_name> <data_type>, ... )]
  [using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>]
}}}

== CREATE EXTERNAL TABLE ==
{{{
CREATE EXTERNAL TABLE <table_name> (<column_name> <data_type>, ... )
  using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>'
}}}

=== Compression ===

If you want to add an external table that contains compressed data, you should give 'compression.code' parameter to CREATE TABLE statement.
{{{
create EXTERNAL table lineitem (
  L_ORDERKEY bigint,
  L_PARTKEY bigint,
  ...
  L_COMMENT text)

USING csv WITH ('csvfile.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
}}}

'compression.codec' parameter can have one of the following compression codecs:
 * org.apache.hadoop.io.compress.BZip2Codec
 * org.apache.hadoop.io.compress.DeflateCodec
 * org.apache.hadoop.io.compress.GzipCodec
 * org.apache.hadoop.io.compress.SnappyCodec

== DROP TABLE ==
{{{
DROP TABLE <table_name>
}}}

= SQL Expressions =
== Arithmetic Expressions ==
== Type Casts ==
A type cast converts a specified-typed data to another-typed data. Tajo has two type cast syntax:
{{{
CAST ( expression AS type )
expression::type
}}}

== String Expressions ==
== Function Call ==
{{{
function_name ([expression [, expression ... ]] )
}}}

= DML =
== SELECT ==

{{{
SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...]
  [FROM <table name> [[AS] <table alias name>] [, ...]]
  [WHERE <condition>]
  [GROUP BY <expression> [, ...]]
  [HAVING <condition>]
  [ORDER BY <expression> [ASC|DESC] [NULL FIRST|NULL LAST] [, ...]]
}}}

=== WHERE ===

==== IN Predicate ====
''IN predicate'' provides row and array comparison.

{{{
column_reference IN (val1, val2, ..., valN)
column_reference NOT IN (val1, val2, ..., valN)
}}}

Examples are as follows:

{{{
-- this statement filters lists down all the records where col1 value is 1, 2 or 3:
SELECT col1, col2 FROM table1 WHERE col1 IN (1, 2, 3);

-- this statement filters lists down all the records where col1 value is neither 1, 2 nor 3:
SELECT col1, col2 FROM table1 WHERE col1 NOT IN (1, 2, 3);
}}}

You can use 'IN clause' on text data domain as follows:
{{{
SELECT col1, col2 FROM table1 WHERE col2 IN ('tajo', 'hadoop');

SELECT col1, col2 FROM table1 WHERE col2 NOT IN ('tajo', 'hadoop');
}}}

<<Anchor(PatternMatching)>>
==== String Pattern Matching Predicates ====

'''''LIKE'''''

''LIKE'' operator returns true or false depending on whether its pattern matches the given string. An underscore (_) in pattern matches any single character. A percent sign (%) matches any sequence of zero or more characters.

{{{
string LIKE pattern
string NOT LIKE pattern
}}}

'''''ILIKE'''''

''ILIKE'' is the same to ''LIKE'', but it is a case insensitive operator. It is not in the SQL standard. We borrow this operator from PostgreSQL.

{{{
string ILIKE pattern
string NOT ILIKE pattern
}}}

'''''Similar To'''''

{{{
string SIMILAR TO pattern
string NOT SIMILAR TO pattern
}}}

It returns true or false depending on whether its pattern matches the given string. Also like LIKE, 'SIMILAR TO' uses '_' and '%' as metacharacters denoting any single character and any string, respectively.

In addition to these metacharacters borrowed from LIKE, 'SIMILAR TO' supports more powerful pattern-matching metacharacters borrowed from regular expressions:

||&#124;||denotes alternation (either of two alternatives).||
||*||denotes repetition of the previous item zero or more times.||
||+||denotes repetition of the previous item one or more times.||
||?||denotes repetition of the previous item zero or one time.||
||{m}||denotes repetition of the previous item exactly m times.||
||{m,}||denotes repetition of the previous item m or more times.||
||{m,n}|| denotes repetition of the previous item at least m and not more than n times.||
||[]||A bracket expression specifies a character class, just as in POSIX regular expressions.||
||()||Parentheses can be used to group items into a single logical item.||

Note that '.' is not used as a metacharacter in 'SIMILAR TO' operator.

'''''Regular expressions'''''

Regular expressions provide a very powerful means for string pattern matching. In the current Tajo, regular expressions are based on [[http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html|Java-style regular expressions]] instead of POSIX regular expression. The main difference between java-style one and POSIX's one is [[http://en.wikipedia.org/wiki/Regular_expression#Character_classes|character class]].

{{{
string ~ pattern
string !~ pattern

string ~* pattern
string !~* pattern
}}}

||Operator||Description||
||~||It returns true if a given regular expression is matched to string. Otherwise, it returns false.||
||!~||It returns false if a given regular expression is matched to string. Otherwise, it returns true.||
||~*||It is the same to '~', but it is case insensitive.||
||!~*||It is the same to '!~', but it is case insensitive.||

Some examples
{{{
'abc' ~ '.*c' true
'abc' ~ 'c' false
'aaabc' ~ '([a-z]){3}bc true
'abc' ~* '.*C' true
'abc' !~* 'B.*' true
}}}

Regular expressions operator is not in the SQL standard. We borrow this operator from PostgreSQL.

'''''REGEXP''''' and '''''RLIKE''''' are synonym operators:
{{{
string REGEXP pattern
string NOT REGEXP pattern
}}}

{{{
string RLIKE pattern
string NOT RLIKE pattern
}}}

But, they do not support case-insensitive operators.

<<Anchor(InsertInto)>>
=== INSERT (OVERWRITE) INTO ===

INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given directory. Tajo's INSERT OVERWRITE statement follows 'INSERT INTO SELECT' statement of SQL. The examples are as follows:

{{{
create table t1 (col1 int8, col2 int4, col3 float4);

-- when a target table schema and output schema are equivalent to each other
INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem;
-- or
INSERT OVERWRITE INTO t1 SELECT * FROM lineitem;

-- when the output schema are smaller than the target table schema
INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem;

-- when you want to specify certain target columns
INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem;
}}}

In addition, INSERT OVERWRITE statement overwrites table data as well as a specific directory.

{{{
INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem;
}}}