Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
linenumberstrue
git clone https://github.com/apache/kylin.git -b kylin-on-parquet-v2
# Compile 
mvn clean install -DskipTests

...

The environment on

...

dev machine

Install Maven

The latest maven can be found at http://maven.apache.org/download.cgi, we create a symbolic symbol link so that mvn can be run anywhere.

...

Manually install the Spark binary in in a local folder like /usr/local/spark. Kylin  support supports the community version of Spark. You can go to apache spark's official website and download spark2spark 2.4.6.

How to Debug

There are two modes to debug source code: Debug with local metadata(recommend) and debug with hadoop recommended), or debug with Hadoop sandbox.

Configuration

Debug with local metadata 

...

   VM options "-Dspark.local=true"   is is for query engine.

Debug with Hadoop sandbox

Local configuration must be modified to point to your hadoop Hadoop sandbox (or CLI) machine.

  • In examples/test_case_data/sandbox/kylin.properties
    • Find sandbox and replace with your hadoop Hadoop hosts (if you’re using HDP sandbox, this can be skipped)
    • Find kylin.job.use-remote-cli and change it to “true” (in the code repository the default is false, which assume assumes running it on hadoop Hadoop CLI)
    • Find kylin.job.remote.cli.username and kylin.job.remote.cli.password, fill in the user name and password used to login hadoop Hadoop cluster for hadoop Hadoop command execution; If you’re using HDP sandbox, the default username is root and password is hadoop.
  • In examples/test_case_data/sandbox
    • For each configuration xml XML file, find all occurrences of sandbox and sandbox.hortonworks.com, replace with your hadoop Hadoop hosts; (if you’re using HDP sandbox, this can be skipped)

...

Code Block
languagebash
cd webapp
npm install -g bower
bower --allow-root install


If you encounter a network problem when run “bower install”, you may try:

...

Note, if on Windows, after install installing bower, need to add the path of “bower.cmd” to system environment variable ‘PATH’, and then run:

...

In IDE, launch org.apache.kylin.rest.DebugTomcat. Please set the path of the “server” module as the “Working directory”, set “kylin-server” for “Use classpath of module”, and check the “Include dependencies with ‘Provided’ scope” option in IntelliJ IDEA 2018. If you’re using IntelliJ IDEA 2017 and older, you need modify “server/kylin-server.iml” file, replace all “PROVIDED” to “COMPILE”, otherwise an a “java.lang.NoClassDefFoundError: org/apache/catalina/LifecycleListener” error may be thrown..

You may also need to tune the VM options:

Code Block
-Dhdp.version=2.4.0.0-169 -DSPARK_HOME=/usr/local/spark -Dkylin.hadoop.conf.dir=/workspace/kylin/examples/test_case_data/sandbox -Xms800m -Xmx800m -XX:PermSize=64M -XX:MaxNewSize=256m -XX:MaxPermSize=128m 

Also remeber remember that if you debug with local mode, you should add a VM option for the query engine:

Code Block
-Dspark.local=true

...

By default Kylin server will listen on the 7070 port; If you want to use another port, please specify it as a parameter when run DebugTomcat.

...