If you did not find the answer of your question, feel free to leave your comment under this wiki.


Question List

RowKey determined sort-by column, check detail at How to improve cube building and query performance .

                Sparder is the implemenatation of new distributed query engine which backend by a spark application. If Sparder is dead, all your query will failed. And you can check sparder's liveness after Kylin instance(Query Server) was started in application list of Resource Manager Web UI. SparderCanary tool has been added in 4.0-beta to monitor the inventory status of sparder. When the sparder dies, SparderCanary will try to pull it up automatically.

                Hadoop 3 is supported in kylin 4.0-beta, and has been verified on CDH 5.7, CDH 6.2, EMR 5.31, EMR 6.0.0, HDP2.4. Hadoop3 and EMR environments require additional configuration, please check the Installation Guide.

1. Copy all files under /etc/hadoop/conf to one directory ("/path/to/hadoop_conf").

2. Copy hive-site.xml to "/path/to/hadoop_conf".

3. Edit kylin.properties, modify kylin.env.hadoop-conf-dir=/path/to/hadoop_conf, restart Kylin.

                Please refer to Read-Write Separation Deployment for Kylin 4.0.

                It will be automatically refreshed the next time build.

                Please refer to How to clean up storage in Kylin 4.

                Cube Planner Phase1 is supported in 4.0.0-beta. Please refer to How to use Cube Planner in Kylin 4.

                Dimension dictionary is removed. The only dictionary remained in Kylin 4.0 is Global Dictionary.

                Check How to improve cube building and query performance .

                Please refer to How to improve cube building and query performance and Improve query performance by setting shard by column .

                Almost fullly compatible, except please purge segments of your cube because HBase Storage is removed now.  Kylin 4.0 remommend to use RDBMS as Metadata, please refer to Use MySQL as Metastore and How to use HBase metastore in Kylin 4.0.

                The pre-calculated cuboid data is completely incompatible, and there is no migration plan for the time being, due to relatively large effort in development.

                Spark 2.4.6 is currently supported. Other spark distribution is not supported offically.

                Please refer to Kylin 4.X Feature List.

                To be updated

                To be updated

                To be updated

                It is not supported in Kylin 4.0.0-alpha and Kylin 4.0.0-beta.

                Use different spark pool for different query

                Please refer to Global Dictionary on Spark.

                No. There are two cases will be different, show below:

1. When cube contains 'COUNT_DISTINCT' from HLL measure, Spark SQL will still calculate the accurate measure values from source data;

2. When cube contains 'PERCENTILE' measure, the algorithm used to calculate the values in Kylin 4.0 is different from the one of Spark SQL;

                No. In Kylin 4.0, if there is a TopN measure in cube, the data of 'TopN' measure will be saved in parquet file as 'ArrayType', which will lead to low reading performance,

                because Spark can't use 'VectorizedParquetRecordReader' to read parquet file when the returned schemas include 'ArrayType'. Please use the original design (dimension + sum measure) directly to execute TopN-style SQL.