Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.



Background

Kylin will generate cuboid statistics of segments in a cube during the cube building; Besides, when optimizing and merging cubes, statistics will update too.

Cuboid statistics will help users to know about precise or estimate cuboid statistics of segments, and help users to design the cube.


The tree structure of cuboid statics

Statistics of {cube name}[{segement name}]

Total cuboids: {num}
Total precise rows: {num}
Total precise size(MB): {num}
Sampling percentage:  {num}
Mapper overlap ratio: {num}
Mapper number: {num}
Length of dimension {dimension name} is {num}
...

|---- Cuboid 111111111111111111, [precise or est] row: {num}, precise MB: {num}
    |---- Cuboid 000111111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
        |---- Cuboid 000101111111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
            |---- Cuboid 000101101111111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
                ...
        |---- Cuboid 000111110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
            |---- Cuboid 000101110011111111, [precise or est] row: {num}, precise MB: {num}, shrink: {num}%
				...


Summary

In the above tree structure of statistics for segments, CubeStatsReader will try to print out the statistics of cuboids in the segments. And If some segments were not built in a cube but had the already used "HyperLogLog" algorithm to generate estimate statics, then segment statistics will use estimate statistics to show out.

If cuboids of segments were built and had precise statistics in metadata of segments, then segments statistics will use precise statistics to show out.

Using estimate or precise statistics will with the related prefix "est" or "precise".


How to use

Check the cube whether contains precise statistics

check the cube

check the segments which contain "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" not be null

 


Not only building job, but also merging and optimizing cube will update the "cuboid_statics_rows_bytes" and "cuboid_statics_size_bytes" in segments.

Show statistics of segments in a cube

bin/kylin.sh org.apache.kylin.engine.mr.common.CubeStatsReader {cube name}
  • If Kyllin4 does not has a base cuboid for a cube, then the base cuboid "1111...111" row will be 0 and the size is 0.0 MB.
  • If a cuboid does not exist, then its children will show the shrink percentage to be "-0.0 %".
  • the command can only check a cube every time.


Example for estimate statistics

# bin/kylin.sh org.apache.kylin.engine.mr.common.CubeStatsReader kylin_sales_cube
Retrieving hadoop conf dir...
Retrieving Spark dependency...
...
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
...
============================================================================
Statistics of kylin_sales_cube[20120101000000_20130201000000]

Total cuboids: 160
Total precise rows: 828158
Total precise size(MB): 30.2807559967041
Sampling percentage:  100
Mapper overlap ratio: -1.0
Mapper number: -1
Length of dimension DEFAULT.KYLIN_SALES.BUYER_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.SELLER_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.TRANS_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.PART_DT is 3
...
|---- Cuboid 111111111111111111, est row: 5402, precise MB: 0.22
    |---- Cuboid 000111111111111111, est row: 5402, est MB: 0.22, shrink: 100%
        |---- Cuboid 000101111111111111, est row: 5402, est MB: 0.21, shrink: 100%
            |---- Cuboid 000101101111111111, est row: 5402, est MB: 0.2, shrink: 100%
                |---- Cuboid 000101001111111111, est row: 5402, est MB: 0.19, shrink: 100%


Example for precise statistics

# bin/kylin.sh org.apache.kylin.engine.mr.common.CubeStatsReader kylin_sales_cube
Retrieving hadoop conf dir...
Retrieving Spark dependency...
...
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
...
============================================================================
Statistics of kylin_sales_cube[20120101000000_20130201000000]

Total cuboids: 160
Total precise rows: 828158
Total precise size(MB): 30.2807559967041
Sampling percentage:  100
Mapper overlap ratio: -1.0
Mapper number: -1
Length of dimension DEFAULT.KYLIN_SALES.BUYER_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.SELLER_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.TRANS_ID is 4
Length of dimension DEFAULT.KYLIN_SALES.PART_DT is 3
...
|---- Cuboid 111111111111111111, precise row: 0, precise MB: 0
    |---- Cuboid 000111111111111111, precise row: 5402, precise MB: 0.22, shrink: -0%
        |---- Cuboid 000101111111111111, precise row: 5402, precise MB: 0.21, shrink: 100%
            |---- Cuboid 000101101111111111, precise row: 5402, precise MB: 0.2, shrink: 100%
                |---- Cuboid 000101001111111111, precise row: 5402, precise MB: 0.19, shrink: 100%




  • No labels