Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

1. After successfully connecting Kylin node, click the icon in the upper right corner to exit the management mode:

After exiting the management mode, the display interface is as follows, and you can create a dataset:

2. MDX for Kylin will automatically read kylin's metadata information. Here, we choose to create a dataset named coved_trip_dataset  under project covid_trip_project :

3. Select the cube in kylin and define the cube relationship.

we put covid19_cube  and  newyork_trip_cube  under the same dataset to establish the connection relationship:

The connection relationship between cubes is realized through the common dimension table between two cubes, covid_cube  and newyork_trip_cube  all use the same dimension table lookup_calendar :

4. In the Define Semantics  step, you can modify the name of atomic indicators, define business indicators, and add hierarchical dimensions, etc.

- First, in order to avoid confusion in the data analysis layer of the count measurement in the two cubes, we can modify the indicator name to a different name:

- According to the business problems mentioned above, you can define MTD, YTD, MOM and YOY for the measure in the cube by adding calculated measure in `MDX for Kylin`:

Before defining these business indicators, we first define the hierarchical for the time dimension in dimension table lookup_calender. After defining the hierarchical dimension, we can roll up and drill down on this dimension to facilitate business personnel to analyze and observe data from different granularity of the same dimension.
Add dimensions lookup_calender.year_start , lookup_calender.quarter_start 、lookup_calender.month_start and lookup_calender.day_start  to a hierarchical dimension in turn:

Similarly, we can also add hierarchical dimension for the regional dimensions in table COVID_19_ACTIVITY :

- Because we need to calculate MTD and YTD at the time level, we need to correctly set the types of each time dimension. For example, set the type of lookup_calender.day_start  to Day :


Similarly, set the type of lookup_calender.month_start  to Month , the type of lookup_calender.quarter_start to Quarter , and the type of lookup_calender.year_start  to Year .

- Then, taking SUM_PASSENGER_COUNT  as an example, MTD, YTD, MOM and YOY can be easily defined through the built-in template expression of  MDX for Kylin :

Similar to SUM_PASSENGER_COUNT, create business indicators of MTD, YTD, MOM and YOY for other atomic indicators in newyork_trip_cube .

-  After the above indicators are added, it is necessary to define semi-additive measure  for two measures SUM_DEATH_COUNT  and SUM_POSITIVE_CASES  in covid19_cube . Semi-additive measure refer to measures that are not cumulative in the time dimension but can be normally accumulated in dimensions other than time. Since the number of positive cases and the number of death count has accumulated over time, they are semi-additive measure like commodity inventory or bank balance. MDX for Kylin  supports defining semi-additive measures through calculated measure .
For example, SUM_DEATH_COUNT  in COVID_19_ACTIVITY  is defined as a semi-additive measure, and its mdx expression is:

sum([LOOKUP_CALENDAR].[TIME_HIERARCHY-Hierarchy].CurrentMember.LastChild, 
[Measures].[SUM_DEATH_COUNT])

Similarly, SUM_POSITIVE_CASES  is also defined as a semi cumulative measure:

Then define the indicators of MOM and YOY on Semiadditive_DEATH_COUNT, Semiadditive_POSITIVE_CASE, SUM_NEW_DEATH_COUNT and SUM_NEW_POSITIVE_CASES. Since the number of positive cases and the number of deaths themselves are cumulative values, there is no need to define the indicators of MTD and YTD.

- According to the indicator design of the business scenario, in addition to the indicators of YTD, MTD, YOY and MOM, some other indicators need to be defined:

1. Covid19 case fatality rate:Semiadditive_DEATH_COUNT/Semiadditive_POSITIVE_CASES

2. Average distance traveled by Newyork taxi:SUM_TRIP_DISTANCE/ORDER_COUNT

After all the above business indicators are defined, the next step is to add more translations in different languages to the indicator names, which can facilitate analysts from different countries around the world to analyze data. If there is no multilingual requirement, you can skip this step and save the dataset. After the dateset is saved successfully, the next step is to use the BI tool to connect MDX for Kylin  for convenient and fast data analysis.



  • No labels