Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Kylin 4.0 use fair scheduler as spark scheduler mode in query module. The fair scheduler supports grouping jobs into pools, and setting different scheduling options (e.g. weight) for each pool.This can be useful to create a “high-priority” pool for more important jobs, for example, or to group the jobs of each user together and give users equal shares regardless of how many concurrent jobs they have instead of giving jobs equal shares.  For more about spark fair scheduler, you can refer to https://spark.apache.org/docs/latest/job-scheduling.html.

In kylin4.0, user can config different spark pool in project level and sql level, and the configuration priority of SQL level is higher than that of project level. There are four spark pool in Kylin, which are 'query_pushdown', 'heavy_tasks', 'lightweight_tasks' and 'vip_tasks'.

If the user does not specify spark pool at both the SQL level and the project level, kylin will automatically adjust the spark pool used by SQL according to some rules.

Here are some examples of usage:

  • 1、Config spark pool at the project level

     Then all query in this project will use the 'vip_tasks' pool to execute:

    


  • 2、Override spark pool at SQL level

     

     Then this sql will use the 'heavy_tasks' pool to execute     

      


  • 3、SQL pushed down to spark  will use 'query_pushdown' pool to execute


  • 4、Without any configuration, the SQL will be allocated to 'lightweight_tasks'

     



  • No labels