--- layout: default title: Docs ---

What should I use Kylin for?

If you want to do multi-dimension analysis on large data sets (billion+ rows) with low query latency (sub-seconds), Kylin is a good option. Kylin also provides good integration with existing BI tools (e.g Tableau).


Why existing SQL-on-Hadoop solutions fall short?

The existing SQL-on-Hadoop needs to scan partial or whole data set to answer a user query. Due to large data scan, many queries are very slow (minute+ latency).


What is MOLAP/ROLAP?

MOLAP (Multi-dimensional OLAP) is to pre-compute data along different dimensions of interest and store resultant values in the cube. MOLAP is much faster but is inflexible. ROLAP (Relational-OLAP) is to use star or snow-flake schema to do runtime aggregation. ROLAP is flexible but much slower.


How does Kylin support ROLAP/MOLAP?

Kylin builds data cube (MOLAP) from hive table (ROLAP) according to the metadata definition. If the query can be fulfilled by data cube, Kylin will route the query to data cube that is MOLAP. If the query can’t be fulfilled by data cube, Kylin will route the query to hive table that is ROLAP. Basically, you can think Kylin as HOLAP on top of MOLAP and ROLAP.


What does a Kylin query look like?

Kylin supports join, projection, filter, aggregation, groups and sub-query. For example:

select test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_kylin_fact.format_name, test_sites.site_name, sum(test_kylin_fact.price) as total_price, count(*) as total_count from test_kylin_fact left join test_cal_dt on test_kylin_fact.cal_dt = test_cal_dt.cal_dt left join test_category on test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id and test_kylin_fact.site_id = test_category.site_id left join test_sites on test_kylin_fact.site_id = test_sites.site_id where test_kylin_fact.seller_id = 123456 or test_kylin_fact.format_name = 'New' group by test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_kylin_fact.format_name, test_sites.site_name

What Hadoop components does it work with?

Kylin depends on HDFS, MapReduce, Hive and HBase. Hive and MapReduce is used for cube building. Hive is used for pre-join and MapReduce is used for pre-aggregation. HDFS is used to store intermediated files during cube building. HBase is used to store data cube and answer the query. HBase coprocessor is also used for query processing.


Where can I find the technical details about Kylin?

Kylin OLAP