OLAP operations and the SQL standard
The SQL-2003 standard includes definitions for a number of new features, which are collectively referred to as OLAP Operations. The features are grouped into several classifications, as follows:
- T611: Elementary OLAP operations
- T612: Advanced OLAP operations
- T431: Extended grouping capabilities
- T433: Multiargument GROUPING function
This page, and the other pages to which it links, are intended to contain information about OLAP operations and Derby.
T611: Elementary OLAP operations
Feature T611 of the 2003 SQL standard specifies the following items:
Window functions (ROW_NUMBER, RANK, and DENSE_RANK)
- Unnamed window specifications:
- window partitioning (PARTITION BY)
- window ordering (ORDER BY)
- window framing (ROWS/RANGE PRECEDING/FOLLOWING)
T612: Advanced OLAP operations
Feature T612 of the 2003 SQL standard specifies the following items:
- Window functions PERCENT_RANK and CUME_DIST
- Window naming
- Window clauses
- Window frame exclusions
- Width buckets
T431: Extended grouping capabilities
Feature T431 of the 2003 SQL standard specifies the following items:
T433: Multiargument GROUPING function
Feature T433 of the 2003 SQL standard specifies the following items:
If this is your first exposure to OLAP Operations in SQL, you may find it useful to start by studying some reference material. Among the documents that I have found particularly valuable are:
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, by Gray, Chaudhuri, Bosworth, Layman, Reichart, Venkatrao, Pellow, and Pirahesh. http://citeseer.ist.psu.edu/gray96data.html
Introduction to OLAP functions, by Zemke, Kulkarni, Witkowski, and Lyle. http://www.ncb.ernet.in/education/modules/dbms/SQL99/OLAP-99-154r2.pdf
Concepts and Terminology
Windows and Window Functions
A windowed table is a table together with one or more windows.
A window is a subset of the rows in the result set of the query to which a window function should be applied. A window describes how to partition the rows of the result set into one or more partitions, how to order the rows within each partition, and which particular rows to include within that window.
(PARTITION BY region ORDER BY date_of_sale ROWS UNBOUNDED PRECEDING)
Every window defines a partitioning of the rows of the <table expression>. Window partitioning is similar to forming groups of a grouped table. However, unlike grouped tables, each row is retained in the transient window data structure. The window partition of a row R is the multi-set of rows that are not distinct from R for all columns in the PARTITION BY clause (if the PARTITION BY clause is missing, the window contains all the rows in the result).
A window function is a function which can be applied to a window. Window functions include the traditional set aggregation functions (SUM, MIN, MAX, COUNT, AVG) as well as the new numbering function ROW_NUMBER, and the new ranking functions (RANK, DENSE_RANK, PERCENT_RANK, CUME_RANK).
ROW_NUMBER () OVER (PARTITION BY department ORDER BY salary DESC)
Rolling-Up and Drilling-Down
People who are analyzing transaction or event data often wish to aggregate data at multiple levels, such as providing totals, sub-totals, sub-sub-totals, etc. For example, you might wish to aggregate sales data by Territory, by Month, by Salesman, and by Product_line. This is often called multi-dimensional aggregation, because each grouping attribute (Territory, Month, Salesman, Product_line) provides a different dimension by which to analyze the data. At different occasions in the analysis you may wish to aggregate larger amounts of data into coarser groups, or you may wish to aggregate smaller amounts of data into finer groups. Reducing the number of dimensions to aggregate at a coarser level is called rolling-up the data, while increasing the number of dimensions to aggregate at a finer level is called drilling-down the data.
With traditional SQL, each SELECT statement can only specify a single GROUP BY grouping. The OLAP operations make it easier to specify these multi-dimensional aggregation queries. In addition to being easier for the user to write, such queries can be run more efficiently because the database can compute all the aggregates in a single scan of the data, rather than one scan per grouping.