Data Aggregation

You can use functions such as SUM and COUNT to aggregate the results of GROUP BY queries at one or more levels.

Aggregating Data at a Single Level

The simplest GROUP BY queries aggregate data at a single level. For example, a table might contain the following information about family expenses:

Table data might look like this:

=> SELECT * FROM expenses ORDER BY Category;
 Year |  Category  | Amount
------+------------+--------
2005  | Books      |  39.98
2007  | Books      |  29.99
2008  | Books      |  29.99
2006  | Electrical | 109.99
2005  | Electrical | 109.99
2007  | Electrical | 229.98

You can use aggregate functions to get the total expenses per category or per year:

=> SELECT SUM(Amount), Category FROM expenses GROUP BY Category;
 SUM     | Category
---------+------------
  99.96  | Books
 449.96  | Electrical
=> SELECT SUM(Amount), Year FROM expenses GROUP BY Year;
 SUM    | Year
--------+------
 149.97 | 2005
 109.99 | 2006
  29.99 | 2008
 259.97 | 2007

Aggregating Data at Multiple Levels

Over time, tables that are updated frequently can contain large amounts of data. Using the simple table shown earlier, suppose you want a multilevel query, like the number of expenses per category per year.

The following query uses the ROLLUP aggregation with the SUM function to calculate the total expenses by category and the overall expenses total. The NULL fields indicate subtotal values in the aggregation.

Using the ORDER BY clause orders the results by expense category, the year the expenses took place, and the GROUP BY level that the GROUPING_ID function creates:

=> SELECT Category, Year, SUM(Amount) FROM expenses 
   GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
 Category   | Year |  SUM
------------+------+-------- 
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99 
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96
            |      | 549.92

Similarly, the following query calculates the total sales by year and the overall sales total and then uses the ORDER BY clause to sort the results:

=> SELECT Category, Year, SUM(Amount) FROM expenses  
   GROUP BY ROLLUP(Year, Category) ORDER BY 2, 1, GROUPING_ID();
  Category  | Year |  SUM 
------------+------+-------- 
 Books      | 2005 |  39.98
 Electrical | 2005 | 109.99
            | 2005 | 149.97
 Electrical | 2006 | 109.99
            | 2006 | 109.99
 Books      | 2007 |  29.99
 Electrical | 2007 | 229.98
            | 2007 | 259.97
 Books      | 2008 |  29.99
            | 2008 |  29.99
            |      | 549.92
(11 rows)

You can use the CUBE aggregate to perform all possible groupings of the category and year expenses. The following query returns all possible groupings, ordered by grouping:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
 Category   | Year |  SUM 
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
            |      | 549.92

The results include subtotals for each category and each year and a total ($549.92) for all transactions, regardless of year or category.

ROLLUP, CUBE, and GROUPING SETS generate NULL values in grouping columns to identify subtotals. If table data includes NULL values, differentiating these from NULL values in subtotals can sometimes be challenging.

In the preceding output, the NULL values in the Year column indicate that the row was grouped on the Category column, rather than on both columns. In this case, ROLLUP added the NULL value to indicate the subtotal row.

To distinguish subtotal rows from NULL values that are part of the input data, use the GROUPING function.