GROUP BY Clause

Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT.

ROLLUP is an extension to the GROUP BY clause. ROLLUP performs subtotal aggregations.

Syntax

GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate‑expression }[,…]

Arguments

/*+GBYTYPE(algorithm)*/

Specifies which algorithm has precedence for implementing this GROUP BY clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm to one of the following values:

  • HASH: GROUPBY HASH algorithm
  • PIPE: GROUPBY PIPELINED algorithm

For more information about both algorithms, see GROUP BY Implementation Options.

expression

Any expression, including constants and column references in the tables specified in the FROM clause. For example:

column,… column, (expression)
aggregate‑expression

An ordered list of columns, expressions, CUBE, GROUPING SETS, or ROLLUP aggregates.

You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate. CUBE and ROLLUP aggregates can result in a large amount of output. In that case, use GROUPING SETS to return only certain results.

You cannot include any aggregates within a CUBE or ROLLUP expression.

You can append multiple GROUPING SETS, CUBE, or ROLLUP aggregates in the same query. Examples:

GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())

Usage Considerations

  • expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP to return summary values for each group.
  • When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT list.
  • If the GROUP BY clause includes a WHERE clause, Vertica ignores all rows that do not satisfy the WHERE clause.

Examples

This example shows how to use the WHERE clause with GROUP BY. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY clause uses the ILIKE function to retrieve only last names beginning with S. The aggregate function SUM computes the total vacation days for each group.

=> SELECT employee_last_name, SUM(vacation_days)
   FROM employee_dimension
   WHERE employee_last_name ILIKE 'S%'
   GROUP BY employee_last_name;
 employee_last_name | SUM
--------------------+------
 Sanchez            | 2892
 Smith              | 2672
 Stein              | 2660
(3 rows)

The GROUP BY clause in the following example groups results by vendor region, and vendor region's biggest deal:

=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 MidWest       |       699163
 NorthWest     |        76101
 South         |       854136
 SouthWest     |       609807
 West          |       964005
(6 rows)

The following query modifies the previous one with a HAVING clause, which specifies to return only groups whose maximum deal size exceeds $900,000:

=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region
   HAVING MAX(deal_size) > 900000;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 West          |       964005
(2 rows)

You can use the GROUP BY clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].

=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);

=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.

=> SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department;
 department |          grants          |  SUM
------------+--------------------------+--------
 Physics    | ["US-7376","DARPA-1567"] | 235000
 Astronomy  | ["US-7376","DARPA-1567"] |   9000
 Physics    | ["US-7376"]              |  30000
(3 rows)	

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT. For example, the following two queries return the same results:

=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;

See Also