GROUP BY Clause
Use the GROUP BY
clause with aggregate functions in a SELECT
statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
.
ROLLUP
is an extension to the GROUP BY
clause. ROLLUP
performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate‑expression }[,...]
Arguments
/*+GBYTYPE(algorithm)*/
|
Specifies which algorithm has precedence for implementing this
For more information about both algorithms, see GROUP BY Implementation Options. |
expression |
Any expression, including constants and column references in the tables specified in the FROM clause. For example: column,... column, (expression) |
aggregate‑expression |
An ordered list of columns, expressions, You can include You cannot include any aggregates within a You can append multiple GROUP BY a,b,c,d, ROLLUP(a,b) GROUP BY a,b,c,d, CUBE((a,b),c,d) GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d) GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c) GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b)) GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),()) |
Usage Considerations
- expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and
ROLLUP
to return summary values for each group. - When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the
SELECT
list. - If the
GROUP BY
clause includes aWHERE
clause, Vertica ignores all rows that do not satisfy theWHERE
clause.
Examples
This example shows how to use the WHERE
clause with GROUP BY
. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY
clause uses the ILIKE
function to retrieve only last names beginning with S. The aggregate function SUM
computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days) FROM employee_dimension WHERE employee_last_name ILIKE 'S%' GROUP BY employee_last_name; employee_last_name | SUM --------------------+------ Sanchez | 2892 Smith | 2672 Stein | 2660 (3 rows)
The GROUP BY
clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal" FROM vendor_dimension GROUP BY vendor_region; vendor_region | Biggest Deal ---------------+-------------- East | 990889 MidWest | 699163 NorthWest | 76101 South | 854136 SouthWest | 609807 West | 964005 (6 rows)
The following query modifies the previous one with a HAVING
clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal" FROM vendor_dimension GROUP BY vendor_region HAVING MAX(deal_size) > 900000; vendor_region | Biggest Deal ---------------+-------------- East | 990889 West | 964005 (2 rows)
You can use the GROUP BY
clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]); => COPY employees FROM STDIN; 42|Physics|[US-7376,DARPA-1567]|[65000,135000] 36|Physics|[US-7376,DARPA-1567]|[10000,25000] 33|Physics|[US-7376]|[30000] 36|Astronomy|[US-7376,DARPA-1567]|[5000,4000] \. => SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department; department | grants | SUM ------------+--------------------------+-------- Physics | ["US-7376","DARPA-1567"] | 235000 Astronomy | ["US-7376","DARPA-1567"] | 9000 Physics | ["US-7376"] | 30000 (3 rows)
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;