COUNT (DISTINCT) and Other DISTINCT Aggregates
Computing a DISTINCT
aggregate generally requires more work than other aggregates. Also, a query that uses a single DISTINCT
aggregate consumes fewer resources than a query with multiple DISTINCT
aggregates.
Vertica executes queries with multiple distinct aggregates more efficiently when all distinct aggregate columns have a similar number of distinct values.
Examples
The following query returns the number of distinct values in the primary_key
column of the date_dimension
table:
=> SELECT COUNT (DISTINCT date_key) FROM date_dimension;
COUNT ------- 1826 (1 row)
This example returns all distinct values of evaluating the expression x+y
for all inventory_fact
records.
=> SELECT COUNT (DISTINCT date_key + product_key) FROM inventory_fact;
COUNT ------- 21560 (1 row)
You can create an equivalent query using the LIMIT
keyword to restrict the number of rows returned:
=> SELECT COUNT(date_key + product_key) FROM inventory_fact GROUP BY date_key LIMIT 10;
COUNT ------- 173 31 321 113 286 84 244 238 145 202 (10 rows)
This query returns the number of distinct values of date_key
in all records with the specific distinct product_key
value.
=> SELECT product_key, COUNT (DISTINCT date_key) FROM inventory_fact GROUP BY product_key LIMIT 10;
product_key | count -------------+------- 1 | 12 2 | 18 3 | 13 4 | 17 5 | 11 6 | 14 7 | 13 8 | 17 9 | 15 10 | 12 (10 rows)
This query counts each distinct product_key
value in inventory_fact
table with the constant 1.
=> SELECT product_key, COUNT (DISTINCT product_key) FROM inventory_fact GROUP BY product_key LIMIT 10;
product_key | count -------------+------- 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 6 | 1 7 | 1 8 | 1 9 | 1 10 | 1 (10 rows)
This query selects each distinct date_key
value and counts the number of distinct product_key
values for all records with the specific product_key
value. It then sums the qty_in_stock
values in all records with the specific product_key
value and groups the results by date_key
.
=> SELECT date_key, COUNT (DISTINCT product_key), SUM(qty_in_stock) FROM inventory_fact GROUP BY date_key LIMIT 10;
date_key | count | sum ----------+-------+-------- 1 | 173 | 88953 2 | 31 | 16315 3 | 318 | 156003 4 | 113 | 53341 5 | 285 | 148380 6 | 84 | 42421 7 | 241 | 119315 8 | 238 | 122380 9 | 142 | 70151 10 | 202 | 95274 (10 rows)
This query selects each distinct product_key
value and then counts the number of distinct date_key
values for all records with the specific product_key
value. It also counts the number of distinct warehouse_key
values in all records with the specific product_key
value.
=> SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key) FROM inventory_fact GROUP BY product_key LIMIT 15;
product_key | count | count -------------+-------+------- 1 | 12 | 12 2 | 18 | 18 3 | 13 | 12 4 | 17 | 18 5 | 11 | 9 6 | 14 | 13 7 | 13 | 13 8 | 17 | 15 9 | 15 | 14 10 | 12 | 12 11 | 11 | 11 12 | 13 | 12 13 | 9 | 7 14 | 13 | 13 15 | 18 | 17 (15 rows)
This query selects each distinct product_key
value, counts the number of distinct date_key
and warehouse_key
values for all records with the specific product_key
value, and then sums all qty_in_stock
values in records with the specific product_key
value. It then returns the number of product_version
values in records with the specific product_key
value.
=> SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key), SUM (qty_in_stock), COUNT (product_version) FROM inventory_fact GROUP BY product_key LIMIT 15;
product_key | count | count | sum | count -------------+-------+-------+-------+------- 1 | 12 | 12 | 5530 | 12 2 | 18 | 18 | 9605 | 18 3 | 13 | 12 | 8404 | 13 4 | 17 | 18 | 10006 | 18 5 | 11 | 9 | 4794 | 11 6 | 14 | 13 | 7359 | 14 7 | 13 | 13 | 7828 | 13 8 | 17 | 15 | 9074 | 17 9 | 15 | 14 | 7032 | 15 10 | 12 | 12 | 5359 | 12 11 | 11 | 11 | 6049 | 11 12 | 13 | 12 | 6075 | 13 13 | 9 | 7 | 3470 | 9 14 | 13 | 13 | 5125 | 13 15 | 18 | 17 | 9277 | 18 (15 rows)
The following example returns the number of warehouses from the warehouse
dimension table:
=> SELECT COUNT(warehouse_name) FROM warehouse_dimension;
COUNT ------- 100 (1 row)
This next example returns the total number of vendors:
=> SELECT COUNT(*) FROM vendor_dimension;
COUNT ------- 50 (1 row)