COUNT (DISTINCT) and Other DISTINCT Aggregates

Computing a DISTINCT aggregate generally requires more work than other aggregates. Also, a query that uses a single DISTINCT aggregate consumes fewer resources than a query with multiple DISTINCT aggregates.

Vertica executes queries with multiple distinct aggregates more efficiently when all distinct aggregate columns have a similar number of distinct values.

Examples

The following query returns the number of distinct values in the primary_key column of the date_dimension table:

=> SELECT COUNT (DISTINCT date_key) FROM date_dimension;
 COUNT 
-------
  1826
(1 row)

This example returns all distinct values of evaluating the expression x+y for all inventory_fact records.

=> SELECT COUNT (DISTINCT date_key + product_key) FROM inventory_fact;
 COUNT 
-------
 21560
(1 row)

You can create an equivalent query using the LIMIT keyword to restrict the number of rows returned:

=> SELECT COUNT(date_key + product_key) FROM inventory_fact GROUP BY date_key LIMIT 10;
 COUNT 
-------
   173
    31
   321
   113
   286
    84
   244
   238
   145
   202
(10 rows)

This query returns the number of distinct values of date_key in all records with the specific distinct product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key)  FROM  inventory_fact 
   GROUP BY product_key LIMIT 10;
 product_key | count 
-------------+-------
           1 |    12
           2 |    18
           3 |    13
           4 |    17
           5 |    11
           6 |    14
           7 |    13
           8 |    17
           9 |    15
          10 |    12
(10 rows)

This query counts each distinct product_key value in inventory_fact table with the constant 1.

=> SELECT product_key, COUNT (DISTINCT product_key) FROM inventory_fact 
   GROUP BY product_key LIMIT 10; 
 product_key | count 
-------------+-------
           1 |     1
           2 |     1
           3 |     1
           4 |     1
           5 |     1
           6 |     1
           7 |     1
           8 |     1
           9 |     1
          10 |     1
(10 rows)

This query selects each distinct date_key value and counts the number of distinct product_key values for all records with the specific product_key value. It then sums the qty_in_stock values in all records with the specific product_key value and groups the results by date_key.

=> SELECT date_key, COUNT (DISTINCT product_key), SUM(qty_in_stock) FROM inventory_fact 
   GROUP BY date_key LIMIT 10;
 date_key | count |  sum   
----------+-------+--------
        1 |   173 |  88953
        2 |    31 |  16315
        3 |   318 | 156003
        4 |   113 |  53341
        5 |   285 | 148380
        6 |    84 |  42421
        7 |   241 | 119315
        8 |   238 | 122380
        9 |   142 |  70151
       10 |   202 |  95274
(10 rows)

This query selects each distinct product_key value and then counts the number of distinct date_key values for all records with the specific product_key value. It also counts the number of distinct warehouse_key values in all records with the specific product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key) FROM inventory_fact
   GROUP BY product_key LIMIT 15;
 product_key | count | count 
-------------+-------+-------
           1 |    12 |    12
           2 |    18 |    18
           3 |    13 |    12
           4 |    17 |    18
           5 |    11 |     9
           6 |    14 |    13
           7 |    13 |    13
           8 |    17 |    15
           9 |    15 |    14
          10 |    12 |    12
          11 |    11 |    11
          12 |    13 |    12
          13 |     9 |     7
          14 |    13 |    13
          15 |    18 |    17
(15 rows)

This query selects each distinct product_key value, counts the number of distinct date_key and warehouse_key values for all records with the specific product_key value, and then sums all qty_in_stock values in records with the specific product_key value. It then returns the number of product_version values in records with the specific product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key),
      COUNT (DISTINCT warehouse_key),
      SUM (qty_in_stock),
      COUNT (product_version)
      FROM inventory_fact GROUP BY product_key LIMIT 15;
 product_key | count | count |  sum  | count 
-------------+-------+-------+-------+-------
           1 |    12 |    12 |  5530 |    12
           2 |    18 |    18 |  9605 |    18
           3 |    13 |    12 |  8404 |    13
           4 |    17 |    18 | 10006 |    18
           5 |    11 |     9 |  4794 |    11
           6 |    14 |    13 |  7359 |    14
           7 |    13 |    13 |  7828 |    13
           8 |    17 |    15 |  9074 |    17
           9 |    15 |    14 |  7032 |    15
          10 |    12 |    12 |  5359 |    12
          11 |    11 |    11 |  6049 |    11
          12 |    13 |    12 |  6075 |    13
          13 |     9 |     7 |  3470 |     9
          14 |    13 |    13 |  5125 |    13
          15 |    18 |    17 |  9277 |    18
(15 rows)

The following example returns the number of warehouses from the warehouse dimension table:

=> SELECT COUNT(warehouse_name) FROM warehouse_dimension;
 COUNT 
-------
   100
(1 row)

This next example returns the total number of vendors:

=> SELECT COUNT(*) FROM vendor_dimension;
 COUNT 
-------
    50
(1 row)