
Analytic functions handle complex analysis and reporting tasks. Here are some example use cases for Vertica analytic functions:
• Rank the longest standing customers in a particular state
• Calculate the moving average of retail volume over a specific time
• Find the highest score among all students in the same class
For a full list of Vertica analytic functions, see Analytic Functions in the Vertica documentation.
Analytic vs. Aggregate Functions
Some analytic functions aggregate data, so it’s important to distinguish between analytic functions that perform aggregations and aggregate functions. The similarities and differences are summarized below:Analytic Functions | Aggregate Functions |
---|---|
Return the same number of rows as the input | Return a single summary value |
The groups of rows on which an analytic function operate are defined by window partitioning and window frame clauses. | The groups of rows on which aggregate functions operate are defined by the SQL GROUP BY clause. |
Working with the Window Clauses
In an analytic function’s OVER clause, you can specify a window (or group of rows) to analyze.The window partition clause (PARTITION BY) groups input rows before the function processes them. Window partitioning is similar to the GROUP BY clause, except that it returns only one result row per input row. If you omit the window partition clause, all input rows are treated as a single partition:
…OVER(PARTITION BY expression)
The window order clause (ORDER BY) specifies how to sort rows that are supplied to the analytic function. The clause only specifies order within a window result set. The query can have its own ORDER BY clause outside the OVER clause. The additional OVER BY clause has precedence over the window order clause and orders the final result set:
…OVER(ORDER BY expression [ASC | DESC])
The window frame of an analytic function comprises a set of rows relative to the row that is currently being evaluated. After the function processes that row and its window frame, Vertica advances the current row and adjusts the frame boundaries accordingly. If the OVER clause also specifies a partition, Vertica also checks that the frame boundaries do not cross partition boundaries. This process repeats until the function evaluates the last row of the last partition.
…OVER ( {ROWS | RANGE} {BETWEEN start-point AND end-point} | start-point)
Function Execution
Vertica executes an analytic function as follows:1. Takes the input rows that the query returns after it performs all joins and evaluates FROM, WHERE, GROUP BY, and HAVING clauses.
2. Groups the input rows according to the window partition clause (PARTITION BY). If this clause is omitted, all input rows are treated as a single partition.
3. Sorts rows within each partition according to the window order clause (ORDER BY).
4. If the OVER clause includes a window order clause, the function checks for a window frame clause and executes it as it processes each input row. If the OVER clause omits a window frame clause, the function treats the entire partition as a window frame.
Examples
The following examples use the Vertica VMart example database that ships with all Vertica versions.Within each window partition, the RANK analytic function ranks all rows in the query results set according to the order specified in the window’s ORDER BY clause. For example, let’s say we want to rank the longest-standing customers in a particular state. This example uses the customer_dimension table in the VMart sample database to rank by state all companies that have been customers since 2007:
=> SELECT customer_state, customer_name, customer_since,
RANK() OVER(PARTITION BY customer_state ORDER BY customer_since) AS rank
FROM customer_dimension
WHERE customer_type='Company' AND customer_since > '01/01/2007'
ORDER BY customer_state
LIMIT 10;
customer_state | customer_name | customer_since | rank
---------------+---------------+----------------+------
AZ | Foodshop | 2007-01-20 | 1
AZ | Goldstar | 2007-08-11 | 2
CA | Metahope | 2007-01-05 | 1
CA | Foodgen | 2007-02-05 | 2
CA | Infohope | 2007-02-09 | 3
CA | Foodcom | 2007-02-19 | 4
CA | Amerihope | 2007-02-22 | 5
CA | Infostar | 2007-03-05 | 6
CA | Intracare | 2007-03-14 | 7
CA | Infocare | 2007-04-07 | 8
(10 rows)
The AVG analytic function computes an average of an expression in a group within a window. For example, the following query uses various tables in the VMart database to find the sales for that calendar month and returns a running average using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW.
=> SELECT calendar_month_number_in_year, SUM(product_price) AS sales,
AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year)
as average_sales
FROM product_dimension, date_dimension, inventory_fact
WHERE date_dimension.date_key = inventory_fact.date_key
AND product_dimension.product_key = inventory_fact.product_key
GROUP BY calendar_month_number_in_year;
calendar_month_number_in_year | sales | average_sales
------------------------------+----------+------------------
1 | 23869547 | 23869547
2 | 19604661 | 21737104
3 | 22877913 | 22117373.6666667
4 | 22901263 | 22313346
5 | 23670676 | 22584812
6 | 22507600 | 22571943.3333333
7 | 21514089 | 22420821.2857143
8 | 24860684 | 22725804.125
9 | 21687795 | 22610469.7777778
10 | 23648921 | 22714314.9
11 | 21115910 | 22569005.3636364
12 | 24708317 | 22747281.3333333
(12 rows)
The MAX analytic function returns the maximum value of an expression within a window. For example, the following query finds the maximum employee salary in Massachusetts and then computes the deviation between an employee’s annual salary and the maximum employee salary. This example uses the employee_dimension table from the VMart database:
=>SELECT employee_state, annual_salary,
MAX(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) max,annual_salary- MAX(annual_salary)
OVER(PARTITION BY employee_state
ORDER BY employee_key) diff
FROM employee_dimension
WHERE employee_state = 'MA' LIMIT 10;
employee_state | annual_salary | max | diff
---------------+---------------+--------+---------
MA | 625887 | 625887 | 0
MA | 395688 | 625887 | -230199
MA | 396174 | 625887 | -229713
MA | 380373 | 625887 | -245514
MA | 196768 | 625887 | -429119
MA | 146312 | 625887 | -479575
MA | 192084 | 625887 | -433803
MA | 141933 | 625887 | -483954
MA | 193966 | 625887 | -431921
MA | 131931 | 625887 | -493956
(10 rows)
For more information, see Analytic Functions.