PERCENTILE_CONT [Analytic]
An inverse distribution function where, for each row, PERCENTILE_CONT
returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT
assumes a continuous distribution data model. NULL
values are ignored.
PERCENTILE_CONT
computes the percentile by first computing the row number where the percentile row would exist. For example:
row‑number = 1 + percentile‑value * (num‑partition‑rows -1)
If row‑number
is a whole number (within an error of 0.00001), the percentile is the value of row row‑number
.
Otherwise, Vertica interpolates the percentile value between the value of the CEILING(row‑number)
row and the value of the FLOOR(row‑number)
row. In other words, the percentile is calculated as follows:
( CEILING( row‑number) - row‑number ) * ( Value of FLOOR(row‑number) row )
+ ( row‑number - FLOOR(row‑number) ) * ( Value of CEILING(row‑number) row)
If the percentile value is 0.5, PERCENTILE_CONT
returns the same result set as the function MEDIAN
.
Behavior Type
Syntax
PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC | DESC ] ) OVER ( [ window-partition-clause ] )
Parameters
percentile |
Percentile value, a |
WITHIN GROUP(ORDER BY expression)
|
Specifies how to sort data within each group. The |
ASC | DESC
|
Specifies the ordering sequence as ascending (default) or descending. Specifying |
OVER()
|
Examples
This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income) OVER (PARTITION BY customer_state) AS PERCENTILE_CONT FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300 ORDER BY customer_state, customer_key; customer_state | customer_key | annual_income | PERCENTILE_CONT ----------------+--------------+---------------+----------------- DC | 52 | 168312 | 483266.5 DC | 118 | 798221 | 483266.5 WI | 62 | 283043 | 377691 WI | 139 | 472339 | 377691 (4 rows)
This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income) OVER (PARTITION BY customer_state) AS PERCENTILE_CONT FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key; customer_state | customer_key | annual_income | PERCENTILE_CONT ----------------+--------------+---------------+----------------- DC | 52 | 168312 | 483266.5 DC | 118 | 798221 | 483266.5 DC | 622 | 220782 | 555088 DC | 951 | 178453 | 555088 DC | 972 | 961582 | 555088 DC | 1286 | 760445 | 555088 DC | 1434 | 44836 | 555088 . . .
WI | 62 | 283043 | 377691 WI | 139 | 472339 | 377691 WI | 359 | 42242 | 517717 WI | 364 | 867543 | 517717 WI | 403 | 509031 | 517717 WI | 455 | 32000 | 517717 WI | 485 | 373129 | 517717 . . .
(1353 rows)