PERCENTILE_CONT [Analytic]
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT assumes a continuous distribution data model. NULL values are ignored.
PERCENTILE_CONT computes the percentile by first computing the row number where the percentile row would exist. For example:
row‑number = 1 + percentile‑value * (num‑partition‑rows -1)
If row‑number is a whole number (within an error of 0.00001), the percentile is the value of row row‑number.
Otherwise, Vertica interpolates the percentile value between the value of the CEILING(row‑number)
row and the value of the FLOOR(row‑number)
row. In other words, the percentile is calculated as follows:
( CEILING( row‑number) - row‑number ) * ( value of FLOOR(row‑number) row ) + ( row‑number - FLOOR(row‑number) ) * ( value of CEILING(row‑number) row)
If the percentile value is 0.5, PERCENTILE_CONT returns the same result set as the function MEDIAN.
Behavior Type
Syntax
PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC | DESC ] ) OVER ( [ window-partition-clause ] )
Parameters
percentile |
Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive). |
WITHIN GROUP (ORDER BY expression)
|
Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded. The |
ASC | DESC
|
Specifies the ordering sequence as ascending (default) or descending. Specifying ASC or DESC in the |
OVER()
|
Examples
This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income) OVER (PARTITION BY customer_state) AS PERCENTILE_CONT FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300 ORDER BY customer_state, customer_key; customer_state | customer_key | annual_income | PERCENTILE_CONT ----------------+--------------+---------------+----------------- DC | 52 | 168312 | 483266.5 DC | 118 | 798221 | 483266.5 WI | 62 | 283043 | 377691 WI | 139 | 472339 | 377691 (4 rows)
This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income) OVER (PARTITION BY customer_state) AS PERCENTILE_CONT FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key; customer_state | customer_key | annual_income | PERCENTILE_CONT ----------------+--------------+---------------+----------------- DC | 52 | 168312 | 483266.5 DC | 118 | 798221 | 483266.5 DC | 622 | 220782 | 555088 DC | 951 | 178453 | 555088 DC | 972 | 961582 | 555088 DC | 1286 | 760445 | 555088 DC | 1434 | 44836 | 555088 ...
WI | 62 | 283043 | 377691 WI | 139 | 472339 | 377691 WI | 359 | 42242 | 517717 WI | 364 | 867543 | 517717 WI | 403 | 509031 | 517717 WI | 455 | 32000 | 517717 WI | 485 | 373129 | 517717 ...
(1353 rows)