PERCENTILE_DISC [Analytic]
An inverse distribution function where, for each row, PERCENTILE_DISC
returns the value that would fall into the specified percentile among a set of values in each partition within a window. PERCENTILE_DISC()
assumes a discrete distribution data model. NULL
values are ignored.
PERCENTILE_DISC
examines the cumulative distribution values in each group until it finds one that is greater than or equal to the specified percentile. Vertica computes the percentile where, for each row, PERCENTILE_DISC
outputs the first value of the WITHIN GROUP(ORDER BY)
column whose CUME_DIST
(cumulative distribution) value is >= the argument FLOAT
value—for example, 0.4
:
PERCENTILE_DIST(0.4) WITHIN GROUP (ORDER BY salary) OVER(PARTITION BY deptno)...
Given the following query:
SELECT CUME_DIST() OVER(ORDER BY salary) FROM table-name;
The smallest CUME_DIST
value that is greater than 0.4 is also the PERCENTILE_DISC
.
Behavior Type
Syntax
PERCENTILE_DISC ( percentile ) WITHIN GROUP ( ... ORDER BY expression [ ASC | DESC ] ) OVER ( ... [ window-partition-clause ] )
Parameters
percentile |
Percentile value, a |
WITHIN GROUP(ORDER BY expression)
|
Specifies how to sort data within each group. The |
ASC | DESC
|
Specifies the ordering sequence as ascending (default) or descending. |
OVER()
|
Example
This query computes the 20th percentile annual income by group for first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_DISC(.2) WITHIN GROUP(ORDER BY annual_income) OVER (PARTITION BY customer_state) AS PERCENTILE_DISC FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300 ORDER BY customer_state, customer_key; customer_state | customer_key | annual_income | PERCENTILE_DISC ----------------+--------------+---------------+----------------- DC | 104 | 658383 | 417092 DC | 168 | 417092 | 417092 DC | 245 | 670205 | 417092 WI | 106 | 227279 | 227279 WI | 127 | 703889 | 227279 WI | 209 | 458607 | 227279 (6 rows)