PERCENTILE_CONT
PERCENTILE_CONT is an inverse distribution function that assumes a continuous distribution model in SQL. It calculates the value which is the nth percentile in sorted values of a particular column.
PERCENTILE_CONT( percentile ) WITHIN GROUP ( ORDER BY expr ) OVER ( [ query_partition_clause ] [ order_by_clause ] )
- percentile: This is a number that specifies the percentile to compute. It must be a value between 0 and 1.
- within group (order by expr): expr here represents the column or expression which is sorted to determine percentile values. It determines the order of the data. This clause is mandatory to define order for percentile calculations.
- over ([query_partition_clause]): This portion of the statement allows you to partition the data into specific groups or chunks for calculated percentiles. The query_partition_clause allows you to specify how to partition rows into groups.
- [order_by_clause]: This is an optional parameter which allows further ordering of your data within each partition or group. This means after partitioning using query_partition_clause, if we still have further criteria to arrange our data, then we use order_by_clause.
Example
Output
Explanation
In the given example, the PERCENTILE_CONT
function is used to calculate the median salary of all employees. Here, 0.5
represents the 50th percentile or median. The WITHIN GROUP
clause sorts the salary in ascending order for this computation.
PERCENTILE_CONT(FLOAT) WITHIN GROUP (ORDER BY expression) OVER ([PARTITION BY partition_expression])
- float: Denotes the percentile to compute. It specifies the exact percentile desired and can range from 0 to 1. The values should be a decimal fraction between 0 and 1, with 0 representing the minimum value in a set of values, and 1 representing the maximum value.
- order by expression: Defines the column or columns based on which the percentile computation is done. The expression can reference any of the table’s columns.
- partition by partition_expression: An optional clause that divides the result set produced by FROM clause into partitions to which the PERCENTILE_CONT function is applied independently. This could refer to one or more existing columns.
- over(): Optional clause used to partition data into smaller sets to operate the percentile calculation over. The PERCENTILE_CONT function is applied to each partition separately and computation is restricted to the current partition. If no partition is specified, the function treats all rows in the query result set as a single group.
Example
Output
Explanation
In the example, PERCENTILE_CONT
function is used to find the 30th percentile of the scores registered within the Scores
table. The percentile is computed as a continuous value within the total. A score of 86.45 represents the 30th percentile across the dataset.
PERCENTILE_CONT( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ PARTITION BY partition_by_expression ] )
- numeric_literal: This is a required parameter representing a percentile value between 0 and 1. It defines the position for percentile calculation within the group. For example, 0.5 represents the median.
- within group: This clause is mandatory and specifies the order in which rows are sorted in the calculation of the percentile.
- order by expression: In this section, you provide an expression to sort the data. This influences how PERCENTILE_CONT will calculate the percentile.
- asc | desc: These are optional parameters allowing the user to set the direction of sorting, ascending or descending. If not specified, ascending order is assumed by default.
- over: This operand is used to compute the function over a set of rows, termed as a window.
- partition by expression: This clause is optional and partitions the data into groups on which the PERCENTILE_CONT is computed. If no PARTITION BY is specified, PERCENTILE_CONT treats all rows of the query result set as a single group.
Example
Output
Explanation
In the given example, the PERCENTILE_CONT
function is used to find the median sales amount from the ‘SalesAmount’ column of the ‘SalesOrderDetail’ table in the ‘Sales’ schema. The percentile to compute is specified as 0.5, which corresponds to the median. The WITHIN GROUP (ORDER BY SalesAmount ASC)
clause is used to sort the sales amounts in ascending order within each group.