PERCENTILE_DISC
PERCENTILE_DISC is a function in SQL that computes the specific percentile for a group of values in a distribution. Specifically, it returns the smallest value that is greater than or equal to the percentile value in the ordered set. This function applies to a rowset and requires an ORDER BY clause to define the ordering of the rows. The output value would be the actual data point from the dataset. Unlike PERCENTILE_CONT, PERCENTILE_DISC will return an actual data point from the dataset. It uses a discrete calculation, meaning that it will return a value from the dataset, even if the value doesn't exist in the data grouping.
PERCENTILE_DISC(numeric_literal) WITHIN GROUP (ORDER BY expr1 [ASC | DESC] [NULLS {FIRST | LAST}]) OVER ([query_partition_clause] order_by_clause)
- numeric_literal: This is a numeric value that falls between 0 and 1 to represent the percentile that will be calculated. The number specified is treated as a Continuous Distribution measure to calculate observations or numbers in a distribution.
- order by expr1 [asc | desc]: This is used to determine the order in which values within the group will be sorted. It sorts based on ‘expr1’, which is a column or expression that data should be ordered by. ‘ASC’ means data is sorted in ascending order whereas ‘DESC’ means data is sorted in descending order.
- nulls {first | last}: This is optional and specifies the position of null values in the ordered list. NULLS FIRST will place nulls at the start of the list whereas NULLS LAST will place the nulls at the end of the list.
- query_partition_clause: It distributes the result set into partitions or groups of rows to which the percentile_disc function is applied. If not specified, the function treats all rows of the query result set as a single group.
- order_by_clause: This is the ORDER BY clause in the OVER function. It sorts the rows in each partition. If not specified, the function treats all rows of the partition as a single group.
Example
Output
Explanation
In this example, the PERCENTILE_DISC
function is used to calculate the median salary (0.5
corresponds to the median) of all employees from the Employee_Salary_Table
. The ORDER BY
clause ensures the salaries are ordered in ascending order before percentile calculation. The result, 45000
, represents the median salary.
PERCENTILE_DISC( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ PARTITION BY partition_by_expression ] )
- numeric_literal: The percentile value that must be between 0 and 1. This specifies the percentile to be computed - for instance, specifying 0.5 would compute the median, which is the 50th percentile.
- order by order_by_expression: Determines the column, or columns, by which the data will be ordered before the percentile is computed. May include `ASC` (ascending) or `DESC` (descending) keywords to specify sort order.
- partition by partition_by_expression: Divides the result set into partitions. This clause is optional and if specified, separates the input into groups where the function is applied separately. Each partition is processed as a separate calculation for percentile.
Example
Output
Explanation
The provided SQL Server code calculates the median of the SalesAmount
column from the SalesOrderDetail
table in the Sales
schema. PERCENTILE_DISC
function with 0.5 as an argument is being used to find the median value which is the middle value when ordering by SalesAmount
.
PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY expression) OVER (partition_by_clause)
- fraction: Represents the percentile value. It must be a constant numeric value ranging between 0 and 1 inclusive. A value of 0.5 will return the median of the data set.
- within group (order by expression): Specifies a sort order. In this syntax, the ORDER BY expression set the subset of data used in calculating the percentile.
- over (partition_by_clause): Determines how the input data set should be split. This clause distributes the rows of the result set into partitions to which the `PERCENTILE_DISC` function is applied independently. Each partition forms a subset of data.
- percentile_disc: It is an inverse distribution function that assumes a continuous distribution model. This function returns the value corresponding to the percentile specified by the `fraction` value. For example, `PERCENTILE_DISC(0.5)` computes the median value in a set.
Example
Output
median_sales |
---|
500 |
Explanation
In the given example, the PERCENTILE_DISC function calculates the median sales amount from sales_data
table. The function takes in a percentile value 0.5
(which represents the 50th percentile or the median) and operates ‘within group’ ordered by sales_amount
. The result is 500
, which is the median sales amount in the sales data.