NTILE
NTILE is a window function in SQL that evenly distributes ordered dataset into specified number of buckets or groups. It provides a way to categorize rows into a certain number of groups ranging from highest to lowest or vice versa.
NTILE(numeric_expr) OVER ([PARTITION BY column_expr [, …]] [ORDER BY column_expr [ASC | DESC] [, …]])
- numeric_expr: Represents the integer value which would be used to divide the result set into equal portions. If the number passed is more than the number of rows in the window, then some of these partitions are empty.
- partition by column_expr: This is an optional parameter used to break the result set into smaller groups, based on the column expression(s) provided. Rows in each partition are processed separately for the functions.
- order by column_expr: Another optional parameter used to sort the rows within each partition. This is based on the column expression(s) provided.
- asc | desc: These optional parameters are used to specify whether the sorting of rows within each partition would be in ascending or descending order. ‘ASC’ for ascending order and ‘DESC’ for descending order. If no option is provided, the default is ‘ASC’.
- [, …]: This represents that multiple column expressions can be provided for ‘PARTITION BY’ and ‘ORDER BY’.
Example
Output
student_id | score | quartile |
---|---|---|
1 | 98 | 1 |
2 | 95 | 1 |
3 | 92 | 2 |
4 | 90 | 2 |
5 | 85 | 3 |
6 | 82 | 3 |
7 | 75 | 4 |
8 | 70 | 4 |
Explanation
In the given example, the NTILE function is used to divide the exam scores of students into four groups or quartiles ordered by score in descending order. The groups are approximately of equal size. The function will assign a quartile value (1-4) based on the score percentile of each student.
NTILE(integer_expression) OVER ([PARTITION BY value_expression, … [n]] ORDER BY sort_expression [ASC | DESC], … [n])
- integer_expression: This represents the number of groups that should be created by the NTILE() function. It must be an integer value greater than zero.
- partition by value_expression: Optional clause, that segments the result set into smaller groups or partitions based on one or more column values. If not specified, the function treats all rows of the query result set as a single group.
- order by sort_expression: This clause determines the logical order of rows within each partition group. The sort_expression specifies the column or columns by which the function will sort.
- asc | desc: These optional keywords specify the direction of sorting. ASCARRanges the data in ascending order, DESC in descending order. The default is ASC if not specified.
- n: This optional parameter specifies the nth variable in the ORDER BY or PARTITION BY clauses, where n is a positive integer.
Example
Output
SalesRep | OrderAmount | SalesQuartile |
---|---|---|
Tom | 200 | 1 |
Jerry | 400 | 1 |
Spike | 600 | 2 |
Tyke | 1000 | 3 |
Butch | 1200 | 4 |
Muscles | 1400 | 4 |
Explanation
The NTILE function in SQL Server splits the ordered dataset into a specified number of groups. In the example provided, the data is split into four groups (quartiles) based on the order amount. The NTILE function then assigns an appropriate group number to each row.
NTILE(buckets NUMBER) OVER ([query_partition_clause] order_by_clause)
- buckets number: This is an argument that specifies the number of buckets into which the NTILE() function is to divide the sorted row set.
- query_partition_clause: This optional parameter divides the result set into partitions. Rows within the same partition are ranked. If not specified, the function treats all rows as a single partition.
- order_by_clause: This parameter is used to order the rows in each partition. The NTILE() function then assigns a unique rank to each row within a partition. The first row selected within a partition is assigned a rank of 1; the second row selected within a partition is assigned a rank of 2, and so on.
Example
Output
firstName | salary | salary_group |
---|---|---|
Jim | 80000 | 1 |
Bob | 75000 | 1 |
Alice | 70000 | 2 |
Charlie | 64000 | 2 |
David | 60000 | 3 |
Eve | 55000 | 3 |
Frank | 50000 | 4 |
Grace | 45000 | 4 |
Explanation
The above SQL statement divides the data set into four groups using the NTILE(4)
function, based on the descending order of salary
. Each group represents a quartile of the overall distribution of salaries. For example, Jim and Bob are in the top 25% earners, Alice and Charlie are in the next 25% earners, and so on.