LAG

LAG is an analytical function in SQL that provides access to a row at a given physical offset preceding the current row within a partition. It's mainly used to compare current row values with values of a previous row.

LAG(value_expression, offset, default_value) OVER (PARTITION BY column_name ORDER BY column_name)

  • value_expression: This refers to the expression that will be returned as the result. It can be a column or an arithmetic operation between columns.
  • offset: This is the position of the previous row to return in comparison to the current row. The position is 1-based, meaning 1 refers to the first row, 2 refers to the second row, and so on. The offset must be a non-negative integer.
  • default_value: If no row is found at the offset position or the value is NULL, the default value will be returned. The type of this default value must match or be implicitly convertible to the value expression type.
  • partition by column_name: This clause divides the result set into partitions. The LAG function is applied within each partition independently.
  • order by column_name: This clause specifies the order of the rows in each partition. The LAG function uses this order to determine which row is the ‘lagging’ row.

Example

CREATE TABLE sales (
id INT AUTO_INCREMENT,
year INT NOT NULL,
sale_amount INT NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO sales(year, sale_amount)
VALUES(2018, 100),
(2019, 200),
(2020, 300);
SELECT year,
LAG(sale_amount) OVER (ORDER BY year) as previous_year_sale
FROM sales;

Output

year | previous_year_sale
-----|-------------------
2018 | NULL
2019 | 100
2020 | 200

Explanation

The LAG() function is a window function in SQL that allows you to look at a row that is a certain number of rows behind the current one. In the given example, the LAG() function returns the sale_amount from the previous year. For the first row (year 2018), there is no previous sale_amount, so the value is NULL. The sale_amount for the year 2019 is 100 (from the year 2018), and the sale_amount for the year 2020 is 200 (from the year 2019).

LAG(value, offset, default) OVER ( [ PARTITION BY partition_expression ] [ ORDER BY order_expression ] )

  • value: The column or expression whose values are to be returned by the LAG function. It is the data that will be used in the calculation of the function.
  • offset: An optional parameter, defining the number of steps back from the current row from which to retrieve data. If not explicitly mentioned, the default is 1.
  • default: Another optional parameter which determines the value to be returned by the LAG function when it cannot fetch a value, typically when the offset surpasses the limit of the data set.
  • partition by partition_expression: This clause divides the result set produced by the FROM clause into partitions to which the LAG function is applied. They can be column(s) or expression(s) that involve columns.
  • order by order_expression: This clause determines the order in which rows will be used by the LAG function within the partition. It can be a list of expressions that may involve columns.

Example

SELECT
name, salary,
LAG(salary) OVER (ORDER BY salary) as prev_salary
FROM employees;

Output

| name | salary | prev_salary |
|---------|---------|-------------|
| John | 40000 | NULL |
| Emma | 50000 | 40000 |
| Harry | 70000 | 50000 |
| Olivia | 75000 | 70000 |

Explanation

In the given example, the PostgreSQL LAG() function is used to fetch the previous row’s salary for each row within the employees table when the rows are ordered by salary. As a result, the prev_salary for the first row is NULL because there is no previous row. For each ensuing row, prev_salary contains the salary of the preceding row.

LAG(value_expression, offset, default) OVER ([partition_by_clause] order_by_clause)

  • value_expression: The value expression defines the column or expression from which the LAG function extracts a value. This parameter is mandatory.
  • offset: The offset specifies the number of rows backward from the current row from which the LAG function fetches the value. It’s optional and if not specified, the OFFSET defaults to one.
  • default: This parameter is also optional. It specifies the value the LAG function returns if the offset goes beyond the extent of the window frame or there’s no row to lag (when the offset is more than the number of rows). If not specified, the default value is NULL.
  • partition_by_clause: It sub-divides the window into smaller sets or partitions. Rows in the same partition are processed in the same manner while applying the window function. This clause is optional for the OVER clause.
  • order_by_clause: It specifies how to arrange the rows in a window in a specific order. This is necessary to determine the position of the current row within its partition and thus is required for the OVER clause.

Example

SELECT
EmployeeName,
Salary,
LAG(Salary) OVER (ORDER BY Salary) AS PreviousSalary
FROM
Employee

Output

| EmployeeName | Salary | PreviousSalary |
|--------------|--------|----------------|
| Mary | 5000 | NULL |
| John | 6000 | 5000 |
| Peter | 7000 | 6000 |
| Katie | 8000 | 7000 |

Explanation

The LAG() function in SQL Server allows us to access the data of a previous row in the same result set without the need to join the table to itself. In the given example, we are showing each employee’s salary and the salary of the person who earns the next lowest amount. The result set indicates that after sorting by salary, there is no person with a lower salary than Mary, John’s salary is the one next higher from Mary, and so on.

LAG(value_expr, offset, default) OVER ([query_partition_clause] order_by_clause)

  • value_expr: This parameter is the column or expression from which the LAG function gets the value. The function will find the value of the value_expr from the previous row which is defined by the offset.
  • offset: This parameter is a positive integer that determines the ‘lag’ between the current row and the row to retrieve. The offset defines how many rows back from the current row to retrieve the value_expr. If not specified, the function treats the offset as 1.
  • default: This optional parameter provides a value to return when the offset exceeds the number of rows in the window. If it’s not specified and offset is greater than the number of rows, the default value will be null.
  • query_partition_clause: This optional parameter is used to divide the result set into groups or partitions of rows to which the LAG function is applied. If this parameter is not specified, the function is applied to the entire result set.
  • order_by_clause: This parameter determines the logical order of rows within each partition of the result set. The LAG function calculates its result based on the order_by_clause. The order_by_clause is required for the LAG function to operate.

Example

SELECT employee_id, salary,
LAG(salary, 1, 0) OVER (ORDER BY employee_id) as prev_salary
FROM employees;

Output

EMPLOYEE_ID SALARY PREV_SALARY
----------- ------ -----------
100 24000 0
101 17000 24000
102 29000 17000
103 9000 29000

Explanation

The LAG function provides access to data from a previous row in the same result set without the use of a self-join. Here, it retrieves the salary of the previous employee when all are sorted by employee ID. The ‘1’ in LAG specifies to look one row back, and the ‘0’ specifies that it will return 0 for the first row, as there is no row before it to pull salary data from.

For in-depth explanations and examples SQL keywords where you write your SQL, install our extension.