THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

SQL Server 2012 Windowing Functions: LAG

SQL Server 2012 introduces new windowing functions beyond the basic ranking functions of RANK, DENSE_RANK, NTILE, and ROW_NUMBER. The LAG function is particularly useful when comparing data in the current time period to the previous period. Period to period changes in sales, nosocomial infection rates, and manufacturing defects are just some examples of where a LAG function can simplify your code. The purpose of this post is to provide a basic introduction. Subsequent posts will discuss more advanced queries.

The traditional approach to comparing a previous period to a current period is to use a self-join, which is illustrated in the first SELECT statement shown below. Notice that the condition on the Tempus column reduces the 12 months (rows) of data to 11 rows. This is expected for the data and logic in the complete example shown further down the page. Also notice that the logic for matching data in the Tempus column with the previous time period works only within the same year when using dates formatted as shown. For example, 201212 + 1 won’t match with 201301. This could be fixed by using date formatting functions including DATEADD. The point here is that dates typically require additional work if you want a robust, generic solution.

SELECT c.Employee
, c.Tempus
, p.Quantity AS PreviousQuantity
, c.Quantity AS CurrentQuantity
FROM #Sales p
INNER JOIN #Sales c
ON c.Employee = p.Employee
WHERE c.Tempus = p.Tempus + 1;

Employee Tempus PreviousQuantity CurrentQuantity
-------- ------ ---------------- ---------------
1        201302 100              200
1        201303 200              150
1        201304 150              175
1        201305 175              200
1        201306 200              250
1        201307 250              275
1        201308 275              300
1        201309 300              350
1        201310 350              400
1        201311 400              200
1        201312 200              300

(11 row(s) affected)

Now look at the output from using the LAG function. Notice that 12 rows are returned. This is important when you want to add aggregations to your result set, but this is a topic for a future post on more advanced techniques. The logic for processing the Tempus column is intrinsically more robust and simpler when using the LAG function.

SELECT Employee
,Tempus
,LAG(Quantity) OVER (PARTITION BY Employee ORDER BY Tempus ASC) AS PreviousQuantity
,Quantity AS CurrentQuantity
FROM #Sales;

Employee Tempus PreviousQuantity CurrentQuantity
-------- ------ ---------------- ----------------
1        201301 NULL             100
1        201302 100              200
1        201303 200              150
1        201304 150              175
1        201305 175              200
1        201306 200              250
1        201307 250              275
1        201308 275              300
1        201309 300              350
1        201310 350              400
1        201311 400              200
1        201312 200              300

(12 row(s) affected)

Now that you can see the quantities for each time period and its immediate predecessor, you’re probably going to want to compute percentage changes from one time period to another.

SELECT Employee
,Tempus
,PreviousQuantity
,CurrentQuantity
,(CurrentQuantity - PreviousQuantity) / PreviousQuantity * 100.0 AS PercentageChange
FROM
(SELECT Employee
,Tempus
,LAG(Quantity) OVER (PARTITION BY Employee ORDER BY Tempus ASC) AS PreviousQuantity
,Quantity AS CurrentQuantity
FROM #Sales) AS Sales;

Employee Tempus PreviousQuantity CurrentQuantity PercentageChange
-------- ------ ---------------- --------------- -------------------
1        201301 NULL             100             NULL
1        201302 100              200             100.000000000000000
1        201303 200              150             -25.000000000000000
1        201304 150              175             16.666666666666667
1        201305 175              200             14.285714285714286
1        201306 200              250             25.000000000000000
1        201307 250              275             10.000000000000000
1        201308 275              300             9.090909090909091
1        201309 300              350             16.666666666666667
1        201310 350              400             14.285714285714286
1        201311 400              200             -50.000000000000000
1        201312 200              300             50.000000000000000

(12 row(s) affected)

I highly recommend that you purchase Itzik Ben Gan’s book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions to acquire an in-depth knowledge of windowing functions. Itzik reviewed my code and suggested the addition of the index commented out in the code sample shown below. I recommend running the queries with and without the index to see how the index causes the SORT operator to drop out of the query plan.

CREATE TABLE #Sales
(
     Employee NVARCHAR(30)
    ,Tempus DECIMAL
    ,Quantity DECIMAL
);

INSERT INTO #Sales
(Employee, Tempus, Quantity)
VALUES
(1, 201301, 100)
,(1, 201302, 200)
,(1, 201303, 150)
,(1, 201304, 175)
,(1, 201305, 200)
,(1, 201306, 250)
,(1, 201307, 275)
,(1, 201308, 300)
,(1, 201309, 350)
,(1, 201310, 400)
,(1, 201311, 200)
,(1, 201312, 300)
;


--Suggestion from Itzik Ben-Gan to eliminate the sort operation shown in the query plan.
--CREATE INDEX IDX_Sales_Employee_tempus ON #Sales(Employee, Tempus) INCLUDE (Quantity);

-- traditional solution using a self-join
-- notice a different approach is required if Tempus data included other years
SELECT c.Employee
, c.Tempus
, p.Quantity AS PreviousQuantity
, c.Quantity AS CurrentQuantity
FROM #Sales p
INNER JOIN #Sales c
ON c.Employee = p.Employee
WHERE c.Tempus = p.Tempus + 1;

-- LAG function used to get date from the previous period
-- easier and more robust logic for handling the Tempus column
SELECT Employee
,Tempus
,LAG(Quantity) OVER (PARTITION BY Employee ORDER BY Tempus ASC) AS PreviousQuantity
,Quantity AS CurrentQuantity
FROM #Sales;

-- Create an alias for the previous query and computer percentage change
SELECT Employee
,Tempus
,PreviousQuantity
,CurrentQuantity
,(CurrentQuantity - PreviousQuantity) / PreviousQuantity * 100.0 AS PercentageChange
FROM
(SELECT Employee
,Tempus
,LAG(Quantity) OVER (PARTITION BY Employee ORDER BY Tempus ASC) AS PreviousQuantity
,Quantity AS CurrentQuantity
FROM #Sales) AS Sales;

DROP TABLE #Sales;

Windowing functions provide a layer of abstraction that can simplify many coding tasks. Complicated tasks become simple tasks, often with a significant performance improvement.

Published Thursday, January 23, 2014 7:22 PM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mike Spizos said:

Remember, you can filter NULL in instances like this.  Or at the very least sort so it is at the bottom.  It makes it easier to glance at, for example 'PercentageChange' and see what show up most.

http://mssqlonline.com/?p=18

This will also work.

Thanks.

January 27, 2014 8:32 PM
 

Pablo said:

What about performance on big tables?

February 6, 2014 5:26 PM

Leave a Comment

(required) 
(required) 
Submit

About John Paul Cook

John Paul Cook is both a Registered Nurse and a Microsoft SQL Server MVP experienced in Microsoft SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. Experienced in systems integration and workflow analysis, John is passionate about combining his IT experience with his nursing background to solve difficult problems in healthcare. He sees opportunities in using business intelligence and Big Data to satisfy healthcare meaningful use requirements and improve patient outcomes. John graduated from Vanderbilt University with a Master of Science in Nursing Informatics and is an active member of the Sigma Theta Tau nursing honor society. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement