Windows functions are one of the most powerful T-SQL programming elements. These functions return a single aggregated value for each row within a set of rows (or window). I’ve spoken with several good programmers who have not used SQL extensively. They are often surprised by the power of Windows functions and how performant they can be on larger datasets. This post is not designed to teach T-SQL or the intricacies of Windows function. It will provide 3 practical ways to effectively use them.
TSQL Windows Function to Calculate Running Totals
Being able to calculate running totals is super useful. In the following example, I have a project with a start and end month… and I’m curious to see how many workdays (total days less weekends and holidays) that represents. If I know the current month, I can then easily work out how many workdays to complete and other useful things.
So, let’s assume there is a project table called DimProject that has the project name, project start and end months, and also the current month (Jul 2021 in this case). And a yearMonth table that has each month and the number of workdays. See the image below.
I can now join the DimProject table to the YearMonth table to see the project by month, workdays each month and importantly, I use a windows function to calculate the running total of workdays each month.
Here is the TSQL:
SELECT P.yearMonthCurrent, YM.yearMonth, P.projectName, Ym.workDays,
SUM(YM.workDays) OVER(
PARTITION BY P.projectKey
ORDER BY YM.yearMonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS workDaysRunningTotal
FROM dbo.DimProject P
LEFT JOIN dbo.DimDateYearMonthCount YM on YM.yearMonth >= P.yearMonthStart AND YM.yearMonth <= P.yearMonthEnd
Here is the result:
For each month a window is created between the first row and the current month and then we simply sum the WorkDays. Perfect.
TSQL Windows Function to Calculate Rolling Averages
Calculating and visualising rolling averages can be very useful in understanding a given data set. Fortunately, calculating them is also straightforward.
Let's assume we have a simple table that contains sales by month. See Below.
Let's say we want to calculate a 3-month rolling average. In Nov 2021 or 202111 using the YYYYMM format the 3 months are Sep = 11, Oct = 9, and Nov = 10... and the average is 10. To calculate this for all months we can use the following T-SQL... and of course the output we could push to a table, or show in a view, or use in a larger code block (stored procedure) to help calculate other relevant information.
SELECT YearMonth, QuantitySold,
AVG(QuantitySold) OVER (
ORDER BY YearMonth
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as ThreeMthRollingAvg
FROM [dbo].[QuantityByMonth]
Here is the result, and if you review 202111 you'll see the 3 month rolling average is 10.
Effectively the code is creating a window for each month that includes the current row and the 2 preceding rows. This window provides the 3 quantities and the AVG function generates the desired result. Simple.
TSQL Windows Function to Remove Duplicates
It would be remiss to not mention removing duplicates. The ROW_NUMBER() windows function is perhaps the best-known windows function, and using it to identify and/or remove duplicates is a common use-case.
The following code block creates and populates a table with multiple Department records. Initially, the table contains 11 records but several of these records have the same Department code.
The code assumes we need to clean this up and display just the current (or most recent) updates. As such it identifies duplicate codes and deletes all but the most recent. It achieves this by creating a partition for each department ordered by update date with the most recent first. The most recent is given the ROW_NUMBER value 1, the next 2 and so on. The code deletes all records that have been given a value greater than 1.
DROP TABLE IF EXISTS dbo.DimDepartment_DeDupeExample
CREATE TABLE dbo.DimDepartment_DeDupeExample (
DepartmentCode nvarchar(10) NOT NULL,
DepartmentDesc nvarchar(50) NOT NULL,
UpdateDate datetime NOT NULL
)
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D002', N'Marketing', '2022-01-01')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D003', N'Sales', '2022-01-01')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D001', N'IT', '2022-01-01')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D004', N'Finance', '2022-01-01')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D001', N'Information Technology', '2022-03-20')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D005', N'Executive', '2022-03-22')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D003', N'Sales Direct', '2022-04-05')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D001', N'Technology', '2022-04-05')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D006', N'Sales Indirect', '2022-04-05')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D007', N'Support', '2022-04-05')
INSERT dbo.DimDepartment_DeDupeExample (DepartmentCode, DepartmentDesc, UpdateDate)
VALUES (N'D007', N'Sales Support', '2022-04-06')
SELECT * FROM dbo.DimDepartment_DeDupeExample
DELETE DeDupe FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY DepartmentCode
ORDER BY UpdateDate DESC)
FROM dbo.DimDepartment_DeDupeExample
) Dedupe
WHERE RN > 1
SELECT * FROM dbo.DimDepartment_DeDupeExample
ORDER BY DepartmentCode
Initial Table
Result Table
Rolling Averages code. You saved me! Thankyou