Wednesday, March 25, 2026

The SQL Server Deadlock Fix I Never Would Have Thought Of

Intro

In my previous post, I shared how a deadlock incident changed how I think about using AI in my day-to-day work as a DBA. Here I'll walk through the incident itself. The data and queries aren't the real ones; I recreated the scenario with my own made-up tables, but the deadlock behavior is the same.

Set up a deadlock demo

Below is the T-SQL to create the table and load it with test data. A couple of things worth knowing about the data:

  • For every ProcessId, there is exactly one active record, the one where EndDate is NULL.
  • The StartDate and EndDate pattern reflects a soft-delete approach: instead of updating a record directly, we close the current one by setting its EndDate, then insert a new record with the updated values.
USE [MyBlogDB];

DROP TABLE IF EXISTS [dbo].[Configuration];
CREATE TABLE [dbo].[Configuration] (
	[Id] INT IDENTITY(1,1),
	[ProcessId] INT NOT NULL,
	[Value1] VARCHAR(50) NOT NULL,
	[Value2] VARCHAR(50) NULL,
	[StartDate] DATETIME NOT NULL,
	[EndDate] DATETIME NULL,
	[IsDefaultConfigOption] BIT NOT NULL DEFAULT 0,
	CONSTRAINT [PK_Configuration_Id] PRIMARY KEY ([Id])
);


-- CLAUDE wrote this code to generate 500,000 rows of test data for the Configuration table 
;WITH Numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM Numbers
    WHERE n &lt 500000
)
INSERT INTO [dbo].[Configuration] ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption])
SELECT
    -- ProcessId cycles through 1–5000
    ((n - 1) % 5000) + 1,

    -- Value1: e.g. 'Value_1', 'Value_2', ...
    'Value_' + CAST(n AS VARCHAR(10)),

    -- Value2: nullable, every 3rd row is NULL
    CASE WHEN n % 3 = 0 THEN NULL ELSE 'Option_' + CAST((n % 10) + 1 AS VARCHAR(10)) END,

    -- StartDate: spread over the past ~3 years
    DATEADD(DAY, -(n % 1095), GETDATE()),

    -- EndDate: NULL only on the last occurrence of each ProcessId (n > 495000),
-- all prior rows get an end date 1 year after their start CASE WHEN n > 495000 THEN NULL ELSE DATEADD(DAY, -(n % 1095) + 365, GETDATE()) END, -- IsDefaultConfigOption: alternates in groups of 4 CASE WHEN n % 4 IN (1, 2) THEN 1 ELSE 0 END FROM Numbers OPTION (MAXRECURSION 0); -- Required to allow recursion beyond the default 100 limit SELECT * FROM [dbo].[Configuration] where processid=999;

I also added an index similar to the one that existed in the real-world case. It doesn't perfectly cover the WHERE clause, but it's good enough to be useful. I kept it that way intentionally, because in the real world, I wouldn't rush to throw more indexes at the problem.

DROP INDEX IF EXISTS I1_ProcessId_StartDate_EndDate ON [dbo].[Configuration];
CREATE INDEX I1_ProcessId_StartDate_EndDate ON [dbo].[Configuration] ([ProcessId], [StartDate], [EndDate]);

Here's the stored procedure at the center of it all. You might wonder why there is only one? in the real-world scenario, two colleagues were accidentally running the same operation on the same ProcessId at nearly the same time, which is what caused the deadlock.

CREATE OR ALTER PROCEDURE [dbo].[Configuration_Update_IsDefaultConfigOption]
	@ProcessId INT,
	@IsDefaultConfigOptionInput BIT
AS
BEGIN
	SET NOCOUNT ON;
	DECLARE @CurrentDate DATETIME = GETDATE();
	DECLARE @ID INT;
	DECLARE @Value1 VARCHAR(50);
	DECLARE @Value2 VARCHAR(50);

	BEGIN TRY
		BEGIN TRANSACTION;
		SELECT 
			@ID = [Id],
			@Value1 = [Value1],
			@Value2 = [Value2]
		FROM [dbo].[Configuration]
		WHERE [ProcessId] = @ProcessId
		      AND [EndDate] IS NULL;

		UPDATE [dbo].[Configuration]
		SET [EndDate] = @CurrentDate
		WHERE [ProcessId] = @ProcessId
		      AND [EndDate] IS NULL;

		INSERT INTO [dbo].[Configuration] ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption])
		VALUES (@ProcessId, @Value1, @Value2, @CurrentDate, NULL, @IsDefaultConfigOptionInput);

		COMMIT TRANSACTION;
	END TRY
	BEGIN CATCH
	IF @@TRANCOUNT > 0
		ROLLBACK TRANSACTION;
		THROW;
	END CATCH
END
GO

The proc is straightforward. Given a ProcessId, it fetches the current active record (where EndDate is NULL), closes it with a soft delete by setting EndDate, then inserts a new record with the updated IsDefaultConfigOption value and everything else carried over.

To reproduce a deadlock, I used SQLQueryStress with 4 threads all hammering this same query and ProcessID simultaneously (for 10 iterations each to increase the chance of getting the deadlock). In the real world there were only two competing sessions, but I bumped it up to 4 here just to make the deadlock easier to trigger reliably.

Understand the deadlock

Here is the deadlock graph I obtained from running the sp_BlitzLock proc (from First Responder Kit scripts).

<deadlock>
  <victim-list>
    <victimProcess id="process2b5c1d01048" />
  </victim-list>
  <process-list>
    <process id="process2b5c1d01048" spid="87"
▶ more... lockMode="S" waitresource="KEY: 5:72057594058899456 (294556843a07)" isolationlevel="read committed (2)" currentdbname="MyBlogDB" transactionname="user_transaction" lasttranstarted="2026-03-12T16:06:10.557" clientapp="SQLQueryStress" hostname="1NDWJD3"
> <executionStack> <frame procname="MyBlogDB.dbo.Configuration_Update_IsDefaultConfigOption" line="17"> SELECT @ID = [Id], @Value1 = [Value1], @Value2 = [Value2] FROM [dbo].[Configuration] WHERE [ProcessId] = @ProcessId AND [EndDate] IS NULL </frame>
▶ <frame procname="adhoc">...<frame procname="adhoc" line="1"> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </frame>
</executionStack>
▶ <stackFrames> (39 frames, collapsed)... <stackFrames> <frame id="00" address="0x7FFA979F4694" module="ntdll" /> <frame id="01" address="0x7FFA94FBEB29" module="kernelbase" /> <frame id="02" address="0x7FFA77AA65CA" module="SqlDK" /> <frame id="03" address="0x7FFA77AA64ED" module="SqlDK" /> <frame id="04" address="0x7FFA77AA216C" module="SqlDK" /> <frame id="05" address="0x7FFA77AA1C33" module="SqlDK" /> <frame id="06" address="0x7FFA77AA3B66" module="SqlDK" /> <frame id="07" address="0x7FFA4659855A" module="sqlmin" /> <frame id="08" address="0x7FFA465987D0" module="sqlmin" /> <frame id="09" address="0x7FFA4650D84D" module="sqlmin" /> <frame id="10" address="0x7FFA46515599" module="sqlmin" /> <frame id="11" address="0x7FFA4652016E" module="sqlmin" /> <frame id="12" address="0x7FFA465160DD" module="sqlmin" /> <frame id="13" address="0x7FFA46515DFE" module="sqlmin" /> <frame id="14" address="0x7FFA465343AE" module="sqlmin" /> <frame id="15" address="0x7FFA780E1955" module="SqlTsEs" /> <frame id="16" address="0x7FFA46534419" module="sqlmin" /> <frame id="17" address="0x7FFA46533EBE" module="sqlmin" /> <frame id="18" address="0x7FFA465342B4" module="sqlmin" /> <frame id="19" address="0x7FFA46530020" module="sqlmin" /> <frame id="20" address="0x7FFA46543ECE" module="sqlmin" /> <frame id="21" address="0x7FFA4369D4D0" module="sqllang" /> <frame id="22" address="0x7FFA4369D702" module="sqllang" /> <frame id="23" address="0x7FFA43B022D9" module="sqllang" /> <frame id="24" address="0x7FFA44619DFF" module="sqllang" /> <frame id="25" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="26" address="0x7FFA43697700" module="sqllang" /> <frame id="27" address="0x7FFA436CB62F" module="sqllang" /> <frame id="28" address="0x7FFA436CB451" module="sqllang" /> <frame id="29" address="0x7FFA436CB278" module="sqllang" /> <frame id="30" address="0x7FFA43B022D9" module="sqllang" /> <frame id="31" address="0x7FFA44619DFF" module="sqllang" /> <frame id="32" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="33" address="0x7FFA43697700" module="sqllang" /> <frame id="34" address="0x7FFA436A2E8B" module="sqllang" /> <frame id="35" address="0x7FFA436956F1" module="sqllang" /> <frame id="36" address="0x7FFA43695402" module="sqllang" /> <frame id="37" address="0x7FFA77AA9165" module="SqlDK" /> <frame id="38" address="0x7FFA77AA9880" module="SqlDK" /> <frame id="39" address="0x7FFA77AA94E3" module="SqlDK" /> </stackFrames>
▶ <inputbuf>...<inputbuf> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </inputbuf>
</process> <process id="process2b5c1acc478" spid="88"
▶ more... lockMode="X" waitresource="KEY: 5:72057594058964992 (3963a1a319be)" isolationlevel="read committed (2)" currentdbname="MyBlogDB" transactionname="user_transaction" lasttranstarted="2026-03-12T16:06:10.557" clientapp="SQLQueryStress" hostname="1NDWJD3"
> <executionStack> <frame procname="MyBlogDB.dbo.Configuration_Update_IsDefaultConfigOption" line="25"> UPDATE [dbo].[Configuration] SET [EndDate] = @CurrentDate WHERE [ProcessId] = @ProcessId AND [EndDate] IS NULL </frame>
▶ <frame procname="adhoc">...<frame procname="adhoc" line="1"> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </frame>
</executionStack>
▶ <stackFrames> (39 frames, collapsed)... <stackFrames> <frame id="00" address="0x7FFA979F4694" module="ntdll" /> <frame id="01" address="0x7FFA94FBEB29" module="kernelbase" /> <frame id="02" address="0x7FFA77AA65CA" module="SqlDK" /> <frame id="03" address="0x7FFA77AA64ED" module="SqlDK" /> <frame id="04" address="0x7FFA77AA216C" module="SqlDK" /> <frame id="05" address="0x7FFA77AA1C33" module="SqlDK" /> <frame id="06" address="0x7FFA77AA3B66" module="SqlDK" /> <frame id="07" address="0x7FFA4659855A" module="sqlmin" /> <frame id="08" address="0x7FFA465987D0" module="sqlmin" /> <frame id="09" address="0x7FFA4650D84D" module="sqlmin" /> <frame id="10" address="0x7FFA46515599" module="sqlmin" /> <frame id="11" address="0x7FFA4652016E" module="sqlmin" /> <frame id="12" address="0x7FFA4671B463" module="sqlmin" /> <frame id="13" address="0x7FFA4671B147" module="sqlmin" /> <frame id="14" address="0x7FFA4671AE9E" module="sqlmin" /> <frame id="15" address="0x7FFA4671A770" module="sqlmin" /> <frame id="16" address="0x7FFA780E1B97" module="SqlTsEs" /> <frame id="17" address="0x7FFA4657E38E" module="sqlmin" /> <frame id="18" address="0x7FFA46530020" module="sqlmin" /> <frame id="19" address="0x7FFA46543ECE" module="sqlmin" /> <frame id="20" address="0x7FFA4369D4D0" module="sqllang" /> <frame id="21" address="0x7FFA436B4AB8" module="sqllang" /> <frame id="22" address="0x7FFA436B44D9" module="sqllang" /> <frame id="23" address="0x7FFA43B022D9" module="sqllang" /> <frame id="24" address="0x7FFA44619DFF" module="sqllang" /> <frame id="25" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="26" address="0x7FFA43697700" module="sqllang" /> <frame id="27" address="0x7FFA436CB62F" module="sqllang" /> <frame id="28" address="0x7FFA436CB451" module="sqllang" /> <frame id="29" address="0x7FFA436CB278" module="sqllang" /> <frame id="30" address="0x7FFA43B022D9" module="sqllang" /> <frame id="31" address="0x7FFA44619DFF" module="sqllang" /> <frame id="32" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="33" address="0x7FFA43697700" module="sqllang" /> <frame id="34" address="0x7FFA436A2E8B" module="sqllang" /> <frame id="35" address="0x7FFA436956F1" module="sqllang" /> <frame id="36" address="0x7FFA43695402" module="sqllang" /> <frame id="37" address="0x7FFA77AA9165" module="SqlDK" /> <frame id="38" address="0x7FFA77AA9880" module="SqlDK" /> <frame id="39" address="0x7FFA77AA94E3" module="SqlDK" /> </stackFrames>
▶ <inputbuf>...<inputbuf> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </inputbuf>
</process> </process-list> <resource-list> <keylock objectname="MyBlogDB.dbo.Configuration" indexname="PK_Configuration_Id" mode="X"> <owner-list> <owner id="process2b5c1acc478" mode="X" /> </owner-list> <waiter-list> <waiter id="process2b5c1d01048" mode="S" requestType="wait" /> </waiter-list> </keylock> <keylock objectname="MyBlogDB.dbo.Configuration" indexname="I1_ProcessId_StartDate_EndDate" mode="U"> <owner-list> <owner id="process2b5c1d01048" mode="S" /> </owner-list> <waiter-list> <waiter id="process2b5c1acc478" mode="X" requestType="convert" /> </waiter-list> </keylock> </resource-list> </deadlock>

The deadlock occurred while session 1 (process2b5c1d01048) was processing the SELECT and session 2 (process2b5c1acc478) was processing the UPDATE. Session 1 holds an S lock on the I1 index and wants an S lock on the PK index. Session 2, conversely, holds an X lock on the PK index and wants an X lock on the I1 index. Both sessions want the index that the other already holds a lock on, and neither side is willing to release. Thus, a deadlock occurs.

If the two sessions were updating different ProcessIDs, the key locks would target different rows, and no deadlock would occur. No fix needed if the two sessions never contend for the same rows, but since users expect a clean experience instead of an abrupt failure, let's explore our options.

First, though, an important question: why do the two queries acquire locks on the two indexes in opposite orders? Session 1 performs a seek on I1 first, then does a key lookup on PK to retrieve columns not covered by I1. Session 2 also seeks on I1, but when updating, it applies changes to PK first, then maintains I1. That's what creates the inversion. See their actual execution plans below:


Before I asked AI 

I came up with a few ideas to fix:

  • Tune the queries to finish faster: Both queries are simple and already have a reasonable index.  I could have reordered I1 to (ProcessID, EndDate, StartDate) or added a new index I2 with (ProcessID, EndDate) as key to better match the WHERE clause. However, reordering I1 might affect other queries, and adding a new index means additional maintenance overhead.
  • Create a covering index: adding columns to I1 to make it a covering index would eliminate the key lookup on PK for the SELECT query. But that adds even more maintenance overhead, in my real-world scenario, that would have meant adding 10 more columns.
  •  Add retry logic: Retry the operation automatically when it fails due to a deadlock.

Every option I could think of was adding overhead somewhere, so I landed on retry logic. It's perfectly reasonable; users won't see a strange deadlock error they don't know how to handle, and since two users competing to update the same row is expected behavior, it's fine for one to fail and have the app gracefully retry. I messaged my coworker about what I thought we should do.

Then I asked Claude.

AI as my advisor on the problem

I was curious how the AI would approach this problem, so I gave it the deadlock report, the query text, and the execution plans. The sp_BlitzLock proc collects all of these in one place, which means I didn't have to do that work myself. That's one of the lessons I took from this: when the right tooling makes it easy to gather data, you're much more likely to actually bring that data to AI in the first place, enhancing the quality of the answers.

The AI correctly identified the deadlock and its root cause:

  • "Classic circular wait: A holds NCI S-lock and wants CI; B holds CI X-lock and wants NCI. SQL Server kills A as the victim."  ✓
  • "The lock ordering inversion is driven by two things: (1) The key lookup in the SELECT plan forces an NCI → CI lock acquisition sequence (2) The UPDATE acquires locks in CI → NCI order (it updates the clustered index first, then maintains the non-clustered index)" ✓
For the fixes, it did a better job this time around: it didn't flag the covering index approach as the "Best Fix," which it had in the real-world scenario I mentioned in my last post. My guess is that I used a better prompt this time (borrowed from First Responder Kit - Using AI documentation). 


Here's the fix Claude suggested:
" Fix 1 — Rewrite the SP (Primary Fix):
Eliminate the separate SELECT entirely by using UPDATE ... OUTPUT. This collapses the SELECT + UPDATE into a single statement, removing the interleaved lock acquisition that causes the circular wait. Two concurrent sessions doing UPDATE ... OUTPUT will conflict on the NCI U-lock upfront and serialize cleanly — no deadlock possible."
CREATE OR ALTER PROCEDURE [dbo].[Configuration_Update_IsDefaultConfigOption]
    @ProcessId INT,
    @IsDefaultConfigOptionInput BIT
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @CurrentDate DATETIME = GETDATE();
    DECLARE @ID INT;
    DECLARE @Value1 VARCHAR(50);
    DECLARE @Value2 VARCHAR(50);

    -- Table variable to capture the pre-update row values via OUTPUT
    DECLARE @Captured TABLE
    (
        Id     INT,
        Value1 VARCHAR(50),
        Value2 VARCHAR(50)
    );

    BEGIN TRY
        BEGIN TRANSACTION;

        -- Single statement replaces the separate SELECT + UPDATE.
        -- OUTPUT captures the values we need before the row is modified.
        UPDATE [dbo].[Configuration]
        SET [EndDate] = @CurrentDate
        OUTPUT
            DELETED.[Id],
            DELETED.[Value1],
            DELETED.[Value2]
        INTO @Captured (Id, Value1, Value2)
        WHERE [ProcessId] = @ProcessId
          AND [EndDate] IS NULL;

        SELECT
            @ID     = Id,
            @Value1 = Value1,
            @Value2 = Value2
        FROM @Captured;

        INSERT INTO [dbo].[Configuration]
            ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption])
        VALUES
            (@ProcessId, @Value1, @Value2, @CurrentDate, NULL, @IsDefaultConfigOptionInput);

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        IF @@TRANCOUNT > 0
            ROLLBACK TRANSACTION;
        THROW;
    END CATCH
END
GO

The fix is simple and clean in hindsight, but it never crossed my mind, simply because UPDATE...OUTPUT isn't something I reach for often. That's exactly why I think AI is something all of us should learn to use well. I now treat Claude as an advisor I consult for second opinions (one who draws from a much wider surface area than I can hold in my head at any one time XD). And just like any working relationship, I'm still learning how to get the most out of it.

*Of course, I tested the UPDATE...OUTPUT fix, it works 🎉.

 

Thursday, March 12, 2026

What a Deadlock Taught Me About Working with AI

 Why I’m Writing This?

Last week, my coworker sent me a report about a deadlock and asked me to help troubleshoot it. The event gave me so many thoughts that it made me feel like writing a blog post sharing them and exploring how AI has been for me in my day-to-day job as a DBA. 

I was afraid the blog would go on too long by mixing technical demos with personal reflections, so I'm splitting this into two posts. In this one, I will share my perspective on AI and my realizations after the deadlock event. In a separate future post, I will cover the deadlock event itself with a demo and how I used AI as a second opinion to solve the problem. 

 (Spoiler Alert: Claude gave me several options to solve the deadlock. Annoyingly, the first option, according to it, the least risky solution, was to create a covering index spanning every single column, and you know it's against best practice. But the second option that it suggested, using “UPDATE…OUTPUT”, was the one that worked, and that impressed me because I hadn't even considered it, having barely seen or used that syntax in my four years working with T-SQL.)

From Ignorance to Awareness

Amid the chaos and excitement of the AI boom, 

I gradually found myself incorporating AI more into my daily work.

I had been using AI since the ChatGPT’s release, and later my company adopted Claude, but only to ask them to help me with things like writing a DMV query, proofreading my emails and messages, or comparing two pieces of code to make sure an improved version was doing the same thing as the original. These are tasks I believe AI can do better than me. 

I didn’t trust it to help me rewrite or tune queries, troubleshoot deadlocks, tasks I thought I would handle better myself, since I can collect more data about an issue than the AI can, I know my system better than it does, and sometimes when I did try, the AI just gave me wrong answers that I felt would only waste my time. (I still now resist asking it to rewrite production queries for me, though I can accept its tuning ideas and rewrite them myself.)

Thankfully, I kept myself updated; I followed Brent’s newsletters and read every tech blog he shared. And of course, among those were posts about AI and how DBAs are using it in their work (everyone is crazy about AI now). I learned about writing good prompts, giving the AI as much useful information as I would need myself (like the query text, the query plan, the deadlock report, ect.), and establishing a workflow that makes it simple enough to collect that data in the first place. For example, the sp_BlitzCache proc (First Responder Kit) has incorporated AI. I haven't tried that functionality yet, but my understanding is that you provide it the stored procedure name you're trying to tune along with an AI configuration (e.g., the model, a tailored prompt for the problem at hand). And the proc will automatically collect the query text and execution plan, sends them to the AI with your prompt, and returns a response. It is very neat. There is some setup involved, I believe SQL Server 2025 is required, but it should be straightforward and one time setup. I'll definitely give it a try sometime.

I picked up all of this gradually over time. But it wasn't until last week that I actually used Claude to troubleshoot a deadlock issue. And honestly, I was reluctant at first. I had already analyzed the deadlock report myself and exhausted every solution I could think of, ruling each one out due to its limitations. I sent my findings to my coworker, a lengthy write-up explaining how the deadlock occurred, what the potential solutions were and why none of them would work, and concluding that adding retry logic might be our only fix.

Then, about five minutes later, I had an idea: why not give AI a try? I was confident there wouldn't be any workable solution beyond retry logic, but I was also curious whether the AI would even understand how the deadlock had occurred in the first place. (Looking back, I think the flood of posts about AI replacing white-collar workers, and all the advice to start using AI before it replaces you, had quietly gotten to me. That fear had probably crept in.) So I gave it everything that I used to understand the problem: the query text, the query plans, and the deadlock report. 

Claude understood the problem very well. It gave me an answer, not a perfect one (it recommended a covering index with almost every columns in the table as included columns, and referred to this solution as “Best fix, lowest risk”). Then I read the second suggestion.

I stopped for a moment and thought:

“Oh… why didn’t I think of that?”

Claude suggested rewriting the logic using UPDATE…OUTPUT, collapsing the SELECT and UPDATE into a single statement and eliminating the deadlock path entirely. I realized AI has been getting smarter. Or had it always been, and I simply hadn't been using it correctly?

The Advisor

I think people relate to AI differently. Some see it as nothing more than a fancy search engine, others treat it like a personal assistant, and some have even started calling it a friend. For me, the word that feels most right is advisor.

You know those advisors in Civilization? I've been playing that game recently, and they pop up to offer help and suggestions for production and research choices based on where your civilization stands. I almost always ignored them, stubbornly convinced that I should figure everything out myself. And honestly, that might be exactly why I never play well.

But here's the thing: AI is a genuinely smart advisor. It proposed an elegant solution to my deadlock problem, one I never would have found on my own, and probably wouldn't have stumbled across online either. I fully understood the problem. I knew what needed fixing. I just hadn't thought of UPDATE…OUTPUT because I barely use it, and it simply didn't cross my mind. That's exactly what a good advisor is for, not to replace your thinking, but to catch the blind spots you didn't know you had.

So from now on, that's how I plan to use it: as a second opinion from someone smarter than me in ways I haven't figured out yet.

Of course, a good advisor still needs the right information to work with. In my case, that means a well-crafted prompt with everything I would gather if I were solving the problem myself. 

Maybe AI Won't Replace Me

Remember that covering index recommendation I mentioned? That one has more than 10 included columns and that goes against general best practices. The more columns you pack into an index, the heavier your modification costs (INSERT, UPDATE, DELETE) and the more painful your maintenance becomes. I still haven't figured out why Claude flagged that as the "Best fix, lowest risk" solution. But honestly? I'm not complaining. It's a reminder that someone needs to catch these things, and that someone is us.

I genuinely believe the best setup for any company right now is a knowledgeable human working alongside a capable AI. Not one replacing the other, but each covering the other's blind spots.

Here’s another story. Before my coworker brought the deadlock to me, he had already tried fixing it by adding a query hint. It didn’t work. My guess is that he had asked Claude for help and applied the suggestion it gave him. Since he’s a developer and not a DBA, he likely pasted only the query into Claude without including the deadlock report or execution plans. I tested that theory by doing the same thing myself and got a similar answer.

The lesson isn’t that AI failed him. It simply worked with the information it was given. Without the right context, even a smart advisor will point you in the wrong direction. That’s where domain knowledge matters. A developer might not know what a deadlock report is, why an execution plan matters, or what we can give so AI has a better understanding of the problem at hand. A DBA does. And that gap is why we're still in the room.

Control what you can control

There's a Stoic idea I keep coming back to: control what you can control.

I can't control whether AI will eventually change my role or reshape my industry. But I can control whether I stay curious, keep learning, and actually engage with the tools that are changing the world around me. Worrying about replacement is wasted energy. Using that same energy to grow; that's a choice I can make every day.

That mindset shift is really what this whole blog is about. I went from ignoring AI for anything serious, to letting it surprise me with a creative solution for my deadlock problem. That's not a small thing. It changed how I think about what AI is for and what my role alongside it looks like.

I'm also genuinely grateful. Grateful to still have a job when so many others haven't been so fortunate. Grateful for my employer. And grateful for the DBA community, for the newsletters, the blog posts, the shared knowledge that quietly kept me moving forward even when I didn't realize it. 


Monday, February 2, 2026

Wait Stats First: How I Start my SQL Server Performance Tuning

The Theme Park Analogy

A friend once told me about his trip to Universal Studios: how packed it was, and how it took forever just to get on anything.

His experience basically came down to two kinds of time: the time that actually felt fun and worthwhile (walking around and being on the rides), and the time that felt like dead time (standing around waiting to get in).

When a user application submits a request to SQL Server for data, the same dynamic plays out. Query response time includes CPU processing time, when the query is actually being productive (reading data, processing it, and sending it back), and time spent waiting on a particular resource (disk I/O, memory, network, locking, and so on). In a concurrent database system, with hundreds or thousands of user requests competing for limited resources, waiting is inevitable.

Performance Tuning Starts with Wait Statistics

Sticking with the analogy: there are things visitors and the theme park can do to enhance the overall experience. Visitors may plan their trip on weekdays when things are less busy and plan ahead so that they can get the most out of the day. The theme park can study what contributes most to customers’ wait times and research solutions accordingly. 

In the SQL Server world, to reduce response time, there are two parallel levers we can pull:

- Reduce processing time by tuning slow queries: queries can't tune themselves yet (maybe someday, we hope), but for now that's still our job. Maybe the execution plan is inefficient because the query is too complex, maybe there’s a missing index, or maybe it reads so much data that it’s time to consider pagination.

- Reduce wait time: This means finding the bottleneck, what resources the server is waiting on, and directing our tuning efforts accordingly.

When SQL Server as a whole is slow, analyzing wait statistics is an effective way to find the primary reasons queries are waiting, so we can focus our tuning efforts on the right problem. Once we’ve identified the bottlenecks, there are many things we can do to relieve them, depending on the problems we find: add resources, tune queries and indexes, or reconfigure settings.

The Resource Waits

For every wait that happens, SQL Server records the length of the wait and its cause, aka the wait type, which generally tells us what resource the request was waiting on. We can find this data in the sys.dm_os_wait_stats DMV. This DMV records cumulative totals for each wait type since the last SQL Server restart (or since wait stats were manually cleared). I use it to establish a wait baseline chart, which I’ll discuss in more detail below.

Not all waits are meaningful. There are problematic waits and benign waits, and we should filter the benign ones out during analysis. The "More to Read" section at the end links to a blog post by Brent Ozar and a white paper by SQLSkills that cover different wait types in more detail.

Identifying the top waits is only half the battle. Once you know what your server is waiting on, the next question is: what do you actually do about it? I'll be dedicating separate blog posts to the most common wait types I encounter and how I approach troubleshooting each one. Stay tuned for those as this series grows.

The Baseline

If you use First Responder Kit , you can get a quick snapshot of the top waits since the last SQL Server startup with a single call:

exec sp_blitzfirst @sincestartup=1

The downside is that you lose that history when SQL Server restarts. For long-term tracking, I needed something that would survive restarts and let me compare across days, weeks, and upgrades. So, I built my own wait stats chart.

The Chart


The chart tracks two performance metrics side by side:

  • Batch requests/second (the red line): how many queries the server is handling.
  • Wait Time (Second) per Core per Second (the bars): how much time (per core, per second) SQL Server spends waiting on specific wait types.

Having both metrics in one view gives me a few things I find valuable:

  • More context: It’s not just “what are we waiting on?"; the Batch Requests/sec line shows how busy the server actually is, which helps separate "high waits because the server is slammed" from "high waits because something is broken."
  • Baseline and comparison:  I can use it to establish a server performance baseline for things like upgrading SQL Server to a newer version. I can spot anomaly periods where wait time is significantly higher and quickly see which waits are driving the change. For example, at the end of Jan 27 and the beginning of Jan 28, the server had an abnormal increase in CXPACKET waits. After tuning the query that contributed to it, we see a drop by mid-day on Jan 28.

Building the chart

The chart above is powered by a SQL Server Agent job that snapshots wait stats and batch request counters at regular intervals. Here's what the script does at a high level:

It creates a set of tables under a perftune schema to store snapshots and computed deltas. Every time the job fires, it captures the current cumulative counters, computes the difference from the last run, and normalizes the results. One year of history is retained by default.

If you want to build a similar chart:

  1. Create a SQL Server Agent job that runs every 15 minutes (or whatever interval you want; just make sure to change the @n value in the script too).
  2. Replace <YourDatabase> with the database where you want to store the monitoring tables
  3. The script below keeps one year of history by default; you can change that retention window
For reference, this is how the script computes each metric:

  • Batch Requests/sec: The script reads the current cumulative “Batch Requests/sec” perf counter from sys.dm_os_performance_counters, subtracts the last saved counter value, then divides by the elapsed time window (@delta = @n * 60).                                               
  • BatchRequestsPerSecond = (NewCounterValue - LastCounterValue) / @delta_seconds
  • Wait per core per second: The script snapshots sys.dm_os_wait_stats, computes resource wait time as wait_time_ms - signal_wait_time_ms, then takes the delta from the prior snapshot. That delta is converted to seconds, divided by CPU core count, and divided by the time window.
  • WaitSecondsPerCorePerSecond = ((NewResourceWaitMs - LastResourceWaitMs) / 1000) / @delta_seconds / cores

For the visualization itself, I use Power BI connected directly to the monitoring tables. Once the Agent job is running and data starts accumulating, it's straightforward to point Power BI at the perftune.WaitStatsTop10 and perftune.BatchRequestsPerSecond tables and build the stacked bar chart and line overlay from there.

Below is the full script for the agent job. I've commented it inline so you can follow along:


USE <YourDatabase>;
GO

/*------------------------------------------------------------------------------
PERFTUNE COLLECTION SCRIPT

Runs as a recurring job (every @n minutes; default 15).

Captures:
  1) Batch Requests/sec (per-interval delta from prior run)
  2) Top 10 *resource* waits (seconds/core/second) using deltas from the
     previous wait snapshot

How deltas are calculated:
  - Batch Requests/sec:
      (NewCounterValue - LastCounterValue) / @delta_seconds
  - Resource waits:
      ((NewResourceWaitMs - LastResourceWaitMs) / 1000) / @delta_seconds / cores

Notes:
  - First run seeds baseline counters; delta-based results require a prior run.
  - CollectionDate is rounded down to the minute (HH:mm:00) for consistent keys.
  - History is retained for 365 days (see cleanup section).
------------------------------------------------------------------------------*/


/*------------------------------------------------------------------------------
1) BOOTSTRAP PERSISTENT OBJECTS (schema + tables)
------------------------------------------------------------------------------*/
IF NOT EXISTS (
    SELECT 1
    FROM sys.schemas
    WHERE name = N'perftune'
)
BEGIN
    EXEC('CREATE SCHEMA perftune');
END
GO

IF OBJECT_ID(N'perftune.LastBatchRequestsCount', N'U') IS NULL
BEGIN
  /* Stores last observed cumulative Batch Requests/sec counter value
     to compute a per-interval delta on subsequent runs. */
  CREATE TABLE perftune.LastBatchRequestsCount
  (
      LastBatchRequestsCount bigint
  )
  INSERT INTO perftune.LastBatchRequestsCount VALUES (0)
END
GO

IF OBJECT_ID(N'perftune.BatchRequestsPerSecond', N'U') IS NULL
BEGIN
  /* Time series of calculated Batch Requests/sec per collection interval. */
  CREATE TABLE perftune.BatchRequestsPerSecond
  (
      CollectionDate datetime NOT NULL PRIMARY KEY,
      BatchRequestsPerSecond int
  )
END 
GO

IF OBJECT_ID(N'perftune.LastWaitStats', N'U') IS NULL
BEGIN
  /* Stores the most recent wait snapshot (one row per wait type).
     Truncated/reloaded each run; used to compute per-interval deltas. */
  CREATE TABLE perftune.LastWaitStats
  (
      CollectionDate datetime NOT NULL,
      WaitType NVARCHAR(100) NOT NULL, 
      WaitTimeMs numeric(19,6),
      PRIMARY KEY (WaitType)
  )
END 
GO

IF OBJECT_ID(N'perftune.WaitStatsTop10', N'U') IS NULL
BEGIN
  /* Time series of top 10 resource-wait deltas per interval,
     normalized to seconds/core/second. */
  CREATE TABLE perftune.WaitStatsTop10
  (
      CollectionDate datetime not null,
      WaitType NVARCHAR(100) not null, 
      WaitTimeSecondPerCorePerSecond numeric(19,6),
      PRIMARY KEY (CollectionDate, WaitType)
  )
END 
GO


/*------------------------------------------------------------------------------
2) COLLECT CURRENT INTERVAL SNAPSHOT
------------------------------------------------------------------------------*/
DECLARE @n as int = 15;
DECLARE @delta as decimal(10,2) = @n*60.;

-- Number of logical CPU cores (used for per-core normalization)
DECLARE @NumberOfCores INT =
(
    SELECT cpu_count
    FROM sys.dm_os_sys_info
);

-- Collection date/time (rounded down to the minute boundary)
DECLARE @CollectionDate DATETIME2(0) = SYSDATETIME();
SET @CollectionDate = DATEADD(MINUTE, DATEDIFF(MINUTE, 0, @CollectionDate), 0);

SELECT @CollectionDate AS CollectionDate;


----------------------------------------------------------------
-- Batch Requests/sec (delta from prior run)
----------------------------------------------------------------
DECLARE @OldBatchRequestsCount BIGINT =
(
    SELECT TOP (1) LastBatchRequestsCount
    FROM perftune.LastBatchRequestsCount
);

DECLARE @NewBatchRequestsCount BIGINT =
(
    SELECT cntr_value
    FROM sys.dm_os_performance_counters
    WHERE object_name LIKE '%:SQL Statistics%'
      AND counter_name = 'Batch Requests/sec'
);

IF (@OldBatchRequestsCount>0)
BEGIN
  INSERT INTO perftune.BatchRequestsPerSecond
  (
      CollectionDate,
      BatchRequestsPerSecond
  )
  SELECT
      CAST(@CollectionDate AS DATETIME) AS CollectionDate,
      (@NewBatchRequestsCount - @OldBatchRequestsCount) / @delta AS BatchRequestsPerSecond
  WHERE (@NewBatchRequestsCount - @OldBatchRequestsCount) / @delta > 0;
END

-- Persist the latest cumulative counter value for the next run’s delta
UPDATE perftune.LastBatchRequestsCount
SET LastBatchRequestsCount = @NewBatchRequestsCount;

----------------------------------------------------------------
-- Wait Stats (resource waits only); capture current snapshot
-- ResourceWaitTimeMs = wait_time_ms - signal_wait_time_ms
----------------------------------------------------------------
DROP TABLE IF EXISTS #NewWaitStats;

CREATE TABLE #NewWaitStats
(
    CollectionDate DATETIME2(0)   NOT NULL,
    WaitType       NVARCHAR(100)  NOT NULL,
    WaitTimeMs     NUMERIC(19,6) NOT NULL,
    CONSTRAINT PK_NewWaitStats PRIMARY KEY (WaitType)
);

INSERT INTO #NewWaitStats
(
    CollectionDate,
    WaitType,
    WaitTimeMs
)
SELECT
    @CollectionDate AS CollectionDate,
    wait_type       AS WaitType,
    (wait_time_ms - signal_wait_time_ms) AS ResourceWaitTimeMs
FROM sys.dm_os_wait_stats
WHERE waiting_tasks_count <> 0
  AND wait_type NOT IN
  (
        -- These wait types are almost 100% never a problem and so they are
        -- filtered out to avoid them skewing the results. Click on the URL
        -- for more information.
        N'BROKER_EVENTHANDLER',        -- https://www.sqlskills.com/help/waits/BROKER_EVENTHANDLER
        N'BROKER_RECEIVE_WAITFOR',     -- https://www.sqlskills.com/help/waits/BROKER_RECEIVE_WAITFOR
        N'BROKER_TASK_STOP',           -- https://www.sqlskills.com/help/waits/BROKER_TASK_STOP
        N'BROKER_TO_FLUSH',            -- https://www.sqlskills.com/help/waits/BROKER_TO_FLUSH
        N'BROKER_TRANSMITTER',         -- https://www.sqlskills.com/help/waits/BROKER_TRANSMITTER
        N'CHECKPOINT_QUEUE',           -- https://www.sqlskills.com/help/waits/CHECKPOINT_QUEUE
        N'CHKPT',                      -- https://www.sqlskills.com/help/waits/CHKPT
        N'CLR_AUTO_EVENT',             -- https://www.sqlskills.com/help/waits/CLR_AUTO_EVENT
        N'CLR_MANUAL_EVENT',           -- https://www.sqlskills.com/help/waits/CLR_MANUAL_EVENT
        N'CLR_SEMAPHORE',              -- https://www.sqlskills.com/help/waits/CLR_SEMAPHORE

        -- Maybe comment this out if you have parallelism issues
        N'CXCONSUMER',                 -- https://www.sqlskills.com/help/waits/CXCONSUMER

        -- Maybe comment these four out if you have mirroring issues
        N'DBMIRROR_DBM_EVENT',         -- https://www.sqlskills.com/help/waits/DBMIRROR_DBM_EVENT
        N'DBMIRROR_EVENTS_QUEUE',      -- https://www.sqlskills.com/help/waits/DBMIRROR_EVENTS_QUEUE
        N'DBMIRROR_WORKER_QUEUE',      -- https://www.sqlskills.com/help/waits/DBMIRROR_WORKER_QUEUE
        N'DBMIRRORING_CMD',            -- https://www.sqlskills.com/help/waits/DBMIRRORING_CMD

        N'DIRTY_PAGE_POLL',            -- https://www.sqlskills.com/help/waits/DIRTY_PAGE_POLL
        N'DISPATCHER_QUEUE_SEMAPHORE', -- https://www.sqlskills.com/help/waits/DISPATCHER_QUEUE_SEMAPHORE
        N'EXECSYNC',                   -- https://www.sqlskills.com/help/waits/EXECSYNC
        N'FSAGENT',                    -- https://www.sqlskills.com/help/waits/FSAGENT
        N'FT_IFTS_SCHEDULER_IDLE_WAIT',-- https://www.sqlskills.com/help/waits/FT_IFTS_SCHEDULER_IDLE_WAIT
        N'FT_IFTSHC_MUTEX',            -- https://www.sqlskills.com/help/waits/FT_IFTSHC_MUTEX

        -- Maybe comment these six out if you have AG issues
        N'HADR_CLUSAPI_CALL',                   -- https://www.sqlskills.com/help/waits/HADR_CLUSAPI_CALL
        N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',  -- https://www.sqlskills.com/help/waits/HADR_FILESTREAM_IOMGR_IOCOMPLETION
        N'HADR_LOGCAPTURE_WAIT',                -- https://www.sqlskills.com/help/waits/HADR_LOGCAPTURE_WAIT
        N'HADR_NOTIFICATION_DEQUEUE',           -- https://www.sqlskills.com/help/waits/HADR_NOTIFICATION_DEQUEUE
        N'HADR_TIMER_TASK',                     -- https://www.sqlskills.com/help/waits/HADR_TIMER_TASK
        N'HADR_WORK_QUEUE',                     -- https://www.sqlskills.com/help/waits/HADR_WORK_QUEUE

        N'KSOURCE_WAKEUP',              -- https://www.sqlskills.com/help/waits/KSOURCE_WAKEUP
        N'LAZYWRITER_SLEEP',            -- https://www.sqlskills.com/help/waits/LAZYWRITER_SLEEP
        N'LOGMGR_QUEUE',                -- https://www.sqlskills.com/help/waits/LOGMGR_QUEUE
        N'MEMORY_ALLOCATION_EXT',       -- https://www.sqlskills.com/help/waits/MEMORY_ALLOCATION_EXT
        N'ONDEMAND_TASK_QUEUE',         -- https://www.sqlskills.com/help/waits/ONDEMAND_TASK_QUEUE
        N'PARALLEL_REDO_DRAIN_WORKER',  -- https://www.sqlskills.com/help/waits/PARALLEL_REDO_DRAIN_WORKER
        N'PARALLEL_REDO_LOG_CACHE',     -- https://www.sqlskills.com/help/waits/PARALLEL_REDO_LOG_CACHE
        N'PARALLEL_REDO_TRAN_LIST',     -- https://www.sqlskills.com/help/waits/PARALLEL_REDO_TRAN_LIST
        N'PARALLEL_REDO_WORKER_SYNC',   -- https://www.sqlskills.com/help/waits/PARALLEL_REDO_WORKER_SYNC
        N'PARALLEL_REDO_WORKER_WAIT_WORK', -- https://www.sqlskills.com/help/waits/PARALLEL_REDO_WORKER_WAIT_WORK

        N'PREEMPTIVE_OS_FLUSHFILEBUFFERS', -- https://www.sqlskills.com/help/waits/PREEMPTIVE_OS_FLUSHFILEBUFFERS
        N'PREEMPTIVE_XE_GETTARGETSTATE',   -- https://www.sqlskills.com/help/waits/PREEMPTIVE_XE_GETTARGETSTATE
        N'PVS_PREALLOCATE',                -- https://www.sqlskills.com/help/waits/PVS_PREALLOCATE

        N'PWAIT_ALL_COMPONENTS_INITIALIZED',     -- https://www.sqlskills.com/help/waits/PWAIT_ALL_COMPONENTS_INITIALIZED
        N'PWAIT_DIRECTLOGCONSUMER_GETNEXT',      -- https://www.sqlskills.com/help/waits/PWAIT_DIRECTLOGCONSUMER_GETNEXT
        N'PWAIT_EXTENSIBILITY_CLEANUP_TASK',     -- https://www.sqlskills.com/help/waits/PWAIT_EXTENSIBILITY_CLEANUP_TASK

        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',     -- https://www.sqlskills.com/help/waits/QDS_PERSIST_TASK_MAIN_LOOP_SLEEP
        N'QDS_ASYNC_QUEUE',                      -- https://www.sqlskills.com/help/waits/QDS_ASYNC_QUEUE
        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', -- https://www.sqlskills.com/help/waits/QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP
        N'QDS_SHUTDOWN_QUEUE',                   -- https://www.sqlskills.com/help/waits/QDS_SHUTDOWN_QUEUE

        N'REDO_THREAD_PENDING_WORK',     -- https://www.sqlskills.com/help/waits/REDO_THREAD_PENDING_WORK
        N'REQUEST_FOR_DEADLOCK_SEARCH',  -- https://www.sqlskills.com/help/waits/REQUEST_FOR_DEADLOCK_SEARCH
        N'RESOURCE_QUEUE',               -- https://www.sqlskills.com/help/waits/RESOURCE_QUEUE
        N'SERVER_IDLE_CHECK',            -- https://www.sqlskills.com/help/waits/SERVER_IDLE_CHECK

        N'SLEEP_BPOOL_FLUSH',        -- https://www.sqlskills.com/help/waits/SLEEP_BPOOL_FLUSH
        N'SLEEP_DBSTARTUP',          -- https://www.sqlskills.com/help/waits/SLEEP_DBSTARTUP
        N'SLEEP_DCOMSTARTUP',        -- https://www.sqlskills.com/help/waits/SLEEP_DCOMSTARTUP
        N'SLEEP_MASTERDBREADY',      -- https://www.sqlskills.com/help/waits/SLEEP_MASTERDBREADY
        N'SLEEP_MASTERMDREADY',      -- https://www.sqlskills.com/help/waits/SLEEP_MASTERMDREADY
        N'SLEEP_MASTERUPGRADED',     -- https://www.sqlskills.com/help/waits/SLEEP_MASTERUPGRADED
        N'SLEEP_MSDBSTARTUP',        -- https://www.sqlskills.com/help/waits/SLEEP_MSDBSTARTUP
        N'SLEEP_SYSTEMTASK',         -- https://www.sqlskills.com/help/waits/SLEEP_SYSTEMTASK
        N'SLEEP_TASK',               -- https://www.sqlskills.com/help/waits/SLEEP_TASK
        N'SLEEP_TEMPDBSTARTUP',      -- https://www.sqlskills.com/help/waits/SLEEP_TEMPDBSTARTUP

        N'SNI_HTTP_ACCEPT',              -- https://www.sqlskills.com/help/waits/SNI_HTTP_ACCEPT
        N'SOS_WORK_DISPATCHER',          -- https://www.sqlskills.com/help/waits/SOS_WORK_DISPATCHER
        N'SP_SERVER_DIAGNOSTICS_SLEEP',  -- https://www.sqlskills.com/help/waits/SP_SERVER_DIAGNOSTICS_SLEEP

        N'SQLTRACE_BUFFER_FLUSH',            -- https://www.sqlskills.com/help/waits/SQLTRACE_BUFFER_FLUSH
        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', -- https://www.sqlskills.com/help/waits/SQLTRACE_INCREMENTAL_FLUSH_SLEEP
        N'SQLTRACE_WAIT_ENTRIES',            -- https://www.sqlskills.com/help/waits/SQLTRACE_WAIT_ENTRIES

        N'VDI_CLIENT_OTHER',         -- https://www.sqlskills.com/help/waits/VDI_CLIENT_OTHER
        N'WAIT_FOR_RESULTS',         -- https://www.sqlskills.com/help/waits/WAIT_FOR_RESULTS
        N'WAITFOR',                  -- https://www.sqlskills.com/help/waits/WAITFOR
        N'WAITFOR_TASKSHUTDOWN',     -- https://www.sqlskills.com/help/waits/WAITFOR_TASKSHUTDOWN

        N'WAIT_XTP_RECOVERY',        -- https://www.sqlskills.com/help/waits/WAIT_XTP_RECOVERY
        N'WAIT_XTP_HOST_WAIT',       -- https://www.sqlskills.com/help/waits/WAIT_XTP_HOST_WAIT
        N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', -- https://www.sqlskills.com/help/waits/WAIT_XTP_OFFLINE_CKPT_NEW_LOG
        N'WAIT_XTP_CKPT_CLOSE',      -- https://www.sqlskills.com/help/waits/WAIT_XTP_CKPT_CLOSE

        N'XE_DISPATCHER_JOIN',       -- https://www.sqlskills.com/help/waits/XE_DISPATCHER_JOIN
        N'XE_DISPATCHER_WAIT',       -- https://www.sqlskills.com/help/waits/XE_DISPATCHER_WAIT
        N'XE_TIMER_EVENT'            -- https://www.sqlskills.com/help/waits/XE_TIMER_EVENT
  );

----------------------------------------------------------------
-- Top 10 waits by delta since last snapshot
-- Convert ms delta to seconds, then normalize by interval duration and cores:
--   (delta_WaitTimeMs / 1000) / @delta / cores
----------------------------------------------------------------
IF ((SELECT COUNT(*) FROM perftune.LastWaitStats) <> 0)
BEGIN
  INSERT INTO perftune.WaitStatsTop10
  (
      CollectionDate,
      WaitType,
      WaitTimeSecondPerCorePerSecond
  )
  SELECT TOP (10)
      n.CollectionDate,
      n.WaitType,
      (n.WaitTimeMs - l.WaitTimeMs) / 1000.0 / @delta / NULLIF(@NumberOfCores, 0)
          AS WaitTimeSecondPerCorePerSecond
  FROM #NewWaitStats AS n
  INNER JOIN perftune.LastWaitStats AS l
	ON n.WaitType = l.WaitType
  WHERE (n.WaitTimeMs - l.WaitTimeMs) > 0
  ORDER BY (n.WaitTimeMs - l.WaitTimeMs) DESC
END
GO

----------------------------------------------------------------
-- Persist latest wait snapshot for next run
----------------------------------------------------------------
TRUNCATE TABLE perftune.LastWaitStats;

INSERT INTO perftune.LastWaitStats
(
    CollectionDate,
    WaitType,
    WaitTimeMs
)
SELECT
    CollectionDate,
    WaitType,
    WaitTimeMs
FROM #NewWaitStats;


/*------------------------------------------------------------------------------
3) HISTORY RETENTION (delete rows older than 365 days)
------------------------------------------------------------------------------*/
DECLARE @CleanupNDatesOlder datetime;
SET @CleanupNDatesOlder = DATEADD(dd,-365,GETDATE());

-- perftune.WaitStatsAllTime
IF OBJECT_ID('perftune.WaitStatsTop10', 'U') IS NOT NULL
    DELETE FROM perftune.WaitStatsTop10 WHERE CollectionDate < @CleanupNDatesOlder;

-- perftune.BatchRequestsPerSecond
IF OBJECT_ID('perftune.BatchRequestsPerSecond', 'U') IS NOT NULL
    DELETE FROM perftune.BatchRequestsPerSecond WHERE CollectionDate < @CleanupNDatesOlder;

  

The Signal Wait

Every wait in SQL Server has two components: resource wait time and signal wait time. Resource wait time (wait_time_ms - signal_wait_time_ms) is how long a task waited for an external resource, a disk read, a lock, a memory grant, and so on. Signal wait time (signal_wait_time_ms) is how long the task waited to get back on a CPU after the resource became available. In other words, it was ready to run but had to wait in line for a CPU to pick it up.

Both matter, but they point to different problems. High resource waits tell you what the server is waiting on: slow storage, lock contention, memory pressure, etc. High signal waits tell you the CPUs are too busy to keep up; tasks are ready to work but there's no core available to run them. 

In this post, we focus on resource waits. I'll cover signal waits and CPU scheduling pressure in a future post.

Conclusion

SQL Server performance tuning has a lot in common with surviving a packed theme park: some time is real work (riding the rides), and some is just waiting in line. Wait stats let us see where SQL Server is spending that "line time," so we can stop guessing and focus on the actual bottleneck.

The baseline chart is my way of keeping that visibility over time, so when things change, new workload, code deploys, upgrades, or just a bad day, I can quickly spot what shifted and why.

The takeaway: if your SQL Server is slow and you don't know where to start, start with wait stats. They won't tell you everything, but they'll almost always point you in the right direction.

More to read

Wait Stats - Brent Ozar Unlimited®

sp_BlitzFirst @SinceStartup = 1 Shows You Wait Stats Since, Uh, Startup - Brent Ozar Unlimited®

How to Measure SQL Server Workloads: Wait Time per Core per Second - Brent Ozar Unlimited®

sql-server-performance-tuning-using-wait-statistics-whitepaper.pdf


If you spot anything off in the explanation (or have ideas to improve the script), I'd appreciate your feedback in the comments or via email. Thank you for reading and have a good day!