Intro
In my previous post, I shared how a deadlock incident
changed how I think about using AI in my day-to-day work as a DBA. Here I'll
walk through the incident itself. The data and queries aren't the real ones; I
recreated the scenario with my own made-up tables, but the deadlock behavior is
the same.
Set up a deadlock demo
Below is the T-SQL to create the table and load it with test
data. A couple of things worth knowing about the data:
- For
every ProcessId, there is exactly one active record, the one where EndDate
is NULL.
- The
StartDate and EndDate pattern reflects a soft-delete approach: instead of
updating a record directly, we close the current one by setting its
EndDate, then insert a new record with the updated values.
USE [MyBlogDB];
DROP TABLE IF EXISTS [dbo].[Configuration];
CREATE TABLE [dbo].[Configuration] (
[Id] INT IDENTITY(1,1),
[ProcessId] INT NOT NULL,
[Value1] VARCHAR(50) NOT NULL,
[Value2] VARCHAR(50) NULL,
[StartDate] DATETIME NOT NULL,
[EndDate] DATETIME NULL,
[IsDefaultConfigOption] BIT NOT NULL DEFAULT 0,
CONSTRAINT [PK_Configuration_Id] PRIMARY KEY ([Id])
);
-- CLAUDE wrote this code to generate 500,000 rows of test data for the Configuration table
;WITH Numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n < 500000
)
INSERT INTO [dbo].[Configuration] ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption])
SELECT
-- ProcessId cycles through 1–5000
((n - 1) % 5000) + 1,
-- Value1: e.g. 'Value_1', 'Value_2', ...
'Value_' + CAST(n AS VARCHAR(10)),
-- Value2: nullable, every 3rd row is NULL
CASE WHEN n % 3 = 0 THEN NULL ELSE 'Option_' + CAST((n % 10) + 1 AS VARCHAR(10)) END,
-- StartDate: spread over the past ~3 years
DATEADD(DAY, -(n % 1095), GETDATE()),
-- EndDate: NULL only on the last occurrence of each ProcessId (n > 495000),
-- all prior rows get an end date 1 year after their start
CASE WHEN n > 495000 THEN NULL ELSE DATEADD(DAY, -(n % 1095) + 365, GETDATE()) END,
-- IsDefaultConfigOption: alternates in groups of 4
CASE WHEN n % 4 IN (1, 2) THEN 1 ELSE 0 END
FROM Numbers
OPTION (MAXRECURSION 0); -- Required to allow recursion beyond the default 100 limit
SELECT * FROM [dbo].[Configuration] where processid=999;
I also added an index similar to the one that existed in the
real-world case. It doesn't perfectly cover the WHERE clause, but it's good
enough to be useful. I kept it that way intentionally, because in the real
world, I wouldn't rush to throw more indexes at the problem.
DROP INDEX IF EXISTS I1_ProcessId_StartDate_EndDate ON [dbo].[Configuration];
CREATE INDEX I1_ProcessId_StartDate_EndDate ON [dbo].[Configuration] ([ProcessId], [StartDate], [EndDate]);
Here's the stored procedure at the center of it all. You
might wonder why there is only one? in the real-world scenario, two colleagues
were accidentally running the same operation on the same ProcessId at nearly the same time, which is what caused the deadlock.
CREATE OR ALTER PROCEDURE [dbo].[Configuration_Update_IsDefaultConfigOption]
@ProcessId INT,
@IsDefaultConfigOptionInput BIT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @CurrentDate DATETIME = GETDATE();
DECLARE @ID INT;
DECLARE @Value1 VARCHAR(50);
DECLARE @Value2 VARCHAR(50);
BEGIN TRY
BEGIN TRANSACTION;
SELECT
@ID = [Id],
@Value1 = [Value1],
@Value2 = [Value2]
FROM [dbo].[Configuration]
WHERE [ProcessId] = @ProcessId
AND [EndDate] IS NULL;
UPDATE [dbo].[Configuration]
SET [EndDate] = @CurrentDate
WHERE [ProcessId] = @ProcessId
AND [EndDate] IS NULL;
INSERT INTO [dbo].[Configuration] ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption])
VALUES (@ProcessId, @Value1, @Value2, @CurrentDate, NULL, @IsDefaultConfigOptionInput);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
END
GO
The proc is straightforward. Given a ProcessId, it fetches
the current active record (where EndDate is NULL), closes it with a soft delete by
setting EndDate, then inserts a new record with the updated
IsDefaultConfigOption value and everything else carried over.
To reproduce a deadlock, I used SQLQueryStress with 4 threads all hammering this same query and ProcessID simultaneously (for 10 iterations each to increase the chance of getting the deadlock). In the real world there were only two competing sessions, but I bumped it up to 4 here just to make the deadlock easier to trigger reliably.
Understand the deadlock
Here is the deadlock graph I obtained from running the sp_BlitzLock proc (from First Responder Kit scripts).
<deadlock> <victim-list> <victimProcess id="process2b5c1d01048" /> </victim-list> <process-list> <process id="process2b5c1d01048" spid="87"> <executionStack> <frame procname="MyBlogDB.dbo.Configuration_Update_IsDefaultConfigOption" line="17"> SELECT @ID = [Id], @Value1 = [Value1], @Value2 = [Value2] FROM [dbo].[Configuration] WHERE [ProcessId] = @ProcessId AND [EndDate] IS NULL </frame>▶ more...
lockMode="S" waitresource="KEY: 5:72057594058899456 (294556843a07)" isolationlevel="read committed (2)" currentdbname="MyBlogDB" transactionname="user_transaction" lasttranstarted="2026-03-12T16:06:10.557" clientapp="SQLQueryStress" hostname="1NDWJD3"</executionStack>▶ <frame procname="adhoc">...
<frame procname="adhoc" line="1"> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </frame>▶ <stackFrames> (39 frames, collapsed)...
<stackFrames> <frame id="00" address="0x7FFA979F4694" module="ntdll" /> <frame id="01" address="0x7FFA94FBEB29" module="kernelbase" /> <frame id="02" address="0x7FFA77AA65CA" module="SqlDK" /> <frame id="03" address="0x7FFA77AA64ED" module="SqlDK" /> <frame id="04" address="0x7FFA77AA216C" module="SqlDK" /> <frame id="05" address="0x7FFA77AA1C33" module="SqlDK" /> <frame id="06" address="0x7FFA77AA3B66" module="SqlDK" /> <frame id="07" address="0x7FFA4659855A" module="sqlmin" /> <frame id="08" address="0x7FFA465987D0" module="sqlmin" /> <frame id="09" address="0x7FFA4650D84D" module="sqlmin" /> <frame id="10" address="0x7FFA46515599" module="sqlmin" /> <frame id="11" address="0x7FFA4652016E" module="sqlmin" /> <frame id="12" address="0x7FFA465160DD" module="sqlmin" /> <frame id="13" address="0x7FFA46515DFE" module="sqlmin" /> <frame id="14" address="0x7FFA465343AE" module="sqlmin" /> <frame id="15" address="0x7FFA780E1955" module="SqlTsEs" /> <frame id="16" address="0x7FFA46534419" module="sqlmin" /> <frame id="17" address="0x7FFA46533EBE" module="sqlmin" /> <frame id="18" address="0x7FFA465342B4" module="sqlmin" /> <frame id="19" address="0x7FFA46530020" module="sqlmin" /> <frame id="20" address="0x7FFA46543ECE" module="sqlmin" /> <frame id="21" address="0x7FFA4369D4D0" module="sqllang" /> <frame id="22" address="0x7FFA4369D702" module="sqllang" /> <frame id="23" address="0x7FFA43B022D9" module="sqllang" /> <frame id="24" address="0x7FFA44619DFF" module="sqllang" /> <frame id="25" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="26" address="0x7FFA43697700" module="sqllang" /> <frame id="27" address="0x7FFA436CB62F" module="sqllang" /> <frame id="28" address="0x7FFA436CB451" module="sqllang" /> <frame id="29" address="0x7FFA436CB278" module="sqllang" /> <frame id="30" address="0x7FFA43B022D9" module="sqllang" /> <frame id="31" address="0x7FFA44619DFF" module="sqllang" /> <frame id="32" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="33" address="0x7FFA43697700" module="sqllang" /> <frame id="34" address="0x7FFA436A2E8B" module="sqllang" /> <frame id="35" address="0x7FFA436956F1" module="sqllang" /> <frame id="36" address="0x7FFA43695402" module="sqllang" /> <frame id="37" address="0x7FFA77AA9165" module="SqlDK" /> <frame id="38" address="0x7FFA77AA9880" module="SqlDK" /> <frame id="39" address="0x7FFA77AA94E3" module="SqlDK" /> </stackFrames></process> <process id="process2b5c1acc478" spid="88"▶ <inputbuf>...
<inputbuf> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </inputbuf>> <executionStack> <frame procname="MyBlogDB.dbo.Configuration_Update_IsDefaultConfigOption" line="25"> UPDATE [dbo].[Configuration] SET [EndDate] = @CurrentDate WHERE [ProcessId] = @ProcessId AND [EndDate] IS NULL </frame>▶ more...
lockMode="X" waitresource="KEY: 5:72057594058964992 (3963a1a319be)" isolationlevel="read committed (2)" currentdbname="MyBlogDB" transactionname="user_transaction" lasttranstarted="2026-03-12T16:06:10.557" clientapp="SQLQueryStress" hostname="1NDWJD3"</executionStack>▶ <frame procname="adhoc">...
<frame procname="adhoc" line="1"> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </frame>▶ <stackFrames> (39 frames, collapsed)...
<stackFrames> <frame id="00" address="0x7FFA979F4694" module="ntdll" /> <frame id="01" address="0x7FFA94FBEB29" module="kernelbase" /> <frame id="02" address="0x7FFA77AA65CA" module="SqlDK" /> <frame id="03" address="0x7FFA77AA64ED" module="SqlDK" /> <frame id="04" address="0x7FFA77AA216C" module="SqlDK" /> <frame id="05" address="0x7FFA77AA1C33" module="SqlDK" /> <frame id="06" address="0x7FFA77AA3B66" module="SqlDK" /> <frame id="07" address="0x7FFA4659855A" module="sqlmin" /> <frame id="08" address="0x7FFA465987D0" module="sqlmin" /> <frame id="09" address="0x7FFA4650D84D" module="sqlmin" /> <frame id="10" address="0x7FFA46515599" module="sqlmin" /> <frame id="11" address="0x7FFA4652016E" module="sqlmin" /> <frame id="12" address="0x7FFA4671B463" module="sqlmin" /> <frame id="13" address="0x7FFA4671B147" module="sqlmin" /> <frame id="14" address="0x7FFA4671AE9E" module="sqlmin" /> <frame id="15" address="0x7FFA4671A770" module="sqlmin" /> <frame id="16" address="0x7FFA780E1B97" module="SqlTsEs" /> <frame id="17" address="0x7FFA4657E38E" module="sqlmin" /> <frame id="18" address="0x7FFA46530020" module="sqlmin" /> <frame id="19" address="0x7FFA46543ECE" module="sqlmin" /> <frame id="20" address="0x7FFA4369D4D0" module="sqllang" /> <frame id="21" address="0x7FFA436B4AB8" module="sqllang" /> <frame id="22" address="0x7FFA436B44D9" module="sqllang" /> <frame id="23" address="0x7FFA43B022D9" module="sqllang" /> <frame id="24" address="0x7FFA44619DFF" module="sqllang" /> <frame id="25" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="26" address="0x7FFA43697700" module="sqllang" /> <frame id="27" address="0x7FFA436CB62F" module="sqllang" /> <frame id="28" address="0x7FFA436CB451" module="sqllang" /> <frame id="29" address="0x7FFA436CB278" module="sqllang" /> <frame id="30" address="0x7FFA43B022D9" module="sqllang" /> <frame id="31" address="0x7FFA44619DFF" module="sqllang" /> <frame id="32" address="0x7FFA4462D5EA" module="sqllang" /> <frame id="33" address="0x7FFA43697700" module="sqllang" /> <frame id="34" address="0x7FFA436A2E8B" module="sqllang" /> <frame id="35" address="0x7FFA436956F1" module="sqllang" /> <frame id="36" address="0x7FFA43695402" module="sqllang" /> <frame id="37" address="0x7FFA77AA9165" module="SqlDK" /> <frame id="38" address="0x7FFA77AA9880" module="SqlDK" /> <frame id="39" address="0x7FFA77AA94E3" module="SqlDK" /> </stackFrames></process> </process-list> <resource-list> <keylock objectname="MyBlogDB.dbo.Configuration" indexname="PK_Configuration_Id" mode="X"> <owner-list> <owner id="process2b5c1acc478" mode="X" /> </owner-list> <waiter-list> <waiter id="process2b5c1d01048" mode="S" requestType="wait" /> </waiter-list> </keylock> <keylock objectname="MyBlogDB.dbo.Configuration" indexname="I1_ProcessId_StartDate_EndDate" mode="U"> <owner-list> <owner id="process2b5c1d01048" mode="S" /> </owner-list> <waiter-list> <waiter id="process2b5c1acc478" mode="X" requestType="convert" /> </waiter-list> </keylock> </resource-list> </deadlock>▶ <inputbuf>...
<inputbuf> exec [dbo].[Configuration_Update_IsDefaultConfigOption] 999, 1 </inputbuf>
The deadlock occurred while session 1 (process2b5c1d01048) was processing the SELECT and session 2 (process2b5c1acc478) was processing the UPDATE. Session 1 holds an S lock on the I1 index and wants an S lock on the PK index. Session 2, conversely, holds an X lock on the PK index and wants an X lock on the I1 index. Both sessions want the index that the other already holds a lock on, and neither side is willing to release. Thus, a deadlock occurs.
If the two sessions were updating different ProcessIDs, the key locks would target different rows, and no deadlock would occur. No fix needed if the two sessions never contend for the same rows, but since users expect a clean experience instead of an abrupt failure, let's explore our options.
First, though, an important question: why do the
two queries acquire locks on the two indexes in opposite orders? Session 1 performs a seek on I1 first, then does a key lookup on PK
to retrieve columns not covered by I1. Session 2 also seeks on I1, but when updating, it applies changes to PK first, then maintains I1. That's what creates the inversion. See their actual execution plans below:
Before I asked AI
I came up with a few ideas to fix:
- Tune the queries to finish faster: Both queries are simple and already have a reasonable index. I could have reordered I1 to (ProcessID, EndDate, StartDate) or added a new index I2 with (ProcessID, EndDate) as key to better match the WHERE clause. However, reordering I1 might affect other queries, and adding a new index means additional maintenance overhead.
- Create a covering index: adding columns to I1 to make it a covering index would eliminate the key lookup on PK for the SELECT query. But that adds even more maintenance overhead, in my real-world scenario, that would have meant adding 10 more columns.
- Add retry logic: Retry the operation automatically when it fails due to a deadlock.
Every option I could think of was adding overhead somewhere, so I landed on retry logic. It's perfectly reasonable; users won't see a strange deadlock error they don't know how to handle, and since two users competing to update the same row is expected behavior, it's fine for one to fail and have the app gracefully retry. I messaged my coworker about what I thought we should do.
Then I asked Claude.
AI as my advisor on the problem
I was curious how the AI would approach this problem, so I gave it the deadlock report, the query text, and the execution plans. The sp_BlitzLock proc collects all of these in one place, which means I didn't have to do that work myself. That's one of the lessons I took from this: when the right tooling makes it easy to gather data, you're much more likely to actually bring that data to AI in the first place, enhancing the quality of the answers.
The AI correctly identified the deadlock and its root cause:
- "Classic circular wait: A holds NCI S-lock and wants CI; B holds CI X-lock and wants NCI. SQL Server kills A as the victim." ✓
- "The lock ordering inversion is driven by two things: (1) The key lookup in the SELECT plan forces an NCI → CI lock acquisition sequence (2) The UPDATE acquires locks in CI → NCI order (it updates the clustered index first, then maintains the non-clustered index)" ✓
CREATE OR ALTER PROCEDURE [dbo].[Configuration_Update_IsDefaultConfigOption] @ProcessId INT, @IsDefaultConfigOptionInput BIT AS BEGIN SET NOCOUNT ON; DECLARE @CurrentDate DATETIME = GETDATE(); DECLARE @ID INT; DECLARE @Value1 VARCHAR(50); DECLARE @Value2 VARCHAR(50); -- Table variable to capture the pre-update row values via OUTPUT DECLARE @Captured TABLE ( Id INT, Value1 VARCHAR(50), Value2 VARCHAR(50) ); BEGIN TRY BEGIN TRANSACTION; -- Single statement replaces the separate SELECT + UPDATE. -- OUTPUT captures the values we need before the row is modified. UPDATE [dbo].[Configuration] SET [EndDate] = @CurrentDate OUTPUT DELETED.[Id], DELETED.[Value1], DELETED.[Value2] INTO @Captured (Id, Value1, Value2) WHERE [ProcessId] = @ProcessId AND [EndDate] IS NULL; SELECT @ID = Id, @Value1 = Value1, @Value2 = Value2 FROM @Captured; INSERT INTO [dbo].[Configuration] ([ProcessId], [Value1], [Value2], [StartDate], [EndDate], [IsDefaultConfigOption]) VALUES (@ProcessId, @Value1, @Value2, @CurrentDate, NULL, @IsDefaultConfigOptionInput); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH END GO
The fix is simple and clean in hindsight, but it never crossed my mind, simply because UPDATE...OUTPUT isn't something I reach for often. That's exactly why I think AI is something all of us should learn to use well. I now treat Claude as an advisor I consult for second opinions (one who draws from a much wider surface area than I can hold in my head at any one time XD). And just like any working relationship, I'm still learning how to get the most out of it.
*Of course, I tested the UPDATE...OUTPUT fix, it works 🎉.
No comments:
Post a Comment