Covering Index = Epic Win, Part I
Posted by Josh | Posted in SQL Server, The Rookie DBA | Posted on 10-06-2010
Tags: indexes, rookie DBA, SQL Server, tuning
1
As if I needed yet another reason to push all of the products I support on to SQL 2005 and up, but in the last few days I’ve been saved at least three times by the wonder that is the INCLUDE columns feature.
From BOL:
- INCLUDE (column [ ,... n ] )
- Specifies the non-key columns to be added to the leaf level of the nonclustered index. The nonclustered index can be unique or non-unique.
Column names cannot be repeated in the INCLUDE list and cannot be used simultaneously as both key and non-key columns. Nonclustered indexes always contain the clustered index columns if a clustered index is defined on the table. For more information, see Index with Included Columns.
All data types are allowed except text, ntext, and image. The index must be created or rebuilt offline (ONLINE = OFF) if any one of the specified non-key columns are varchar(max), nvarchar(max), or varbinary(max) data types.
Computed columns that are deterministic and either precise or imprecise can be included columns. Computed columns derived from image, ntext, text, varchar(max), nvarchar(max), varbinary(max), and xml data types can be included in non-key columns as long as the computed column data types is allowable as an included column.
In all three cases that I worked, the issue at hand was that some older, rather dusty stored procedures were performing badly. In one case, the procedure took almost thirty minutes to return a mere six thousand rows to be used in a report. Needless to say, because we all know how important reports are to those who use them, this was suddenly deemed totally unacceptable and had to be investigated at once. Luckily, I do love performance tuning.
So upon examining the execution plan of the query in question, the main points of note were:
- The majority of the total query cost was taken up by clustered index scan and bookmark lookup operations against large tables.
- In all cases the predicate of those operations was a date range (these are tables that hold various kinds of transactions).
- In one case, the query executed using a loop join with the transactional table being both the lower (multiple execution) and upper part of the join, courtesy of an index scan and clustered index scan. Notably, the estimated number of rows from the top portion was only one, whereas the actual number was something in the thousands.
In this post (and probably the next one or two), I’m going to go about reproducing the issues described above and show how with some simple index tuning, problems were all but eliminated. That’s right, no code changes. While I am certainly a capable T-SQL coder, I am just as loathe as anyone to crack open these ancient stored procs, whose owners are long gone and yet still are highly critical in daily operations. I don’t know about other DBAs, but my work is in production support, not development and testing code.
Setup Work
First we’re going to create a test table, put an index on it, and put some data in it:
IF OBJECT_ID('dbo.tmpA') IS NOT NULL
DROP TABLE dbo.tmpA;
GO
CREATE TABLE tmpA
(
ID INT IDENTITY(1,1),
TransDate DATETIME,
AccountID VARCHAR(255),
Amount MONEY,
CONSTRAINT PK_CI_tmpA PRIMARY KEY CLUSTERED (ID)
);
GO
CREATE NONCLUSTERED INDEX NCI_tmpA_TransDate ON dbo.tmpA (TransDate);
-- Insert some records --
INSERT tmpA
(
TransDate,
AccountID,
Amount
)
SELECT TOP 100
'1-1-2010',
b.name,
CAST(a.object_id AS MONEY)
FROM sys.columns a, sys.columns b
ORDER BY b.object_id;
INSERT tmpA
(
TransDate,
AccountID,
Amount
)
SELECT TOP 1000
'1-2-2010',
b.name,
CAST(a.object_id AS MONEY)
FROM sys.columns a, sys.columns b
ORDER BY b.object_id;
INSERT tmpA
(
TransDate,
AccountID,
Amount
)
SELECT TOP 10000
'1-3-2010',
b.name,
CAST(a.object_id AS MONEY)
FROM sys.columns a, sys.columns b
ORDER BY b.object_id;
INSERT tmpA
(
TransDate,
AccountID,
Amount
)
SELECT TOP 100000
'1-4-2010',
b.name,
CAST(a.object_id AS MONEY)
FROM sys.columns a, sys.columns b
ORDER BY b.object_id;
GO
Now, let’s execute a little query. Make sure you have the “Display actual execution plan” option in SSMS turned on.
SELECT AccountID, Amount FROM dbo.tmpA WHERE TransDate = '1/1/2010';
Now, take a look at the execution plan. What you’ll see is probably something similar to this:
See that little “Key Lookup” operator? Notice how it is consuming a whopping 99% of the total query cost? That my friends, is a bad little son of a…. well, you get my drift.
From Grant Fritchey (twitter | blog) and Sajal Dam’s (anyone know if Sajal is on Twitter or has a blog? I couldn’t find one but will gladly put it here) outstanding book SQL 2008 Query Performance Tuning Distilled (p. 163 to be exact and give proper credit):
A major overhead associated with nonclustered indexes is the cost of excessive key lookups, commonly known as bookmark lookups, which are a mechanism to navigate from a nonclustered index row to the corresponding data row in the clustered index or the base table.
…
When a SQL query returns a small number of rows, the optimizer can use the nonclustered index, if available, on the column(s) in the WHERE or JOIN clause to retrieve the data. If the query refers to columns that are not part of the nonclustered index used to retrieve the data, then navigation is required from the index row to the data row in the table to access these columns.
Now this behavior was only seen in a few of the execution cases I was studying, namely those with smaller result sets (that makes sense I suppose given what was stated above about “small number of rows”). In most of the other examples, instead of this bookmark lookup, I was seeing a simple clustered index scan. Nonetheless, I did want to point out this (commonly seen in my somewhat small experience) kind of behavior, since we know it can be a major performance bottleneck. It’s also a good example of how you cannot just through an index on a oft-queried column and assume that everything will be fine from there out.
Next time we’ll see how even more useless indexes can be, but also begin talking about how we can make them worth their weight in gold.
