Covering Index = Epic Win, Part 2

In part one of this series, we looked at how a non-clustered index can be made into a real performance bottleneck because of a nefarious little fellow called a “bookmark lookup”. We showed how this operation, which is how SQL Server retrieves the requested columns not part of an index, can result in excess IOs and generally poorly scaling performance.

In this part, we’ll look at how SQL Server will even go so far as to ignore an index completely, just to avoid our little friend the lookup.

First off, let’s look again at our query execution plan from our SELECT query shown in part one:

We see how the most expensive part of the query is the “Key Lookup” operation, by a margin on 99 to 1. Pretty obvious that it would be the bottleneck in this case.

Now this particular query returned exactly one-hundred rows. Now let’s see what happens if we increase the number of returned rows a little bit.


SELECT AccountID, Amount FROM dbo.tmpA WHERE TransDate = '1/2/2010';

This query returns around one thousand results. Let’s look at the execution plan here:


Whoa, wait a second. Why is SQL Server using a clustered index scan operation? Don’t we have an index on the column being filtered on?

Well, yes we do. But see, SQL is generally pretty smart about choosing how to execute it’s queries (though at times I might debate that statement). And while I can’t speak for how it’s making these decisions, I would hedge a bet that one of the deciding factors in constructing an execution plan is IO cost. So, let’s look at the output of STATISTICS IO for this query.

Table 'tmpA'. Scan count 1, logical reads 549, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

549 logical reads total. Now, let’s force SQL to use our index by putting an index hint in the query:


SELECT AccountID, Amount FROM dbo.tmpA WITH (INDEX = NCI_tmpA_TransDate) WHERE TransDate = '1/2/2010';

Now you’ll have to take my word for it, but the execution plan now looks identical to the one we saw in part one. However, let’s look at the STATISTICS IO output:

Table 'tmpA'. Scan count 1, logical reads 2076, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Well that looks pretty bad. The number of logical IOs increased from 549 to over two-thousand. Now we see why SQL might choose to simply scan the clustered index.

So, why would it choose to use the index in the first case? Well, let’s look at STATISTICS IO for our original query:

SELECT AccountID, Amount FROM dbo.tmpA WHERE TransDate = '1/1/2010';
Table 'tmpA'. Scan count 1, logical reads 217, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Ok, so a total of 217 logical reads, and the index is being used. Now, let’s force it to use the clustered index and see what we get.

SELECT AccountID, Amount FROM dbo.tmpA WITH (INDEX = PK_CI_tmpA) WHERE TransDate = '1/1/2010';
Table 'tmpA'. Scan count 1, logical reads 549, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

See? In the case of our first query, SQL again made the right choice, choosing the operation which resulted in only 249 logical IOs, versus 549 for the clustered index scan.

Somewhere, a SQL developer is probably yelling “See?! Why do you doubt me?!”

OK SQL, we’ve now shown that you make the right call on what operation will be the least expensive to service the query we gave you. But, that doesn’t mean we can’t give you better tools to do the job. That’s what we’ll be covering in part three of our series. I promise we’ll get to the “epic win” part there too.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

Covering Index = Epic Win, Part I

As if I needed yet another reason to push all of the products I support on to SQL 2005 and up, but in the last few days I’ve been saved at least three times by the wonder that is the INCLUDE columns feature.

From BOL:

INCLUDE (column [ ,... n ] )
Specifies the non-key columns to be added to the leaf level of the nonclustered index. The nonclustered index can be unique or non-unique.

Column names cannot be repeated in the INCLUDE list and cannot be used simultaneously as both key and non-key columns. Nonclustered indexes always contain the clustered index columns if a clustered index is defined on the table. For more information, see Index with Included Columns.

All data types are allowed except text, ntext, and image. The index must be created or rebuilt offline (ONLINE = OFF) if any one of the specified non-key columns are varchar(max), nvarchar(max), or varbinary(max) data types.

Computed columns that are deterministic and either precise or imprecise can be included columns. Computed columns derived from image, ntext, text, varchar(max), nvarchar(max), varbinary(max), and xml data types can be included in non-key columns as long as the computed column data types is allowable as an included column.

In all three cases that I worked, the issue at hand was that some older, rather dusty stored procedures were performing badly. In one case, the procedure took almost thirty minutes to return a mere six thousand rows to be used in a report. Needless to say, because we all know how important reports are to those who use them, this was suddenly deemed totally unacceptable and had to be investigated at once. Luckily, I do love performance tuning.

So upon examining the execution plan of the query in question, the main points of note were:

  1. The majority of the total query cost was taken up by clustered index scan and bookmark lookup operations against large tables.
  2. In all cases the predicate of those operations was a date range (these are tables that hold various kinds of transactions).
  3. In one case, the query executed using a loop join with the transactional table being both the lower (multiple execution) and upper part of the join, courtesy of an index scan and clustered index scan. Notably, the estimated number of rows from the top portion was only one, whereas the actual number was something in the thousands.

In this post (and probably the next one or two), I’m going to go about reproducing the issues described above and show how with some simple index tuning, problems were all but eliminated. That’s right, no code changes. While I am certainly a capable T-SQL coder, I am just as loathe as anyone to crack open these ancient stored procs, whose owners are long gone and yet still are highly critical in daily operations. I don’t know about other DBAs, but my work is in production support, not development and testing code.

Setup Work

First we’re going to create a test table, put an index on it, and put some data in it:


IF OBJECT_ID('dbo.tmpA') IS NOT NULL
 DROP TABLE dbo.tmpA;
GO

CREATE TABLE tmpA
(
 ID INT IDENTITY(1,1),
 TransDate DATETIME,
 AccountID VARCHAR(255),
 Amount MONEY,
 CONSTRAINT PK_CI_tmpA PRIMARY KEY CLUSTERED (ID)
);
GO

CREATE NONCLUSTERED INDEX NCI_tmpA_TransDate ON dbo.tmpA (TransDate);

-- Insert some records --
INSERT  tmpA
(
 TransDate,
 AccountID,
 Amount
)
SELECT  TOP 100
 '1-1-2010',
 b.name,
 CAST(a.object_id AS MONEY)
FROM    sys.columns a, sys.columns b
ORDER BY b.object_id;

INSERT  tmpA
(
 TransDate,
 AccountID,
 Amount
)
SELECT  TOP 1000
 '1-2-2010',
 b.name,
 CAST(a.object_id AS MONEY)
FROM    sys.columns a, sys.columns b
ORDER BY b.object_id;

INSERT  tmpA
(
 TransDate,
 AccountID,
 Amount
)
SELECT  TOP 10000
 '1-3-2010',
 b.name,
 CAST(a.object_id AS MONEY)
FROM    sys.columns a, sys.columns b
ORDER BY b.object_id;

INSERT  tmpA
(
 TransDate,
 AccountID,
 Amount
)
SELECT  TOP 100000
 '1-4-2010',
 b.name,
 CAST(a.object_id AS MONEY)
FROM    sys.columns a, sys.columns b
ORDER BY b.object_id;
GO

Now, let’s execute a little query. Make sure you have the “Display actual execution plan” option in SSMS turned on.


SELECT AccountID, Amount FROM dbo.tmpA WHERE TransDate = '1/1/2010';

Now, take a look at the execution plan. What you’ll see is probably something similar to this:

See that little “Key Lookup” operator? Notice how it is consuming a whopping 99% of the total query cost? That my friends, is a bad little son of a…. well, you get my drift.

From Grant Fritchey (twitter | blog) and Sajal Dam’s (anyone know if Sajal is on Twitter or has a blog? I couldn’t find one but will gladly put it here) outstanding book SQL 2008 Query Performance Tuning Distilled (p. 163 to be exact and give proper credit):

A major overhead associated with nonclustered indexes is the cost of excessive key lookups, commonly known as bookmark lookups, which are a mechanism to navigate from a nonclustered index row to the corresponding data row in the clustered index or the base table.

When a SQL query returns a small number of rows, the optimizer can use the nonclustered index, if available, on the column(s) in the WHERE or JOIN clause to retrieve the data. If the query refers to columns that are not part of the nonclustered index used to retrieve the data, then navigation is required from the index row to the data row in the table to access these columns.

Now this behavior was only seen in a few of the execution cases I was studying, namely those with smaller result sets (that makes sense I suppose given what was stated above about “small number of rows”). In most of the other examples, instead of this bookmark lookup, I was seeing a simple clustered index scan. Nonetheless, I did want to point out this (commonly seen in my somewhat small experience) kind of behavior, since we know it can be a major performance bottleneck. It’s also a good example of how you cannot just through an index on a oft-queried column and assume that everything will be fine from there out.

Next time we’ll see how even more useless indexes can be, but also begin talking about how we can make them worth their weight in gold.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

Narrow Focus: Blessing Or Curse?

I think pretty much everyone who knows me would say that one of my defining characteristics is my singular focus. Once I put my mind to solving some problem or accomplishing something, I damn well poke at it until it’s finished. Now, in many cases this trait has served me quite well; it’s what allowed me to learn programming and SQL Server with no formal training or even so much as a seminar (thank you Internet and amazon.com), and why I consider myself an above average problem solver.

One of my favorite teachers once told me that at times I “missed the forest for the trees.” He went on to explain that at times my work tended to find it’s conclusion too early, then ignore all other points of view. Narrow focus is often good in writing, but not when it excludes other angles of approach completely.

The same could be said of how I function as a DBA. Take a project I am working on: the design and build of an enterprise data warehouse, storing performance data for every SQL server in our environment. It’s been fascinating and a tremendous learning experience, but also at time a real challenge.

Early on one of the requirements was to collect performance data on a second-by-second basis, essentially forming a “baseline” of data to review and use for later comparison. Now being the aggressive lad I am, I expanded upon this as follows: the warehouse should now store second-by-sec0nd data for every server at all times, with a retention of 2 years. Yes, you read that correctly. Two years of data, with points at every second for fifty plus servers. That’s a lot of data folks (think 1,800 rows per minute, per server, or around 129,600,000 rows per day).

So, I set out to prove it could be done. Here’s a brief list of some of the challenges I’ve hit along the way:

  • PerfMon’s built-in direct-to-SQL functionality scales downright awfully. The structure of the tables would make any decent DBA shudder (the primary key of the main, and largest, table has a GUID as it’s first column, as an example), the use of bulk inserts (not using MS’s own optimization schemes), and statistics issues to name a few.
  • After abandoning that approach we are going with dumping the performance data into flat files, which are then picked up by a SSIS package at intervals, which required a custom script component acting as a data flow source. This is due to the fact that PerfMon records its data in .csv files with a dynamic and unpredictable number and order of columns. Custom logic had to be written to essentially perform a reverse pivot of the data into a more normalized form.
  • Even fully optimized, the package simply could not keep up with the incoming flood of files. They would eventually pile up and consume disk space, not to mention having stale data in the warehouse. This was attacked by custom coding a script task in a parent SSIS package that would dynamically spawn multiple threads running a child package to import one file each, up to ten at once (I’m shocked by the way that this isn’t either a standard part of SSIS or that a suitable aftermarket component could not be found). Naturally this promptly brought the server to its knees, being a poor little VM with only four CPUs. But the files did import faster!

Now, throughout all of this, keep in mind that I was going far beyond the original and still stated requirements given to me. Why? Because dammit, I was out to prove that such a huge amount of data could be handled by a little VM running SQL 2008 Standard Edition.

Well, today I cried Uncle. In an effort to actually get some usable data into the system, we’re (read, I’m) reducing our goals to only baselining one server at a time, with others collecting a mere four data points per minute. And damn, I am disappointed we haven’t gotten there yet. But you can believe I’m not giving up. But at the same time, I look back and consider all the time I spent chasing what, pride?

My old teacher was right: I need to stop staring at the one big tree in front of me and enjoy the woods a bit more. Luckily, there’s this little methodology I follow called GTD, and something tells me it’s going to be very useful in this upward movement of focus.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

What Is The Role Of A DBA?

Lately this question has been a bit of a hot discussion topic at work. Lots of folks have differing opinions, many of which have merit. So I’m curious, what do folks think the true primary role of a DBA is?

For those of you who might answer “it depends on what the client wants”; that’s a perfectly true answer.  If that’s your opinion, I’d ask that you pick the poll answer closest to what is true for you and your clients.

And if anyone has suggestions on other roles, please leave them in the comments and I’ll gladly look at adding them.

What is the primary role of a DBA?

View Results

Loading ... Loading ...
Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

The Rookie DBA

So in a bit of personal news, I was recently moved into a new role at work, namely that of a full time database administrator. In my previous role I had quite a bit of interaction with my new team, and gathered a decent amount of knowledge around the workings of SQL Server, at least from the application interacting parts.

It’s been two weeks now, and I must say the change is startling. Already I’ve learned more than I have in the last six months in my old role. There is always something new to learn about SQL, be it the inner workings of the underlying engine, the new features such as compression, encryption, and partitioning (not a new feature, but made much better from what I can see), or digging into some new problem we are facing. I am kept constantly busy, with new demands being thrown at me at a pace even higher than the one I previously faced. A month ago I would have said that wasn’t possible; now I see that I was barely seeing the tip of the amount of effort these guys put into their job when I was working with them.

I’m blessed to have a team of seasoned veterans to work with. Combined they have more than 30 years of experience with SQL Server, networking, server administration, and other areas of IT. They have welcomed me in and made sure to teach me at every possible turn. I’m tremendously grateful to them, and can’t wait to continue learning.

As I do learn and grow into this new job, I thought it might be interesting to post my thoughts and “lessons learned” as time goes on. Who knows; maybe years from now when I myself am a grizzled old pro, I’ll look back on these musing and laugh.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

Linked Servers – Good or Bad?

From Linchi Shea’s blog:

In SQL Server, it is rather handy to retrieve data from a different SQL Server instance and use the result locally in another SQL statement for further processing. In theory and in the set purists’ fantasy land, it shouldn’t matter where you get your data or even how you get the data as long as you can use them to further compose a set-based solution.

That is all fine. After all, in a distributed environment the data you want may not be on the same instance where the processing takes place. Well, that is fine until it comes to performance and troubleshooting. If you use linked servers indiscriminately, sooner or later you’ll run into these issues, and they are not pretty.

We have used linked server functionality at various points at work. As Linchi points out, it can be quite useful when you have data distributed across multiple systems. But I have seen cases where it’s caused issues, such as excess network traffic, authentication problems (anyone actually been able to get Kerberos auth to work for a two-hop Windows auth’d linked server?), and general performance malaise.

So what are some alternatives?

  • Replication – true, it may not solve the network issue, and depending upon the type used, it might cause a delay in data transfer. But it would allow for queries to execute against a single server.
  • Static data transfer using SSIS – I would say this would be fine for offline use of data such as reporting. There have been very few cases in my experience where true real-time data has been a hard business requirement, though at times the point has been, shall we say, vigorously argued.
Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter

Getting worked up over nothing?

Today I caught myself getting rather ticked off after spending around 2.5 hours trying to script out changes to around 25 SQL Server stored procedures. To explain in a non-geeky (if that’s possible coming from me) way, I had to comment out a common line in all 25 procs that needed to be disabled in order to use the proc in my test system. (OK, that failed only slightly in the non-geeky department.)

I’ll be the first to admit that once I get focused on solving an issue, it pretty much consumes my attention, to the point of near obsessiveness. This was no exception, with me getting more and more worked up as attempts to automate this change (rather than cracking the code open on all twenty plus one at a time) failed. Finally I threw up my hands in disgust and walked away, having undoubtedly raised my blood pressure a notch or two in the process.

A short time later, it occurred to me that most of what I was trying to accomplish was completely outside of the work necessary to accomplish my pre-stated goals for this project. In effect, I was trying to change twenty plus bits of code, when this particular effort required just one change. Yep, you heard me, just one. So why even make the attempt to automate the process?

I suppose it’s mostly because I’m a lazy coder by nature; if chances are better than 50-50 that I’ll need to repeat some action in the future, I’ll probably at least take a stab at scripting or otherwise automating the process. Plus, in this case, it presented a worthy challenge, which is always more than enough to entice me into jumping in over my head. I’m a sucker for challenge, almost to the point of it being a character flaw.

In hindsight, perhaps if I’d stopped at the onset and thought things out in a rational, methodical manner, I would have saved myself not only a brooding headache, but almost two hours of time as well. Two hours. Think of all the things you could get done in two hours time.

Next time, I’m going to follow my own advice, and make sure I properly define what my work is going to be, lay out all the time and effort needed, and only then consider adding scope to my plate. Sure, automation is a wonderful thing, but only when used with discretion. Spending two hours to avoid doing something that ultimately took around 10 minutes to accomplish manually just isn’t worth it.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google Bookmarks
  • Technorati
  • FriendFeed
  • Slashdot
  • Twitter