T-SQL Tuesday #20 – Best Practices (And Those Who Don’t Follow Them)

It’s T-SQL Tuesday time again! This month is being hosted by Amit Banerjee (Blog | Twitter), and the topic is “Best Practices.”

I could talk all day about the various best practices I require in my environments, but I thought a more novel topic might be to write about why people don’t follow them. I’ve only been in the business a little over two years (four in IT), and I’ve already heard enough to fill a small volume. But, we’re all busy people so I’ve decided just to post what I’d consider the top two most frequent whines (and they definitely make up the lion’s share).

Without further ado…

We Don’t Have The <Time, Money, Manpower>

I’m a firm believer in the triangle of project management; you can have something cheap and fast (but poorly written), you can have it good (as in high quality) and fast (at a high cost), or you can have something good and cheap (but it will take a long time). All too often the people holding the cash choose the first one: fast and cheap. I would bet this is especially true in startups or new business ventures, where being first to market at 80% quality is better than being fourth to market at 100%.

Honestly, this is one that we, as SQL professionals, usually can’t do much about. There have been many cases in my career where I reviewed some SQL code, immediately flagged it as bad for various reasons, only to be told “We don’t have time to fix it before it goes live,” or “It’s not in our scope to make the changes you’ve requested.” Earlier on I would kick and scream, telling my boss there was no way I’d let this code in to my systems. Almost inevitably I was overruled and the code was released, and my tantrums only served to increase the bad blood between my group and the development teams. Some times the code would blow up and we’d be forced to fix it ourselves (because we’re the only ones who knew how, more on that later), or release a hastily constructed patch that probably only made things worse half the time. It is incredibly frustrating to say the least. Eventually I’ve just learned that this is largely out of my control, and all I can do is explicitly document my findings and make it known that if the code is released, it is done in a limited support, if-it-breaks-you’re-SOL manner. We’ve also tried to engage our groups earlier in the process, before the project scope is set, so that we can ensure enough time / money / manpower is thrown at the problem to solve it correctly.

We Don’t Know How To Do That

I’ve often heard the excuse of “We don’t know how to follow your standard,” or “We don’t know how to follow that best practice.” Many times when I ask why not, the answer is “We’re not SQL developers.”

Yes, you read correctly. They’re not SQL developers, so they don’t know how to write SQL properly. I’m sure you can guess my next question: “So then why are you writing T-SQL code? You wouldn’t expect me to write C# multi-threaded applications, would you? So why should you write T-SQL when you don’t know the difference between an index scan and a seek?”

Inevitably this is met with some level of blank stare, followed by more excuses of the “We can’t afford a SQL developer” or “SQL developers are hard to come by” variety.

To be honest I don’t know why this is the case, but it’s been my experience (and from what I’ve read I’m not alone) that SQL Server knowledge is often a huge gap on most development teams. The assumption is that someone who is a good .NET programmer and happens to have written a few queries can suddenly become an enterprise data architect, designing a scalable, secure, and high-performing system to process hundreds of thousands of transactions an hour. One would think that the database layer would be held sacred, since it is responsible for ensuring the enterprise’s data is kept safe; instead it often seems to end up more like an experimental lab, with structures designed on Etch-A-Sketches and napkins, not ER modeling tools.

So what’s a DBA to do? In our case, we’re attacking this on three levels:

  1. Bring in one or more true SQL Server developers, and have them handle the really complex stuff.

    Can’t afford to have a SQL guru on every group? At least bring in one or two and let them handle the real low level tasks up front, such as designing the database structure and complex retrieval patterns. Maybe the script-kiddie in the next cubicle can handle some really simple CRUD procedures, but leave the fancy stuff to the pros.
  2. Have regular sessions where staff are invited to get free in-house training on SQL Server.

    We have bi-weekly Lunch ‘n Learns, where members of development teams are invited to come and participate in sessions that include watching webinars, conference lectures, presentations by in-house SQL gurus, or round-table discussions. The participation waxes and wanes, but overall the response has been positive and at the very least, we can begin to see who really wants to learn.
  3. Grade developers on their code, and make the results public.

    We’ve begun collecting detailed grades on submitted T-SQL code, on dimensions such as security, efficiency, readability (because who wants to read a single line 5000 word query – yes, I’ve seen it), and documentation. These grades are collected in (of all things) a data warehouse, where they are reviewed and published to the development managers. This helps us in two ways: 1) to spot diamonds in the rough – those folks who, given some encouragement and guidance, could develop into good SQL developers; 2) finding repeat offenders, who, despite assistance, are not improving and are costing the company money and time in fixing their poor code. The system’s only been in place for a brief amount of time, but we’re excited about the transparency and intelligence this will offer.

There are many others…

Certainly there are many other reasons teams don’t follow standards. I’d love to hear what others think are the top reasons why standards don’t get followed in their company (or in the SQL world in general)!

Covering Index = Epic Win, Part 2

In part one of this series, we looked at how a non-clustered index can be made into a real performance bottleneck because of a nefarious little fellow called a “bookmark lookup”. We showed how this operation, which is how SQL Server retrieves the requested columns not part of an index, can result in excess IOs and generally poorly scaling performance.

In this part, we’ll look at how SQL Server will even go so far as to ignore an index completely, just to avoid our little friend the lookup.

First off, let’s look again at our query execution plan from our SELECT query shown in part one:

We see how the most expensive part of the query is the “Key Lookup” operation, by a margin on 99 to 1. Pretty obvious that it would be the bottleneck in this case.

Now this particular query returned exactly one-hundred rows. Now let’s see what happens if we increase the number of returned rows a little bit.

This query returns around one thousand results. Let’s look at the execution plan here:

Whoa, wait a second. Why is SQL Server using a clustered index scan operation? Don’t we have an index on the column being filtered on?

Well, yes we do. But see, SQL is generally pretty smart about choosing how to execute it’s queries (though at times I might debate that statement). And while I can’t speak for how it’s making these decisions, I would hedge a bet that one of the deciding factors in constructing an execution plan is IO cost. So, let’s look at the output of STATISTICS IO for this query.

549 logical reads total. Now, let’s force SQL to use our index by putting an index hint in the query:

Now you’ll have to take my word for it, but the execution plan now looks identical to the one we saw in part one. However, let’s look at the STATISTICS IO output:

Well that looks pretty bad. The number of logical IOs increased from 549 to over two-thousand. Now we see why SQL might choose to simply scan the clustered index.

So, why would it choose to use the index in the first case? Well, let’s look at STATISTICS IO for our original query:

Ok, so a total of 217 logical reads, and the index is being used. Now, let’s force it to use the clustered index and see what we get.

See? In the case of our first query, SQL again made the right choice, choosing the operation which resulted in only 249 logical IOs, versus 549 for the clustered index scan.

Somewhere, a SQL developer is probably yelling “See?! Why do you doubt me?!”

OK SQL, we’ve now shown that you make the right call on what operation will be the least expensive to service the query we gave you. But, that doesn’t mean we can’t give you better tools to do the job. That’s what we’ll be covering in part three of our series. I promise we’ll get to the “epic win” part there too.