Unit Testing T-SQL – Some Opening Thoughts

Posted by Josh | Posted in Life As A SQL Developer, SQL Server, T-SQL Programming | Posted on 06-09-2012

Tags: , ,

1

One of the first goals I have in my new role as a SQL Server developer is to learn how to write unit tests for SQL Server database code. I’ve started using the open source TSQLT framework, and I thought as I go I’d record my thoughts in an ongoing series.

What have I learned so far (in about a week’s time)?

Retro-fitting Unit Testing Is A Lot Of Work

This isn’t TSQLT specific, but I think it’s a valid point. One of my first projects is to add units tests to a number of existing stored procedures. It’s a lot of work, for a number of reasons. At least several of them actually don’t have much to do with unit testing, per say, but are just indicative of other issues. For example, creating unit tests is very difficult without solid and documented business requirements. In their absence, all you can do is try and reverse-engineer the code. This is even more difficult if the code is complex and not well commented.

In addition, if the stored procedures involve multiple tables, the necessary setup code can be lengthy. Just making sure all the tables have correct values is a significant amount of trial and error work. This alone accounted for hours of getting my first unit test up and running. When you do this, make a list of all the tables and go through each one methodically. Depending on the framework you are using, and how you are organizing your tests, you may want to do this work in “setup” code (meaning, code that executes once at the beginning of testing). TSQLT has this functionality; you just include a procedure called “SetUp” in your test suite.

FakeTable Is Your Friend

TSQLT has a stored procedure called “tsqlt.FakeTable”, whose function is to “[create] an empty version of the table without the constraints in place of the specified table.” This makes isolated testing very easy, without dealing with foreign keys or constraints. Naturally, if you’re actually testing those constraints (which, apparently is also helped by more TSQLT functionality) this is a bad thing. But if you’re just testing data returns or calculations, this can be immensely helpful.

Perseverance Is Key

This is tiring work, especially when having to create all the tests after the fact. But the feeling you get when you get one working is well worth it. It’s amazingly calming to know that you can make any change you want to a piece of code and instantly know if it breaks key functionality. It’s taken a lot of the anxiety out of working on some very critical code, especially as a relative newcomer to the field of database development. And once I’m done, I’m thoroughly convinced that the amount of time this will save in the future is well worth the effort.

That’s all for now. I’ll continue to record my thoughts in this tag as I get further down the road. In the mean time, if anyone has tips for a newbie, please feel free to share!

Finding All Access Groups For A Windows Login With sys.login_token

Posted by Josh | Posted in SQL Server, T-SQL Programming | Posted on 09-03-2012

Tags: ,

0

Recently I had a requirement to determine how a certain Windows user gained access to an instance of SQL. Let’s say, for example, you want to audit certain DDL code executions for some users, but not for others. We could do this at the individual login level, but that would be tedious to keep up. Instead, we want to use Windows domain groups to selectively enable the audit process. The trouble is, when a user is logged in all we see is their individual user name, not the name of the groups that, but virtue of them being a member, give them access to the server. Or do we?

I was reading a post by Erland Sommarskog on the subject of permission granting through stored procedures, and noticed his use of a system view called sys.user_tokens. This view (according to BOL) “Returns one row for every database principal that is part of the user token.” In essence, it shows one row per role or group that the user is part of. For example, if a user is a member of a Windows group that has access to the database, a row will be present for that.

This got me thinking, was there an equivalent for the server level? Sure enough, there is the sys.login_tokens view. This shows one row per server level authentication token. For some reason it seems to return duplicate rows at times, but when joined on the sys.server_principals table it does indeed seem to work as I hoped.

This way I can tell what domain groups a user is a member of that gives them access to the server.

Fun With Aggregation – TSQL2SDay #16

Posted by Josh | Posted in SQL Server, T-SQL Programming | Posted on 08-03-2011

4


It’s that time again! This month’s T-SQL Tuesday is being hosted by Jes Borland (blog | twitter), and the topic is: Aggregation!

When I think of aggregation as a T-SQL programmer, my first thought is usually “Why aggregate in T-SQL at all? Why not use Analysis Services?” Now, understand I’m in no way saying that one should never use functions like SUM, AVG, COUNT, etc in T-SQL; they can be very useful and perform well when used appropriately. But when I’m in a conversation and people start talking in phrases such as “Average over time” or “Comparison with the previous period”, I will usually jump in and say “Why don’t we use SSAS for this?”

One of my biggest projects at work was the construction of a data warehouse to store collected PerfMon data across all our SQL servers. Here are two common scenarios for how that data would be used:

  1. Baseline analysis and comparison – how busy is the server compared to others in the farm? Compared to itself last month? Last year?
  2. Point-in-time troubleshooting – The users complained that the application was slow at 11:21 AM on Tuesday; was the database server slow or was it something else? Using the collected data we could easily compare one specific period to a rolling average of that same time frame, and see if there were any appreciable differences.

For each of these use cases, I constructed some MDX scripts to easily allow my fellow DBAs to drill into the details of the cubes and get meaningful information back. So, I thought I’d share some of the tricks I used in writing them!

A Little Background

So that you can best understand the scripts, I wanted to briefly summarize the structure of the PerfMon cubes:
The cube consists of one measure and three dimensions:

  • The CounterData measure, which contains all the actual PerfMon data in decimal form.
  • The Calendar dimension, used to tie a piece of counter data back to the time at which it was captured. Contains information such as Date, Day Of Week, Month, Quarter, etc.
  • The Counter dimension, used to tie a piece of counter data back to a specific counter instance across all servers. Contains information such as Server Name, Object Name (e.g. Processor, Memory, Logical Disk, etc), Instance Name (e.g. ‘M:’ for logical drives), and Counter Name (e.g. “Avg Secs / Write”).
  • The Time dimension, used to tie a counter fact back to the specific time of day it was captured at.

Now, with that out of the way, let’s write some MDX!

Compare Two Periods

In this scenario, what we’re doing is looking at averaged data across two distinct periods of time, and finding the differences. We used this both for baseline comparison and for troubleshooting (i.e. to see if a server really was performing worse than average at a given time).

First, we’re going to create two named members:


WITH MEMBER	[CurrentPeriod] AS 
		ROUND
		(
			AVG
			(
				(
					[dim Calendar].[Date Key].[2010-08-08 00:00:00],
					{[dim Time].[Hour].[17]:[dim Time].[Hour].[19]},
					[dim Calendar].[Week Day].[All],
					[Measures].[Average Counter Value]
				)
			), 3
		), FORMAT_STRING = "#,0.000"
	 MEMBER	[ComparePeriod] AS 
		ROUND
		(
			AVG
			(
				(
					{[dim Calendar].[Date Key].[2010-06-01 00:00:00]:[dim Calendar].[Date Key].[2010-08-09 00:00:00]},
					{[dim Time].[Hour].[17]:[dim Time].[Hour].[19]},
					[dim Calendar].[Week Day].[6],
					[Measures].[Average Counter Value]
				)
			), 3
		), FORMAT_STRING = "#,0.000"

These two members are used to find the two average counter values from the two periods desired for comparison. In this case, we’re comparing Friday August 8th from 5PM – 7PM with all Fridays from 5PM – 7PM ranging from June 1st to August 9th. This might be used to compare a perceived “trouble spot” with the average across a long time span of similar periods. You could easily modify this to compare, say, the last three months to the same three month period the year before:

Next, we construct a member that compares the two time periods:

Finally, we can use these calculated members to construct a nice clean grid showing us a bunch of common PerfMon counters over both periods:

I used this query many times to justify adding additional capacity to our farm, or to tell an application team to take a hike and stop blaming my database server for their app’s poor performance.

Comparison With The Farm

What about finding out how one server compares to all the others in the farm? We can do that too.

First, construct a series of calculated members that find the sum total of counter values across the farm, then the count of servers, then calculates the average counter value:

Then, use that calculated member to look at how a particular counter value compares to the rest of the farm:

In this case, I was comparing processor use between the hours of 9AM and 10AM.

In Summary…

So there you have it: two great ways we can use the power of MDX and SSAS to get some very useful data out of an otherwise massive collection of numbers (the main fact table contains over 4 billion rows and counting). That, I think, is the essence of a good aggregation strategy: slim down an overwhelming amount of data into meaningful chunks.