Comparing SQL Result Sets With Powershell

Posted by Josh | Posted in Life As A SQL Developer, Powershell, SQL Server | Posted on 23-10-2012

Tags: , , ,

0

Lately I’ve been doing a lot of work re-writing some reporting code that was performing terribly. You know the old joke “Start it, then go get a cup of coffee, then come back”? Well in this case, that would be more like “Start it, get a cup of coffee, go for a run, read the morning news, eat lunch, take a nap, eat dinner, watch an hour of TV, then come back.” Ugh…

Obviously it’s important when modifying and tuning code to ensure that you don’t affect the results that come out of the procedure. Normally I like to do this using unit tests, but in this case the logic of the procedure was so complex (and relied on a bunch of underlying vendor code) that to create tests for it would have taken weeks. So instead, I opted for the strategy of simply comparing the old and new output, given a set of standard parameters. At first I did this manually, but after comparing 200+ columns a few times, I said to myself, “Self! This is silly, why not use Powershell to compare the results automatically?”

The result is the Compare-SQLResultSet function. It’s rough and currently doesn’t handle differently shaped results sets at all, but it has already been a huge time saver. I hope to improve it as time goes on, but wanted to get it out so others in the community could use it. Because if you’re doing this kind of comparison work manually… well, perhaps you should find a different line of work, because clearly you’re not getting it.

Introducing Execute-RunspaceJob

Posted by Josh | Posted in Powershell | Posted on 01-08-2012

Tags: ,

5

Recently I began experimenting in earnest with Powershell runspaces as an alternative to background jobs. My interest was mostly keyed by an excellent blog post on the subject by Boe Prox (blog | twitter). I’ve found that, in general, runspaces work really well for multi-threaded workloads, and, once you get past some initial hiccups, are easier to use than background jobs.

During my experimentation I had a bit of a realization around how runspaces are generally used. I found that I was generally doing the same things over and over again:

  1. Creating a script block to do the background work, which had one or more parameters.
  2. Creating a series of parameter sets containing the information about the individual “entities” to be processed.
  3. Instantiating various objects and settings for the runspace configuration, such as the number of concurrent runspaces.
  4. Starting the background threads and waiting for them to finish.
  5. Getting the data back from the runspaces, and returning warnings for any errors that occurred during processing.

After copying and slightly modifying code a few times, I said to myself, “Self! This is not good practice copying and pasting code all over the place. Why not write a reusable and generic function implementing this work and simply re-use it?” Thus Execute-RunspaceJob was born.

The function basically encapsulates and makes generic the work required to setup and execute a parameterized script block within background runspaces. It also handles tricky items such as error handling and parsing return data. Let’s look at a quick example of how to use it.

Let’s say that you have a function called Get-DiskspaceInfo, which retrieves, well, information about disk space on remote machines using WMI calls. You have a series of servers, say, twenty, that you want to collect this information from. You could certainly simply get this list of servers and pipe them into the function (because you are writing functions that accept pipeline input, right?), but that approach would not scale very well since it operates in a one-at-a-time mode. Instead, using Execute-RunspaceJob, you could have any number (up to overwhelming your computer, naturally) of concurrent background threads collecting this information, then have it return the resulting data to you after it was finished.

First, you need to construct the script block which actually executes the work:

Next, we need to construct a hashtable of parameter values. The function expects a hashtable where the key is some unique identifier for the row to be processed (like a server name, database name, file name, etc), and the value is a nested hashtable of “parameter name”=”parameter value” pairs.

Finally, we execute the function, letting the results be placed into an array variable. We use the “-ThrottleLimit” parameter to specify the number of concurrent operations that are allowed.

If any of the background operations fail, a warning message will be printed out on the screen. Once all the data is collected you can treat the array just like you would the set of data returned by a normal pipleine-style operation.

I’ve started using this all over the place, and found that it greatly increases performance of most “collect and return” type operations. For example, it will be used in my forthcoming adaptation of Alan Renouf’s excellent vCheck framework for SQL Server.

To download the latest version of the function, go here. And please, tell me if you see something wrong or have suggestions for enhancements!

The Value Of Logging–#TSQL2sDay 31

Posted by Josh | Posted in Powershell, SQL Server | Posted on 12-06-2012

Tags: , ,

0

TT150x150This month’s T-SQL Tuesday is being hosted by Aaron Nelson (blog | twitter), and is all about logging. As it happens my recent presentation at SQL Saturday 121 included a discussion around this subject in the context of keeping track of administrative activities. As a result, the topic is fairly fresh in my mind.

Rather than re-hash the whole thing, I’ll simply restate what I consider the most important point from that section of the presentation.

bart-simpson-generator.php

You can be logging everything and anything under the sun, but if you’re not reviewing and being alerted as needed, there’s really no point. How you do so is really up to you; use whatever tool you are most comfortable with. For some folks that might be SSRS, others Powershell (shameless T-SQL2sDay host brown-nose: this is my preference for collecting logs from multiple sources and displaying them all in one place), or even SSIS. It’s the act, not the method, that matters.

Personally, I like to review everything first thing in the morning, after I’ve filled my coffee cup but before I start reading over my e-mail. That way, any issues are immediately noticed and can be dealt with before I get distracted. Taking a tip I once heard from Sean McCown of MidnightDBA fame, I collect everything in one place and then send a single summary of problems, rather than dealing with individual e-mails. This makes the task seem much less daunting.

How do you collect and review your logs (of all sorts)? Because, of course, you are reviewing them. Right?

Adapting vCheck for SQL Server

Posted by Josh | Posted in Powershell, SQL Server | Posted on 11-06-2012

Tags: , ,

2

For those of you not already aware, VMware and PowerCLI guru Alan Renouf (blog | twitter) recently upgraded his excellent vCheck Powershell script framework to have a plugin friendly approach. (The framework was originally written to provide a daily report of issues identified throughout a VMWare environment.) As a result, numerous forks have been popping up for getting daily reports on all sorts of systems, from Exchange to System Center. I noticed that there didn’t appear to be anything for SQL Server, and pinged Alan to confirm if anyone had starting working on one. When he responded that no one had, I volunteered to take a stab at it. The framework is an impressive piece of work, with some robust HTML reporting features and extensibility. If you’ve got a need to do any kind of centralized, scheduled reporting on some infrastructure, I highly recommend you check out his work.

And here, dear reader, is where I need your help. I want to make this as useful and complete a daily health check as possible for all of us DBAs out there. I have my own list of items that I want to check on, but I want to get more input to make sure I’m covering all the bases.

The scripts will connect to a CMS server, and iterate over all the servers contained within while performing various health checks. At the end, a nicely formatted HTML report is delivered listing all the problems identified. Thanks to Alan’s work, all the thresholds will be completely configurable.

Here is a list of the checks I’ve thought of so far:

  1. Ping test (is the server responding to a ping)
  2. Last backup date (full, differential, log)
  3. Last DBCC date
  4. Disk space free percent
  5. Services running (SQL Server, SQL Agent)
  6. Database file space free percent
  7. High severity errors (17+)
  8. Failed logins (over a certain threshold of counts)
  9. Failed agent jobs

This is a short list, which is why I’d love to hear from you. What would you like to see in your mailbox every morning that will give you the best view of your SQL Server infrastructure?

One (Very Important) Goal For 2012 – AUTOMATE EVERYTHING

Posted by Josh | Posted in Life As A DBA, Powershell, SQL Server | Posted on 02-01-2012

Tags: , , ,

2

Yep, you heard me right. I am going to automate everything I do. Or, at least, try to.

I’m not going to talk in detail about why I’m doing this, other than to say:

  1. I have better things to do with my time than perform mundane, repeatable tasks.
  2. I recently re-read John Sansom’s excellent blog post on the subject, and, as is usually the case with John’s work, found it rather inspiring. Bravo John!

Instead, I’m going to talk a little about two kinds of “automation”, and why I generally pick one that might not, at first glance, be the right choice.

Full versus Partial Automation

I will define something that is fully automated as a task or operation which requires absolutely zero manual effort on my part. That is, I don’t have to push a button, flip a switch, type a command, or even know that it is happening, unless something goes wrong. An example of something that is already fully automated in my world is the weekly and daily database maintenance that I have installed, using Ola Hallengren’s excellent (and freely available) scripts. These run every night and ensure that my databases are backed up, corruption free, and optimized. Unless something goes *bump* and the jobs fail, I don’t get so much as an e-mail upon completion.

By contrast, a process that is partially automated requires some manual effort, but is still optimized such as to require as little work on my part as possible. I may need to open a script, enter a few parameters, or run a Powershell command, but outside of that, all the logic and processing is done behind the scenes, with perhaps a pretty little progress bar showing that work is, well, progressing. An example of this might be setting up of a server side SQL trace; I have a series of scripts ready that, given a database name and a path at which to place the trace files, will setup a standardized trace, grant rights on some signed stored procedures to allow non-DBAs to read the trace data, and even generate a README file to be sent to the requesting team with a series of instructions.

In an ideal world, I suppose that all of the mundane daily work I have to take care of would be in the former (fully automated) category. But in reality, I find that there’s a bit of a diminishing return on putting more and more effort into fully automating tasks. Consider this scenario / requirement: the request and creation of new databases.

At a basic level, the process we follow is something like this:

  1. A developer requests a new database via our in-house ticketing system.
  2. We (my team) ensure that the request at least contains the following details:
    1. The name of the database.
    2. The server on which to create it (or at least a version of SQL Server).
    3. The initial size of the database.
    4. The domain users or group that requires access to the database.
  3. Assuming we’ve got all the information required, we will go out to the database server where the databases will be placed, and determine a suitable location to place the data files. This is usually determined by simple the simple rule of “whichever volume has the most space”, with a few exceptions. If there is not enough space on the server, we’ll inform the requestor that there will be a delay, and start the process of ordering more storage.
  4. Once we know where the data and log files will be placed, we will create the database, take a full backup (to initialize the differential backup chain), and grant the access requested.

Now, I could certainly envision writing some kind of application that would allow developers to submit requests for new databases and process everything straight through based on the rules outlines above, similar to what I see on my web host’s control panel for MySQL databases. But, at the same time, constructing such as system would take a lot of time and effort. Contrast that with a partial automation solution, such as:

  • Provide an InfoPath form for the requests, forcing people to enter the necessary details in before submitting the request, thereby eliminating back-and-forth chatter.
  • Use a Powershell script that, when given a server name, database, and log file size, will connect via WMI and determine a list of suitable locations for placing the database files, then prompt the user to select one. When selections are made, the database is created and a full backup taken. This eliminates logging into the computer (to look at drives) and stepping through a bunch of GUI screens to create the database.
  • Have a second Powershell script that can accept a text file containing domain groups / users and role names, which will then grant the access required on a specified database. The file could be generated from the InfoPath form. Again, this eliminates some GUI clicks, and minimizes the chance of fat-fingering group / user names.

While this will not prevent someone from having to manually process the request, it will significantly cut down on the pain to do so, without a great deal of effort / complexity. Especially considering that these requests are fairly infrequent (perhaps one to three a month), I see no reason to spend additional time on providing a fully automated solution, when the partial one provides almost as much value.

Over the course of the year, I’ll be blogging about the various processes / means I use to accomplish this goal.

How do you automate your processes, and how do you determine a “break-even” point, if you will?

Averaging PerfMon With Powershell

Posted by Josh | Posted in Powershell | Posted on 10-07-2011

Tags: ,

0

I just wanted to post this quickly before I somehow lost the syntax I developed. What I was trying to do is gather some PerfMon counters for a set period of time and return some aggregated (averaged) results. Here’s what I ended up with:

$counters = (Get-Counter -ListSet "LogicalDisk");
$counterValues = ($counters.PathsWithInstances | Where {$_ -like "*Sec/*"} | get-counter -MaxSamples 10 | select -ExpandProperty CounterSamples);
$groupedValues = $counterValues | select Path,InstanceName,CookedValue,Timestamp | Group-Object Path;
$avgValues = $groupedValues | % {New-Object PSObject -Property @{CounterName = $_.Name; AvgValue = ($_.Group | Measure-Object CookedValue -Average).Average}};
$avgValues | Out-GridView;

This ends up working really well for what I needed. Now I just need to wrap it in a function, parameterize it, and enable it to interrogate remote machines via PowerShell remoting (less firewall hassles that way).

Invoke-Sqlcmd + Invoke-Command = Posh Awesomness

Posted by Josh | Posted in Powershell | Posted on 05-07-2011

0

I have a database that contains metadata for all the servers I support, such as server and domain names, instance names, SQL versions, etc. It proves very handy when trying to determine who owns what, or find out which servers need to be upgraded, etc.

Unfortunately, it’s also extremely frustrating to maintain. Right now all data is entered manually via T-SQL commands (I do at least have procedures that encapsulate some check logic) and it’s up to the operator to enter the data correctly. Given that we humans are prone to mistakes and slip-ups (not to mention laziness) that means that inevitably stuff gets out of date.

To try and mitigate this problem (and lighten my workload in the mean time), I’m in the process of writing some Powershell scripts that will iterate over the list of servers, attack them with some Posh remoting commands, and update anything in the database that is stale.

Let’s say we want to get back a list of servers and get some information from each of them. At first I thought this would be as simple as something like this:

Invoke-Sqlcmd -ServerInstance $ServerName -Query $SqlQuery -Database dbarepos | Invoke-Command -Cred $Cred -ScriptBlock $ScriptBlock

But then when I tried this, I got a rather vague error:

Invoke-Command : Parameter set cannot be resolved using the specified named parameters.

So apparently the Invoke-Command doesn’t immediately like getting that piped input. I have a hunch it was related to something Don Jones pointed out here, about how Invoke-Command binds piped input to it’s parameters.

In any case, my workaround was to just rewrite the code a little differently, which also gave me some more verbose progress output:


$servers = Invoke-Sqlcmd -ServerInstance $server -Query $sqlcmd -Database $db;
$servers | Foreach-Object {
Write-Host Querying $Server.Name;
Invoke-Command -ComputerName $Server.Name -Cred $Cred -ScriptBlock $ScriptBlock;
}

This ends up accomplishing the same thing, albeit in a bit more code (but not much, so I’m fine). I tested this and in about 2 minutes I could pull back a tremendous amount of information on all the servers, ranging from memory to PerfMon stats to even SQL Server specific information like database names and sizes. Truly a timesaver; I really should use Powershell more often!