Posted by Josh | Posted in GTD, SQL Server, The Rookie DBA | Posted on 06-04-2010
Tags: GTD, perspective, rookie DBA, SQL Server, SSIS
I think pretty much everyone who knows me would say that one of my defining characteristics is my singular focus. Once I put my mind to solving some problem or accomplishing something, I damn well poke at it until it’s finished. Now, in many cases this trait has served me quite well; it’s what allowed me to learn programming and SQL Server with no formal training or even so much as a seminar (thank you Internet and amazon.com), and why I consider myself an above average problem solver.
One of my favorite teachers once told me that at times I “missed the forest for the trees.” He went on to explain that at times my work tended to find it’s conclusion too early, then ignore all other points of view. Narrow focus is often good in writing, but not when it excludes other angles of approach completely.
The same could be said of how I function as a DBA. Take a project I am working on: the design and build of an enterprise data warehouse, storing performance data for every SQL server in our environment. It’s been fascinating and a tremendous learning experience, but also at time a real challenge.
Early on one of the requirements was to collect performance data on a second-by-second basis, essentially forming a “baseline” of data to review and use for later comparison. Now being the aggressive lad I am, I expanded upon this as follows: the warehouse should now store second-by-sec0nd data for every server at all times, with a retention of 2 years. Yes, you read that correctly. Two years of data, with points at every second for fifty plus servers. That’s a lot of data folks (think 1,800 rows per minute, per server, or around 129,600,000 rows per day).
So, I set out to prove it could be done. Here’s a brief list of some of the challenges I’ve hit along the way:
- PerfMon’s built-in direct-to-SQL functionality scales downright awfully. The structure of the tables would make any decent DBA shudder (the primary key of the main, and largest, table has a GUID as it’s first column, as an example), the use of bulk inserts (not using MS’s own optimization schemes), and statistics issues to name a few.
- After abandoning that approach we are going with dumping the performance data into flat files, which are then picked up by a SSIS package at intervals, which required a custom script component acting as a data flow source. This is due to the fact that PerfMon records its data in .csv files with a dynamic and unpredictable number and order of columns. Custom logic had to be written to essentially perform a reverse pivot of the data into a more normalized form.
- Even fully optimized, the package simply could not keep up with the incoming flood of files. They would eventually pile up and consume disk space, not to mention having stale data in the warehouse. This was attacked by custom coding a script task in a parent SSIS package that would dynamically spawn multiple threads running a child package to import one file each, up to ten at once (I’m shocked by the way that this isn’t either a standard part of SSIS or that a suitable aftermarket component could not be found). Naturally this promptly brought the server to its knees, being a poor little VM with only four CPUs. But the files did import faster!
Now, throughout all of this, keep in mind that I was going far beyond the original and still stated requirements given to me. Why? Because dammit, I was out to prove that such a huge amount of data could be handled by a little VM running SQL 2008 Standard Edition.
Well, today I cried Uncle. In an effort to actually get some usable data into the system, we’re (read, I’m) reducing our goals to only baselining one server at a time, with others collecting a mere four data points per minute. And damn, I am disappointed we haven’t gotten there yet. But you can believe I’m not giving up. But at the same time, I look back and consider all the time I spent chasing what, pride?
My old teacher was right: I need to stop staring at the one big tree in front of me and enjoy the woods a bit more. Luckily, there’s this little methodology I follow called GTD, and something tells me it’s going to be very useful in this upward movement of focus.