Posted by Josh | Posted in SQL Server | Posted on 04-04-2011
This is part three of a series cataloging my search for the ideal High Availability solution for my development and QA environments. The other parts:
- Part 1 – an introduction to why dismissing the need for HA in development / test systems is not necessarily the right decision.
- Part 2 – a quick poll of the community to see what others are doing.
In part three, we’re going to take a look at the first option for providing HA: clustering.
So what exactly is a cluster? From MSDN:
A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource. A cluster provides a single name for clients to use and a single administrative interface, and it guarantees that data is consistent across nodes.
Windows Clustering encompasses two different clustering technologies. These technologies implement the following two types of clusters.
- A network load balancing cluster filters and distributes TCP/IP traffic across a range of nodes, regulating connection load according to administrator-defined port rules.
- A failover cluster provides high availability for services, applications, and other resources through an architecture that maintains a consistent image of the cluster on all nodes and that allows nodes to transfer resource ownership on demand.
In this post we’re going to look exclusively at failover clustering. Network load balancing will be looked at in a later post, combined with the log-shipping HA methodology.
Briefly, a SQL failover cluster is composed of one or more (yes, you can make a one node cluster, though it obviously negates the HA feature) nodes, each with SQL Server installed on them. The nodes are backed by a common set of shared storage, such as a series of SAN-provided LUNs. The shared storage is used to house all SQL related files (except binaries, which are stored locally on each cluster), as well as any other items that are part of the cluster, such as backup volumes or custom components. An example of the latter might include custom Powershell or other scripts used by Agent jobs or whatnot.
Prior to Windows Server 2008, Microsoft only supported clustering between identical hardware, and that hardware had to be in the Windows Server Catalog. Basically, this a huge list of hardware that has been certified by Microsoft as being a supported configuration. If your hardware isn’t in that list, and you call Microsoft for support of a cluster issue, they will “offer troubleshooting tips “(reference this KB), but a resolution is not guaranteed.
Now in my environments (and I would guess in many other development environments), the vast majority of systems are virtual machines, rather than physical servers. That being the case, we would refer to this KB article from VMWare, listing all the documents and links to information about clustering under VMWare. To me, the most useful one is the link to the Cluster team’s blog, wherein we find this tidbit:
Windows Server 2003
For a cluster solution to be supported by Microsoft it must be a tested solution which has been qualified and verified to function properly with the Failover Clustering feature. The full Windows Server 2003 cluster support policy is documented here: http://support.microsoft.com/kb/309395
When a cluster solution has been qualified it will receive a ‘Designed for Microsoft® Windows® Server 2003′ logo and be listed on the Windows Server Catalog under “Cluster Solutions”.
Two separate VMware configurations have received a logo and are supported in Windows Server 2003 with vSphere 4.0 and EMC storage. One configuration is with EMC V-Max storage and the other with EMC CLARiiON CX4 storage. Details can be found here:
These are the only two supported Windows Server 2003 guest clustering configurations. The Windows Server 2003 cluster logo program stopped accepting new submissions as of 12/31/09, no additional configurations will be added in the future.
Basically, for Windows 2003 clustering, only the two configurations listed are supported. Outside of that, you’re on your own.
For Windows Server 2008, the outlook is slightly better:
Windows Server 2008 / Windows Server 2008 R2
The Microsoft support policy for Failover Clustering radically changed with Windows Server 2008 to become much more flexible. The following criteria must be met for a solution to be supported by Microsoft:
1. On a host Windows Server Failover Cluster all hardware and software components must meet the qualifications to receive the appropriate “Certified for Windows Server 2008” or “Certified for Windows Server 2008 R2” logo.
a. If a guest cluster is running inside a virtual machine on non-Microsoft hardware virtualization software, the virtual machine must be hosted by a virtualization solution that is listed in the Server Virtualization Validation Program
2. The solution must not fail any of the tests in the cluster Validation tool. With virtualized servers in a cluster, run the cluster validation wizard as you would with any other new cluster. The requirement for running the wizard is the same regardless of whether you have a “host cluster” (where failover will occur between two physical computers), a “guest cluster” (where failover will occur between guest operating systems all on the same physical computer), or some other configuration that includes one or more virtualized servers.
I read this as “so long as your hosts and VM provider are certified, and the cluster validation wizard passes you, you’re supported.”
There is also an excellent whitepaper from VMWare on the subject, available here. Some points of note:
- It includes a detailed, step-by-step set of instructions on how to configure VMs to allow for clustering.
- Only Windows Server 2003 and Windows Server 2008 R2 are considered supported.
- By configuring your VMs in this manner, you lose the following VMWare features:
Note: my VMWare admins tell me that these settings also break snapshots and vRanger backups, though that’s not mentioned here.
For SQL 2008 R2 / Windows Server 2008 R2, the cluster solution appears valid, albeit a bit of a complicated setup and at the cost of DRS and live vMotion capability. I can vouch that the configuration does work as advertised, once you get all the pieces right. If you’re looking to implement this, do yourself (and your VMWare admins) a big favor: read the [fantastic] manual.
For Windows 2008 / SQL 2008 and Windows 2003 / SQL 2005 (our standard builds) on the other hand, this doesn’t appear to be a viable solution. The two specific setups listed in the Windows Server Catalog (linked above) aren’t in use where I work, and outside of purchasing physical hardware or relying on hand-me-downs as setups are decommissioned (neither of which I would consider a good approach), I don’t see a way to get a supportable configuration.