Upgrading the OS of a SQL Server Failover Cluster – Same Hardware, Same Name

Upgrading the Operating System, or Windows Server, currently can seem a bit daunting because there is no rolling upgrade option available.   In the next version of Windows Server there will be a Windows Cluster Rolling Upgrade feature, but you will need to already be on Windows Server 2012 R2 (http://windowsitpro.com/windows-server/failover-clustering-changes-and-improvements-windows-server-vnext). In this post I will walk you through the a strategy on migrating you SQL Server failover cluster to Windows Server 2012 R2 from Windows Server 2008 R2 to that you a ready to take advantage of the rolling upgrade.

Scenario Objective

In this scenario I am:

  • Using an existing failover cluster where my space is limited so I want to use the same hardware.
  • Retain the same name.
  • Avoid creating as much as work possible for my SAN, network and domain admin guys.

I would like to note that the setup I did this on has some key elements in it that make it easier for the transition, some of which you can do yourself if you choose to.   I’ll use this opportunity now to outline the environment that I’m doing this for so that you can keep in mind of slight variances that may affect you in your environment.

  • 3 Physical nodes in a Majority Node quorum configuration
  • Fiber Channel connectivity to a SAN
  • 10+ instance of SQL Server 2012 installed running the nodes in an Active-Active-Passive
  • Over 100 LUNs (disks from the SAN) are used
  • A letter assigned drives are under 1GB and used only as a landing point for the rest as mount points.
  • Each installation is assigned its own drive letter.
  • Separate OUs in Active Directory for different OS with SQL server having its own OU under each

This may or may not be a bit beyond what you have setup but there are some challenges I must face.   First of all, I am going to have to wipe each machine and do a fresh install of Windows 2012 R2, which in itself could be challenging just on software compatibility, never mind hardware compatibility.   Assuming that that is ok you cannot just rejoin the box to the cluster, you have to setup a new cluster.   I also have to face the fact I am going to somehow move over 100 disks, which would be a fun conversation with your SAN guy. In this case I am also dealing with about 12 instance of SQL Server. Luckily in this setup I have several advantages which will come to light later on.

 

Preparation Checklist

Some of these might have already been done due to best practices:

  • Make sure your SAN admin has labeled your LUNs with the paths that they are assigned on the host nodes it will make it easier for the transition

FCAdvantage

 

  • Get the IP address and names used for each SQL instance
  • Create a new DNS entry for the new failover cluster name
    • This is just the name used to manage the cluster, but we are setting up a new cluster so you will need a new entry for it.
  • Different OUs are ideal because there will be some settings just easier to manage through Group Policy

 

Rebuilding the First Node

In this case I have 3 machines in this scenario, you may have less or you may have more so adjust accordingly.   The other cool thing I know is that I can have a one node cluster if I need to and in this case I will be removing one node from the old cluster and creating a new cluster with it using a new DNS entry for it. This also means that I will need to add a witness share to the old one after the eviction so that I have an odd number of votes in my quorum.

  1. The previous step would have created an AD object based off of the DNS name. Make sure the following is done
    • Make sure it has the ability to create child objects in the OU these servers are going to and make sure it is in that OU as well
    • Go to each SQL Failover Cluster name AD object and make sure it has full control
    • FC_OU
  2. In the Failover Cluster Manager on the Windows Server 2012 R2, in this case the evicted node, and create a new cluster using the new DNS entry that you created earlier.
  3. Assuming you only have one disk which is Disk 0 and C:\ is on that you can to the following to take the disks offline using Powershell (you can create a loop around it which I eventually did because I had over 100 disks to take offline
    • Set-Disk [-Number] –IsOffline $True
    • After doing this you will find that Disk Management GUI interface will work.
  4. Re-enable the fiber channel
    • FCNtfserrors
  5. Install all the vendor applications related to your SAN (thank the SAN admin)
  6. Move the server object in Active Directory to the appropriate OU (ie Windows2012R2\SQL)
  7. Add on your .Net Features, Failover Clustering, Multipath IO, etc.
  8. Wipe evicted node and install Windows 2012 R2 (see vendor docs for this) as well as setup any permissions that the SQLS service accounts may need for your setup.
  9. Disable the fiber channel ports on the switch for the evicted node or simply pull the cables out
    • You do not want the fiber channel talking to the SAN just yet otherwise you installation of Windows will be a difficult task.
  10. Change the quorum of old cluster so that you have an odd number of votes.
  11. Evict a node from the cluster
    • Can be done from the Failover Cluster manager by simply going under nodes, pick your passive node, right click and Evict node.
    • You will also need to make sure that the cluster AD object has access to the SQL instance DNS entry
    • FC_DNS
  12. Now you can validate the newly rebuilt node and add it to the new Failover Cluster and are now ready for some SQL Server migration.

 

Migrating a SQL Server Failover Cluster Instance

In this section we will experience outage during this process. Depending on comfort level of the steps and the speed of you systems you can migrate an instance in under an hour. It greatly helps if you already have a downloaded copy of the SQL Server build you want that already has the latest service pack rolled into it, applying a CU after the install will not require a reboot.

  1. Take the entire failover cluster resource of the SQL instance you want to move. This includes services, disk, IP address, and name.
  2. Move the SQL Active Directory object to the OU of where the new Windows 2012 R2 objects are.
  3. In the Failover Cluster Manager on the new node in the new cluster go to the Disk section and select add disk
    • FC_adddisk
  4. This list should consist of the disk that you just took offline for that instance. Which makes it easier to identify what you need for that instance. I can also immediately tell which disk will be my root one because my root ones are less than 1GB, otherwise I’d have to match up the signature ids.
  5. You now either through PowerShell or the Disk Manager you need to find the root and assign it a drive letter. Since I used mount points for the rest of the disk they will appear along with the root.
  6. Back in the Failover Manager, now would be a good time to rename the cluster disks so you can easily identify them later.
  7. At this point you should be able to access the disk and the mount points underneath.   You will want to traverse to the location of where the system databases are and make a copy of them elsewhere.
  8. If this was an upgraded system (your system database are in an old path) then just leave the databases there. If this came from an original install (never inline upgraded before) then you will have to delete the systems database files including tempdb.
    • NOTE: it is important to know where the paths are for the system databases are and where the installation is going to. In this example I will be using a previously inline upgraded instance.
    • FC_paths
  9. Start the SQL Server installation. Change the instance path to match what it was on the old system. Do not change the paths of the system databases or anything else. So in this case I will be using the MSSQL11 path for the installation even though the system databases are in the MSSQL10 path.
    1. You are doing add Node to a Failover Cluster installation
    2. Use the same names you have in the other failover cluster
      • FCInstallNames
    3. The root drive should be available during the install if not then something has gone wrong with the cluster disk itself.
      • FCFindDisk
    4. Use the same IP address you use for the failover cluster
    5. Make sure the collation matches the old install. If you forgot to get this information you can go on the old node, open up the registry (regedit) and go to HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>Setup>Collation
    6. Add yourself for local admin and make sure you match Windows only or Mixed Mode for Authentication.
    7. In the Data Directories this is where we will point it to the same installation path. Again you can get an idea if you go to the registry and go to HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>Setup>SQLDataPath
      • FCInstallPaths
    8. You may need to setup FileStream here if you had it previously setup as well
  1. Install all the SQL Server patches to bring it to the same level as it was before. If you had the service pack already rolled in it will save you a lot of time and a reboot.
  2. Make all the other drives dependent on the parent drive in failover cluster
  3. Change the TCP ports to match what it was on the old system.
    1. This is only relevant if it not the default port and you have firewall rules and/or SPN for delegation setup
  1. Take the SQL Server service offline.
  2. Depending on to location of the system database you can 2 scenarios
    1. If the instances was ever previously involved in an inline upgrade then you can simple go to the registry and change the location of the Master Database files and Errorlog to where the older one is. ***Below shows example of major build version 11 vs 10
    2. HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>MSSQLServer>Parameters
    3. If the instance was never previously involved in an inline upgrade then you will take the copy of the system databases you previously backed up and overwrite the new installed ones.
  1. Bring the SQL Server Service back online.
  2. Done. Enjoy a beer.

 

Rolling Back

If you are seasoned you probably already know that nothing always works perfectly and you should have a roll back plan. With this scenario even at the very end I still have my old cluster with everything needed to run I could just shut down this instance, or uninstall, or maybe it did not install all the way, but I could turn the old instance back on as long as I do one thing, move the AD object back into the same OU it was in previously. That may work without a hitch or you may have to recreate the name in the failover cluster which is just reading the DNS entry to the failover cluster. Get past that and it should come back online no worries.

 

Finishing it off

Basically I repeat the process until I have enough moved over that I want to evict my next node, rebuild that and join it to the cluster and redo all the installations by adding it to the cluster, then finish moving any leftover instance and rebuild the last node. And at the end of it all you can clean up the left over AD object and DNS entry for the old cluster manager object and you are no on a fully new functioning system.

 

Final Thoughts

This process I was able to reuse the system databases, the existing LUN mappings and the utilization of map points to make this transition easier.   In my next iteration of this topic I will be looking at transferring to new hardware using a similar technique by reusing system databases and names but causing my SAN admin grief by mapping my 100s of LUNs so it will cost me in beer. I would like to stress that you need to be at the same SQL Server build level to be able to reuse the system database technique, this is not a SQL Server upgrade but simply an OS upgrade.

 

 

 

 

 

 

Standard

Techfest Vancouver 2009

Well I presented at TechFest Vancouver 2009 at BCIT on Saturday. My presentation was on T-SQL in SQL Server 2008. I was glad to be the first one up because there were a couple of SQL Server MVPs (Greg Larsen and Todd McDermid) following afterwards.

The presentation itself went well and I think the timing went smoothly. Luckily I only had one Americano otherwise I would have been twitchy.

I did contemplate installing SQL Server 2008 R2 on my laptop before the presentation; more or less because I was curious if much has changed in the T-SQL realm when jumping from 2008 to 2008 R2. Of course I didn’t because Presentation Skills 101 says that you shouldn’t mess with your demo environment just before the presentation. Now that the presentation is over I intend to install R2 (Nov CTP) to see what has changed.

For those of you looking for my presentation and demo sample you can find them at http://www.vancouvertechfest.com/Downloads/tabid/450/language/en-US/Default.aspx .

Also, the session was recorded just keep going back to http://www.vancouvertechfest.com to see if it has been uploaded yet.

Standard

Case of the Non-Yielding Schedulers in SQL Server 2005

The worst kind of issues are the kind that leave you with almost nothing to go on.  On September  15th, I came upon this issue, with my co-DBA, on a production cluster, SQL Server 2005 x64 Enterprise SP3 and then some, where it would seem at all connections to the server would stop for about 10 minutes and then SQL Server would restart itself and everything would work.  Granted, 10 minutes is not a lot of time, but if it occurs at the wrong time it can be devastating to just about any business.

 Needless to say after going through the logs all that was left was a SQL mini-dump log of this:Capture7

A first we thought perhaps it was a fluke, but then it occurred a couple week later.  This is when we opened a call to Microsoft who then told us to put the trace flag -T2330 into the Startup Parameters of the clustered instance.

 Then we thought that is was perhaps a NUMA issue introduced in one of the Cumulative Hotfix Updates.   As it turns out the trace flag actually turns of  tracking if Index Usage stats collection.  In this case an index was being created on a temp table in the tempdb, furthermore this is a known issue in SQL Server 2005, but it said to have been fixed in SQL Server 2008.

 So for anyone seeing mysterious non-yielding scheduler in the error logs try adding -T2330 to the startup parameters or upgrade to SQL Server 2008.

 I did recently notice however that recently this KB article came out on November 2nd, 2009 http://support.microsoft.com/kb/974205 which suggests that CU6 should fix the issue, personally, I’m going to leave that trace flag in until I migrate to SQL Server 2008, because my users notice 10 minutes of outage!!!

 – Rich Baumet

Standard

PASS Summit 2009 Reflection

This years’ PASS Summit was from beginning to end filled with information and fun.   Even during tough economic times it seems that PASS seems to thrive!   Ok, so the booze was no longer covered and there was no mass Guitar Hero time, but the community itself thrived!   You could still feel the spirit of the summit as you talked to new people and reunited with the same crew.

This year had a few first’s for me:

                1. Twitter – I try not to bend to the will of popularity but there is some sustenance to this “mini-blog” (quoting someone one this but fail to remember who…) My twitter name is @RichBaumet

                2. Karaoke – Again, I bent to the will of popularity but for good reason.  There are some serious voices in the PASS community (not mine)

                3. Itzik Ben-Gan – I thought I understood most thing about T-SQL until Itzik.  Never been to one of his sessions until pre-con and I may very well see his session every chance I get.

                4. Kimball – This year I tried to go to something different and this is one of those that I will go back to the recordings to watch again. 

                5. Women in Technology Luncheon – Pleasantly surprised by the passion and the turn out.  I think it was important for me to take for I can be oblivious to some issues.

So all in all I learned something new, got answers to some questions I had, met new people,  and reconnected with old friends.  Summit 2009 was the best one yet in terms of content and community, and it didn’t need Guitar Hero to achieve that! 

So what can top that next year?  Tweets suggest a duo presentation of Buck Woody and Paul Randal in kilts!  So if you missed out in 2009, then you better signup for 2010

On a final note, hats off to PASS and the entire community that came out because it rocked!  And I apologize for not listing off everyone I met and had fun with for fear of forgetting someone, but I’ll see you all again next year.

 – Rich Baumet

Standard