Upgrading the Operating System, or Windows Server, currently can seem a bit daunting because there is no rolling upgrade option available. In the next version of Windows Server there will be a Windows Cluster Rolling Upgrade feature, but you will need to already be on Windows Server 2012 R2 (http://windowsitpro.com/windows-server/failover-clustering-changes-and-improvements-windows-server-vnext). In this post I will walk you through the a strategy on migrating you SQL Server failover cluster to Windows Server 2012 R2 from Windows Server 2008 R2 to that you a ready to take advantage of the rolling upgrade.
Scenario Objective
In this scenario I am:
- Using an existing failover cluster where my space is limited so I want to use the same hardware.
- Retain the same name.
- Avoid creating as much as work possible for my SAN, network and domain admin guys.
I would like to note that the setup I did this on has some key elements in it that make it easier for the transition, some of which you can do yourself if you choose to. I’ll use this opportunity now to outline the environment that I’m doing this for so that you can keep in mind of slight variances that may affect you in your environment.
- 3 Physical nodes in a Majority Node quorum configuration
- Fiber Channel connectivity to a SAN
- 10+ instance of SQL Server 2012 installed running the nodes in an Active-Active-Passive
- Over 100 LUNs (disks from the SAN) are used
- A letter assigned drives are under 1GB and used only as a landing point for the rest as mount points.
- Each installation is assigned its own drive letter.
- Separate OUs in Active Directory for different OS with SQL server having its own OU under each
This may or may not be a bit beyond what you have setup but there are some challenges I must face. First of all, I am going to have to wipe each machine and do a fresh install of Windows 2012 R2, which in itself could be challenging just on software compatibility, never mind hardware compatibility. Assuming that that is ok you cannot just rejoin the box to the cluster, you have to setup a new cluster. I also have to face the fact I am going to somehow move over 100 disks, which would be a fun conversation with your SAN guy. In this case I am also dealing with about 12 instance of SQL Server. Luckily in this setup I have several advantages which will come to light later on.
Preparation Checklist
Some of these might have already been done due to best practices:
- Make sure your SAN admin has labeled your LUNs with the paths that they are assigned on the host nodes it will make it easier for the transition
- Get the IP address and names used for each SQL instance
- Create a new DNS entry for the new failover cluster name
- This is just the name used to manage the cluster, but we are setting up a new cluster so you will need a new entry for it.
- Different OUs are ideal because there will be some settings just easier to manage through Group Policy
Rebuilding the First Node
In this case I have 3 machines in this scenario, you may have less or you may have more so adjust accordingly. The other cool thing I know is that I can have a one node cluster if I need to and in this case I will be removing one node from the old cluster and creating a new cluster with it using a new DNS entry for it. This also means that I will need to add a witness share to the old one after the eviction so that I have an odd number of votes in my quorum.
- The previous step would have created an AD object based off of the DNS name. Make sure the following is done
- In the Failover Cluster Manager on the Windows Server 2012 R2, in this case the evicted node, and create a new cluster using the new DNS entry that you created earlier.
- Assuming you only have one disk which is Disk 0 and C:\ is on that you can to the following to take the disks offline using Powershell (you can create a loop around it which I eventually did because I had over 100 disks to take offline
- Set-Disk [-Number] –IsOffline $True
- After doing this you will find that Disk Management GUI interface will work.
- Re-enable the fiber channel
- Install all the vendor applications related to your SAN (thank the SAN admin)
- Move the server object in Active Directory to the appropriate OU (ie Windows2012R2\SQL)
- Add on your .Net Features, Failover Clustering, Multipath IO, etc.
- Wipe evicted node and install Windows 2012 R2 (see vendor docs for this) as well as setup any permissions that the SQLS service accounts may need for your setup.
- Disable the fiber channel ports on the switch for the evicted node or simply pull the cables out
- You do not want the fiber channel talking to the SAN just yet otherwise you installation of Windows will be a difficult task.
- Change the quorum of old cluster so that you have an odd number of votes.
- Evict a node from the cluster
- Can be done from the Failover Cluster manager by simply going under nodes, pick your passive node, right click and Evict node.
- Now you can validate the newly rebuilt node and add it to the new Failover Cluster and are now ready for some SQL Server migration.
Migrating a SQL Server Failover Cluster Instance
In this section we will experience outage during this process. Depending on comfort level of the steps and the speed of you systems you can migrate an instance in under an hour. It greatly helps if you already have a downloaded copy of the SQL Server build you want that already has the latest service pack rolled into it, applying a CU after the install will not require a reboot.
- Take the entire failover cluster resource of the SQL instance you want to move. This includes services, disk, IP address, and name.
- Move the SQL Active Directory object to the OU of where the new Windows 2012 R2 objects are.
- In the Failover Cluster Manager on the new node in the new cluster go to the Disk section and select add disk
- This list should consist of the disk that you just took offline for that instance. Which makes it easier to identify what you need for that instance. I can also immediately tell which disk will be my root one because my root ones are less than 1GB, otherwise I’d have to match up the signature ids.
- You now either through PowerShell or the Disk Manager you need to find the root and assign it a drive letter. Since I used mount points for the rest of the disk they will appear along with the root.
- Back in the Failover Manager, now would be a good time to rename the cluster disks so you can easily identify them later.
- At this point you should be able to access the disk and the mount points underneath. You will want to traverse to the location of where the system databases are and make a copy of them elsewhere.
- If this was an upgraded system (your system database are in an old path) then just leave the databases there. If this came from an original install (never inline upgraded before) then you will have to delete the systems database files including tempdb.
- Start the SQL Server installation. Change the instance path to match what it was on the old system. Do not change the paths of the system databases or anything else. So in this case I will be using the MSSQL11 path for the installation even though the system databases are in the MSSQL10 path.
- You are doing add Node to a Failover Cluster installation
- Use the same names you have in the other failover cluster
- The root drive should be available during the install if not then something has gone wrong with the cluster disk itself.
- Use the same IP address you use for the failover cluster
- Make sure the collation matches the old install. If you forgot to get this information you can go on the old node, open up the registry (regedit) and go to HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>Setup>Collation
- Add yourself for local admin and make sure you match Windows only or Mixed Mode for Authentication.
- In the Data Directories this is where we will point it to the same installation path. Again you can get an idea if you go to the registry and go to HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>Setup>SQLDataPath
- You may need to setup FileStream here if you had it previously setup as well
- Install all the SQL Server patches to bring it to the same level as it was before. If you had the service pack already rolled in it will save you a lot of time and a reboot.
- Make all the other drives dependent on the parent drive in failover cluster
- Change the TCP ports to match what it was on the old system.
- This is only relevant if it not the default port and you have firewall rules and/or SPN for delegation setup
- Take the SQL Server service offline.
- Depending on to location of the system database you can 2 scenarios
- If the instances was ever previously involved in an inline upgrade then you can simple go to the registry and change the location of the Master Database files and Errorlog to where the older one is. ***Below shows example of major build version 11 vs 10
- HKLM>Software>Microsoft>Microsoft SQL Server>MSSQL1x.Instancename>MSSQLServer>Parameters
- If the instance was never previously involved in an inline upgrade then you will take the copy of the system databases you previously backed up and overwrite the new installed ones.
- Bring the SQL Server Service back online.
- Done. Enjoy a beer.
Rolling Back
If you are seasoned you probably already know that nothing always works perfectly and you should have a roll back plan. With this scenario even at the very end I still have my old cluster with everything needed to run I could just shut down this instance, or uninstall, or maybe it did not install all the way, but I could turn the old instance back on as long as I do one thing, move the AD object back into the same OU it was in previously. That may work without a hitch or you may have to recreate the name in the failover cluster which is just reading the DNS entry to the failover cluster. Get past that and it should come back online no worries.
Finishing it off
Basically I repeat the process until I have enough moved over that I want to evict my next node, rebuild that and join it to the cluster and redo all the installations by adding it to the cluster, then finish moving any leftover instance and rebuild the last node. And at the end of it all you can clean up the left over AD object and DNS entry for the old cluster manager object and you are no on a fully new functioning system.
Final Thoughts
This process I was able to reuse the system databases, the existing LUN mappings and the utilization of map points to make this transition easier. In my next iteration of this topic I will be looking at transferring to new hardware using a similar technique by reusing system databases and names but causing my SAN admin grief by mapping my 100s of LUNs so it will cost me in beer. I would like to stress that you need to be at the same SQL Server build level to be able to reuse the system database technique, this is not a SQL Server upgrade but simply an OS upgrade.