Jim Hannan (@HoBHannan), Principal Architect
We are very fond of this picture at House of Brick, as it speaks to the common misconception that Disaster Recovery (DR) is just the button you hit, and then magically your business is up and running again at a DR site. Until you gain an understanding of the complexities of setting up a reliable DR environment, it is easy to have a false sense of security.
Over the last few years, we have seen an increase in customers implementing DR. I think a lot of this has to do with supportive technologies like VMware Site Recovery Manager (SRM), and SAN replication products like EMC RecoverPoint and Veeam. These technologies have simplified DR, making it more approachable to implement and support. In this blog we would like to discuss what makes a good DR strategy. At HoB a good DR strategy is very important to us, as too often we watch the industry struggle to do DR properly. Without a good strategy, you are putting your company at risk for experiencing long outages or losing data.
Gathering Business Requirements
Often administrators avoid asking what the DR SLAs need to be in order to protect the business. The first question the business should ask themselves is which applications are business critical and need to be protected, and which ones could be offline for a week while traditional backups are restored (i.e. as Cold DR). The second question is how quickly do the applications protected by a DR solution need to be operational in a declared DR event.
Determining business requirements is the right way to build your DR strategy. Sit down with the correct business decision makers to determine:
- What should be protected in DR?
- What doesn’t need to be protected?
- How long can the applications be down before impacting the business? (This can be different for each application.)
- What amount of data loss is acceptable? (The answer may be no data loss.)
Gathering Technical Requirements
After gathering business requirements, it is time to gather your technical requirements. Have you noticed that we haven’t discussed any software or replication tools yet? At this stage, you shouldn’t. Don’t be tempted to evaluate DR software until after the necessary requirements are gathered.
- What are the infrastructure dependencies to getting the business critical applications up and running? For example, Active Directory, DNS and networking components.
- Determine the data rate of change of the applications. There are different techniques to do this. At HoB we like to, at a minimum, analyze seven or more days worth of data. Keep in mind that this should be customized to your business cycle. For example if you have large month end processes, include month end in the analysis. Determining the data rate of change will allow you to determine the bandwidth needed between the Primary and Secondary/DR sites.
- Build a document for what hardware will be needed at the DR site.
- Create a dependencies list: what applications need to come up in what order to ensure a clean DR failover.
- Include needed application, hardware, and support licenses and costs.
Selecting DR Software and Replication
I mentioned in the beginning of the blog that DR has become more approachable for businesses. This is due to the features that we will discuss in more detail below. I often remind customers that DR is typically, at best, a part-time job. Meaning that you build it well so you can get back to you other daily tasks. What has made DR difficult in the past are things like complex Run Books (order of tasks to bring your DR site online). Often Run Books will involve different IT skills because you may be using something like Oracle Data Guard for the database, and OS replication for the web servers, changing servers’ IP addresses at the DR site, and network redesigns at the DR site. Today’s tooling takes a much more holistic approach, typically using only one approach to protect all of your servers and eliminating a majority of the manual processes involved in bringing DR online.
Both Veeam and VMware’s SRM support bubble testing. This makes DR testing efficient and effective. What is the bubble? The two software products mentioned above wrap a network fence around the VMs to be tested. Think of the network fence like your router (firewall) at home, the local IP addresses are not exposed past the router. Additionally, you can be very restrictive on what can come in and go out, or block all traffic from leaving the router, or in this case the bubble (or network fence). A few years ago I was fortunate enough to work on a project where we needed to clone the applications, but unfortunately the LDAP server didn’t allow for an IP change or server name change. We creatively built our own network bubble by using Linux as the firewall to restrict traffic.
Every good DR strategy has a testing requirement – plan to conduct DR tests at least annually.
I hope this blog helps you get started with a DR strategy. If you’re looking for guidance, House of Brick offers a variety of services for helping you with creating a DR strategy, implementing that strategy and effectively testing your DR.
The Future of DR…
Stay tuned for more information about DR 2.0 in our next blog post.