Approachable Disaster Recovery – How to Do it Right
by Chris Vacanti, Senior Consultant
Recently I had the privilege of co-presenting a webinar titled Virtualization: The Key to Approachable Disaster Recovery, a part of the BrightTalk Storage and Virtualization Summit. In this webinar we discussed how Disaster Recovery (DR) can be an approachable and achievable goal for all businesses regardless of their operational size. I want to reiterate that – DR can be achieved and is approachable using a methodical approach that we have mastered here at House of Brick.
I am positive that Disaster Recovery is a common concern for anyone in business who uses computers for any part of their business from storing customer information in a database, and taking orders online to, responding to customer needs in email and processing work orders.
It is not uncommon for DR to find its way onto the agenda of a random meeting where a question to IT leadership is posed – inquiring if the company has DR and/or if it works. To which IT leadership will typically answer in the affirmative. But how did they get to that point? Who leads internal conversations around DR? Who owns it? How did DR get on the agenda and why is it treated so casually? Effective DR cannot be relegated to a simple bullet item on an operations meeting agenda and be glossed over so quickly.
DR cannot be successful without stepping through a series of exercises, which if not completed correctly, will result in a false sense of security founded on an untested and non-validated DR platform.
Since DR is protecting our infrastructure, by having a second copy somewhere else that we can go to in an emergency, it should be easy to be successful right? No, it’s that simple. A DR plan must include ample time for discovery, planning, implementation, testing, and review in order to be successful.
In this blog, I will walk through each step of the process and provide guidance towards establishing a strategy for moving through all cycles of the DR process, so you can be successful. In short, we’re going to walk through the Who, What, and How of DR in this post.
Establish the Who
I want you to gather your DR team for a meeting. This team has to be a diverse mixture of operations and IT.
You must have everyone in the room engaged, focused, and ready to participate. If they don’t participate, replace them. Don’t get me wrong – we want respect, listening, and honesty. But everyone MUST participate and speak up. Everyone needs to act like his or her part of the puzzle is the most important part of the puzzle. This helps ensure nothing is missed in the discovery process. We’re talking about the revenue of the company in this meeting, and how to identify and protect that revenue, so no yes-men allowed.
Define the What
Once the proper team is identified, it’s time to identify the What that you need to protect with DR. This is where the meeting gets fun. This is where each person in the team presents their respective opinion of what needs to be protected in order for the company to run effectively. Each person should be given time to speak and each person should speak. I suggest using Post-it notes and writing down business components that represent each person’s opinion and posting them on a board.
Once everything is on the board, it is time to trim down to the leanest operating model of the business, which will vary based on each business. Some might need to be able to print in an emergency. Some may need to access their website, while others may not need such access. Each critical component to bringing in revenue should be considered. Keep in mind that this part of the conversation is heavily driven by the business units. Therefore IT should not try to influence this part of the conversation, unless clarifying operational process that impact revenue. This information is the What that should be protected. During NO time in this meeting should a solution be discussed, and no one should talk about cost or budget.
Determine the How
After the What is identified, it’s time for IT to do what they do best – solve the How. How will all this information need to be protected? Does it go into the cloud, or a second datacenter? How much equipment will this take? How much will it cost? How fast should everything be online? I encourage IT to come up with three options here – 1) The lowest risk of data loss, fastest recovery time, highest cost; 2) Medium risk of data loss, slower recovery time, medium cost; and 3) Highest risk of data loss, slowest recover time, lowest cost.
As I stated during our webinar, there are many options for the How that can bring the business back online after a disaster. The Recovery Time Object (RTO), how long it takes to come back online, will be determined by the business. While oftentimes a statement is made to the effect that everything should come back online instantly, you will need to consider the fact that the faster the business needs/wants the systems back online, the higher the cost will be. The Recovery Point Objective (RPO), how much data loss the business can tolerate, will also affect the cost. Some businesses can survive having their systems offline for a couple days, where other businesses need instant recovery. Again, the RPO is driven by the needs of the business units.
I want to admonish those IT teams who never pre-determine what the business will or will not approve. After all, it is our responsibility in IT to present the business with all of the available options. It is then up to the business to determine what to spend, with the spend being determined by the risk tolerance of the business.
Plan and Test
After these three components are determined (who, what, and how), it is time to implement the plan and then test it. A successful DR plan is never complete until tested. DR testing should include testing all business processes protected in the DR plan, and testing should always begin in an isolated space – a test bubble. As we shared in our webinar, approachable DR can be accomplished by leveraging the power of VMware’s vSphere. If taking customer orders is part of the scope, then testing in the bubble should include the ability to place a customer order. It may not be enough to verify if a server powers on. Instead, it may be necessary to bring the server and applications online, and generate an invoice or place an order in the system.
The final part of the plan is to determine the fail back process. How does it happen? When does it happen? How soon? Will there be reverse-protection?
The success of the DR planning is dependent on the relationship between business units and IT. Do you have an adversarial relationship, or do your teams work together well? In some cases, it can be better to bring someone in who can help broker those conversations, a consultant like House of Brick. So give us a call, we’d love to help guide your business through the DR planning and implementation cycles.