In this two-part series, we’ll be covering the processes essential to putting together a rock-solid Disaster Recovery system. In this series we’ll cover everything from the types of disasters your business may be at risk for to identifying your critical data to recovery systems.
For a company that is considering a Disaster Recovery plan it is important to note that, in the event of a disaster, the very existence of the business may be in jeopardy. For analyzing the possible impact to your business and conducting a Business Impact Analysis (BIA), see this great resource.
Going through the BIA will give your organization an idea of how urgent deploying a Business Continuity plan and Disaster Recovery (DR) solution is to your business. You can then use this metric to evaluate the cost of putting a DR plan in place and perform implementation trade offs.
There are a wide range of disasters that can adversely affect your business. One that often isn’t thought of, but can be as disastrous as others, is employee error. Depending on how well your current IT systems are built, one small change can cascade into a system-wide failure.
Of course, a good IT team will try and predict all the ways that an employee mistake can affect your business’s systems, but even in a semi-complex system, it will always be humanly impossible to cover every possibility. This is just one reason why a good IT department will have a Disaster Recovery plan in place.
Next up are natural disasters. With natural disasters, it’s usually not hard to convince someone of the possible damage—the problem lies with how often such a disaster will strike. Many of the North East businesses affected by Super Storm Sandy that didn’t have disaster recovery plans in place (or had poor ones), learned the hard way why preparing for an occurrence that may seem unlikely is good business practice.
In addition to hurricanes, depending on where your business operates, you also have to worry about tornadoes, extreme temperature events such as a prolonged heat waves and/or earthquakes.
Another threat to consider is a bit more sinister: malicious intent. This can come in the form of a vindictive employee, a nasty virus or hackers breaking in to cause havoc.
These occurrences happen much more than people realize; in the interest of limiting the damage to the affected company’s brand, such events are often kept out of the news.
Employee Health (widespread or prolonged)
An oft-overlooked threat that impacts Business Continuity can be as simple as a cold/flu outbreak or some other contagious illness. In the strictest sense, this is not a “disaster”, but if a flu epidemic renders an IT staff whereby they cannot function and service the IT equipment, it may seriously impact the business in case of a software failure or a bug which requires some expert system administration. This affects Business Continuity.
Finally on the list of possible disasters, there’s system failure. Even the most prepared systems in the world (the ones that absolutely cannot have any down time) go down. Amazon Web Services (AWS) is a great example of a system that will lose significant business if it goes down.
Yet, even with the threat of significant damage to both its business and its brand, AWS still suffered 5 major outages in 2012 with each one adversely affecting a variety of businesses.
Now that we’ve considered some of the most common causes of disasters, think about what would happen if your business weren’t prepared. Would your business be able to function without email for a week or more? What about vital systems such as credit card processing for your ecommerce site not being available? What is the impact to the business if there is a total loss of data?
While the above possible causes of major business disruptions paint a convincing picture of the need for a well thought out and managed DR plan, many businesses in highly regulated industries such as Health Care, Finance or government, are actually required to have a DR plan.
For example, Health Insurance Portability and Accountability Act of 1996 (HIPAA) compliant companies are required to have a disaster Recovery plan according to the HIPAA Contingency Plan standard in the Administrative Safeguards section of the HIPAA Security Rule. Any company regulated by HIPAA will find it to be a complex undertaking as each system is required to comply with the privacy regulations included in HIPAA as well.
Another example is the Expedited Funds Availability (EFA) Act, 1989, which affects the Financial industry. It requires federally chartered financial institutions to have a demonstrable Disaster Recovery plan to ensure prompt availability of funds.
While the rules and best practices for Disaster Recovery can vary greatly from a HIPAA regulated business to a PCI compliant business, the one thing they all have in common is that these rules make Disaster Recovery systems vastly more complex than for non-regulated industries. An experienced partner can go a long way to making the process of building and maintaining such a system much less painful.
Disasters come in all shapes and sizes and can occur in many ways you often don’t expect, but the key takeaway is that no matter what system your business requires, it’s always cheaper, in the longer term, to be prepared, than to have to rebuild.
Your Disaster Recovery planning won’t be worth much if you haven’t ensured that the right systems are being covered. Also, covering “everything” is rarely a viable option due to the massive costs and time that can be involved in covering every system your company operates.
Generally, a standard business has several if not tens of different systems that it depends on, in some capacity or another, to operate. Within each system there are often multiple components that the system depends upon, in varying degrees, to run smoothly. Very quickly you’ll find yourself thinking about hundreds of components, trying to determine their value to your business.
This is a lot to think about at once, so we suggest starting out by thinking about the kind of business you are in and attributes that describe the business.
Think through the products your business offers. Is it an e-commerce website where downtime means significant loss of revenue? Do any of these products have to be running 24/7? Often, any kind of product that requires internet connectivity tends to be one that is constantly running and requires a highly available deployment architecture.
Is your business one that is heavily dependent on historical data being readily accessible such as a healthcare company? Is it the kind that requires critical systems to be up 24/7 such as one of our customers who is in the education industry where students need access to the student portal at all hours? Do you operate a distributed workforce who, to continue doing their jobs in any capacity, need remote access to email and other company systems? Is your business in a regulated industry, which is governed by certain industry regulations such as HIPAA, PCI, etc.?
It is worth noting that the requirements put on regulated industries by federal and state governments have only become stricter since the terms Disaster Recovery and Business Continuity started appearing in federal legislation in the 1990s. These requirements often go far beyond the actual day-to-day needs of the business and cover extreme cases where significant amounts of data could be lost.
Data storage and handling requirements can vary greatly between regulated industries. New legislation has to be watched carefully for any changes that may affect your business. This especially concerns you if your business is in one of these four industries: Healthcare, Government, Finance or Utilities.
A good place to start is this report from Gartner (while it was released in 2005, it’s still full of some great information). It is vital that your company stays up to date with current laws and regulations that apply to your industry and the Gartner document will help you do that.
Some of our favorite tips from the Gartner document are:
Coming back to the topic of determining which systems are critical – once you have a general sense of what systems your business cannot operate without, you need to rank each system on level of importance. This system should take into consideration the problems of business-critical systems, the number of systems affected by the problem, the number of end users affected by the problem, and how the problem is affecting end-user productivity.
Think through each in terms of how much tolerance your company has for the system going down. Doing this exercise may reveal some surprises. It’s not uncommon during critical system surveys for Disaster Recovery plans to discover that some systems, commonly thought to be absolutely vital to the day to day running of a business, to actually be much more tolerant to down time.
For example, if you have a company that is dependent on your website for your commerce, the internal employee email might not be as critical as the website being up and running. In this scenario, a frequent email backup is sufficient, but a geo-load-balanced web server system is likely more important and critical.
Once you identified the systems most critical to your business, it’s time to decide what the dependencies of each system are. This is important because while you may have a system that is absolutely critical to your company, having it down for hours or even days may not be that detrimental to your business.
Some systems that aren’t at the top of the list in importance to your company may still cause serious issues if they go down.
Now that you have determined the systems most critical to your business and ranked them in value you should have an idea as to the systems your Disaster Recovery plan needs to address first. The next step is reviewing what backup strategies your business already has in place.
Review this Seven Tiers of Disaster Recovery guide on Wikipedia to get a good idea of the gap between where you business needs to be and where it’s backup systems currently stand. The first tier covers business that have absolutely no system in place to recover data or get systems running again in case of a disaster at any scale. For many businesses, this lack of any backup system means the business is constantly in danger of suffering a critical blow form even a minor event.
The highest tier, Tier 7 assumes an entirely automated system that never goes down and provides the highest level of data consistency. Critical government systems such as defense systems and civilians banking systems require such a high level of Disaster Recovery. Chances are that most typical businesses will likely fall in between the bottom and top tiers.
In summary, once you have determined which tier you fall in, have performed the Business Impact Analysis against various failure modes/scenarios, and put a cost or value against that service going down, you are now ready to proceed to the next step of determining your RPO and RTO requirements. Evaluate various RTO and RPO by consulting with a good Disaster Recovery solution provider to put a system and a solution together that meets the RPO/RTO needs that are balanced against the Business Impact and the cost of the solution.
Stay tuned for the next blog post in our series, which will cover failure modes and all possible ways a system can fail.