No Nonsense Disaster Recovery Planning
Just a few years ago, disaster recovery was a technology problem; now, it's a business problem. Disaster recovery is changing as rapidly as business is changing. And at its core, disaster recovery involves technology, but recovery is quickly morphing into a euphemism for resiliency.
Larry Marler, disaster recovery coordinator at Southern Farm Bureau Casualty Insurance Co., thinks a lot about hurricanes. Five of the seven states the $2.5 billion multi-line property and casualty insurer serves are on the weather-riddled Atlantic or Gulf coasts. Southern Farm, based in Jackson, Miss., can't afford to be offline, for the sake of its business and customers. "Our resiliency is our ability to fail over," Marler says. Simply, for the company's critical apps, when a failure occurs, another system replaces it almost instantaneously.
Over the past several years, Southern Farm has developed its disaster recovery plan by leveraging technology initiatives such as server virtualization, replication and fault tolerance. While these initiatives were originally sponsored by one or more of the company's eight IT teams-- each responsible for different information technologies--to improve data center efficiencies, reduce costs and management, it became obvious to Marler's bosses they had to piggyback these initiatives with disaster recovery projects.
For instance, Southern Farm embraced server virtualization earlier this decade. Initially promoted by the local area network team, the company reduced its server count by nearly 60 percent, to 170 open system servers. The servers, spread across the Southern Farm offices, contain duplicate systems that can be accessed in one area if disaster strikes another area. "Virtualization not only dramatically reduced the number of servers utilized in the dayto- day business operations, but it also reduces the number of servers we have to recover, reduces recovery costs and significantly speeds up recovery," Marler says.
Disaster recovery testing at Southern Farm revealed that recovering a virtual environment cut recovery times by 90 percent compared with conventional tape recovery.
Similarly, a data replication initiative implemented more than five years ago to satisfy government regulators is used to replicate mission critical data in Southern Farm offices--payroll, claims, premiums, flood insurance applications, property casualty customer information--to the company's central data center as well as remote sites.
According to a July research report on disaster recovery self-assessment from IT consultancy Gartner, so-called Class 1 business processes and services such as critical customer-facing applications or systems that significantly affect revenue or brand image typically require recovery times of zero to four hours.
But there's a cost. Attaining Class 1 status can double a company's initial capital expenditures for equipment and facilities, according to Gartner. Related expenses include additional networking, replication and management capabilities.
Still, companies that leverage production technology for business continuity reduce incremental spending and ultimately drive down costs, says John Morency, Gartner research director for business continuity/disaster recovery. Marler concurs: "In my world, if I'm not able to apply technology to business functions, it's hard to justify the costs."
How much time and money a company dedicates to disaster recovery depends on how dependent the business is on specific systems and how it prioritizes those systems for restore. A company might decide it can live without its back-office systems for a few days, for instance, and opt for a slower but less expensive tape-based recovery system to tide it over.
To determine their tolerance of data loss and level of protection required, Gartner says, companies should conduct a business impact analysis that includes two metrics: recovery time objective and recovery point objective. RTO defines the length of time in which a process must be completed to avoid unacceptable loss of business functions. RPO is the time between the last available backup and the time a disruption could occur. Combined, RTO and RPO help companies identify their data recovery strategies.
In Gartner's self-assessment methodology, RTO and RPO in Class 1 are identical: zero to four hours. But in lower classes there's a wider spread between RTO and RPO. In Class 3, for instance, there's a three-day RTO and a one-day RPO, so the company would rely on tape-based systems, often outsourced, as its principal recovery method.
Ask your IT team:
What new technologies can help meet the company's recovery time objectives?
Ask your CFO:
Are we dovetailing IT and disaster recovery project spending where feasible?