Take a Realistic Approach to Disaster Recovery Testing

You have made significant investments in availability and disaster recovery – but your ability to recover hasn’t been tested in years. Testing will:

Improve your DR capabilities.
Identify required changes to planning documentation and procedures.
Validate DR capabilities for interested customers and auditors.

Our Advice

Critical Insight

If you treat testing as a pass/fail exercise, you aren’t meeting the end goal of improving organizational resilience.
Focus on identifying gaps and risks, and addressing them, before a real disaster hits.
Take a realistic, iterative approach to resilience testing that starts with small, low-risk tests and builds on lessons learned.

Impact and Result

Identify testing scenarios and scope that can deliver value to your organization.
Create practical test plans with Info-Tech’s template.
Demonstrate value from testing to gain buy-in for additional tests.

Take a Realistic Approach to Disaster Recovery Testing Research & Tools

Besides the small introduction, subscribers and consulting clients within this management domain have access to:

1. Take a Realistic Approach to Disaster Recovery Testing Storyboard – A guide to establishing a right-sized approach to DR testing that delivers durable value to your organization.

Use this research to understand the different types of tests, prioritize and plan tests for your organization, review the results, and establish a cadence for testing.

Take a Realistic Approach to Disaster Recovery Testing Storyboard

2. Disaster Recovery Test Plan Template – A template to document your organization's DR test plan.

Use this template to document scope and goals, participants, key pre-test milestones, the test-day schedule, and your findings from the testing exercise.

Disaster Recovery Test Plan Template

3. Disaster Recovery Testing Program Summary – A template to outline your organization's DR testing program.

Identify the tests you will run over the next year and the expertise, governance, process, and funding required to support testing.

Disaster Recovery Testing Program Summary

[infographic]

Take a Realistic Approach to Disaster Recovery Testing

Reduce costly downtime with a right-sized testing program that improves IT resilience.

Analyst Perspective

Reduce costly downtime with a right-sized testing program that improves IT resilience.

Andrew Sharp

Most businesses make significant investments in disaster recovery and technology resilience. Redundant sites and systems, monitoring, intrusion prevention, backups, training, documentation: it all costs time and money.

But does this investment deliver expected value? Specifically, can you deliver service continuity in a way that meets business requirements?

You can’t know the answer without regularly testing recovery processes and systems. And more than just validation, testing helps you deliver service continuity by finding and addressing gaps in your plans and training your staff on recovery procedures.

Use the insights, tools, and templates in this research to create a streamlined and effective resilience testing program that helps validate recovery capabilities and enhance service reliability, availability, and continuity.

Andrew Sharp

Research Director, Infrastructure & Operations
Info-Tech Research Group

Executive Summary

Your Challenge

You have made significant investments in availability and disaster recovery (DR) – but your ability to recover hasn’t been tested in years. Testing will:

Improve your DR capabilities.
Identify required changes to planning documentation and procedures.
Validate DR capabilities for interested customers and auditors.

Common Obstacles

Despite the value testing can offer, actually executing on DR tests is difficult because:

Testing is often an IT-driven initiative, and it can be difficult to secure business buy-in to redirect resources away from other urgent projects or accept risks that come with testing.
Previous tests have been overly complex and challenging to coordinate and leave a hangover so bad that no one wants to do them again.

Info-Tech's Approach

Take a realistic approach to resilience testing by starting with small, low-risk tests, then iterating with the lessons you’ve learned:

Identify testing scenarios and scope that can deliver value to your organization.
Create practical test plans with Info-Tech’s template.
Get buy-in for regular DR testing from key stakeholders with a testing program summary.

Info-Tech Insight

If you treat testing as a pass/fail exercise, you aren’t meeting the end goal of improving organizational resilience. Focus on identifying gaps and risks so you can address them before a real disaster hits.

Process and Outputs

This research is accompanied by templates to help you achieve your goals faster.

1 - Establish the business rationale for DR testing.
2 - Review a range of options for testing.
3 - Prioritize tests that are most valuable to your business.
4 - Create a disaster recovery test plan.
5 - Establish a Test Program to support a regular testing cycle.

Outputs:

DR Test Plan
DR Testing Program Summary

Example Orange Activity slide.
Orange activity slides like the one on the left provide directions to help you make key decisions.

Key Deliverable:

Disaster Recovery Test Plan Template

Build a plan for your first disaster recovery test.

This document provides a complete example you can use to quickly build your own plan, including goals, milestones, participants, the test-day schedule, and findings from the after-action review.

Why test?

Testing helps you avoid costly downtime

In a disaster scenario, speed matters. Immediately after an outage, the impact on the organization is small, but impact increases rapidly the longer the outage continues.
A quick and reliable response and recovery can protect the organization from significant losses.
A DRP testing and maintenance program helps ensure you’re ready to recover when you need to, rather than figuring it out as you go.

“Routine testing is vital to survive a disaster… that’s when muscle memory sets in. If you don’t test your DR plan it falls [in importance], and you never see how routine changes impact it.”

– Jennifer Goshorn
Chief Administrative Officer
Gunderson Dettmer LLP

Info-Tech members estimated even one day of system downtime could lead to significant revenue losses. Estimated loss of revenue over 24 hours. Core Infrastructure has the highest potential for lost revenue.

Average estimated potential loss* in thousands of USD due to a 24-hour outage (N=41)

*Data aggregated from 41 business impact analyses (BIAs) conducted with Info-Tech advisory assistance. BIAs evaluate potential revenue loss due to a full day of system downtime, at the worst possible time.

Run tests to enhance disaster recovery plans

Testing improves organizational resilience

Identify and address gaps in your plans before a real disaster strikes.
Cross-train staff on systems recovery.
Go beyond testing technology to test recovery processes.
Establish a culture that centers resilience in everyday decision-making.

Testing keeps DR documentation ready for action

Update documentation ahead of tests to prepare for the testing exercise.
Update documentation after testing to incorporate any lessons learned.

Testing validates that investments in resilience deliver value

Confirm your organization can meet defined recovery time objectives (RTOs) and recovery point objectives (RPOs).
Provide proof of testing for auditors, prospective customers, and insurance applications

Overcome testing challenges

Despite the value of effective recovery testing, most IT organizations struggle to test recovery plans

Common challenges

Key resources don’t have time for testing exercises.
You don’t have the technology to support live recovery testing.
Tests are done ad hoc and lessons learned are lost.
A lack of business support for test exercises as the value isn’t understood.
Tests are always artificially simple because RTOs and RPOs must be met to satisfy customer or auditor inquiries

Overcome challenges with a realistic approach:

Start small with tabletop and recovery tests for specific systems.
Include recovery tests in operational tasks (e.g. restore systems when you have a maintenance window).
Create testing plans for larger testing exercises.
Build on successful tests to streamline testing exercises in the future.
Don’t make testing a pass-fail exercise. Focus on identifying gaps and risks so you can address them before a real disaster hits.

Go beyond traditional testing

Different test techniques help validate recovery against different threats

There are many threats to service continuity, including ransomware, severe weather events, geopolitical conflict, legacy systems, staff turnover, and day-to-day outages caused by human error, software updates, hardware failures, or network outages.
At its core, disaster recovery planning is about recovery. A plan for service recovery will help you mitigate against many threats at once. The testing approaches on the right will help you validate different aspects of that recovery process.
This research will provide an overview of the approaches outlined on the right and help you prioritize tests that are most valuable to your organization.

Different test techniques for disaster recover training: System Failover tests, tabletop exercises, ransomware recovery tests, etc.

00 Identify a working group

30 minutes

Identify a group of participants who can fill the following roles and inform the discussions around testing in this research. A single person could fill multiple roles and some roles could be filled by multiple people. Many participants will be drawn from the larger DRP team.

Roles and expectations for Disaster Recovery Planning. DRP sponsor, Testing coordinator, System testers, business liaisons, executive team.

Input

Organizational context

Output

A list of key participants for test planning and execution

Participants

Typically, start by identifying the sponsor and coordinator and have them identify the other members of the working group.

Start by updating your disaster recovery plan (DRP)

Use Info-Tech’s Create a Right-Sized Disaster Recovery Plan research to identify recovery objectives based on business impact and outline recovery processes. Both are tremendously valuable inputs to your test plans.

Overall Business Continuity Plan

IT Disaster Recovery Plan

A plan to restore IT services (e.g. applications and infrastructure) following a disruption. A DRP:

Identifies critical applications and dependencies.
Defines appropriate recovery objectives based on a business impact analysis (BIA).
Creates a step-by-step incident response plan.

BCP for Each Business Unit

A set of plans to resume business processes for each business unit. A business continuity plan (BCP) is also sometimes called a continuity of operations plan (COOP).

BCPs are created and owned by each business unit, and creating a BCP requires deep involvement from the leadership of each business unit.

Info-Tech’s Develop a Business Continuity Plan blueprint provides a methodology for creating business unit BCPs as part of an overall BCP for the organization.

Crisis Management Plan

A plan to manage a wide range of crises, from health and safety incidents to business disruptions to reputational damage.

Info-Tech’s Implement Crisis Management Best Practices blueprint provides a framework for planning a response to any crisis, from health and safety incidents to reputational damage.

01 Confirm: why test at all?

15-30 minutes

Identify the value recovery testing for your organization. Use language appropriate for a nontechnical audience. Start with the list below and add, modify, or delete bullet points to reflect your own organization.

Drivers for testing – Examples:

Improve service continuity.
Identify and address gaps in recovery plans before a real disaster strikes.
Cross-train staff on systems recovery to minimize single points of failure.
Identify how we coordinate across teams during a major systems outage.
Exercise both recovery processes and technology.
Support a culture that centers system resilience in everyday decision-making.
Keep recovery documentation up-to-date and ready for action.
Confirm that our stated recovery objectives can be met.
Provide proof of testing for auditors, prospective customers, and insurance applications.
We require proof of testing to pass audits and renew cybersecurity insurance.