Disaster recovery testing before an incident occurs is a must. Whether you test to check or test to improve, waiting until a catastrophe happens before unveiling your plan is a sure way to fail. However, a test environment is usually not a production environment. After all, who’s going to let you anywhere near their production machines with your disaster scenarios, metrics, and measures? While this is understandable, if you’re not dealing with the real world and its quirks, you may find that your DR testing misses some important things.

The challenges that testers face when trying to find out if their disaster recovery plan is real-world-viable or not include:

  • Insufficient resources. Many organisations cannot afford to maintain a large team of DR specialists. IT departments are already stretched just in accomplishing everyday tasks.
  • Incomplete testing. If you cannot access production servers for testing, a large part of your DR tests will be missing. Replication of production servers may be possible but requires extra time, effort, and materials.
  • Unrealistic scenarios. Suppose for a moment that you can test on production servers. It’s unlikely that production teams will simply let you pull the plug and crash the servers. Orderly shutdown may be mandatory, but compared to a true disaster situation it will probably also be unrealistic.
  • Untested dependencies. Systems may be highly interlinked or dependent on timing of process interactions. For example, a server replicated on one remote site may not function properly with applications or data running on a second remote site, owing to increased network latency.
  • Configuration changes. You test. Then the configurations change, and you must start all over again.

There is no magic solution for these issues, although certain approaches can help. Automation of disaster recovery testing procedures can address problems of resources and to some extent testing coverage.

DevOps methods make it easier to keep pace with configuration changes, while cloud computing allows for cost-efficient replication of complete systems, including production servers. At the same time, it is crucial to remember that DR testing, like any other form of testing, will always have its limitations.