Why Today’s Disaster Recovery Test May be Invalid Tomorrow

Test, test, test. Any disaster recovery plan worth its salt must be tested, and any shortcomings revealed by the test must be rectified. That’s how we make DR plans that are robust, relevant, and effective – as long as the configurations of IT systems remain reasonably stable. However, configuration drift is a fact of IT life.

Production systems morph away from previous versions that have been replicated in backup systems, as IT operations teams react to changing environments and business requirements.

After all, enterprises and organisations are supposed to be agile and dynamic, constantly adapting to their fast-moving markets. But where does that leave disaster recovery?

Unfortunately, it doesn’t take much in the way of configuration disparities to derail or at least delay a disaster recovery. Those carefully prepared RTO and RPO objectives, defined according to business objectives and strategy, can end up going out of the window.

Ideally, enterprises would retest their disaster recovery plans every time a configuration change was made, but this would quickly consume an inordinate amount of time and energy. Yet the same quick-fire, often highly automated procedures that are creating this problem may also point the way to an answer.

In a DevOps environment, for example, where system configurations may change as part of a constant feedback loop between development, operations, and end-users, these changes can also be automatically logged. Configuration management databases (or CMDBs, for short) are designed to hold this information and the updates.

Depending on the type of CMDB and the way it is used, rules can be applied to check changes, spot inconsistencies, and send alerts to IT staff. There should always be a continuously up-to-date record of production system configurations that can be replicated directly in backup systems at any time.

Smarter systems still will assess vulnerabilities and impacts, and suggest how to fix them. In that way at the least, even if tests cannot be run every minute of the day, you should never be too far from the configuration truth and successful disaster recovery.

This entry was posted in Disaster Recovery and tagged , , , , , , , , , . Bookmark the permalink.

Comments are closed.