The best laid plans of mice and men may not cover every eventuality in IT disaster recovery. The most you can expect, to paraphrase a saying about models, is that “all DR plans are bad, but some plans are useful” – “bad” simply because they can never precisely match the real-life conditions of a real IT disaster.

Of course, both plans and models remain vital components of business activities, because despite their imprecision, they still help us understand and prepare for incidents, accidents, and the future in general (they are “useful”). The following example shows how real life can play havoc with DR planning.

In a white paper on critical flaws in disaster recovery scenarios, Symantec points to a situation that can arise with larger data storage environments using multiple RDF (Remote Data Facility) groups in EMC installations. Without diving too deep into the detail, the root of the potential problem in disaster recovery comes from these factors:

  • Storage volumes on different RDF groups can be used by the same host and the same application (we’ll use the example of a database system), the groups remaining synchronised for fault-free operation
  • Each RDF group can potentially be linked to different network infrastructures
  • System management tools do not currently alert a user to this combination, if the user happens to configure storage in this way.

A disaster recovery plan may well test such a configuration properly in the event of a catastrophic interruption of IT operations. However, it is less likely to test a situation of gradual failure, a so-called rolling disaster, which is often typical of fires, floods, malware, hackers, and so on.

In the rolling disaster case, failures in network components may be unsynchronised. RDF groups then also lose their synchronisation, leading to corruption of the database system. Although recent backups should be available to restore the whole environment, recovery time and recovery point objectives (RTO and RPO) can then suffer. In short, real life gets in the way of effective disaster recovery planning, because while one orderly disaster scenario can be planned for, an infinity of disorderly ones cannot.