Being properly prepared when disaster strikes can make all the difference - and save lives, says Jay Patel.
One of the main hurdles many organisations face when it comes to implementing a disaster recovery (DRP) or business continuity plan (BCP) is quite simply how to start. Common sense tells us to break things down into less daunting steps that can simplify the process to get it completed more efficiently.
The most important and quite often the most expensive resource for any organisation to consider is its people. Assets such as property, materials, IT hardware/ software can be replaced, however people cannot. It is therefore advisable to safeguard the safety of all employees before starting anything else. When putting a disaster recovery plan in place it is important to realise that when a real disaster strikes some of your key employees in charge of your key business-critical areas may be away from work, on holiday or off sick. So, you’ll have to take this into consideration and have a plan B.
Consider your people first and foremost and always expect the unexpected.
Getting started with a disaster recovery plan
Impact and risk analysis – Identify mission-critical systems first
Recognise which systems are critical to run the business process and concentrate on protecting those as a matter of priority. Understand the level of risk and impact as not all systems require equal levels of cover. In order to understand where a higher level of protection is needed, perform a risk assessment, identify and prioritise the systems that are mission-critical. Ask questions such as: ‘What keeps your organisation going? Is it email? Accounts? Databases?’
Get all critical data off-site
In the event of a server failure or an unplanned outage you will be able to recover data if you have a copy stored off-site. Even if your data requires being restored to another location, at least your data will be available. Remember that your data is the second most valuable asset.
Understand the cost implication of downtime
This will enable you to understand which areas of your business require what levels of protection. Remember that while the unavailability of some systems may not create a large commercial impact, there may be legal or reputational impact if they are not available or recoverable.
Tape backup is not always adequate
This is by far the most common way for protecting and recovering data, but it may well not be adequate for all of your applications. Tape backup is acceptable for long-term archival and recovery, but it can take a long time to rebuild a system entirely from tape.
Options other than tape
When tape backup is not sufficient, consider other alternatives such as real-time replication to an off-site facility, either managed by a thirdparty provider or implemented in-house if a secondary site is available. It should be noted that off-site replication services should not negate the necessity of maintaining your routine tape backup procedures. Real-time replication solutions can provide for near to zero-time data loss which thus allows immediate system fail-over and data availability. There are various flavours of replication software available in the replication market. Hosted replication is often more cost-effective and the most flexible as it can work with many types of storage technologies, integrate with your IT infrastructure and provide excellent disaster recovery solutions.
Make your disaster recovery plan a part of your normal working routine
Plan for the different types of outages, including a simple defunct hard disk, system hardware failure, a software malfunction, human error, virus or spam attack, a building outage, a regional power failure, environmental disasters, and natural disasters eg hurricanes and river overflows. Ensure procedures are well documented and made available for everyone. Note that all staff members have an important responsibility in the event of a disaster. It’s important that all staff understand their roles during a crisis.
Disaster recovery plan or business continuity plan – what’s the difference?
The difference between these two plans should not be confused. A disaster recovery plan (DRP) is specifically for IT systems within a specific location or a few locations. A business continuity plan (BCP) can be is normally thought of as the comprehensive corporate plan. A business continuity plan is a holistic management process that identifies potential threats to an organisation and the impacts to business operations that those threats, if realised, might cause. It also provides a framework for building organisational resilience with the capability for an effective response that safeguards the interests of its key stakeholders, reputation, brand and value-creating activities.
Creating a realistic business continuity plan can take anything up to nine months, so during this time a disaster recovery solution should be implemented to minimise any business disruption, ie off-site data replication, off-site server mirroring, off-site backup etc.
Disaster recovery planning
Each organisation will differ so your disaster recovery plan should be catered to suit your requirements with the importance placed on your data. Performing a business impact analysis and risk assessment will identify the requirements of the organisation and steer it towards the creation of a disaster recovery plan. Try to visualise what is required at various levels of a disaster or an occurrence, as different situations will invoke completely different procedures, for example, a server or hard disk failure would invoke different recovery procedure than a fire or explosion that can potentially destroy an entire building.
Don’t be afraid to start implementing
Do not wait until you have a complete DR/BCP plan in place to start protecting your organisation. Get your data offsite to a different physical location. Storing a copy of your critical data at another remote facility will allow you to recover and get back to business quickly. Your daily and weekly backup tapes can be stored off-site easily and can be accessible in the case of recovery.
Two key factors to understand when determining priorities are recovery point objective (RPO) and recovery time objective (RTO). RPO is the target point to set for resumption of product, service or activity delivery after an incident. For certain applications, recovering data from yesterday or even last week might be sufficient, thus, the RPO would be days or weeks. For applications and data, where any loss is not acceptable, an RPO of minutes or less is applicable.
While RPO defines how much data is protected, RTO defines how long it takes to recover that data. RTO is the amount of time the application can be down and not available to users or customers.
Testing saves lives
It is imperative to test your plan and a plan is only as good as it is when it is actually invoked. After plan is completed, it is crucial that it is tested adequately to ensure that in a real disaster the plan actually works. Testing the DRC and BC plans provides excellent training for all in your organisation.
I recall when the tragic events of 11 September 2001 and organisations’ disaster recovery plans were initiated. Suddenly staff had to recall the evacuation plan. For organisations that regularly practised the evacuation drill, evacuating the building was seamless and some even made their way to their disaster workplace recovery offices.
For the handful of companies that didn’t have a disaster recovery plan or didn’t practise their fire evaluation drill, panic took over. Some staff made their way upward toward the roof of the Twin Towers in hope of helicopter rescue, but the roof access doors were locked. No plan existed for a helicopter rescues, and on 11 September the thick smoke and intense heat would have prevented helicopters from conducting rescues.
Regular testing is recommended, this way all people at all levels in the company know what to do in an emergency and are aware of the role they play in an invocation scenario.
Tests should be scheduled quarterly or at least every six months in order to take account of any new staff or system changes. For Morgan Stanley top executive Robert Scott, who helped his company survive the heavy toll from 11 September, one leadership lesson is particularly clear. “If you wait for a crisis to begin to lead, it’s too late,” said Scott.
Scott said that 32 years on Wall Street did little to prepare him for the terrorist attacks. But he found that a range of factors, from disaster contingency plans to the actions of well-trained managers, enabled Morgan Stanley – the largest tenant in the World Trade Centre – to come through the disaster with relatively little loss of life. Six of Morgan Stanley’s 3,700 employees died in the attacks.
In the 20 minutes between the first and second plane crashes, Morgan Stanley had implemented an evacuation plan put into place after the 1993 terrorist attack on the World Trade Centre. “It turned out we had most of our people off the high floors before the second plane hit,” Scott said.
Meanwhile, employees in charge of operations, having been drilled in what to do in the event of disaster, walked 22 blocks to Morgan Stanley’s backup site and turned on the computers. “By 9.20am, the backup site was activated,” said Scott. “By 9.30 am senior management had relocated to another site that became our command facility.” Lessons from disaster: Preparedness counts.
The information in the above narrative provides basic information for building a disaster recovery plan. It is important to realise that while a complete DRP or BCP may take some considerable time there are immediate plans that can be put in place, ie off-site data replication, off-site server replication, off-site data backup etc which are extremely costeffective these days and if implemented correctly can enhance the IT support function as well. In essence your DR plan is a portion of the larger BC plan. It is recommended not to leave your systems and thus your organisation at risk during the BC planning process; take immediate action to safeguard them and seek assistance from experts in this area.