How to avoid unexpected IT downtime
Every business, no matter how big or small, is likely to face unexpected downtime that has the potential to cause significant disruption and ultimately stop core business activities, such as delivering goods or services to customers.
Thankfully, events that can cause major disruption to business operations are rare, but when they do happen, consequences can be disastrous. What sets some businesses apart from others is how prepared they are to deal with unplanned downtime and how quickly they can recover from major disruption.
From power outages, human error to IT failures, fires, floods or even global pandemics, planning for the unexpected is essential to understand and mitigate risks that have the potential to stop your business operations for an extended period of time. Businesses that aren’t prepared to deal with unexpected IT downtime face severe consequences, from revenue loss and cancelled contracts to reputational damage or even regulatory fines.
In this article we explore how to safeguard your IT and operational technology, a vital part of your Business Continuity Management.
Plan, Prepare, Practise
To effectively deal with disasters and maintain business operations, you need a disaster recovery plan and control measures in place to guide how your business prepares for and responds to unexpected events.
Your plans need to map out all the activities that will be performed from the moment a disaster starts up to the point that you can restore affected systems and processes back to their full operational capacity. Control measures need to be part of your company-wide technology strategies, ready to support your business operations should things go wrong.
Whilst having a proactive approach to deal with different types of disasters is a great step in the right direction, you need to schedule regular reviews and tests to ensure these work as intended. You also need to consider how your business continuity and disaster recovery plans might be affected whenever you make changes to your processes and systems.
Whilst having plans to deal with different types of disasters and the right control measures implemented is a great step in the right direction, it is important reviews and tests are conducted to ensure the plans and control measures work as intended. When making any changes to your processes and systems it is also important to consider how your business operations might be affected.
Control measures are mechanisms that enable plans to be performed effectively and can often minimize the impact of an event. This could include the use of backup generators and uninterruptable power supplies to protect again power failures, fire suppression systems to protect against fires, servers replicated to a secondary location to protect against IT downtime and remote access solutions to enable employees to keep working if their normal location is not accessible. In some cases, control measures can even mitigate the effects of a disaster completely.
It won’t happen to us
It can be tempting to avoid preparing for unplanned downtime as the probability of them occurring is very low. Why invest time and money in something that’s unlikely to occur? Think of these preparations as an insurance policy that you hope will never be needed but can protect your business from significant revenue loss and damage to brand reputation should the worst happen.
Disasters can come in many shapes and forms, such as:
- Human error
- Employee sabotage
- Technology failures
- Service provider failures and outages
- Natural events such as fires and floods
- Global events, including pandemics
An Unforeseen Event
In March 2020, many businesses were confronted with an event they never thought would happen – a global pandemic. Millions of people were told to stay at home and businesses were forced to shut their doors. Suddenly offices were empty, people had to work from home, and face-to-face meetings were replaced with online meetings. Whilst very few businesses planned for a pandemic, those that planned for the possibility of their offices being inaccessible for an extended period of time had systems in place for staff to work remotely, so were able to cope far better than those that were unprepared.
Practical Steps to Take To Avoid IT Downtime
You know the importance of being prepared for a disaster. Follow our step-by-step guide to ensure your business is ready to face the unexpected.
Step One: Business Impact Assessment
The first step in business continuity management planning is to perform an impact assessment which looks at the critical activities, processes and systems used within your business and considers how the business would be impacted if they were affected by a disaster.
- Identify all the systems used within your business and assess how your business would be impacted if these were unavailable.
- Consider different time frames: what would happen if your IT and operational technology systems were unavailable for a few hours, days or even weeks?
- What would happen if some of your data is lost forever? This will allow you to decide the Maximum Tolerable Period of Downtime and Maximum Tolerable Data Loss which are the points where disruption and loss of data become detrimental to business operations.
Maximum Tolerable Period of Downtime: This is your target timeframe for returning your systems to a working state from the point of a disaster occurring and is usually expressed as a Recovery Time Objective (RTO). For example, an RTO of 4 hours means that if a disaster occurred at 11am the affected systems must be available for use by 3pm.
Maximum Tolerable Data Loss: This is the point to which data must be restored, otherwise known as the Recovery Point Objective (RPO). An RPO of 1 hour means that if a disaster occurred at 11am, the recovered system must contain all the data that existed at 10am.
Step Two: Risk Assessment and Gap Analysis
The second step in business continuity management planning is to perform a risk assessment and gap analysis. This will determine the types of events that could occur and whether your existing capabilities allow the Recovery Time Objective and Recovery Point Objective to be met.
- Identify all the possible risks that could interrupt your business. What would be your proactive approach to respond to each of the risks identified?
- Do your current capabilities allow you to either mitigate the effects of the disaster or to restore your business operations within the RTO and RPO?
- If you’ve found a gap between the recovery objectives and what your existing capabilities can achieve then consider what additional control measures you need to help meet your targets. For example, to meet the RPO you might need to replicate all your data to a secondary location every hour.
Step Three: Create a Plan to Avoid IT Downtime
Now that the risks and business impact are fully understood, your third step is to create a disaster recovery plan that outlines the activities that will be performed in the event of a disaster. Depending upon the size and complexity of your organisation and the types of disasters you want to be protected against, it may be sufficient to have one plan that covers everything, or you might need multiple plans each covering a specific system or a phase of the recovery activities.
- Start with the biggest risks. Some types of disasters are more likely to occur than others so prioritise these events first.
- Design the plans around the consequences of the disaster rather than the disaster itself. There are many disasters which could stop your employee productivity in the office, but they have the same impact.
- Create a phased response. Your priorities immediately after a disaster will be different to those in subsequent days, so structure your response accordingly.
- Identify who is responsible for carrying out each activity and make sure everyone has a backup in case they aren’t available
- Have clear lines of communication with your employees, suppliers and customers. Nothing is worse than misinformation and rumours.
- Make sure your plan will be accessible when you need it. You don’t want the only copy of your plan to be lost due to the disaster or to be held by someone who’s on holiday.
Step Four: Test and Review
The fourth and final step in business continuity management is to check your plans work as intended and will meet your long term recovery objectives. After all, how do you know if the plan will work if it’s never been tested?
There are a few different ways we can gain confidence in the plan, as follows.
- Conduct a desk-based exercise where key staff meet and discuss how they would execute the activities in the plan when faced with a hypothetical situation. This will ensure that everyone is familiar with the activities they will be expected to perform and to identify elements of the plan that may not work as expected in the scenario or are missing all together, so there is no lost productivity.
- Test that your control measures work as expected. Your plan relies on these measures, so check that backup generators work, backups can be restored, and data is being replicated to a secondary location.
- Finally, conduct a live exercise where you execute the activities in the plan in response to a hypothetical scenario – without relying on any resources that would have been lost as a result.
These three options will give you valuable insights into how prepared your business will be to deal with IT downtime and to fix any weaknesses. You might even identify problems that you weren’t aware of. This test and review process shouldn’t be a one-off event: it’s important to review the arrangements at least annually or whenever there have been any significant changes.
How Ripley Solutions can help prevent IT downtime
We hope these tips help you see how business continuity management can prepare your business to deal with unexpected events.
If you need extra help, we can support you at all stages of your business continuity and disaster recovery plan, from assessing the landscape and planning to implementing control measures and testing your plans.
We’ll help you future proof your IT, along with sharing information and resources to support your IT and operational technology infrastructure.
Sign up here to learn more about avoiding IT donwtime