A Warning for Companies Without a Disaster Recovery Plan
What are RTO and RPO, how to build a backup strategy, and why DR drills matter. A practical guide to being prepared for system outages.
Most companies find out their backup strategy is broken during an actual incident. Until that moment, “we have backups” feels like a complete answer. Where are they stored? How often are they taken? Have they ever been tested? Who runs the recovery, and how long will it take? These questions rarely get asked until the system is down and the pressure is on.
This isn’t meant to alarm — it’s a practical framework. Let’s clarify what disaster recovery planning actually involves and what a strategy that genuinely works looks like.
RTO and RPO: Two Concepts You Need to Understand
Disaster recovery planning rests on two fundamental metrics. Without understanding them, you can’t make the right architectural decisions or know whether your current setup is adequate.
RTO — Recovery Time Objective
The maximum acceptable amount of time your system can be offline after an incident before the business impact becomes unacceptable. An RTO of four hours means you’re committing to restoring service within four hours of a failure.
In practice, RTO answers the question: “How long can we afford to be down?” For an e-commerce platform, the tolerance is very low. For an internal reporting tool, it might be much higher.
RPO — Recovery Point Objective
The maximum acceptable amount of data loss, measured in time. An RPO of one hour means that in the worst case, you’re prepared to lose up to one hour’s worth of data.
In practice, RPO determines how frequently you need to back up. If your RPO is one hour, your backups must run at least hourly.
You can’t write a meaningful disaster recovery plan without first defining these two numbers — because they directly determine what technology and architecture you need.
The Most Common Disaster Recovery Mistakes
Backups Exist But Are Never Tested
This is the most dangerous assumption in infrastructure management. The existence of a backup does not mean that backup can be successfully restored. The backup process might have been silently failing. The restore procedure might be undocumented. A partially corrupted backup might only reveal the problem when you try to use it under pressure.
Having untested backups is like having a fire extinguisher you’ve never inspected. It might work when you need it. Or it might not.
Backups Stored in the Same Location as the Data
If your primary database is in one AWS region and your backups are in the same region, a regional outage takes out both. This happens more often than it should. Backups need to be physically and logistically separate from the systems they protect.
A different cloud region is the minimum. A different cloud provider entirely provides stronger isolation. The goal is to ensure that whatever failure scenario knocks out your primary system cannot also take out your backups.
No Runbook — “We Know What to Do”
A crisis is the worst time to figure out who does what. If the recovery procedure exists only in one or two people’s heads, your system is only as resilient as those people’s availability and composure under pressure.
A runbook is a simple document: step-by-step instructions for who gets notified, who makes the call to initiate recovery, what the exact steps are, which tools and credentials are needed, and where to find them. Having this open during an incident saves hours and prevents costly mistakes.
Infrequent Full Backups as the Only Strategy
A daily full backup sounds adequate until you encounter a situation where it isn’t. For large datasets, full backups take significant time and can impact system performance. More importantly, if your RPO is measured in hours rather than days, a once-daily backup is insufficient.
Incremental backup strategies — backing up only what has changed — allow for much more frequent snapshots without the overhead of full backups every time.
How to Build a Practical Disaster Recovery Plan
Step 1: Define Your RTO and RPO
This is a business decision, not a technical one. Sit down with the business stakeholders and answer the questions: How long can each system be offline? How much data can we afford to lose? Different systems will have different answers. Document them.
Step 2: Automate Your Backups
Manual backups are unreliable. Humans forget, get busy, or skip steps under pressure. Automated, scheduled backups remove the human variable. Whether you’re using AWS RDS automated backups, S3 versioning, or cron-based database dumps, the process should run without anyone needing to initiate it.
Step 3: Store Backups in a Different Location
Automate the replication of your backups to at least one different region or provider. AWS S3 Cross-Region Replication, GCP Multi-Region Storage, or exporting to a separate provider are all viable approaches. The requirement is simple: whatever takes out your primary system should not also take out your backups.
Step 4: Write the Runbook
Don’t wait for an incident to document your recovery procedure. Write it now, while you’re calm and have time to think clearly. Cover: who gets alerted and how, who has the authority to initiate recovery, the step-by-step recovery process, and where to find every tool and credential you’ll need. Keep it somewhere the whole technical team can access.
Step 5: Test Quarterly
Aim for at least two to four DR drills per year. These don’t need to be full production failures — a simulated restore in an isolated environment is sufficient. The gaps you find during a drill, under no pressure, are far less costly to address than the gaps you find during an actual incident.
A Realistic Assessment
You don’t need to be a large enterprise to need a disaster recovery plan. In fact, smaller companies are often more exposed — they lack the dedicated infrastructure teams and redundant systems that large organisations take for granted.
Every company will experience a system incident at some point. The question isn’t whether it will happen, but whether you’ll be ready when it does. “It won’t happen to us” is the most expensive assumption in infrastructure management.
If you’d like to review your current disaster recovery strategy or build one from scratch, a free discovery call is a good place to start. We can help you work through RTO and RPO definitions, assess your current backup posture, and put a practical plan together.
Found this useful?
If you want to take concrete steps on your technology decisions, let's talk. First call is free.
Book a Free Discovery Call