The dust has settled on the latest major Internet outage — this one due to problems with Amazon Web Services S3 cloud storage — and we’ve all moved on with our daily business. These things happen, right?
Yes, unfortunately, they do. But we still ought to learn a couple of important lessons from it and get smarter about how to protect our businesses. Let’s briefly look at what happened with this AWS outage, and what safeguards you should have in place for your business.
Human Error (As Usual)
In case you missed the news, on February 28, Amazon Web Services (AWS) suffered an outage with its Simple Storage Service (commonly known as S3), which took down significant portions of the Internet for more than four hours. From 12:37 p.m. Eastern until about 4:50 p.m., the issue impacted a wide range of websites and apps, including Expedia, Slack, Medium, even the U.S. Securities and Exchange Commission. Many cloud-based services use AWS, so if one app uses one of these services, a ripple effect occurs.
The source of the problem? According to Amazon:
“…an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”
In short, employee error. (GeekWire’s rundown offers more detail.)
Of course, AWS will institute changes to its procedures. But neither the use of humans nor automated software can ever guarantee some new problem won’t take out crucial servers.
How to Protect Your Business from Cloud Outages
Despite the general stability of AWS and many other cloud services, this latest outage should remind us that trusting them comes at a risk. Whether you need the services provide access to your data, operate critical components of your business or connect with customers, cloud services could represent a significant single point of failure.
You need to have your own processes in place to preserve your business. Two essential steps you can take:
- Back up data outside the cloud where you host — If your cloud data goes offline, what do you do? Backing up data to a third-party environment ensures that you still have access to your most critical data, no matter how long the primary service takes to get back up and running. Protect your cloud data just the same as you do (or should) data that you host on-premise: always have at least one of copy of data separate from the primary site or service.
- Plan and practice Disaster Recovery — Storing backup data is one thing; actually restoring it is something else. Knowing how to recover if cloud services go down preserves your company’s autonomy and lets you get back to business. Document who does what, and the specific steps associated with which vendors.
We hope your business suffered no serious impact from the AWS outage. If you would like to discuss how your business uses cloud services and assess the risks it faces, get in touch.