dev stuff

AWS Outage — When there is Thunder in the Cloud!

On Tuesday, February 28th, 2017, as you already know, Amazon Web Service — better known as AWS — had a large-scale outage. According to AWS, the S3 service has a 99.99% reliability. Yesterday the 0.01% has happened.

Silke English

· 4 min read

The almost 5-hour long breakdown of the Amazon AWS S3 servers in the US-EAST-1 region affected Yodel, as well as other major internet services all around the globe.

Amazon Web Services is among the largest providers of internet-based computing services, allowing companies to grow their computing power in a cloud-based setup. The outage was due to high error rates with S3 in US-EAST-1, so AWS on their service health dashboard. Further details about the issue were not given. The result was a widespread performance problem for websites and services big and small. Even AWS was not able to correctly update their own service health dashboard since it is hosted on AWS S3 and had to switch over to Twitter to inform their users.

While some services shut down completely, thousands of online services had issues with their performances. Functions like uploading files in Slack was not possible, so Slack in their Tweet. Many other companies were not able to retrieve the necessary scripts and files to operate properly, also one of our better-known partners Twilio, according to Twilio’s Status page. This caused our disruptive service. At this point: Cudos to Twilio, who resolved their performance issues within a short amount of time and helped us being back up with the affected features without further delays!

As mentioned on venturebeat.com, other major companies, who were affected are: Adobe’s services, Atlassian’s Bitbucket and HipChat, Buffer, Business Insider, Citrix, Giphy, GitHub, GitLab, Heroku, IFTTT, Kickstarter, Lonely Planet, Mailchimp, Medium, Quora, Signal, Slack, Trello, Xero, and Zendesk, amongst others. Ironically, even the website “It Is Down Right Now” was affected by this extensive outage.

Not only websites and services had performance issues, but also IoT hardware was impacted. Many were unable to control these devices, as a result of the outage. Alexa from Amazon itself did not function accordingly. Users also reported in the comment section on social media platforms about various difficulties including not reacting to smart bulbs in their homes.

Around 10 AM PST, we realized that our partner Twilio had performance issues, which caused parts of our Yodel infrastructure to breakdown as well. We informed our customers as soon as possible via Twitter and mail and kept monitoring the situation during the outage. Only our voicemail feature was down the full length of the outage. The voicemail did record but the delivery of them was not simultaneously given.

Around 1 PM PST the Amazon services began to return and about an hour later Amazon put an update out that said that AWS is fully recovered in terms of resolving the error rates it was seeing and S3 service is now “operating normally”.

A big relief went through the online community after the outage was resolved. And of course, Twitter was swamped with comments and reactions. The #awsoutage was trending.

What is the aftermath of the AWS S3 outage?

Yodel — and I do believe a lot of other companies will too — approaches this issue with living by the famous line: What doesn’t kill you makes you stronger!

As already mentioned above, Amazon Web Service is 99.99% reliable and the last time something similar to this has happened, was back in 2012, which underlines the truthfulness of AWS’s promise. Nevertheless, we will work repeatedly towards a solution to avoid any future disruptions of our service. This includes spreading our server platforms even more and mirroring Java scripts to minimize the risk of being struck by events like this.

Get a Free Trial

Got another minute? Read up on our other blog posts:

© 2021 YodelTalk – All rights reserved.