The almost 5-hour long breakdown of the Amazon AWS S3 servers in the US-EAST-1 region affected Yodel, as well as other major internet services all around the globe.
Amazon Web Services is among the largest providers of internet-based computing services, allowing companies to grow their computing power in a cloud-based setup. The outage was due to high error rates with S3 in US-EAST-1, so AWS on their service health dashboard. Further details about the issue were not given. The result was a widespread performance problem for websites and services big and small. Even AWS was not able to correctly update their own service health dashboard since it is hosted on AWS S3 and had to switch over to Twitter to inform their users.
Will update Twitter on this handle as we have new information.— Amazon Web Services (@awscloud) February 28, 2017
The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.— Amazon Web Services (@awscloud) February 28, 2017
S3 is experiencing high error rates. We are working hard on recovering.— Amazon Web Services (@awscloud) February 28, 2017
While some services shut down completely, thousands of online services had issues with their performances. Functions like uploading files in Slack was not possible, so Slack in their Tweet. Many other companies were not able to retrieve the necessary scripts and files to operate properly, also one of our better-known partners Twilio, according to Twilio’s Status page. This caused our disruptive service. At this point: Cudos to Twilio, who resolved their performance issues within a short amount of time and helped us being back up with the affected features without further delays!
As mentioned on venturebeat.com, other major companies, who were affected are: Adobe’s services, Atlassian’s Bitbucket and HipChat, Buffer, Business Insider, Citrix, Giphy, GitHub, GitLab, Heroku, IFTTT, Kickstarter, Lonely Planet, Mailchimp, Medium, Quora, Signal, Slack, Trello, Xero, and Zendesk, amongst others. Ironically, even the website “It Is Down Right Now” was affected by this extensive outage.
Not only websites and services had performance issues, but also IoT hardware was impacted. Many were unable to control these devices, as a result of the outage. Alexa from Amazon itself did not function accordingly. Users also reported in the comment section on social media platforms about various difficulties including not reacting to smart bulbs in their homes.
Around 10 AM PST, we realized that our partner Twilio had performance issues, which caused parts of our Yodel infrastructure to breakdown as well. We informed our customers as soon as possible via Twitter and mail and kept monitoring the situation during the outage. Only our voicemail feature was down the full length of the outage. The voicemail did record but the delivery of them was not simultaneously given.
We are experiencing outages from our backend providers and are working on a fix. Other major internet services seem to be affected, too.— Yodel.io (@yodeltalk) February 28, 2017
Around 1 PM PST the Amazon services began to return and about an hour later Amazon put an update out that said that AWS is fully recovered in terms of resolving the error rates it was seeing and S3 service is now “operating normally”.
A big relief went through the online community after the outage was resolved. And of course, Twitter was swamped with comments and reactions. The #awsoutage was trending.
Yodel — and I do believe a lot of other companies will too — approaches this issue with living by the famous line: What doesn’t kill you makes you stronger!
As already mentioned above, Amazon Web Service is 99.99% reliable and the last time something similar to this has happened, was back in 2012, which underlines the truthfulness of AWS’s promise. Nevertheless, we will work repeatedly towards a solution to avoid any future disruptions of our service. This includes spreading our server platforms even more and mirroring Java scripts to minimize the risk of being struck by events like this.
If you work with virtual clients, I’m guessing at one point or another you have found yourself …
The unimaginable happened today: Slack was down for about two hours!