Just a short update about the outage we had yesterday. At about 5:30pm (GMT) yesterday, our automated systems alerted us that we had a problem and that systems where down.
Immediately we mobilised our team to investigate. Unfortunately we were hit by the major Amazon AWS outage that had affected many sites on the internet .
We worked on looking for a quick workaround or solution but a combinations of factors meant this wasn’t simple to do. We received criticism on twitter for not thinking about redundancy. Redundancy is something we take very seriously and continually work on improving this throughout our system. Some systems however are easier to build in redundancy than others, and this extensive AWS outage took out the redundant systems as well.
We worked till late into the night here and as Amazon was able to restore services to components of their platform we were able to shift some of the affected components onto new working servers.
Today we have been doing maintenance work to improve the system after these issues and will continue to do so. We will also continue to look at ways of improving our robustness and reliability and try prohibit such a severe issue from happening again.
Whilst we in no way are pointing the blame on Amazon (and feel they have done their utmost in restoring services) this outage was something unusual from such a large provider. We’ll take more steps to mitigate against this type of issue in the future.
This couldn’t have happened at worse time, being the Easter Weekend in Europe just after the team left the office, during the peak of business in the US. Please be assured the whole team was working their best to resolve the issue.
If any customers who have questions or further concerns, please contact us via the support or phone system on Tuesday. Myself (CTO) and our CEO (Stephen) will be available to speak with any customers personally and discuss anything you may wish to cover.
Again the utmost of apologies to all our customers. We pride ourself on our reliability and will be working harder to prevent issues like this happening again
