BA, Microsoft and TSB suffer IT meltdowns, what can be learnt from these disasters?
In insight / By Mark Flynn / 11 September 2018
BA, Microsoft and TSB have all recently experienced spectacular IT failures and all of them seem to have exhibited the classic failing of poor planning, so let's reflect on each of these incidents and see what can be learnt.
Microsoft Azure is a cloud computing service that is used in a lot of Microsoft’s products such as Office 365. Unfortunately, Azure suffered a massive failure that lasted for over 24 hours, due to bad weather.
A lightning strike took down the cooling systems of one of Microsoft's South Central US data centres which forced it to shut down. This loss of a data centre didn’t just affect America but also Europe and South America. After bringing the data centre mostly back online, services continued to experience issues for several days.
This outage was compounded by the release of a faulty patch which smacked of a rushed response to an unforeseen issue which in reality is always a possibility.
This issue shows the underlying problem of outsourcing your IT to a single vendor, organisations should plan in additional redundancy. Don't forget outsourcing your IT doesn't mean you'll never suffer from IT related issues ever again, however big and impressive the firm you are outsourcing to is.
The next piece of news was the continuation of the TSB saga, in this case, Paul Pester has stepped down from his role as CEO due to the problems with their new IT systems role out.
In 2015, TSB were taken over from the Lloyds Banking Group by the Sabadell Banking Group. After several years of continuing to use Lloyds banking software TSB decided to switch to Sabadell's platform.
In April they switched customer records over to the new system but this resulted in their online systems going into meltdown. Most customers were completely unable to do any kind of online banking and some customers were shown other customers information.
The big problem that seems to have surrounded this switchover is complacency. Because they were using a copy of Sabadell’s software it was assumed the migration would go smoothly. If you’re making a big switch you can never assume it will all go to plan. Failing fast is useful in some areas but taking your time to get it right the first time is most important.
British Airways is the most recent failure when it's website was hacked affecting 380,000 transactions. No passport information was stolen but information used to pay for holidays online such as credit card numbers and critically CVV codes were stolen. BA swiftly contacted those affected but faces large fines due to the scale of the breach.
If there’s one big takeaway from these three incidents it’s the importance of having a plan when your IT fails.
BA is facing a potentially huge fine of £500 million but perversely had a rather impressive plan to deal with their failure. The hack was identified within 15 days (hacks like this often go unnoticed for months), ICO (GDPR) were notified of the security breach within 72 hours and a full official apology, fronted by their CEO Alex Cruz, was made across national newspapers, radio, and television on the Thursday and Friday. This is not to say that BA are off the hook, they failed to protect their customers from hackers and such a spectacular failure cannot go unpunished.
TSB and Microsoft however, clearly failed in planning for failure in their systems. TSB failed to do the proper tests which would have revealed the problems ahead of time and then when the failure took place, demonstrating a level of arrogance in their communication to their clients which ultimately caused Paul Pester to lose his job.
At Microsoft, it was assumed that the Azure services would be fine in the event of one of their data centres failing but obviously, that plan didn’t work out and then in scrabbling to fix the issue they made it worse.
So in summary, IT systems will always be vulnerable to issues so always have a plan for their failure and how you communicate that to your customers is key.
For more information on anything in this article, please contact me.