<img src="//bat.bing.com/action/0?ti=5739181&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">

DevOps

British Airways IT Outage: Lessons Learned? Maybe.

May 29, 2017 | Sabo Taylor Diab
Find me on:
British Airways IT Outage: Lessons Learned? Maybe.

 

Thousands of British Airways passengers worldwide have been stranded and many planes grounded due to a mysterious technical glitch. The airline is about to be thrown into even more turmoil as systems go offline, passengers get rerouted, luggage gets redirected, and fares demand refunding. We’re not even taking on-board (sorry) the crazy number of compensation claims that will surely pour in over the next few months, with latest estimations coming in at close to $128m.

 

The one question on everyone’s lips is, quite simply:

 

 

Was That Necessary?

 

What exactly caused this tech mess? Many excuses are flying around the net right now yet the official position is that a power outage was the primary cause of this chaos. Official Air Traffic experts are claiming that the problem, whatever it may be, is rooted in ‘bad tech’ and hit many parts of BA operations, leaving simple tasks like completing load sheets (which are essential for fuel calculations) impossible. So, an inevitable question needs asking; if ‘bad tech’ was the root of the problem, can ‘good tech’ be its savior?

 

Short answer? Maybe. There is a way to prevent this kind of collateral damage; solutions that have the power to predict these glitches in real time using machine learning , can save companies like BA millions of dollars, thousands of hours of unnecessary work, countless stressful minutes of downtime, and mainly the embarrassment!

 

 

The Offense - AI for Monitoring

 

Small, medium, and larger enterprises everywhere are quickly catching on to the world of AI and machine learning. They’re paying close attention to the benefits of developing AI-based solutions to help get a grip on the tech.  

 

Artificial Intelligence has the ability to transform many fields, IT monitoring included. AI mimics the workflow of humans, but with full visibility and entirely at scale. In an IT environment, automated AI-powered tools would be able to look at all the relevant fields in a log, across every stack, and cut through the noise to predict issues and get to the root of the issue, fast.

 

Surely it’s a no brainer, right? A simple implementation of a smart monitoring tool into an airline IT framework would not only flag issues before they escalate, but it would also tell the user what could be done about them. Not only do these kind of solutions effortlessly achieve results that usually take thousands of man hours, but they actually keep the business safe.

 

 

So, what went wrong with BA?

 

We have yet to hear BA’s post-mortem report, possibly due to the fact that BA has a very large IT infrastructure; with over 500 data cabinets spread across six halls in two different sites near its Heathrow Waterside HQ. Monitoring such an environment requires different tools and much effort from IT professionals / DevOps / SREs. God knows when they'll be able to get anywhere near the root cause.  

 

But the frustrating question about this whole situation is:

 

If airlines depend on tech to ensure the smooth running of their enormous operations, why are we not hearing superhero stories about how machine learning tech saved the day?

 

Whatever excuses are thrown our way over the coming days, weeks, or months, maybe it’s time for airlines like BA to follow the leaders of other leading industries and start implementing AI based machine learning tech into their daily operations to make sure these situations never come to light again.

 

What do you think went wrong with the BA systems? Do you think that this tech disaster could have been prevented? Were you caught up in this mess? We’d love to hear your thoughts, feel free to comment and share below.

 

 

Loom Systems delivers an AI-powered log analysis solution to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down. Schedule Your Live Demo here!

Tags: DevOps AI post-mortem British-Airways outage Airlines

Looking for more posts like this?

 

New Call-to-action

Measure ROI from IT Operations Tools

 

 

New Call-to-action

Gain Visibility into Your OpenStack Logs with AI

 

 

New Call-to-action

Lead a Successful Digital Transformation Through IT Operations