<img src="//bat.bing.com/action/0?ti=5739181&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">

IT Operations

You think your team focused on MTTR, but they're really just after TTI

 

hour

Recently, I had an interesting conversation with a big shot from a large financial organization. We spoke about the various issues and problems for Fintech nowadays and more specifically complexities in their system. The conversation was then shifted to talking about how we could potentially try and tackle such issues. That was the point where I was first introduced to the concept of TTI.

 

Let’s try and simplify:

 

In every transaction, there are two sides of the coin: on one side stands the customer and on the other, the organization. The end-game is mutual- keep the customer as satisfied and as active as possible. Organizations invest a lot of effort in defining SLAs, orchestrate service and help centers, conduct surveys and are constantly aiming to improve customer satisfaction and loyalty.

 

The customer really does not care about what happens in the organization. He interacts through various channels and wants the transaction to go through smoothly to remain a happy, loyal and most importantly- undisturbed customer. On daily basis, ordinary scenarios are actions performed without even putting a second thought into it: 

Swipe a credit card – purchase made; Withdraw cash through branch or ATM - $$ in hand; Internet transfers, checking balance, online trading, and the list goes on.

 

The problem begins when the customer starts to face issues, and from here things can quickly go sour:

 

A Credit card transaction – unexpectedly declined though there is enough credit available on her credit line so the customer is (very) upset; the Banking website is down – can’t access balances or can’t make payments and only a 404/ Sorry for the inconvenience, please come back later' error appears.

 

At this point, the customer calls the help center and reports this issue, by now, There are already service tickets being raised. Priority escalations. IT and Ops teams are scrambling.

 

You get it… I don’t need to explain this.

 

Every time a situation like this arises, someone, somewhere, is up awake at night or in the day trying to put out the fire and solve the issue as soon as possible: various teams are working on simultaneously – database, infrastructure, network, connectivity, hardware, firewall, banking app… So many moving parts and these things keep changing as that’s the world we live in today.

 

When things go well, all teams can go on with day to day tasks. But when things start not to function as they should, fingers start getting pointed elsewhere. But who takes responsibility? Is it a problem at one part of this entire complex stack or is it a combination of factors?

 

It all boils down to TTI: the need to prove your (and your team’s) total time to innocence, how fast you can prove that this is not your team’s problem.

 

Or In other words, as we previously wrote, let’s stop playing the blame game.

 

And then it goes from there on: 

 

  • I did not write a bad code and it went through QA. All integrations were thoroughly tested.
  • Infrastructure problems? It can handle anything and it’s meant to scale with high volume of transactions.
  • Security firewalls are absolutely fine. No issues in communication ports.
  • Application is working absolutely fine.

And, then, it's always the fault of you know who.

b.jpg

 

Let’s face it. As things simplify, ironically, the applications supporting the simplification get more and more complex and anything in that chain could cause an entire application to stop functioning in the intended way. So rather than trying to prove it’s not “my” problem, let’s try and solve the problem irrespective of where it comes from.

 

There are various tools that can help you search through the logs and look for answers if you already know where the problem is (as the logs are the most granular form of information telling you what’s really happening across your digital enterprise).

When leveraging Machine Learning and AI based tools, they help learn about these problems as they come along and highlight anomalies and abnormalities which can help detect the root cause of the problem and reduce dramatically your mean time to resolution (MTTR). The advantage is you need not have set up specific thresholds or specific parameters you want to monitor – the Machine learning starts learning your system based on unsupervised learning and leverages your feedback to convert the unsupervised learning to a supervised learning or feedback mechanism to sharpen it’s skills and act as your intelligent agent.

 

All of this ultimately translates to business implications – loss of revenue or increased costs / overheads and not to mention the loss of customer loyalty. So next time a system glitch occurs, the automatic blame game commences, stop for one second and think: What are we really after? let's start focusing on reducing the MTTR, and ont only how to get down to the source of the blame. 

After all, it's not always the network! let us not forget, that as Stephen King mentioned "the trust of the inoocent, is the liar's most useful tool."

 

Want to read more about reducing MTTR using AI? download our whitepaper now: 

New Call-to-action

 

 

Loom Systems delivers an AIOps-powered log analytics solution, Sophie, to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down.
Get Started with AIOps Today!

 

New Call-to-action

Measure ROI from IT Operations Tools

 

 

New Call-to-action

Gain Visibility into Your OpenStack Logs with AI

 

 

New Call-to-action

Lead a Successful Digital Transformation Through IT Operations

 

Looking for more posts like this?