So I’m a bit of a Football Fanatic. Not Soccer. American, Gridiron, Tackle Football.
Now that we’ve got that out of the way, you can ask what this has to do with Machine Learning, Artificial Intelligence and Logs? Hold on, we’ll get to that in a second.
First I’d like to throw back to January 23rd of this Year, in the Pivotal, Winner takes all game Between the Denver Broncos and the New England Patriots. These 2 juggernauts competed in the AFC (divisional) championship in which the Broncos won, 20-18, and advanced to win Super Bowl 50. It was possibly the game of the season between two amazing Quarterbacks, drawing over 50 million viewers. It is second-most watched AFC title game on any network in 39 years and the most-watched and highest-rated program on TV since Super Bowl XLIX.
The Patriots likely lost due to a technical malfunction!!!. Yup you read that right.
The NFL and Microsoft had an arrangement that all of the 30+ teams in the NFL will use the Microsoft Surface Tablet as their playbook and way of reviewing what plays just occurred in real time. A really cool concept if you think about it.
The problem is that they didn't work so well.
Many teams throughout the season were having issues with the tablets and even though Microsoft spent time and resources training the players and coaching staff how to use the tablets. Microsoft even paid the NFL $400 million for the teams to use the Surface Tablet as a publicity campaign.
Except that on January 23rd, those tablets left the New England Patriots with no working system, while the Denver Broncos scored what would be the differential points between the 2 teams. The Surface Tablets ceased to work for about 20 minutes during the Denver Broncos possession.
Microsoft were able to fix the issue and were even quoted saying "Our team on the field has confirmed the issue was not related to the tablets themselves but rather an issue with the network. We worked with our partners who manage the network to ensure the issue was resolved quickly."
Except that won’t help the 2015 Patriots.
Lets think about where the root of the problem was.
We know that 32 of the teams were using these Surface Tablets.
We know that there were multiple issues with these tablets, and that they tended to stop working momentarily and needed a reboot.
Maybe the root cause was that the NFL moved forward with Tech that wasn't ready to move into production?
Maybe the NFL teams needed to have A backup plan in the means of a hard copy playbook?
Think about the monetary implications there! We are talking about millions of dollars that were lost in advertising, tickets, merchandise etc for both the Patriots as a team and the Individual players. I’m not even sure that you can put a price tag on that. In fact, Bill Belichick, Head coach of the patriots was quoted saying this past week after having additional tablet issues in their game against the Buffalo Bills “I'm done with the tablets. I've given them as much time as I can give them.” You can hear the frustration in his voice. At least he gives a shout out to his IT team who did a great job to try and find a workaround.
Microsoft were even able to confirm that the problem was not related to the tablet specifically but rather the network they were working on.
Any NFL team works like a company. From the website that sells merchandise and tickets, to the hundreds of employees who maintain the field, or uniforms or payroll or IT. Yeah the IT team is one of the most important teams on the field. They maintain scores of systems and need to be able to diagnose each issue quickly, in real time, and resolve them ASAP.
Now this is just speculation, If the NFL would have had a root cause analysis system to identify the problem, and offer a solution, maybe the resolution would have taken less time and identified earlier. One of the biggest issues in the IT world is preventing the Root problem from the start.
Its no secret. Prevention is the best treatment. If I can prevent hundreds or thousands of incidents, I will save my organization money and avoid devastating losses. My overall goal in Devops is not just putting out existing fires when my production environment starts to have issues, it’s to make sure that those issues never turn into a raging fire.
Any fire can be put out with a glass of water. It just depends what phase you catch the fire at.
Loom Systems delivers an AIOps-powered log analytics solution, Sophie,
to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and
recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down.
Get Started with AIOps Today!