<img src="//bat.bing.com/action/0?ti=5739181&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">

IT Operations

The Marketo Outage: 4 Key Takeaways Before it Happens to You

July 30, 2017 | Sabo Taylor Diab
Find me on:

marketo.jpg

Marketing leader Marketo had a challenging few days last week after failing to renew their domain name. A lot has been said about the recent Marketo outage (#marketodown #marketopocalypse) from the hilarious to the critical and the supportive. I have nothing extraordinary to add, but to tie it to the series of outages we covered lately in our blog (BA & Gitlab) and try to offer some takeaways to you!  

So, in a nutshell, what happened?

On July 25, Marketo’s website was down from 7:41 am EST (4:41 am PST) through at least 9 pm EST (6 pm PST). During the next 48 hours, some users still couldn’t access the website.

 

Fortunately (or sadly, depends if you’re Marketo’s CEO)  one of the Marketo customers, Travis Prebble, a domain name specialist, quickly figured out the problem – the company had somehow failed to renew its domain and its registration had expired. Travis acted quickly and paid for the registration and reinstatement fees, resolving the cause of the issue. In our business, having an end user to provide and solve the root-cause on an issue is a NO-NO!

 

Gradually Marketo’s services came back online as the change propagated through the domain name system.

 

Now for the takeaways:

 

1) Transparent and Timely Communication

 

To be fair, Marketo’s CEO, Steve Lucas, led the company’s communication at this sensitive time, bravely. Not at first, but over the course of the day, he provided timely updates, which reassured customers that the problem was the company’s highest priority. In one of Steve Lucas’s tweets, he updated that “Resolving DNS issues re: our site and I profusely apologize to everyone. No excuses, just fixing”, taking a direct and transparent approach which customers appreciated.

 

 

2) MTTR: the Only Option is a QUICK One

Marketo is used by more than 9 thousand domains, which means that thousands of marketers were struggling (and not succeeding) to do their jobs. As with all software platforms, absolute minimal downtime, or no downtime,  is critical to maintain customer loyalty and stickiness. With Marketo and other Marketing Automation platforms, this need is magnified, as their systems are so embedded in many marketers’ day-to-day work processes that during the downtime users can’t do their job (Proof: the hilarious Twitter exchanges that resulted from marketers having too much time on their hands). It took at the minimum 13 hours and up to 48 hours for users to get their operations back, waiting for local DNS servers to update. That amount of downtime means large amounts of lost revenue for thousands of businesses.

 

 

3) Automation is Your Friend - Data Silo is Your Enemy

 

The tech world has become so complex that manual system management just doesn’t cut it anymore. When an organization has thousands of assets that need to be monitored and a limited number of staff members to do so, mistakes will happen. Automation and the ability to correlate issues between different IT assets, such as Applications, Infrastructure, Middleware and Virtualization Layers, have an enormous impact on the continuity of any business.

 

 

4) If You’re Big and Complex Enough, it Will Happen to You

 

Large-scale systems with a massive number of moving parts are going to suffer from “black swan” events — something terrible that happens unexpectedly, often triggered by something relatively minor, but with disastrous ripple effects. Companies such as Google, Amazon, and Netflix have also suffered outages and other disasters, showing that it can happen to the best-of-breed. You can (and should) put processes and technologies in place to master the impact of these events at scale.

In conclusion, looks like Marketo has weathered the storm and might come out of it even stronger. Not many companies have such a strong brand equity which can help them through such disasters. Even though the downtime had a large negative impact on customer operations, the community of Marketo showed the brand overwhelming support and was rooting for the company to get back up and running. Ultimately it’s how you handle the crisis once it’s happened that will have the lasting effect.

References

 

[1] Kieren McCarthy, Marketing giant Marketo forgets to renew domain name. Hilarity ensues, July 26, 2017

[2] Scott Brinker, 3 thoughts on Marketo’s domain outage this week, July 28, 2017

[3] Dayna Rothman, The Marketo Meltdown And The Holy Grail Of SaaS Stickiness, July 28, 2017

 

Loom Systems delivers an AI-powered log analysis solution to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down. Schedule Your Live Demo here!

Tags: IT Operations Monitoring root cause analysis

Looking for more posts like this?

 

New Call-to-action

Measure ROI from IT Operations Tools

 

 

New Call-to-action

Gain Visibility into Your OpenStack Logs with AI

 

 

New Call-to-action

Lead a Successful Digital Transformation Through IT Operations