<img src="//bat.bing.com/action/0?ti=5739181&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">


Production Monitoring: Admitting There's a Problem, Then Crushing It

In 1894, the Times of London predicted that by 1950 the streets of London would be buried under nine feet of manure. We can only imagine how high the manure would be today if those fine folks had their way.

Happily, this particular threat to civilization never came to pass.

Only 14 years after this dire prediction was made, the Ford Model-T automobile was unleashed upon the streets of London, and horse-drawn carriages and their smelly by-product made way to a (mostly) manure-free society.

The takeaway?
The problems we had aren’t the problems we have.
And neither are the solutions.


115 years later, in 2011, the #monitoringsucks hashtag was exploding onto the production monitoring and DevOps scenes. Drawn to this newly formed community were those relieved to learn they were not alone in feeling that their monitoring systems have become the bane of their existence (or, at least, a bane).

The hashtag soon became both an emotional outlet, and a nexus for forward-thinkers to brainstorm and come up with next-gen solutions for the betterment of all mankind.

It ended up inspiring its own hackathon and BoF session, and can reasonably take the credit for sparking the first honest public debate on the topic of production monitoring (and its suckage).

Tellingly, the hashtag remains highly active to this day, 4 years after its inception.

Even though the last few years saw a huge surge in intelligent and well-designed tools and improved monitoring tactics, we are still far from seeing a truly disruptive solution to the problems plaguing the field.

So what’s going wrong? Let’s break it down:


Problem #1:
The amount of manual configuration is too damn high.

Why, [insert principal object/s of your particular faith here], why?!

The amount of manual slavery we subject ourselves to is quite staggering, and everybody’s acting like this is how things are supposed to be.

Absolutely nothing is truly out-of-the-box.

What happened to Plug-and-Play?

All the tools in our monitoring stack require varying levels of both one-time and continuous configuration.

Teams need to write plugins, codes and scripts, set-up and maintain configuration files, requesting new information and tweaking the way they're exposed to the old, and generally investing a LOT of time in keeping the system operational, useful and precise.

All this enslavement is taking place before, during and after (in case of terminal interdependencies) our work with them.

The solution?
Automate all the things!

LineSeparatorProblem #2:
The tools we use are too numerous and out of sync with industry needs.

baloneysandwich-1437367824 We hungry. For actionable, contextual insight into our monitoring stack.

We love our stack, but let’s face it: it’s kind of an abusive relationship. We put up with so much, and get back not-quite-enough.

After pouring in hours, weeks and months of manual tweaking, you’d expect not to have to face with a stack that’s inflexible, impersistent and poorly integrated, and with information that is granular, inconsistent, non-contextual, un-actionable, and providing a sensory experience ranging from complete obtuseness to an overbearing informational extravaganza.

And all we asked for was a sandwich.

The solution?
One good tool, that gives you exactly what you need.

LineSeparatorProblem #3:
They’re super expensive.

‘Nuff said.

Actually, there’s a little bit more to say: in the current state of the industry there is no reason for a monitoring stack to be super expensive.

photo Sure, go ahead, we weren't using that.

Let’s repeat that for the guys in the back: there is no reason for a monitoring system to be super expensive.

Recent years’ technical advances in storage and big data mean that what once cost a boatload of money can now be bought for a fraction of what it used to.

That is, unless some of the companies providing these services are purposefully creating technical gaps in their products that mandate users pay for costly features, professional services and support, thus leaving prices at an unnecessarily high bracket. But of course, we wouldn’t dream to suggest that actually happens. That’s pretty awful. No way anyone is doing that. Nope.

The solution?
Switching to services that respect our business.


The problems we listed are the obvious, in-plain-view issues.
Those things are easy. We’re not here for easy.

If we want the operation to succeed, we need to cut deep.
We need to start looking at the less-than-obvious issues.

We call these “emergent issues”: issues that are the result of our underlying, deep-rooted, axiomatic modes of operation, birthed from a mishmash of different, sometimes innocuous, factors.

As such, they are harder to detect, harder to resolve, and therefore far more dangerous. The emergent issues are the ones that must be identified and addressed before the field of production monitoring can take its next giant leap forward.


Emergent Issue #1:  Bias Bloat

Our monitoring stack is built in such a way as to feed our existing biases, rendering us helpless in the face of pitfalls and dangers.

Some of these are common human biases which our monitoring tools tap into readily, such as the Anchoring Effect and Change Blindness.

Others are even more subtle still, such as semantic and structural decisions in our toolsets which cause us to behave in ways that are counterproductive. For example, even the mere use of the word “monitoring” subtly insinuates that this toolset was built to be reactive and observational, rather than a proactive, take-charge solution for HEROES.

Our tools also rely entirely on numbers to tell our machine’s stories, when the reality has become so subtle and complex as to require a huge evolutionary leap in narrative capabilities.

The good news? We’re already there.

LineSeparatorEmergent Issue #2: Hysterical Homosapiens

Try as we might to become a borg-like, hive-mind society, we're not quite there yet.

Until that day comes, we need to come to terms with a basic fact about production monitoring - decisions are still being made by humans, and humans are different than one another. Disparate levels of competence, along with different ideas about what a monitoring stack actually needs to do, has turned our stack into a mess.

Add to that the adverse psychological effects of many of our tools - Alarm Fatigue, Analysis Paralysis, Information Pollution, Information Overload and Ego Depletion, just to name a few. These lead humans to become easily distracted, and to obsess, in their hazy professional stupors, over what they deem important and noteworthy, while blatantly ignoring painful and/or boring truths.

LineSeparatorEmergent Issue #3: Crushing Conatus

Conatus is the name given to any system’s innate inclination to continue to exist and enhance itself, rather than change course or die out. It’s a fun term to incorporate into day to day life, as it’s effective in helping identify things that are moving ahead by inertia, rather than merit. 

So how are our monitoring stacks conatusing?

Easy. We’ve become too reliant on integration of countless micro-services, so much so that we’ve entangled our systems with each services’ axioms, necessities and specific modus operandi.

These have also caused us to lose sight of what may be the most important thing to keep in mind: monitoring is not about you - it’s about your customers.

When you spend so much time, money and effort into solving your own issues, it’s hard to keep track of the fact that it all boils down to giving your customers the service they need, want and deserve.

Lo and behold, that sometimes even means not solving everything. Sometimes it means putting more effort into issues that seem innocuous to you but have dreadful effect on their experience, and the other way around.


When we use our monitoring stack as a hammer, everything looks like a nail.
We need solutions that are so versatile and contextual as to never become hammers.
Services that learn, grow, change, know when we need them and how to help.

Services that are, in a very real sense, more like people.
Like really, really awesome people.


Contact Us

[contact-form to='beerit@loomsystems.com'][contact-field label='Name' type='name' required='1'/][contact-field label='Email' type='email' required='1'/][contact-field label='Website' type='url' required='1'/][contact-field label='Comment' type='textarea' required='1'/][/contact-form]



Loom Systems delivers an AIOps-powered log analytics solution, Sophie, to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down.
Get Started with AIOps Today!

Looking for more posts like this?