At a baseline, let’s first come to an agreement of what “Enterprise Ready” means. As a storage consultant and IT generalist with a specialty in cloud architecture, I would define enterprise ready as an environment with the following characteristics:
No surprises here: we know and understand the environment’s behaviors during any stress point.
Availability, measured in uptime, indicates how many nines are supported and in general the practices that need to be in place to guarantee a highly available environment.
The performance of the environment should be dependable and we should be able to set clear expectations with our clients and know which workloads to avoid.
There should be a help line with somebody reliable to back you up in knowledge and expertise.
We should know where we can grow and by how much.
The environment should so low-maintenance as to be a “set it and forget it” type of experience.
How to Get There: Artificial Intelligence
Now that we know the characteristics and their meanings, the question is, how do we make our open source environment enterprise ready? Let’s take it one at a time. Hint: artificial intelligence can help at every turn.
To make your OpenStack environment enterprise ready, you need to perform a wide range of testing to discover functionality during issues, failures, and high workloads. At KIO Networks, we do continuous testing and internal documentation so our operations teams knows exactly what testing was done and the environment’s behavior.
Artificial Intelligence can help by documenting historical behavior and predicting potential issues down to the minute that our operations team will encounter an anomaly. It’s the fastest indication that something’s not running the way it’s supposed to.
To test high availability, we perform component failures and document behavior. It is important to fail every single component including hardware, software, and supporting dependencies for the cloud environment like Internet lines, power supplies, load balancers, and physical or logical components. In our tests, there are always multiple elements that fail and are either recovered or replaced. You need to know your exposure time: how long does it take your team to both recover and replace an element.
AI-powered tools complement traditional monitoring mechanisms. Monitoring mechanisms need to know what your KPIs are. From time to time you may encounter a new problem and need to establish a new KPI for it alongside additional monitoring. With AI, you can see that something abnormal is happening and that clarity will help your administrators hone in to the issue, fix it, and create a new KPI to monitor. The biggest difference with an AI-powered tool is that you’re able to do that without the surprise outage.
Really, this is about understanding speed and either documenting limitations or opting for a better solution. Stress testing memory, CPU, and storage IO is a great start. Doing so in a larger scale is desirable in order to learn breaking points and establish KPIs for capacity planning and, just as important, day-to-day monitoring.
Do you know of a single person who would be able to manually correlate logs to understand if performance latency is improving based on what’s happening now compared to yesterday, 3 weeks ago, and 5 months ago? It’s impossible! Now, imagine your AI-powered platform receiving all your logs from your hardware and software. This platform would be able to identify normal running conditions and notify you of an issue as soon as it sees something unusual. This would happen before it hits your established KPIs, before it slows down your parallel storage, before your software-defined storage is impacted, and before the end user’s virtual machine times out.
We emphasize the importance of continuously building our expertise in-house but also rely on certain vendors as the originators of code that we use and/or as huge contributors to open source projects. It’s crucial for businesses to keep growing their knowledge base and to continue conducting lab tests for ongoing learning.
I don’t expect anyone to build their own AI-powered platform. Many have done log platforms with visualization fronts, but this is still a manual process that relies heavily on someone to do the correlation and create new signatures for searching specific information as needed. However, if you are interested in a set of signatures that’s self-adjusting, never rests, and can predict what will go wrong, alongside an outside team that’s ready to assist you, I would recommend Loom Systems. I have not found anything in the market yet that comes close to what they do.
When testing growth, the question always is, what does theory tell you and what can you prove? Having built some of the largest clouds in LATAM, KIO knows how to manage a large-volume cloud, but smaller companies can always reach out to peers or hardware partners to borrow hardware. Of course, there’s always the good, old-fashioned way: you buy it all, build it it all, test it all, shrink it afterwards, and sell it. All of the non-utilized parts can be recycled to other projects. Loom Systems and its AI-powered platform can help you keep watch over your infrastructure as your human DevOps teams continue to streamline operations.
Every DevOps team wants a set-it-and-forget-it experience. Yes, this is achievable, but how do you get there? Unfortunately, there’s no short cut. It takes learning, documenting, and applying lessons to all of your environments. After many man hours of managing such an environment, our DevOps team has applied scripts to self-heal and correct, built templates to monitor and detect conditions, and set up monitors to alert themselves when KPIs are being hit. The process is intensive initially, but eventually dedicated DevOps teams get to a place where their environment is low maintenance.
The AI-powered platform from Loom Systems helps you by alerting you of the unknown. Your team will be shown potential fixes and be prompted to add new fixes. As time goes by, the entire team will have extensive documentation available that will help new or junior admins just joining the team. This generates a large knowledge base, a mature project, and also a lower-maintenance team.
All serious businesses should enjoy the benefits of running a predictable, highly available, fast, well supported, easily expandable and low-maintenance environment. The AI-powered platform built by Loom Systems takes us there much faster and gives us benefits that are usually reserved for huge corporations. Just as an example, if you’re the first in the market offering a new product or service, you can feel confident with Loom Systems that they’ll detect problems early and give you actionable intelligence so you can fix them with surgical precision.
It’s been a pleasure sharing my learnings with you and I look forward to hearing your feedback. Please share your comments and points of view - they’re all welcome!