APM and Application Stability: Where Two Monitoring Roads Merge and Diverge

In software development, application performance management (APM) is one of the grown-ups in the room. Not only has APM been around for a long time, but its solutions have evolved over several generations, making it one of the more mature product categories.

APM’s longevity makes perfect sense when you consider its fundamental purpose, which is to provide organizations with a way to understand the performance characteristics of their software. Value is delivered by alerting infrastructure teams when applications are performing slowly or poorly, and APM has the vast experience to do this job well.

If APM is one of the old-timers in software development, then application stability is the new kid in town. With the rise of mobile apps and iterative development releases, application stability has answered the widespread need to monitor applications in a new way, shifting the focus from servers and networks to the customer experience.

The emergence of application stability has caused some consternation for die-hard APM fans. However, these two solutions embody very distinct monitoring focuses, which leads me to believe there’s room for both tools, as well as different teams for both.

APM: The Engine for Infrastructure and DevOps Teams

Before the cloud, organizations supplied their own physical hardware and monitored components such as RAM, disk space, CPU and memory. If you ran out of any of these resources, you were screwed.

And that was the beauty of APM: It enabled the people running the applications to anticipate when they’d need more resources.

Of course, rather than monitor physical machines, today’s infrastructure, SRE and DevOps teams monitor cloud instances. Gone are the days of running out to Best Buy and lugging back new hardware. Instead, a simple request for a new instance is submitted to a cloud provider, and instant access is received.

Rather than make APM obsolete, as some feared, the cloud presents two good reasons for continued reliance on this tool.

Apps Can Be Resource Hogs: A cloud instance is simply a slice of someone else’s computer, which means you still need to know when you’re close to running out of resources on this virtual machine. In fact, extra care is needed these days because you’re likely to run out of space more quickly.
Money Still Doesn’t Grow on Trees: The ease of access to infinite cloud resources means that software companies may fall into the habit of throwing money at problems rather than figuring out how to streamline usage and costs through better efficiency. Companies can end up paying a ton of money to run apps if they continuously spin up more cloud instances without thorough consideration of the cost and need.

And that’s where APM steps in. Its main purpose is to help infrastructure teams figure out capacity planning. With the cloud, questions shift to include:

How can I better manage my cloud instances and optimize usage?
How can I make sure I don’t give ludicrous amounts of money to cloud providers?
How can we tweak and optimize the cloud to reduce costs?

These are the types of things that APM does extremely well. Anytime you need to figure out how to optimize your current resources or when to buy a new server or cloud instance, APM tells you. This information is invaluable to the people running the applications, namely the infrastructure or DevOps team.

Application Stability: Where the Tires Meet the Road for Engineering Teams

Now, let’s be honest, APM doesn’t have its strongest showing when the person who runs the app is also the person who builds the app. And that’s exactly the scenario presented by mobile development and iterative coding.

Rather than going through long development cycles, software now gets pushed to the web on a daily basis, and mobile apps tend to have weekly or biweekly release cycles. This release speed is not only encouraged but expected in agile software development.

That means the chasm between building and running apps has shrunk, especially for mobile apps. More often than not, the people building the apps are the people releasing them. There is no gap. And, with mobile, companies no longer need to worry about expensive physical hardware but rather about the end user experience. The customer becomes the focus.

Unfortunately, APM is perhaps a little too set in its ways to help you do that. But it’s not APM’s fault. These requirements are not what APM is inherently built to do. It’s great at detecting problems and alerting infrastructure teams so they can toss the issue over the fence to the development team. However, its core strength isn’t providing information on how to fix the problems because it wasn’t built with the developer audience in mind.

Trying to refactor APM to help with stability and error issues is like tuning your engine when you have flat tires. They are completely separate components of the car, built for different purposes. You can tune the engine all you want, but it isn’t going to move the car unless you focus on why there isn’t air in the tires.

In today’s iterative world, development teams care a lot more about how apps are running. There’s a demand for fixing actionable items. Developers want to know exactly what’s broken, what to fix right now and what can wait. In short, developers want aggregation and automation of errors. They want to know, “Do we build or fix?”

This trade-off between building new features versus fixing bugs is one of the key factors behind the adoption of application stability management tools. Developers need answers to several questions:

Where are the errors in the code?
How can we get to those bugs and fix them as soon as possible?
How can we tie bug fixing into our planning for the week?

Whether running sprints or building in an agile manner, development work is planned in advance. Developers want to figure out how much new feature work will be included in a sprint versus how much bug fixing is required. And they need a tool that helps them automate the answer to that question.

Benefits of Application Stability

The beauty of application stability is that it brings together the errors captured by APM and enables developers to see at a glance which ones are worth fixing. As a result, five major benefits arise.

Increased Efficiency: Companies eliminate the problem of infrastructure teams tossing issues over the fence to development teams. Valuable time is saved because application stability tools remove the game of telephone between the two teams and deliver bugs directly to the team that will fix them.
Stronger CSAT: The time to fix bugs goes down dramatically when the person who wrote the code fixes the code. With diagnostic information in hand from the application stability tool, software engineers innately understand what the code does, what the bug means and how to fix it. Faster resolution of bugs that impact the end user experience means that customer satisfaction levels (CSAT) are less likely to drop.
Error Prioritization: Application stability tools group bugs by root cause, making it easy for developers to get a sense of severity at a glance. It’s much easier to determine what to fix first when developers can see which errors are most costly, which affect the most customers, and which bug is impacting a key customer.
Tool Synchronization: Taking it one step further, application stability tools are tied into project management suites. Bugs map directly to tickets created in Jira (or whatever tool is used), and tickets update automatically as priority changes.
Stability Scores by Release: Application stability enables product and development teams to see stability scores by release. Since it’s common to have multiple app versions live at the same time, especially with mobile apps (where DevOps isn’t really involved), companies can’t rely on a single stability score. Teams need to see stability by release so that it’s clear exactly where the errors are and what impact they’re having on users.

What Percentage of Your Development Team Has a Login to Your APM?

I’m often asked whether I think application stability will replace APM, and my answer is simple: no, I don’t. APM remains an essential part of developing software, and organizations still need to understand when they’re about to run out of resources and when there’s poor performance.

Instead, I see these two solutions co-existing as adjacent categories but helping different teams. Application stability delivers prioritized errors to developers for fixing, while APM works well for enabling ops teams to raise red flags on high error rates and reduce cloud spend.

Some of you may be thinking to yourself, “Well, my APM product does what you’re describing for application stability, so I’m sure my developers are fine using it.” To which I poise the following challenge: What percentage of your dev team has a login to your APM? What percentage logs in on a daily basis? And, if they do use it, do your developers like it?

The answers to these questions may surprise you. After all, APM wasn’t really built for developers or for keeping end users happy. In contrast, application stability was born at the customer layer and is designed specifically to monitor the front end and ensure strong customer experiences with web and mobile apps.

Once you’ve had a chance to hear from your dev team, it wouldn’t surprise me if you discover that they’re pretty excited about the new kid in town.

— James Smith