Continuously monitoring Application Performance Management is necessary because business demands proactivity, transparency and timely action
In a short period, the complexity of technology increased exponentially. The number of frameworks that appear “overnight”, together with new architectural patterns and distributed teams, can pose several challenges. Keeping such a complex landscape in check requires constant monitoring.
Choosing the right monitoring solution, not only for the project but also for the team dynamics, will help to identify the possible areas of improvement. Furthermore, embedding a monitoring solution within the development cycle of a product will help to reduce the number of problems that can appear in each step, and might also help shorten the amount of time generally needed in identifying the area of service disruption.
Organizations today operate in highly dynamic and complex environments. Businesses need to adapt faster than ever, and customer experience is central. In the world of Twitter, Facebook, Google and 24/7 real time information streams, responding efficiently and effectively to disruptions has become highly important and application stability is key for a profitable business. Application disruptions, however, can happen easily, when, for instance, a sudden increase in the number of website visitors overloads the servers, causing unexpected behavior and application failures. Understanding the behavior and performance of complex applications in order to resolve such a disruption before it has impacted any users may be the differentiator between becoming successful or being forgotten.
KPMG performed its twentieth annual CIO survey in 2018 [KPMG18]. According to its findings, delivery of stable IT solutions is one of the top 3 Operational Priorities and a top priority for Digital Leaders. Having an APM solution implemented in an IT landscape is a good starting point for developing stable applications.
Application Performance Monitoring, or APM, is the monitoring and management of the availability and performance of software applications. It is the dashboard for performance problems and the answer to the following questions:
- How can the organization have real-time holistic visibility of the application stack and Infrastructure map?
- Is there deep code-level visibility into the business-critical web and mobile app?
- Is the organization aware of future service disruption and can acts before business transactions break down?
- Is it transparent whether the users experience suboptimal application performance in a certain time, region, module or business flow?
- Is it clear for IT managers where to get started in order to tackle poor application performance?
- Does the organization understand the inefficiency of a system and is it uncertain from where to start the improvement process? Which process is more critical and which performance remediation provides the highest RoI (Return on Investment)?
The amount of data, coming out of various business systems today is enormous, and sifting through the noise in order to find the actual problem can be a daunting task even for the most experienced IT managers. It would be near to impossible to find the cause of a service disruption within the mountains of logging output generated by all systems. A precise and efficient tool can be easily used to narrow the scope of a problem. An APM tool can help turn the noise into structured information since the language it speaks is data. This “language” enables people, even those without a technical background, to also understand what is happening when a disruption occurs (e.g. when a web page takes too long to load, by using APM it can be clearly seen in the reporting graph how much longer the page took to load than normal). For web applications, Page Load Time (PLT) and performance are all about speed. In fact, PLT is the most discussed performance metric of the 21st century when it comes to the digital landscape, and according to [Camb09], about half of web users expect a site to load in two seconds or less. PLT is an important part of any user experience and with APM it is easy to pinpoint which action or transaction is expensive and where to start looking. Figure 1 clearly describes and makes it easier to understand why PLT is such an important performance metric. By going above the accepted threshold of two seconds, companies like Google and Amazon felt a considerable decrease in revenues for what might be thought as an insignificant increase in response time.
WHAT IS APM AND WHO IS APPLYING IT?
Both Application Performance Management and Application Performance Monitoring are often abbreviated to APM. Application Performance Management is the more proactive component, while monitoring is more reactive when it comes to application performance. Despite the way it is being looked at, APM is essentially a tool that helps optimize and monitor the performance of any organization’s applications.
Monitoring an application requires data. Depending on the complexity of a solution, there are multiple areas from where data needs to be collected and aggregated in order to evaluate the “health” of a system. Monitoring solutions that are on the market have wide coverage and each suite offers plenty of monitoring options. Most APM vendors offer similar features in their solutions, as the market problems are mostly the same and the way data is collected and analyzed hasn’t changed much in the past few years. Figure 2 depicts the common points all APM vendors are covering, demonstrating the entire end-to-end flow of transactions, including the underlying technical components and hardware.
Companies in almost every sector are already using APM: from Samsung, Allianz, BMW, NTT, and Siemens, who are part of Fortune 100 companies, to CNN and Ryanair. Leading companies from various industries like Media, Financial, Government, Healthcare, Insurance Retail, Telecommunications, and Technology are using it extensively and are asking for more capabilities to measure and act upon. APM vendors are also improving their tools with features like Artificial Intelligence for IT Operations (AIOps), End User Monitoring (EUM), Infrastructure Visibility, Business Performance Monitoring, AI powered user experience insight and Cloud Monitoring.
- eBay: “With more than a hundred million active users globally, our customers expect efficient delivery of services. Therefore, our infrastructure performance – which executes critical transactions like product listing, product selection, order booking, payments, and shipping – is key to the success of our business… APM provides unrivalled visibility into performance issues within our network and beyond, as well as automates the rest of the job. This gives us faster root cause analysis and allows us to resolve performance issues before they impact our customers.” – Rajshekhar Desai, Group Manager – Quality, eBay Managed Marketplaces [Desa17].
- CNN: “We partnered with APM vendor to build a mobile experience that helps users digest the numbers in a compelling and entirely personal way.” – Matthew Drooker, VP of App Development and Technology, CNN [Droo16].
CNN partnered with an APM vendor in order to develop the CNN Politics application, which provides an immersive multimedia experience that tracks data from polls, voting and fundraising. With exclusive stories and visualizations, personalized alerts and notifications, users have the power of data analysis at their fingertips.
- Nasdaq: “It really stood very well within a DevOps model, in this day and age where there is a lot of complexity within a given application architecture. The flow map that came out of the box just really sold the product…” – Eric Poon, Director of Operations Analytics, Nasdaq [Poon18].
“It’s a tool that offers seamless traceability and a view that bridge both the APM and the business product usage effectively.” – Heather Abbott, Senior Vice President of Corporate Solutions Technology [Poon18].
- Ryanair: “APM (…) has made us all better and faster at doing our jobs. Without APM, it would be impossible to troubleshoot this environment.” – Declan Costello, Infrastructure and Operations Manager, Ryanair [Cost17].
WHAT PROBLEMS DOES APM SOLVE?
According to [Will18], by 2021, enterprises will monitor 20% of all business applications with APM suites because digitized business processes will increase considerably, compared to 2017, when only 5% of the business applications were monitored using APM suites. These numbers alone should make any organization evaluate where they stand now. Any custom application comes with complexity. With more pressure on digitization, a globally distributed customer base and demand of high availability makes the platform, infrastructure and application more complex. Global trends and patterns show complexity and will keep increasing in coming years. So, let’s look at how APM can help with various problems.
According to [ISCO16], Digital Transformation (DT) is the process of transforming business and organizational activities, processes, and models to take full advantage from the mixture of digital technologies and the accelerated impact they have across society. As businesses are getting more and more digitalized, organizations need to have more control over their IT landscape and having APM will help in better visualizing that landscape and all the data that flows through it. In order to do that, organizations need to first translate business goals into application requirements. Then they should make sure that those are properly implemented. This can be done via metrics. For example, if a business goal is “an application must be 100% available”, monitoring uptime for the application will be a must. It also helps to understand the customer behavior and demand. The McKinsey [Olan13] analysis puts seamless customer experience, digital fulfillment, automation of activities and customer insights as key aspects for DT to play. APM does this work for organizations by putting facts on the table.
Every company depends on operation-critical software to run their business and because of this dependency, even a slight outage can lead to unhappy customers. When an unsatisfied customer contacts the Customer Services (CS) team to report a problem, even with basic APM dashboards, the CS can investigate the problem themselves and after narrowing it down, they can contact the relevant team for a solution (e.g. server node was unreachable in some region for some time, the message queue was full or busy executing queued events etc.). They no longer need to disturb the IT Department every time something is reported ([Wats17]). Enabling the right teams to act will save time and money for the organization.
When it comes to an application and its performance, organizations usually think about the software, but the hardware on which the application runs also matters. Applications can and will consume important hardware resources like CPU, RAM, I/O and storage due to problems that they inherit from the language they are written in and from their architecture. Due to this fact, hardware monitoring is usually the first thing that should be set up. This way the usage of various hardware resources can be observed and, if needed, scaling operations based on user needs can be performed.
Having an APM solution in place makes an organization pro-active. The solution does not need to be used only when users contact the CS with a problem; and checking the graphs from time to time to make sure everything is alright is not a must either. One of the beauties of APM is called alerting or notifying. People from the IT Department can create rules in the APM tool that will raise alerts each time the application’s performance dips in specific areas. These alerts can be set with different degrees of importance and the responsible team can receive them in different ways. For example, if an important part of the application went down, a critical alert will be raised and the development team can receive it via their phones, as a message. Less important alerts can just be sent via email and can be taken care of the next day. There are several options available and organizations can choose the one that best fits their product and needs. With this alert system, a proactive response is provided, instead of the typical reactive one.
SLOW IS THE NEW DOWN
Why is an application slow? Applications have evolved from stand-alone to client-server to 3-tier architecture to distributed one; a lot has changed in the IT landscape. With many microservices, APIs (Application Programming Interface), Enterprise Service Buses, Service-Oriented Architecture, message brokers and even cloud-based elastic applications, a slowdown is not easy to identify. Performance management is a challenge given the complexity of today’s hybrid development platforms, cloud-native infrastructure, virtualized and containerized servers, and dynamic and ephemeral application architectures. With the heterogeneous nature of the IT landscape and the numerous interdependencies between elements, it is difficult to identify the cause of why an application is performing poorly. One of the toughest problems that application owners, developers, and IT managers face is the question: “Why is this application slow?”. APM helps to locate the bottlenecks in this distributed application environment.
BENEFITS OF APM
The benefits of APM can be summarized using four categories: improving business continuity; improving user experience and increasing customer satisfaction; enhancing insight and improving productivity for development and IT Ops; and decreasing reliance on experts through better monitoring. By grouping concerns like operational costs reduction, finding and addressing performance bottlenecks, deep-code visibility and transaction profiling into concern bounded categories, like the ones below, it is easier to understand problematic areas and how APM can add value to each one of them. Business Continuity and Better User Experience and greater Customer Satisfaction categories go together well and they address the business as well as the human side, while Enhanced Visibility and High Productivity, together with Decreased Reliance on experts with better Monitoring categories tackle the technical side of a business: a 50%-to-50% balance.
According to [DZON17], 43% of application performance issues are due to an issue in the application codebase. Transaction profiling generally provides the capability for developers to deep dive into the code flow and get method-level processing time breakdown. Many application issues occur due to slow network connectivity, virtualization bottlenecks, memory leaks, distributed environments, etc. APM helps in understanding a complex microservice environment, identifying slow requests moving through the system and diagnosing the root cause of latency in slow customer experience.
APM is more than a way of tracing problems; organizations now have the power to make certain predictions regarding their application, making some of them be one step ahead of the competition, and who knows, maybe even a game changer at some point.
CURRENT STATE OF APM
In the past two years, APM solutions have come a long way. According to the Gartner Magic Quadrant from 2016 (Figure 4), there were only three leaders, but also a visionary, compared to the Magic Quadrant from 2018. Some niche players also moved to being challengers, while a challenger moved to the leaders’ square and the niche players increased considerably. These changes give an idea of how much the APM solutions market has matured in such a short time. The evolution of the big players can be seen in Figure 4 and 5.
[Capp18] categorized these APM suites in their Magic Quadrant of 2018 as follows:
The magic quadrants recognize very few market leaders in the APM segment. Let see who they are and why they are leaders in APM solutions.
As can be in Figure 6, the 2018 APM leaders offer similar functionalities. Their subtle differences, however, make each one of them stand out from the Leaders group and from the other categories in the Magic Quadrant.
For a business with software applications, these applications must always be running optimally with the highest priority. APM is monitoring and managing the performance of the code, application dependencies, transaction times, and overall user experience. Things like AIOps and Distributed tracing are the solutions needed for the new architectures (e.g. services and microservices), so it can be observed that the leaders are considering the technology evolution and they try to keep up with it by anticipating what the market needs.
Figure 7 [GART19] shows that although Dynatrace initially appears to deliver more to their customers than New Relic or AppDynamics, the numbers are showing that they come in second, the winner being New Relic. By taking a closer look at the Product Capabilities areas, the differences between vendors are not that great, but the number of users that reviewed the products can easily reveal the favorite vendor.
By taking a good look at which industries the reviewers originated from, an organization can come closer to choosing the right vendor for their business.
Monitoring solution development has improved drastically over the last ten years: from the APM concept as we know it today ([Drag14]), to the age of the cloud, which is lightweight, deploys quickly, and needs zero configuration to get started. It is easy to be impressed by these technological advancements.
The APM tools of tomorrow are far away from only monitoring production applications, they will be an integral part of the software delivery lifecycle. Identifying problems early and enabling continuous optimization across various development and testing phases and environments is a must in such a fast-paced world.
AI is becoming the new norm of monitoring which leads to the development of advanced AI-powered features in APM tools. For instance, causal analysis, which is a pattern detection and machine learning capability. Features such as differential analysis, which uses analytics and machine learning for automatically detecting anomalies and problems ahead of time is quite possible. Another hot topic in the area of APM is assisted triage ([Sidd18]). This is an intelligent engine that uses the graphical topology model, analytics, machine learning and expert heuristics to help users determine and verify the exact root cause of an issue by constructing a personalized path for each individual. These are no longer Sci-Fi stories. The future is on its way and it looks stunning.
[Camb09] M.A. Cambridge et al., Akamai Reveals 2 Seconds As The New Threshold Of Acceptability For ECommerce Web Page Response Times, Akamai.com, https://www.akamai.com/us/en/about/news/press/2009-press/akamai-reveals-2-seconds-as-the-new-threshold-of-acceptability-for-ecommerce-web-page-response-times.jsp, 14/09/2009.
[Capp18] W. Cappelli, Magic Quadrant for Application Performance Monitoring Suites, Dynatrace.com, https://www.dynatrace.com/gartner-magic-quadrant-application-performance-monitoring-suites, 03/2018.
[Cost17] D. Costello, Ryanair’s Ops Team Relies on New Relic to Deliver Passengers and Performance, NewRelic.com, https://newrelic.com/case-studies/ryanair, 08/2017.
[Desa17] R. Desai, eBay Leverages Compuware APM To Optimise Applications, FirstPost.com, https://www.firstpost.com/biztech/ebay-leverages-compuware-apm-to-optimise-applications-1891241.html, 02/2017.
[Drag14] L. Dragich, Monitoring Magic and the Future of APM, APM digest, https://www.apmdigest.com/monitoring-magic-and-the-future-of-apm, 14/08/2014.
[Droo16] M. Drooker, CNN Politics: Democracy Powered by Data, CA Technologies, https://www.ca.com/us/collateral/case-studies/cnn-politics-democracy-powered-by-data.html, 05/2016.
[DZON17] DZone, Performance & Monitoring, LinkedIn SlideShare, https://www.slideshare.net/tuoitrecomvn/dzone-performancemonitoring2016mastercodevn, 04/2017.
[Eato12] K. Eaton, How One Second Could Cost Amazon $1.6 Billion In Sales, https://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales, 15/03/2012.
[GART19] Gartner Inc., Application Performance Monitoring Suites > Compare Vendors, Gartner.com, https://www.gartner.com/reviews/market/apm/compare/dynatrace-vs-newrelic-vs-appdynamics-vs-ca-technologies, 2019.
[GOOG19] Google, The core foundations of a delightful web experience, Google Developers, https://developers.google.com/web/fundamentals/, 09/05/2019.
[ISCO16]i-SCOOP, Digital transformation: online guide to digital business transformation, i-SCOOP.eu, https://www.i-scoop.eu/digital-transformation/, 2016.
[KPMG18]Harvey Nash & KPMG, CIO Survey 2018: The Transformational CIO, https://assets.kpmg/content/dam/kpmg/nl/pdf/2018/advisory/cio-survey-2018.pdf, 2018.
[Maln13] R. Malnati, For Google, 400ms of increased page load time, results in 0,44% lost search sessions, Citrix.com, https://www.citrix.com/products/citrix-intelligent-traffic-management/, 02/2013.
[Olan13] T. Olanrewaju et al., Finding your digital sweet spot, McKinsey Digital, https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/finding-your-digital-sweet-spot, 11/2013.
[Pate11] N. Patel, How Loading Time Affects Your Bottom Line, NeilPatel.com, https://neilpatel.com/blog/loading-time/, 04/2011.
[Poon18] E. Poon et al., Challenges: Slow, Cumbersome Log Analytics, Home-Built Tools, AppDynamics.com, https://www.appdynamics.com/case-study/nasdaq/, 04/2018.
[Sidd18] A. Siddiqui, A New Approach to Application Performance Management?—?Delivering a Future-Proofed Modern Solution, CA Technologies, https://medium.com/@CATechnologies/a-new-approach-to-application-performance-management-delivering-a-future-proofed-modern-solution-960c25a6dddd, 22/05/2018.
[Wats17] M. Watson, Why APM is Valuable to Every Part of Your Business, APM digest, https://www.apmdigest.com/why-apm-is-valuable-to-every-part-of-your-business, 28/02/2017.