Azure Kubernetes Service is a managed service that you can use to manage a Kubernetes environment in Azure. AKS makes it simple to install, develop, deploy, and maintain your containerized workload. AKS automatically handles the scalability and availability of the Kubernetes control plane, whereas you manage the data nodes and only pay for the VMs your nodes run on. AKS gives you the flexibility of open-source Kubernetes without having to deal with the operational overhead of running and managing your Kubernetes cluster.

 A Kubernetes cluster has many moving components, and if you have your cloud-native applications running in AKS, it is critical to have a robust observability strategy. This article will explore Epsagon’s observability platform to monitor your Kubernetes cluster and understand how easy it is to visualize and troubleshoot issues.

Kubernetes Dashboards

The Epsagon platform provides you with a simplistic way to monitor your Kubernetes cluster deployed in AKS, with a screen that displays a comprehensive overview of the Kubernetes cluster, node, pod, container, and deployment metrics.

Figure 1: View of AKS cluster metrics

Kubernetes handles the scheduling of pods across nodes in the AKS cluster based on the available resources like CPU, memory, disk, and network.

Figure 2: View of node-level metrics in the AKS cluster

 A pod is the smallest unit of deployment that can be scheduled in Kubernetes. In the Epsagon console, you can view pod metrics like CPU, memory, disk, and network, plus pod state, an important metric that appears as Running, Pending, Succeeded, Failed, or CrashLoopBackoff.


Figure 3: View of pod-level metrics in the AKS cluster

Custom Dashboards

Epsagon provides both out-of-the-box and customizable dashboards that you can leverage to visualize metrics and improve the observability of your services deployed in the Kubernetes cluster.


Figure 4: Epsagon Dashboards


Under the “Epsagon Dashboards” tab, you will find ready-to-use dashboards created by the Epsagon team to help monitor your services and Kubernetes infrastructure. 


Figure 5: Epsagon dashboard for Kubernetes


You can also create a custom dashboard from scratch or duplicate an already-created dashboard and modify it. Then, you can select any of the available metrics Epsagon tracks and add them to your custom dashboard. You can even create your own widget/panel inside of it. 


Figure 6: Epsagon custom dashboard


To add new panels to your custom dashboard, just drag and drop. A variety of template variables let you filter the dashboard based on an application value as well, helping you create highly interactive and reusable boards. 

Troubleshooting AKS with Epsagon

Epsagon provides a consolidated platform to help you troubleshoot issues and detect bottlenecks in your Kubernetes cluster deployed to AKS, giving you in-depth visibility into application performance issues and allowing you to make informed decisions to optimize your infrastructure. 

Troubleshooting issues is faster with Epsagon since you have all the required tooling in a single platform at your fingertips. You can easily correlate your application logs, metrics, and traces under the same set of pre-configured dashboards, a beneficial capability for development teams since they don’t have to navigate different observability tools to monitor their workloads.

Trace Search

The Trace Search screen is customizable and lets you search across any request in your workloads. The screen has a wide variety of filters available, including time frame, and displays the matching traces in the results section for you to further analyze via RED metrics (Requests, Errors, and Duration (Latency)).


Figure 7: Analyze traces using the Trace Search page


Trace View

Epsagon’s Trace View provides additional metadata that can help you dig deep into a particular transaction flow and better understand the overall system. Using this view, you can quickly identify issues in your application stack.


Figure 8: Epsagon trace view


A distributed trace is a collection of spans that can help you break down the total duration of a given request. A trace gives you in-depth visibility into how a request gets processed across multiple services and is beneficial in troubleshooting performance issues and identifying slow components in your architecture. 


Figure 9: Epsagon view of trace details


Epsagon’s out-of-the-box dashboards are great resources to build an automated monitoring and troubleshooting strategy for your services deployed in an AKS environment. The “Application Overview” dashboard displays the Top 5 applications with the highest number of errors. It also shows other useful metrics like endpoint throughout, latency, error codes, and exceptions. 


Figure 10: Epsagon’s Application Overview dashboard


Service Maps

A service map provides you a visual insight into your service architecture in real time, helping you better understand your systems. You can monitor and troubleshoot any performance-related issue using this feature. Behind the scenes, Epsagon’s service map utilizes distributed tracing to display the dependencies between various resources.


Figure 11: Epsagon service map


If you hover over a resource in the service map, you can view the upstream and downstream dependencies on that resource. You can also visualize RED metrics associated with that particular component. From an observability perspective, the ability to view standardized health-check metrics across all your services is critical. 


Figure 12: Service Map hover view


If you click on any resource in the service map, you can drill down into the RED metrics over time and view the detailed traces associated with that request in the Epsagon UI.


Figure 13: Service Map focus view


Trace-Based Alerts for AKS

With Epsagon, you can even create a comprehensive monitoring and alerting strategy for your AKS cluster. This ensures that you receive timely notifications about any issues and can take preventive measures to minimize application downtime. Epsagon has out-of-the-box integration with industry-standard alerting tools like OpsGenie, ServiceNow, Slack, Microsoft Teams, PagerDuty, etc.


Figure 14: All alerts set up to monitor the AKS cluster


You can also create new custom alerts to verify your AKS cluster’s health based on metrics like CPU usage, memory utilization, disk read, disk write, network transmit, and network receive.


Figure 15: New alert to verify health of the AKS cluster



Epsagon provides a comprehensive solution for monitoring your services deployed to Azure Kubernetes Service. You can view the metrics, logs, and traces, all in the same dashboard with just a few clicks. You can perform a high-level analysis of your requests by looking into the RED metrics or choose to dig deeper into a particular request and view the associated details. You can even run a health check on your Kubernetes cluster and configure alerts when the metrics don’t meet a specified threshold.

The ability to view traces, logs, and metrics all in one place without any manual configuration is a powerful feature that makes Epsagon stand out as one of the best tools available in the observability space for monitoring and troubleshooting AKS and other Kubernetes workloads.


Try Epsagon for Free!


Read More:

Monitoring Kubernetes with Epsagon

Getting started with Azure Kubernetes Service (AKS)