As organizations strive toward modernizing their applications by adopting a microservices architecture and deploying services using containers, achieving complete visibility into your environment becomes critical. In traditional monolithic applications, you can effectively troubleshoot with the help of logs and metrics. But with distributed systems, it becomes essential to understand how a request flows through multiple services. Each service handles the request by fulfilling a responsibility, whether that means making an API call, firing a database query, publishing a message, or other options. 

Enterprises have widely adopted Kubernetes as they move toward cloud-native technologies and implement containers to deploy their applications, as Kubernetes orchestrates and manages the deployment of containerized applications. However, there are a number of challenges associated with monitoring your Kubernetes clusters and understanding how a request flows through the application stack. 

This article will look at how Epsagon provides an automated, distributed tracing solution for your systems deployed in Kubernetes. Check out part 1 here.

Monitoring Challenges with Kubernetes

Kubernetes simplifies the deployment and management of your services. However, compared to your traditional infrastructure, Kubernetes has a lot of moving components that need to be monitored. This brings in complexity from the perspective of system observability and troubleshooting. Hence, you need to have a comprehensive monitoring solution that can help you to easily visualize the services running in your Kubernetes cluster, send you timely alerts, and assist you in identifying and resolving real-time operational issues.

Challenges with Distributed Tracing

As the number of services in your microservices architecture grows, the complexity in tracking and diagnosing issues increases. Distributed tracing comes to the rescue here by providing you with in-depth visibility of the transactions across your services, helping you to better understand your distributed system and quickly identify the source of latency and performance bottlenecks.

At a high level, each trace has a unique identifier associated with it at the source, this ID provides details into how a single request traverses across multiple components in your environment. When a significant event occurs, you will need to associate this same transaction ID to the given context so that you can successfully track the transaction across the entire system. If your applications are polyglot, you should add instrumentation code to each service endpoint, which might not be a straightforward effort. You can get more details on the challenges surrounding distributed tracing here.

Monitoring Kubernetes with Epsagon

Epsagon provides a consolidated platform to help you troubleshoot issues and detect bottlenecks in your Kubernetes cluster. You will be able to gain in-depth visibility into application performance issues and make informed decisions to optimize your infrastructure. This in turn will increase your team’s productivity, enabling you to spend more time on feature development, instead of on the maintenance of existing applications.

Troubleshooting issues is also faster since you have all the required tooling in a single platform at your disposal. Using Epsagon, you can correlate your application logs, metrics, and traces under the same set of pre-configured dashboards. This is very helpful for development teams since they then don’t have to navigate different observability tools to monitor their workloads. To understand the key metrics and components you should be monitoring in a Kubernetes environment, check out our post on this topic.

Epsagon helps you address the critical challenges to monitoring your Kubernetes environment, as well as distributed tracing. Epsagon’s trace-based metrics and trace-based alerts let you monitor issues and send notifications when error/latency thresholds are not met.

Trace-Based Metrics

The Trace Search screen allows you to search across any request in a workload. This screen is highly customizable and lets you search traces based on timeframe and available filters for several criteria, including application, duration, and HTTP status code, as well as Kubernetes resources like cluster, node, pod, container, namespace, etc. Once you execute your search criteria, you’ll see the events that match your given conditions: 

Figure 1: Analyze traces using the Trace Search page

When you click on one of the traces in the “Trace Search” screen, it displays a pictorial representation of the entire request flow. Using this trace view, you can quickly identify issues in your microservices stack:


Figure 2: Epsagon Trace View


The best part of this tool is that once you click on any of these events, the Epsagon UI opens a service map showcasing the details of the payload and the interaction between various components:


Figure 3: Service Map showing the trace flow


One easy way to discover an errored resource is to look for red arrows in the service map, which highlight any problematic calls and help you to quickly detect performance issues. You can then click on the component and look at the corresponding trace information to determine the root cause of the latency or error.

When you hover over a specific component, you can also view the RED (Rate, Error, and Duration) metrics associated with it and the resources interacting with it. This comprehensive view of the service interaction inside your architecture provides an effortless way to monitor and troubleshoot issues:


Figure 4: Visualize RED metrics associated with a component


Trace-Based Alerts

Epsagon also provides a robust real-time alerting strategy to notify teams about issues with their workload. This negates the need for teams to manually monitor the Kubernetes cluster 24/7. A common alerting scenario in a Kubernetes cluster is when CPU/memory utilization on a node/pod exceeds a particular threshold. 

You can leverage Epsagon’s integration with a wide range of industry-standard alerting tools like Opsgenie, PagerDuty, ServiceNow, Slack, and Microsoft Teams. Once your team receives an alert, they can leverage the Epsagon platform to review the logs, metrics, and traces all in one dashboard to quickly troubleshoot and fix the issue. 

With Epsagon, you can additionally view all the alerts that have been set up for Lambda events, Kubernetes, and tracing data in one place:


Figure 5: View all the alerts set up to monitor the system


Plus, you can also the “Create new alert” feature to configure alerts in Epsagon based on metrics like rate, errors, timeouts, cost, and memory:


Figure 6: Create a new custom alert


Trace-Centric Observability

Unfortunately, you can’t effectively troubleshoot today’s distributed systems using only logs and metrics. You also have to find out how a request flows through the entire system. Epsagon lets you do this, giving you complete visibility into your environment and how a request traverses multiple services and components. Meanwhile, the availability of trace-based metrics and alerts gives you the confidence you need in your monitoring solution, allowing you to easily diagnose performance issues and identify their root cause.

Using Epsagon’s unified platform, you can automatically collect all the application and infrastructure metrics of your services running inside a Kubernetes cluster. Plus, it provides out-of-the-box dashboards to monitor your Kubernetes cluster and review the real-time state of your nodes, pods, deployments, and container metrics, like CPU and memory requests. You can also visualize the distributed traces of your containerized applications and resolve issues faster.

There are other open-source solutions available for distributed tracing, but they come with their own set of challenges and shortcomings:

  • No alerts 
  • No correlation between logs and traces
  • No source of truth for observability purposes
  • No dedicated support team available for scaling deployment and maintenance activities

The ability to view traces, logs, and metrics all in one place without any manual configuration is a powerful feature that makes Epsagon stand out as one of the best tools available in the observability space for monitoring and troubleshooting your Kubernetes workload.

Try Epsagon for Free!


Read More:

Monitoring Amazon ECS Clusters with Epsagon

Monitoring Kubernetes with Epsagon