Modern applications are built using microservices for a variety of reasons such as scalability, faster time to market, and ease of deployment. Whenever a user interacts with the application, there is a flow of the request through different microservices. Distributed tracing helps the user examine such request flows. However, when a request errors out we need to know the root cause of the error as soon as possible to avoid a bad user experience. In this blog, we will explore how to use tracing to detect correctness issues. We will also explore how to use Kubernetes dashboards to detect pod and deployment versions.

Detect Errors and Correctness Issues in Applications: Finding Errored Traces

With Epsagon, you can find out which traces errored out in multiple ways.

1. Alerts

The most common way to know about such traces is by setting trace-specific alerts.

Figure 1: Trace-specific Alerts

In Figure 1, an alert has been set for all the traces that have errors. Once you get an alert, you can click on the alert which will take you to the traces screen.

2. Trace Search

If you want to view specific errors related to traces, you can go to the Trace Search screen and set the filter as “error is true”. You can also select the application that you are interested in.

Figure 2: Trace Search Screen to Find Errored Traces

3. Service Maps

Service Maps give a holistic view of your application. With Service Maps, you can quickly identify which microservices have a problem and then jump to the relevant trace.

Figure 3: Finding Errored Traces using Service Maps

 

Comparing Errored Traces with a Good Trace

Let’s see how you can troubleshoot quickly by comparing a good trace and an errored trace. We will use the trace search approach to find an errored trace and then compare it with a good trace. To find an errored trace, set the filter as “error = true”.

Figure 4: Finding Errored Traces using Trace Search

For the purpose of the blog, let’s use the following trace as an example.

To find a good trace that is relevant to this trace, set the appropriate filters.

Figure 5: Find a Relevant Good Trace

 

When you open up both traces, you can see the difference between a good trace and an errored trace.

 

Figure 6: Errored Trace (Trace A)

 

Figure 7: Good Trace (Trace B)

For this example, you can see that there was a “division by zero” error in Trace A and that is why the request did not go through the DynamoDB and Twilio resources. You can also compare metadata like operation stats, cloudwatch metadata, etc.

Detecting Errors and Correctness Issues in Kubernetes Deployments

Using Kubernetes Explorer, you can detect if your K8s environment is running correctly. 

Figure 8: Kubernetes Explorer

For example, one of the common checks done before troubleshooting a Kubernetes environments is to check if a container is running the right image. You can do this by jumping to the Containers tab and viewing the image field as shown below.

Figure 9: Check Image Correctness in Containers

In addition to checking the image, you can also check Kubernetes environment correctness issues such as:

– in the Pods tab, sort/filter by number of restarts and check the pod events for reason.

– in the Nodes tab, filter by status to check nodes that are not ready and check their events tab to understand why.

– in the Clusters tab, check total nodes or pods.

Figure 10: Detecting Correctness Issues in Kubernetes Environment

Conclusion

Detecting correctness issues at the application level as well as the infrastructure level is the first step before deep diving into troubleshooting. With Epsagon, you can easily detect correctness issues in an application’s request flow by comparing traces. You can also detect infrastructure correctness issues using the Kubernetes Explorer. Thus, you end up saving debugging time by getting context into the issues before diving into them.

To learn about how you can use Epsagon to monitor your applications and infrastructure, start a free trial.

Read More:

Monitoring Microservices-based Environments using Epsagon

Introduction to Service Maps

Troubleshooting Application Errors with Epsagon