Developing for the cloud is nothing new. After all, the oldest players in the league are now more than a decade old. But what we often describe as cloud development usually means: development for a particular provider.
Vendor lock-in is a big problem in this field, with companies unable to change the platform even when a better alternative comes up. The Cloud Native Computing Foundation (CNCF) is trying to change this trend. Cloud native computing uses readily available resources (whether cloud or on-premise) and abstracts them away. And it does this while sticking to the core principles of cloud computing: high availability, scalability, and flexibility.
This series of articles presents the most important tools from the CNCF landscape. Our last piece discussed the cloud native proxy, Envoy, and this installment focuses on Jaeger, a distributed tracing solution.
What Is Jaeger?
Tracing is not a new problem, so why does it need a new solution? To put it simply, cloud native applications are much more complex than the monoliths of yesteryear. Your microservices scale automatically, routing requests between each other. You never know where a given transaction ends up and how it will be processed. Yet, you want to know what’s happening in your system. You want observability in this dynamic environment. You want to locate the performance bottlenecks, minimize the latency, and provide the best level of service that you can afford.
That’s where Jaeger comes in. Jaeger’s built from the ground up with cloud native solutions in mind. It aims to address the problems of monitoring distributed transactions and propagating the distributed context. Besides performance or latency optimization, it allows you to perform a root cause analysis or analyze the inter-service dependencies.
As it’s compatible with the OpenTracing API (an open standard for tracing and instrumentation), Jaeger has access to many client libraries, and there are numerous examples and tutorials on how to use it for tracing. It can be run as a single binary using its own backend, or it can use an external backend such as Elasticsearch, Cassandra, or Kafka. Jaeger also comes with a UI for displaying call graphs.
docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 5775:5775/udp \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 14268:14268 \ -p 9411:9411 \ jaegertracing/all-in-one:1.8
A funny thing about this command is that you configure Jaeger to also act as a Zipkin-compatible collector. Zipkin is another distributed tracer, which we’ll compare to Jaeger a bit later.
You use the all-in-one image, which bundles all of Jaeger’s components together. Note that this is not a scalable solution. For that, you may want to run each service that comprises Jaeger as a separate container.
To test how Jaeger works, you also need an application that will send the traces to the backend. Fortunately, there is such an example application written in Go, which you can run from Docker as well:
docker run --rm -it \ --link jaeger \ -p8080-8083:8080-8083 \ -e JAEGER_AGENT_HOST="jaeger" \ jaegertracing/example-hotrod:1.8 \ all
Open your browser, navigate to http://localhost:8080, and interact with HotROD (as it is called). After that, you can access the Jaeger UI to see what traces have been collected. The UI is available at http://localhost:16686, and we’ll return to this in one of the sections below.
As a CNCF project, Jaeger is currently in the incubation stage This means that it’s mostly mature but not as mature as Kubernetes or Envoy. In addition to coming in different Docker images, there are also other means of deploying Jaeger: Kubernetes operator, Kubernetes templates, or Helm chart. All Jaeger backend components also expose Prometheus metrics for interoperation with this CNCF graduated project.
The Power of OpenTracing
We’ve already mentioned that Jaeger uses OpenTracing API to collect the samples. If you want to avoid vendor lock-in, this is another great piece of information for you since OpenTracing is a vendor-independent API that passes request-scoped information between segments of your application.
If at any given point in time you decide that you want to switch tracers from Jaeger to, say, Zipkin or Datadog, you are free to do so without having to rewrite your entire codebase. Even better, you won’t even need to touch your code besides redirecting the OpenTracing output to a different endpoint.
Besides compatibility with OpenTracing, Jaeger also accepts samples in Zipkin format. This means that you don’t have to rewrite the instrumentation from one format to another if you already use Zipkin. Bear in mind that OpenTracing is still recommended for all newly written applications.
Hunting for Bottlenecks
Previously, we started Jaeger and sent a few traces to its backend from the HotROD app. We can now see how the Jaeger UI looks. Unless you turned off the Jaeger container, it should still be available at http://localhost:16686. The UI presents three main sections.
This allows us to see all the traces that satisfy the search requirements. Searching everything that touches the service frontend, for example, will show us all the requests that originated from the client interaction with the frontend. Clicking on a particular trace will show us the entire chain of events related to the event that triggered the trace.
If we know the trace ID of two traces that we want to compare, we can go directly to this section. An alternative way to compare two traces would be to select them in the Search section and then click the Compare Traces button. This shows us the events that happened in one trace but not in the other one. This is indicated by the color of the nodes in the graph, with grey meaning events happened in both, red meaning they happened only in the first, and green meaning they happened only in the second.
This visualizes how our services are connected and which of them play a role in any given request.To learn more about how to use the Jaeger UI, here’s a great article with further details. It shows how to uncover an actual bottleneck in the code and how to perform a root cause analysis once it’s found.
Jaeger vs. Rest of the World
Jaeger’s main competitor in the distributed tracing field is Zipkin. Zipkin is an older project, which also means it’s more mature. Despite this, Jaeger has seen an increase in interest mainly due to its low footprint and scalable design.
There’s a difference in the level of support for OpenTracing between the two. Jaeger was designed with OpenTracing in mind right from the beginning. For Zipkin, the support was added later on as an addition to the proprietary API it originally used.
Beside Jaeger and Zipkin, a third possible alternative is AppDash, another OpenTracing compatible solution. It was inspired by Google Dapper as well as Zipkin, but it hasn’t seen much adoption so far. Its documentation is also not very good, so we don’t recommend it to run in production.
Zipkin is hosted under the Apache foundation, which is known for high-quality products. Jaeger, as you already know, is part of the Cloud Native Computing Foundation (CNCF). This means, if you’re building cloud native applications, you can expect better support and integration from Jaeger than you would from Zipkin. If, on the other hand, you want a general solution, Zipkin may be a good choice for you.
If you’re building cloud native applications, you will definitely need a distributed tracing system to help you analyze the possible performance issues. The old-school ways ceased to be viable when we left the monolith era and entered an era of more loosely coupled services. Today, out of all the possible solutions for tracing, Jaeger looks to be the most promising. It’s built to scale and prevents vendor lock-in as well, which can save you costs down the road.