Since the beginning of software programming, developers and operations teams have had to find and solve problems. They’ve had to attend to defects (on different dev environments) and incidents (on production), investigate their root cause, and prepare a solution according to their findings. Application log files that contained errors, events, and warnings have thus become a substantial part of any software and provide all the required data that developers and ops need to resolve software issues.
During the information explosion decade, however, when almost every software started to possess and manage huge amounts of data and traffic, application logs became almost useless for proper investigation, as a large number of events obscured the relevant log entry. In order to reestablish their ability to search for a relevant line of log and analyze it, different logging tools were thus developed. But while these tools provide an overview of an event-triggered log, they cannot present a longer-term record of application behavior. Tracing, on the other hand, can do just this, as it encompasses a more extensive and continuous view of the software.
The goal of tracing is different than that of logging. Tracing was introduced to make following a program’s scenario–and the data–simpler and swifter. Different organizations may utilize tracing for different needs, but they are all in need of a practical tool to help them implement it. OpenTracing is one such tool that helps standardize tracing, thus making the process of instrumenting an application for distributed tracing generic. Since the implementation and integration are quite simple, teams should be able to set up and start generating value from OpenTracing on day one. But to do this, organizations also need distributed traces that are OpenTracing-compatible.
In this article, we will discuss two open-source tracing tools, zipkin vs. jaeger. We will compare their abilities and also the benefits that developers and DevOps teams can gain from using them both.
zipkin vs. jaeger
In an ideal world, every API that the application exposes has tracing enabled. But the amount of resulting data can be too much to sort and manage. Understanding how long a specific operation took in a web server, load balancers, database, application code, or entirely different systems can become a tough task when hundreds of thousands, and sometimes millions of requests, are being processed–some of them in parallel. This is why different tracing tools have been introduced over time, allowing for straightforward deployment and enablement of tracing in almost every application.
The two most popular and practical tracing tools are Zipkin and Jaeger. Zipkin was developed by Twitter and is now maintained as open-source by its dedicated community. Jaeger was developed by Uber and was shared as an open-source project once it was production-ready. The overall architecture of the two tools is quite similar. Both implement a trace collector, and the different APIs send events/traces to this trace collector. The collector then records the data and the relation between the different traces. Both tools also provide a user interface for inspecting and following traces.
Zipkin was one of the first tracing systems developed according to Google’s Dapper, and its architecture is quite simple. Every operation performed in the application that is being traced is usually generated on the client-side and starts with an outgoing http request. Several headers are added to this request in order to create unique IDs that are traceable along with the application. The Reporter is the component that is implemented inside each host of the tracked application and is in charge of sending the data to Zipkin. Reporters then send all traces to Zipkin collectors, which persist the data to databases.
You can use either Cassandra or Elasticsearch for a scalable storage backend, as Zipkin supports them both. This storage is queried by the API used by the Zipkin UI.
In addition, Zipkin has an open-source community, OpenZipkin, that continuously publishes new APIs, data formats, and libraries. This allows users to replace their existing Zipkin backend to process the same type of data sent to the regular Zipkin server.
Zipkin is easy to deploy and start. It even has a Docker project and can be simply spun up using the Docker command:
docker run -d -p 9411:9411 openzipkin/zipkin
Or you can run the source project at https://github.com/openzipkin/zipkin using Java.
Jaeger is similar to Zipkin but has a different implementation. Supported by the Cloud Native Computing Foundation (CNCF) as an incubating project, Jaeger implements the OpenTracing specification to the last API, and its preferred deployment method is actually Kubernetes. Jaeger is built with components that resemble other tracing systems, with collectors, a query service, and a UI. Jaeger also deploys an agent on every host that locally aggregates the data and sends it to the collector. From there, the data passes through the same flow–to be stored in Elasticsearch or Cassandra and even ScyllaDB–and is then queried and delivered to the UI.
Another way to buffer data in a Jaeger system is by using Kafka between the collector and the DB. This does, however, require a couple of additional tools for indexing and streaming (components to read the data from the Kafka topics and to write it to the storage).
Unlike Zipkin, Jaeger has a shorter list of supported languages: C#, Java, Node.js, Python, and Go. But it does come with multiple repositories that offer off-the-shelf instrumentation for many development frameworks such as JAX-RS and Dropwizard (Java), Flask and Django (Python), and Go standard library.
Jaeger deployment is more complicated than running a Docker command, as it is mainly built to run inside Kubernetes. The storage component needs to be set up on its own (can be a simple Docker container with Cassandra on it), and then Jaeger needs to be set up inside the Kubernetes cluster as a DaemonSet.
Here’s a list of kubectl commands to deploy Jaeger:
Creating a dedicated namespace for Jaeger:
kubectl create namespace observability
Creating all the cluster Custom Resource Definitions for Jaeger and the service account:
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing_v1_jaeger_crd.yaml kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
And eventually, creating the Jaeger operator:
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml
Alternatively, Jaeger can also run inside OpenShift.
At this point, jaeger-operator deployment should be available in the Kubernetes cluster.
And once the backend is set up, the UI should be deployed. Jaeger’s UI can also be completely customized according to the user’s needs. After all configurations and customizations are completed, the Jaeger image can be deployed.
Running the following kubectl command will deploy Jaeger’s “all-in-one” official image (containing the agent, collector, query, ingestor, and Jaeger UI) in one pod, which uses in-memory storage:
kubectl apply -f jaeger.yaml
One interesting aspect of Jaeger is that it samples only 0.1% of all traces that pass through each client. This percentage is controlled through the use of probability sampling that is executed at random. Jaeger uses an algorithm to make an educated guess as to which traces need to be collected. This process is based on adaptive sampling, which adds an additional context to help the decision-making process.
Conclusion – Zipkin vs. Jaeger
Software today is increasingly complicated, and containers only add to that complexity. Modern IT environments are usually based on Docker, where new code is continuously being pushed to production. And in such an environment, the entire infrastructure stack must be fully visible.
Zipkin and Jaeger are both powerful tools for tracing requests and creating such visibility. The choice of which one to use depends on what makes the best sense for developers and DevOps teams per the supported programming languages, libraries, and frameworks. Zipkin has made a name for itself as a leader in providing such support, but Jaeger is probably the safer choice because it works with all the OpenTracing instrumentation libraries. At the end of the day, it really comes down to the application’s technology stack and how much of it is already instrumented. You may also choose to combine both tools, implementing elements from both Jaeger and Zipkin. Since Jaeger is compatible with Zipkin’s API, you can easily use Zipkin’s instrumentation libraries with Jaeger’s collector.
Deployment, however, is another aspect that should be taken into consideration. If you are using Kubernetes, Jaeger should be a natural fit. But Zipkin is the less-complicated solution if you are not currently using a container infrastructure.
If the above parameters are less relevant to your organization, the best way to make a decision is to perform a Proof of Concept with both tools. Then, based on your experience with each, you can make an educated decision as to which tool is best to implement tracing in your application.
Looking to automate your tracing? Get started with Epsagon today for free.