In the past few years, several organizations made the jump from monoliths to microservices architecture. This architectural pattern breaks down a large complex application into a collection of smaller loosely coupled services that are easy to maintain, scale, and deploy independently.
However, the downside of modern microservices architecture is the inherent complexity of service-to-service discovery and communications. This complexity brings additional challenges to quickly diagnosing errors and performance issues that may impact your system and increase MTTR—the mean time to resolve, repair, or recover.
When using MTTR as a metric to measure a system’s key performance indicators (KPIs), each R brings its own nuance and dimension to the discussion table. To have low MTTR values, reliability and scalability are critical factors that always need to be taken into account while designing a solution. However, without proper tooling, even a well-designed system might fall short and have prolonged failures with devastating business implications. With systems becoming more and more advanced, it is crucial to have the proper means to swiftly troubleshoot errors and bottlenecks in a complex distributed system, and thus, reduce the different dimensions of MTTR.
Distributed tracing helps you do this and should thus be included in your toolbox. It helps you understand the flow of requests through a microservices environment, allowing you to quickly diagnose where and why failures or performance issues are occurring. The great news is that if you are already using a combination of a service mesh with a network proxy solution—say, Istio and Envoy—to address challenges such as service-to-service discovery and communication, then you might already have all the right ingredients in place to achieve scalable distributed tracing capabilities.
Getting to Know Istio and Envoy
Istio, a Service Mesh Solution
Istio is an open-source project developed by teams from Google, Lyft, and IBM. While regularly associated only with Kubernetes, Istio is a service mesh solution with support for multiple-platform and cloud integrations.
As a service mesh, Istio manages communications between microservices and applications, providing capabilities such as traffic management, security, and observability to distributed software systems. Packed with powerful tracing, logging, and monitoring features, Istio gives you deep insight into your system deployment as well as visibility into the performance of all your microservices—including how each of them affects other processes and resources within the system. These features will let you effectively set, monitor, and enforce SLOs (service-level objectives), enabling you to quickly and efficiently detect and fix issues.
Envoy, a Network Proxy Solution
Originally developed by Lyft to migrate from their single monolithic service to a microservices architecture, Envoy is a high-performance distributed proxy solution. Seeing that many other organizations were facing similar problems when making this switch, Lyft open-sourced Envoy, and the project was later incubated by the Cloud Native Computing Foundation. Since then, it has been a fast-growing project, adopted by many organizations, and fully ready for production environments.
Distributed Tracing with Istio and Envoy
Using an extended version of the Envoy proxy, Istio can deploy Envoy proxies as sidecars to all microservices and thus manage all traffic between them. Naturally, this enhances Istio’s capabilities, plus it lets you gain a whole range of new features without disrupting your existing application stack.
Thanks to the addition of Envoy proxies, Istio is able to provide out-of-the-box support for distributed tracing integration. The proxies are deployed together, side by side with your own microservices, and are able to intercept all incoming and outgoing requests of the service that is coupled within the same pod. This automatically generates trace spans that enable you to gain deep and visual insights into the interaction across your different services and resources.
However, you should keep in mind that Istio requires the application code to be changed if you wish to propagate the trace context (i.e., HTTP headers) between incoming and outgoing requests. This additional step would enable various trace spans to be correlated and provide a complete view of the traffic flow.
To get started quickly, you can simply have traces generated automatically by leveraging Envoy proxies without modifying the application code. This would enable you to get some out-of-the-box basic tracing information and gain visibility over your systems without much effort. Changing the code to manually send a trace opens a whole new world of customization and options (e.g., propagating HTTP headers), but it’s also important to remember that even small changes in code have to be replicated to every piece of code in cases where a request is being forwarded to a downstream service.
Istio supports various popular tracing backends, such as Zipkin and Jaeger, and Epsagon. Therefore, there’s a lot of development support, via client libraries, to make the custom header propagation almost effortless.
Getting Started with Distributed Tracing
Step 1: Setting Up Your Environment
To install the latest version of Istio, you’ll need access to a Kubernetes Cluster, version 1.16 or later. You can either install Istio in any existing Kubernetes platform or easily deploy a Kubernetes cluster on your local machine using Docker Desktop or Minikube. Just keep in mind that whatever platform you are using might require additional setup. Throughout the rest of this getting started guide, we’ll use a local Kubernetes cluster.
Step 2: Download Istio
First, download and extract the latest release of Istio on your local machine by running the following command:
curl -L https://istio.io/downloadIstio | sh -
Next, move to the Istio package directory that contains sample applications and the istioctl client binary, currently with the latest 1.7.3 version, cd istio-1.7.3. Once in the directory, add the istioctl client to your path via the command:
Now, run the command istioctl x precheck to see if everything is properly installed and configured before proceeding with the installation of Istio.
Step 3: Installing Istio in the Kubernetes Cluster
With all the requirements met, you can install Istio on your Kubernetes cluster by running the command:
istioctl install --set profile=demo
This will install Istio using the demo configuration profile, which comes with proper testing defaults.
Once complete, it’s time to instruct Istio to automatically inject Envoy sidecar proxies when deploying applications. You can do this using the command:
kubectl label namespace default istio-injection=enabled.
Step 4: Deploying the Sample Application
Now that Istio is up and running, it’s time to deploy a sample application to demonstrate distributed tracing. In this case, you’ll be using the Bookinfo sample provided by Istio.
Deploy the application by running the command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
Envoy’s proxy will thus be deployed along with each pod as soon as it’s ready.
Next, run the command kubectl get pods and wait until all pods are running and ready, which can take a few minutes.
Once the Bookinfo application is deployed, you need to make sure it’s accessible from the outside. In order to do this, run the command:
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
This creates an Istio Ingress Gateway to make your application accessible.
Step 5: Installing Jaeger
Istio provides a sample installation to quickly deploy Jaeger in your cluster. You can do this using the command:
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.7/samples/addons/jaeger.yaml
Just keep in mind that this Jaeger configuration is meant for demonstration use only; it isn’t tuned with the proper performance or security settings.
Step 6: Generating Traces
With Bookinfo up and running, now you can generate trace information and send requests to your service. Access http://localhost/productpage a couple of times to generate trace information.
To access Jaeger’s dashboard, the command istioctl dashboard jaeger will open it automatically in your browser.
In Jaeger’s Dashboard:
- Select productpage.default from the Service drop-down menu.
- Click “Find Traces.”
- Click on the most recent trace for details on the latest request to the /productpage.
Here, you will have a detailed view of the latest trace, where you can check when the root span started and how long each service took. In this case, the root span is istio-ingressgateway, and the following spans correspond to Bookinfo services that were invoked during the execution of the /productpage request.
Implementing distributed tracing capabilities is crucial for a modern cloud-native microservice architecture, as they allow you to pinpoint failures or performance issues. And with the right set of tools, the effort to get started is quite minimal.
The Istio and Envoy combination offers multiple options for distributed tracing integration; plus, it significantly reduces the time and cost to deploy, scale, and manage distributed tracing.
At the end of the day, you need to leverage the proper tools and services to give you full observability, including distributed tracing capabilities. Choosing the right managed tool gives you reliable technologies, products, and services without your engineering team having to divert their valuable time and effort away from core business interests.