OpenTelemetry is the observability solution supported by CNCF. It follows the same standards as OpenCensus and OpenTracing, meaning you can more easily prevent vendor lock-in since it decouples your application instrumentation and data export. With OpenTelemetry, you can achieve full application-level observability from its SDKs, agents, libraries, and standards. 

For more information about OpenTelemetry, please refer to our posts ”Introduction to OpenTelemetry,” and ”OpenTelemetry Best Practices.” Here in this post, we’ll guide you through setting it up, implementing traces with OpenTelemetry, and using Grafana, Prometheus, and Jaeger; we’ll then introduce you to AWS Distro. 

How to Set Up OpenTelemetry

To get started with OpenTelemetry, you need to meet the following prerequisites:

  • Have a Kubernetes cluster up and running, or use the following link to set up your Kubernetes cluster using kubeadm. A sample setup should have one master node (control plane) and two worker nodes. To list all the nodes in your Kubernetes cluster run the command below:
kubectl get nodes             

NAME                          STATUS   ROLES    AGE   VERSION

istio-cluster-control-plane   Ready    master   16m   v1.19.1

istio-cluster-worker          Ready    <none>   16m   v1.19.1

istio-cluster-worker2         Ready    <none>   16m   v1.19.1

 

  • Istio should also be up and running; if not, click here for instructions. Once the Istio cluster is set up, use the following command to check the status of all resources: 
kubectl get all -n istio-system

NAME                                        READY   STATUS    RESTARTS   AGE

pod/istio-egressgateway-c9c55457b-zzf55     1/1     Running   0          15m

pod/istio-ingressgateway-865d46c7f5-ddpnk   1/1     Running   0          15m

pod/istiod-7f785478df-2c6rx                 1/1     Running   0          16m




NAME                           TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                                                                      AGE

service/istio-egressgateway    ClusterIP      10.96.178.69   <none>         80/TCP,443/TCP,15443/TCP                                                     15m

service/istio-ingressgateway   LoadBalancer   10.96.60.62    172.19.0.200   15021:32028/TCP,80:31341/TCP,443:31306/TCP,31400:30297/TCP,15443:32577/TCP   15m

service/istiod                 ClusterIP      10.96.9.127    <none>         15010/TCP,15012/TCP,443/TCP,15014/TCP                                        16m




NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/istio-egressgateway    1/1     1            1           15m

deployment.apps/istio-ingressgateway   1/1     1            1           15m

deployment.apps/istiod                 1/1     1            1           16m




NAME                                              DESIRED   CURRENT   READY   AGE

replicaset.apps/istio-egressgateway-c9c55457b     1         1         1       15m

replicaset.apps/istio-ingressgateway-865d46c7f5   1         1         1       15m

replicaset.apps/istiod-7f785478df                 1         1         1       16m

 

  • The sample Bookinfo application should already be deployed. To verify the status of the application, please run the command below; it will list all the resources deployed for the Bookinfo application:
kubectl get all                

NAME                                  READY   STATUS    RESTARTS   AGE

pod/details-v1-79f774bdb9-7zzwm       2/2     Running   0          14m

pod/productpage-v1-6b746f74dc-z7z8m   2/2     Running   0          14m

pod/ratings-v1-b6994bb9-bt9tb         2/2     Running   0          14m

pod/reviews-v1-545db77b95-kbjbg       2/2     Running   0          14m

pod/reviews-v2-7bf8c9648f-ddw5d       2/2     Running   0          14m

pod/reviews-v3-84779c7bbc-27vz6       2/2     Running   0          14m




NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE

service/details       ClusterIP   10.96.48.2      <none>        9080/TCP   14m

service/kubernetes    ClusterIP   10.96.0.1       <none>        443/TCP    23m

service/productpage   ClusterIP   10.96.62.75     <none>        9080/TCP   14m

service/ratings       ClusterIP   10.96.195.114   <none>        9080/TCP   14m

service/reviews       ClusterIP   10.96.4.60      <none>        9080/TCP   14m




NAME                             READY   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/details-v1       1/1     1            1           14m

deployment.apps/productpage-v1   1/1     1            1           14m

deployment.apps/ratings-v1       1/1     1            1           14m

deployment.apps/reviews-v1       1/1     1            1           14m

deployment.apps/reviews-v2       1/1     1            1           14m

deployment.apps/reviews-v3       1/1     1            1           14m




NAME                                        DESIRED   CURRENT   READY   AGE

replicaset.apps/details-v1-79f774bdb9       1         1         1       14m

replicaset.apps/productpage-v1-6b746f74dc   1         1         1       14m

replicaset.apps/ratings-v1-b6994bb9         1         1         1       14m

replicaset.apps/reviews-v1-545db77b95       1         1         1       14m

replicaset.apps/reviews-v2-7bf8c9648f       1         1         1       14m

replicaset.apps/reviews-v3-84779c7bbc       1         1         1       14m

 

Now, verify the application from the browser by going to http://<ingress gateway external ip>/productpage. 

 

Figure 1:  Sample BookInfo application

 

Your setup is now ready. In the next section, we’ll explore how to send these traces to Grafana and Jaeger.

Sending Traces with OpenTelemetry

In the last section, you got Istio up and running, but Istio can integrate with a bunch of other telemetry applications to provide additional functionality. 

Prometheus, Grafana, and Jaeger are three such applications. Let’s explore each of these, one by one:

Prometheus

Prometheus is an open-source monitoring system that provides a time-series database for metrics. Using Prometheus, you can record metrics, track the health of your application within a service mesh, then use Grafana to visualize those metrics. 

Istio provides a sample add-on to deploy Prometheus, so proceed to the directory where you have Istio downloaded: cd istio-1.9.0

Deploy Prometheus by using the following command; the output follows:

kubectl apply -f samples/addons/prometheus.yaml 

serviceaccount/prometheus created

configmap/prometheus created

clusterrole.rbac.authorization.k8s.io/prometheus created

clusterrolebinding.rbac.authorization.k8s.io/prometheus created

service/prometheus created

deployment.apps/prometheus created

 

Now, verify your Prometheus setup:

kubectl get all -n istio-system -l app=prometheus                        

NAME                              READY   STATUS    RESTARTS   AGE

pod/prometheus-7bfddb8dbf-xqvdk   2/2     Running   0          2m28s




NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

service/prometheus   ClusterIP   10.96.176.67   <none>        9090/TCP   2m28s




NAME                         READY   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/prometheus   1/1     1            1           2m28s




NAME                                    DESIRED   CURRENT   READY   AGE

replicaset.apps/prometheus-7bfddb8dbf   1         1         1       2m28s

Next, access the Prometheus dashboard:

istioctl dashboard prometheus&        

[1] 1034524

http://localhost:9090

Figure 2: Prometheus dashboard

Now that you have Prometheus installed, you can visualize the metrics it collects by installing Grafana. 

Grafana

Grafana is an open-source monitoring solution that, when integrated with a time-series database like Prometheus, creates a custom dashboard and gives meaningful insights into your metrics. Using Grafana, you can monitor the health of your application with a service mesh.

Similar to Prometheus, Istio provides a sample add-on you can use to deploy Grafana. Simply go to the directory where you have Istio downloaded: cd istio-1.9.0.

Deploy Grafana via the following command; again, this is followed by the output code:

kubectl apply -f samples/addons/grafana.yaml     

serviceaccount/grafana created

configmap/grafana created

service/grafana created

deployment.apps/grafana created

configmap/istio-grafana-dashboards created

configmap/istio-services-grafana-dashboards created

 

Go ahead and verify your Grafana setup:

kubectl get all -n istio-system -l app.kubernetes.io/instance=grafana           

NAME                           READY   STATUS    RESTARTS   AGE

pod/grafana-784c89f4cf-mxssg   1/1     Running   0          2m36s




NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE

service/grafana   ClusterIP   10.96.206.141   <none>        3000/TCP   2m36s




NAME                      READY   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/grafana   1/1     1            1           2m36s




NAME                                 DESIRED   CURRENT   READY   AGE

replicaset.apps/grafana-784c89f4cf   1         1         1       2m36s

 

And access the Grafana dashboard:

istioctl dashboard grafana&   

http://localhost:3000

 

Figure 3: Grafana dashboard

 

Once you have Grafana up and running, you will see that Grafana bundled up some pre-configured Istio dashboards.

 

Figure 4: Preconfigured Grafana dashboard

 

If you click on “Istio Service Dashboard,” you will see a number of metrics. As there is no activity on this server, all the metrics show either a 0 or N/A.

 

Figure 5: Grafana service dashboard

 

Let’s try to generate some load to your cluster by running this script, which will access the sample app application page every second (an infinite while loop).

while :; do; curl -s -o /dev/null 172.19.0.200/productpage;done

 

If you go back to your Grafana dashboard, you’ll start seeing the loads you’ve generated and different metrics.

 

Figure 6: Grafana service dashboard

 

With Grafana up and running, let’s move on to tracing using Jaeger.

Jaeger 

Jaeger is a distributed tracing system that is open source and uses the OpenTracing specification. It allows users to troubleshoot and monitor transactions in complex distributed systems. 

Istio also provides a sample add-on to deploy Jaeger, just like with Prometheus and Grafana.

So, go to the directory where you have Istio downloaded: cd istio-1.9.0

And deploy Jaeger by using the following command; output follows:

kubectl apply -f samples/addons/jaeger.yaml                          

deployment.apps/jaeger created

service/tracing created

service/zipkin created

service/jaeger-collector created

 

Run the following command to verify your Jaeger setup:

 

kubectl get all -n istio-system  -l app=jaeger                       

NAME                          READY   STATUS    RESTARTS   AGE

pod/jaeger-7f78b6fb65-4n6dd   1/1     Running   0          2m10s




NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)               AGE

service/jaeger-collector   ClusterIP   10.96.255.86   <none>        14268/TCP,14250/TCP   2m9s

service/tracing            ClusterIP   10.96.30.136   <none>        80/TCP                2m10s




NAME                     READY   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/jaeger   1/1     1            1           2m10s




NAME                                DESIRED   CURRENT   READY   AGE

replicaset.apps/jaeger-7f78b6fb65   1         1         1       2m10s

 

Now, access the Jaeger dashboard:

istioctl dashboard jaeger& 

1096336

http://localhost:16686

 

 

Figure 7: Jaeger dashboard

 

Jaeger is now running in the background and collecting data, so select “productpage.default” and click on “Find Traces” at the bottom of the drop-down menu.

 

Figure 8: Jaeger dashboard with productpage

 

The top visualization shows you the average response time of an end-to-end response for different periods. 

 

Figure 9: Jaeger Dashboard visualization for average response time

 

Now that you understand how to send metrics to Grafana and Jaeger, let’s shift gears and look at AWS Distro for OpenTelemetry (ADOT).

AWS and OpenTelemetry

AWS now offers AWS Distro for OpenTelemetry (still in preview phase). AWS is one of the upstream contributors to the OpenTelemetry project and tests, secures, optimizes, and supports various components of the project like SDKs, agents, collectors, and auto-instrumentations. The initial release supports languages including Python, Go, Java, and JavaScript; other languages will be included in upcoming releases. On top of that, you don’t have to pay to use AWS Distro for OpenTelemetry—just for the traces, logs, and metrics sent to AWS.

Using AWS Distro, you need to instrument your application only once and can then send correlated metrics and traces to multiple monitoring solutions, such as CloudWatch, X-Ray, Elasticsearch, and Partner solutions. With the help of auto-instrumentation agents, you can collect traces without needing to change your code, plus the solution gathers metadata from your AWS resources, which helps correlate application performance data with the underlying infrastructure data, resolving problems faster. 

Currently, AWS Distro for OpenTelemetry supports instrumenting your application for on-premises as well as the following AWS services: Amazon Elastic Kubernetes Service (EKS) on EC2, AWS Fargate, Elastic Compute Cloud (EC2), and AWS Fargate.

Various Components 

AWS Distro for OpenTelemetry consists of the following components:

  • The OpenTelemetry SDK allows for the collection of metadata for AWS-specific resources, such as Task and Pod ID, Container ID, and Lambda function version. It can also correlate trace and metrics data from both CloudWatch and AWS X-Ray. 
  • The OpenTelemetry Collector is responsible for sending data to AWS services like AWS CloudWatch, Amazon Managed Service for Prometheus, and AWS X-Ray.

AWS also supports an OpenTelemetry Java auto-instrumentation agent for tracing data from AWS SDKs and X-Ray. For all these components, AWS also contributes back to the upstream project.

Serverless and OpenTelemetry 

AWS Distro for OpenTelemetry currently only supports Python based on Lambda Extensions. First, you need to build your Lambda layer containing the OpenTelemetry SDK and Collector, which you can then add to your Lambda function. Once this is done, AWS takes care of auto-instrumentation and initializes the instrumentation of dependencies, HTTP clients, and AWS SDKs. It also captures resource-specific information, such as Lambda function name, Amazon resource name (ARN), version, and request-ID. 

Requirements

There are a couple of installations required before building the Lambda layer:

  • AWS SAM CLI: Refer to the following doc to install per your given platform. 
  • AWS CLI: Refer to the following doc to install per your given platform; this is needed to configure AWS credentials and requires administrator access.

Note: Currently Lambda layer only supports Python 3.8 Lambda runtimes.

Building the Lambda Layer

Once you meet all the prerequisites, the next step is to build the Lambda layer. Here, you’ll have the AWS Distro for OpenTelemetry Collector (ADOT Collector), run as a Lambda extension; your Python function will also use this layer. 

For this example, you’ll use the aws-otel-lambda repository

First, clone the repo:

git clone https://github.com/aws-observability/aws-otel-lambda.git

 

Then go to the sample-apps directory:

cd sample-apps/python-lambda

 

To Publish the layer, run the command below:


./run.sh
running...
Invoked with: 
sam building...

  SAM CLI now collects telemetry to better understand customer needs.

  You can OPT OUT and disable telemetry collection by setting the
  environment variable SAM_CLI_TELEMETRY=0 in your shell.
  Thanks for your help!
--------------------------Output Cut -------------------------------
Successfully created/updated stack - adot-py38-sample in us-west-2

ADOT Python3.8 Lambda layer ARN:
arn:aws:lambda:us-west-2:XXXXXXX:layer:aws-distro-for-opentelemetry-python-38-preview:1

 

If you want to publish the layer in a different region, e.g., to us-east-2, run the run.sh command with the -r parameter:

./run.sh -r us-east-2

 

Auto-Instrumentation for Your Lambda Function

Once you push the Lambda layer, you need to follow a series of steps to enable auto-instrumentation.

First, go to the Lambda console and select the function you want to instrument. Scroll down and click on “Add a layer.”

 

 

Figure 10: Lambda console for adding a layer

Select “Custom layers,” and from the drop-down, choose the layer you created earlier and Version 1. Click on “Add.”

 

Figure 11: Lambda console for adding a custom layer

 

Now, go back to your Lambda function and click on “Configuration,” then “Environment variables.” Select “Edit” and “Add environment variable.”

 

Figure 12: Lambda console for adding an environment variable

 

Add AWS_LAMBDA_EXEC_WRAPPER with value /opt/python/adot-instrument. This will enable auto-instrumentation. Click on “Save.”

 

Figure 13: Lambda console for adding environment variable AWS_LAMBDA_EXEC_WRAPPER

 

Also, make sure that “Active tracing” is enabled under “Monitoring and operations tools.”

 

Figure 14: Lambda console for enabling active tracing

 

By default, AWS Distro for OpenTelemetry exports telemetry data to AWS X-Ray and CloudWatch. For the latter, go to the CloudWatch console and click on “Traces.”

 

Figure 15: CloudWatch dashboard with traces

 

To retrieve information about specific traces, click any of the Lambda functions and then trace ID.

 

Figure 16: CloudWatch dashboard with specific trace

 

And to drill down even further, go to the X-Ray console and click on “Analytics.”

 

Figure 17: AWS X-Ray with specific trace

Wrapping Up

OpenTelemetry is still an evolving project, and with the launch of products like AWS Distro for OpenTelemetry, fully backed by AWS, it’s heading toward stability. Currently, AWS Distro for OpenTelemetry only supports Python for Lambda, but other languages (Node.js, Java, Go, .NET) will be coming soon. Also, you need to create your own Lambda layer manually in the current state, but in the future, AWS will automate and manage this process. 

Epsagon is tightly integrated with AWS and provides full visibility into how your serverless application is performing. Onboarding your new or existing application is straightforward and doesn’t require any complex configuration. It also provides a visualization dashboard that helps detect bottlenecks and overall system health, predicts the overall cost, and offers other helpful insights based on collected data and metrics. Another advantage is that Epsagon correlates all the aggregated data, which is vital in distributed architectures using AWS Lambda and other serverless services. Plus, Epsagon includes auto-instrumentation for languages like Python, Go, Java, Ruby, Node.js, PHP, and .NET, reducing the time it takes to instrument tracing. 

Check out our demo environment or try for FREE for up to 10 Million traces per month!

 

Prometheus and Grafana: The Perfect Combo

Implementing Distributed Tracing with Istio and Envoy