OpenTelemetry is the observability solution supported by CNCF. It follows the same standards as OpenCensus and OpenTracing, meaning you can more easily prevent vendor lock-in since it decouples your application instrumentation and data export. With OpenTelemetry, you can achieve full application-level observability from its SDKs, agents, libraries, and standards.
For more information about OpenTelemetry, please refer to our posts ”Introduction to OpenTelemetry,” and ”OpenTelemetry Best Practices.” Here in this post, we’ll guide you through setting it up, implementing traces with OpenTelemetry, and using Grafana, Prometheus, and Jaeger; we’ll then introduce you to AWS Distro.
How to Set Up OpenTelemetry
To get started with OpenTelemetry, you need to meet the following prerequisites:
- Have a Kubernetes cluster up and running, or use the following link to set up your Kubernetes cluster using kubeadm. A sample setup should have one master node (control plane) and two worker nodes. To list all the nodes in your Kubernetes cluster run the command below:
kubectl get nodes NAME STATUS ROLES AGE VERSION istio-cluster-control-plane Ready master 16m v1.19.1 istio-cluster-worker Ready <none> 16m v1.19.1 istio-cluster-worker2 Ready <none> 16m v1.19.1
- Istio should also be up and running; if not, click here for instructions. Once the Istio cluster is set up, use the following command to check the status of all resources:
kubectl get all -n istio-system NAME READY STATUS RESTARTS AGE pod/istio-egressgateway-c9c55457b-zzf55 1/1 Running 0 15m pod/istio-ingressgateway-865d46c7f5-ddpnk 1/1 Running 0 15m pod/istiod-7f785478df-2c6rx 1/1 Running 0 16m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/istio-egressgateway ClusterIP 10.96.178.69 <none> 80/TCP,443/TCP,15443/TCP 15m service/istio-ingressgateway LoadBalancer 10.96.60.62 172.19.0.200 15021:32028/TCP,80:31341/TCP,443:31306/TCP,31400:30297/TCP,15443:32577/TCP 15m service/istiod ClusterIP 10.96.9.127 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 16m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/istio-egressgateway 1/1 1 1 15m deployment.apps/istio-ingressgateway 1/1 1 1 15m deployment.apps/istiod 1/1 1 1 16m NAME DESIRED CURRENT READY AGE replicaset.apps/istio-egressgateway-c9c55457b 1 1 1 15m replicaset.apps/istio-ingressgateway-865d46c7f5 1 1 1 15m replicaset.apps/istiod-7f785478df 1 1 1 16m
- The sample Bookinfo application should already be deployed. To verify the status of the application, please run the command below; it will list all the resources deployed for the Bookinfo application:
kubectl get all NAME READY STATUS RESTARTS AGE pod/details-v1-79f774bdb9-7zzwm 2/2 Running 0 14m pod/productpage-v1-6b746f74dc-z7z8m 2/2 Running 0 14m pod/ratings-v1-b6994bb9-bt9tb 2/2 Running 0 14m pod/reviews-v1-545db77b95-kbjbg 2/2 Running 0 14m pod/reviews-v2-7bf8c9648f-ddw5d 2/2 Running 0 14m pod/reviews-v3-84779c7bbc-27vz6 2/2 Running 0 14m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/details ClusterIP 10.96.48.2 <none> 9080/TCP 14m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23m service/productpage ClusterIP 10.96.62.75 <none> 9080/TCP 14m service/ratings ClusterIP 10.96.195.114 <none> 9080/TCP 14m service/reviews ClusterIP 10.96.4.60 <none> 9080/TCP 14m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/details-v1 1/1 1 1 14m deployment.apps/productpage-v1 1/1 1 1 14m deployment.apps/ratings-v1 1/1 1 1 14m deployment.apps/reviews-v1 1/1 1 1 14m deployment.apps/reviews-v2 1/1 1 1 14m deployment.apps/reviews-v3 1/1 1 1 14m NAME DESIRED CURRENT READY AGE replicaset.apps/details-v1-79f774bdb9 1 1 1 14m replicaset.apps/productpage-v1-6b746f74dc 1 1 1 14m replicaset.apps/ratings-v1-b6994bb9 1 1 1 14m replicaset.apps/reviews-v1-545db77b95 1 1 1 14m replicaset.apps/reviews-v2-7bf8c9648f 1 1 1 14m replicaset.apps/reviews-v3-84779c7bbc 1 1 1 14m
Now, verify the application from the browser by going to http://<ingress gateway external ip>/productpage.
Your setup is now ready. In the next section, we’ll explore how to send these traces to Grafana and Jaeger.
Sending Traces with OpenTelemetry
In the last section, you got Istio up and running, but Istio can integrate with a bunch of other telemetry applications to provide additional functionality.
Prometheus, Grafana, and Jaeger are three such applications. Let’s explore each of these, one by one:
Prometheus is an open-source monitoring system that provides a time-series database for metrics. Using Prometheus, you can record metrics, track the health of your application within a service mesh, then use Grafana to visualize those metrics.
Istio provides a sample add-on to deploy Prometheus, so proceed to the directory where you have Istio downloaded: cd istio-1.9.0
Deploy Prometheus by using the following command; the output follows:
kubectl apply -f samples/addons/prometheus.yaml serviceaccount/prometheus created configmap/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created deployment.apps/prometheus created
Now, verify your Prometheus setup:
kubectl get all -n istio-system -l app=prometheus NAME READY STATUS RESTARTS AGE pod/prometheus-7bfddb8dbf-xqvdk 2/2 Running 0 2m28s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus ClusterIP 10.96.176.67 <none> 9090/TCP 2m28s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus 1/1 1 1 2m28s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-7bfddb8dbf 1 1 1 2m28s Next, access the Prometheus dashboard: istioctl dashboard prometheus&  1034524 http://localhost:9090
Now that you have Prometheus installed, you can visualize the metrics it collects by installing Grafana.
Grafana is an open-source monitoring solution that, when integrated with a time-series database like Prometheus, creates a custom dashboard and gives meaningful insights into your metrics. Using Grafana, you can monitor the health of your application with a service mesh.
Similar to Prometheus, Istio provides a sample add-on you can use to deploy Grafana. Simply go to the directory where you have Istio downloaded: cd istio-1.9.0.
Deploy Grafana via the following command; again, this is followed by the output code:
kubectl apply -f samples/addons/grafana.yaml serviceaccount/grafana created configmap/grafana created service/grafana created deployment.apps/grafana created configmap/istio-grafana-dashboards created configmap/istio-services-grafana-dashboards created
Go ahead and verify your Grafana setup:
kubectl get all -n istio-system -l app.kubernetes.io/instance=grafana NAME READY STATUS RESTARTS AGE pod/grafana-784c89f4cf-mxssg 1/1 Running 0 2m36s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/grafana ClusterIP 10.96.206.141 <none> 3000/TCP 2m36s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/grafana 1/1 1 1 2m36s NAME DESIRED CURRENT READY AGE replicaset.apps/grafana-784c89f4cf 1 1 1 2m36s
And access the Grafana dashboard:
istioctl dashboard grafana& http://localhost:3000
Once you have Grafana up and running, you will see that Grafana bundled up some pre-configured Istio dashboards.
If you click on “Istio Service Dashboard,” you will see a number of metrics. As there is no activity on this server, all the metrics show either a 0 or N/A.
Let’s try to generate some load to your cluster by running this script, which will access the sample app application page every second (an infinite while loop).
while :; do; curl -s -o /dev/null 172.19.0.200/productpage;done
If you go back to your Grafana dashboard, you’ll start seeing the loads you’ve generated and different metrics.
With Grafana up and running, let’s move on to tracing using Jaeger.
Jaeger is a distributed tracing system that is open source and uses the OpenTracing specification. It allows users to troubleshoot and monitor transactions in complex distributed systems.
Istio also provides a sample add-on to deploy Jaeger, just like with Prometheus and Grafana.
So, go to the directory where you have Istio downloaded: cd istio-1.9.0
And deploy Jaeger by using the following command; output follows:
kubectl apply -f samples/addons/jaeger.yaml deployment.apps/jaeger created service/tracing created service/zipkin created service/jaeger-collector created
Run the following command to verify your Jaeger setup:
kubectl get all -n istio-system -l app=jaeger NAME READY STATUS RESTARTS AGE pod/jaeger-7f78b6fb65-4n6dd 1/1 Running 0 2m10s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/jaeger-collector ClusterIP 10.96.255.86 <none> 14268/TCP,14250/TCP 2m9s service/tracing ClusterIP 10.96.30.136 <none> 80/TCP 2m10s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/jaeger 1/1 1 1 2m10s NAME DESIRED CURRENT READY AGE replicaset.apps/jaeger-7f78b6fb65 1 1 1 2m10s
Now, access the Jaeger dashboard:
istioctl dashboard jaeger& 1096336 http://localhost:16686
Jaeger is now running in the background and collecting data, so select “productpage.default” and click on “Find Traces” at the bottom of the drop-down menu.
The top visualization shows you the average response time of an end-to-end response for different periods.
Now that you understand how to send metrics to Grafana and Jaeger, let’s shift gears and look at AWS Distro for OpenTelemetry (ADOT).
AWS and OpenTelemetry
Using AWS Distro, you need to instrument your application only once and can then send correlated metrics and traces to multiple monitoring solutions, such as CloudWatch, X-Ray, Elasticsearch, and Partner solutions. With the help of auto-instrumentation agents, you can collect traces without needing to change your code, plus the solution gathers metadata from your AWS resources, which helps correlate application performance data with the underlying infrastructure data, resolving problems faster.
Currently, AWS Distro for OpenTelemetry supports instrumenting your application for on-premises as well as the following AWS services: Amazon Elastic Kubernetes Service (EKS) on EC2, AWS Fargate, Elastic Compute Cloud (EC2), and AWS Fargate.
AWS Distro for OpenTelemetry consists of the following components:
- The OpenTelemetry SDK allows for the collection of metadata for AWS-specific resources, such as Task and Pod ID, Container ID, and Lambda function version. It can also correlate trace and metrics data from both CloudWatch and AWS X-Ray.
- The OpenTelemetry Collector is responsible for sending data to AWS services like AWS CloudWatch, Amazon Managed Service for Prometheus, and AWS X-Ray.
AWS also supports an OpenTelemetry Java auto-instrumentation agent for tracing data from AWS SDKs and X-Ray. For all these components, AWS also contributes back to the upstream project.
Serverless and OpenTelemetry
AWS Distro for OpenTelemetry currently only supports Python based on Lambda Extensions. First, you need to build your Lambda layer containing the OpenTelemetry SDK and Collector, which you can then add to your Lambda function. Once this is done, AWS takes care of auto-instrumentation and initializes the instrumentation of dependencies, HTTP clients, and AWS SDKs. It also captures resource-specific information, such as Lambda function name, Amazon resource name (ARN), version, and request-ID.
There are a couple of installations required before building the Lambda layer:
- AWS SAM CLI: Refer to the following doc to install per your given platform.
- AWS CLI: Refer to the following doc to install per your given platform; this is needed to configure AWS credentials and requires administrator access.
Note: Currently Lambda layer only supports Python 3.8 Lambda runtimes.
Building the Lambda Layer
Once you meet all the prerequisites, the next step is to build the Lambda layer. Here, you’ll have the AWS Distro for OpenTelemetry Collector (ADOT Collector), run as a Lambda extension; your Python function will also use this layer.
For this example, you’ll use the aws-otel-lambda repository.
First, clone the repo:
git clone https://github.com/aws-observability/aws-otel-lambda.git
Then go to the sample-apps directory:
To Publish the layer, run the command below:
./run.sh running... Invoked with: sam building... SAM CLI now collects telemetry to better understand customer needs. You can OPT OUT and disable telemetry collection by setting the environment variable SAM_CLI_TELEMETRY=0 in your shell. Thanks for your help! --------------------------Output Cut ------------------------------- Successfully created/updated stack - adot-py38-sample in us-west-2 ADOT Python3.8 Lambda layer ARN: arn:aws:lambda:us-west-2:XXXXXXX:layer:aws-distro-for-opentelemetry-python-38-preview:1
If you want to publish the layer in a different region, e.g., to us-east-2, run the run.sh command with the -r parameter:
./run.sh -r us-east-2
Auto-Instrumentation for Your Lambda Function
Once you push the Lambda layer, you need to follow a series of steps to enable auto-instrumentation.
First, go to the Lambda console and select the function you want to instrument. Scroll down and click on “Add a layer.”
Select “Custom layers,” and from the drop-down, choose the layer you created earlier and Version 1. Click on “Add.”
Now, go back to your Lambda function and click on “Configuration,” then “Environment variables.” Select “Edit” and “Add environment variable.”
Add AWS_LAMBDA_EXEC_WRAPPER with value /opt/python/adot-instrument. This will enable auto-instrumentation. Click on “Save.”
Also, make sure that “Active tracing” is enabled under “Monitoring and operations tools.”
By default, AWS Distro for OpenTelemetry exports telemetry data to AWS X-Ray and CloudWatch. For the latter, go to the CloudWatch console and click on “Traces.”
To retrieve information about specific traces, click any of the Lambda functions and then trace ID.
And to drill down even further, go to the X-Ray console and click on “Analytics.”
OpenTelemetry is still an evolving project, and with the launch of products like AWS Distro for OpenTelemetry, fully backed by AWS, it’s heading toward stability. Currently, AWS Distro for OpenTelemetry only supports Python for Lambda, but other languages (Node.js, Java, Go, .NET) will be coming soon. Also, you need to create your own Lambda layer manually in the current state, but in the future, AWS will automate and manage this process.
Epsagon is tightly integrated with AWS and provides full visibility into how your serverless application is performing. Onboarding your new or existing application is straightforward and doesn’t require any complex configuration. It also provides a visualization dashboard that helps detect bottlenecks and overall system health, predicts the overall cost, and offers other helpful insights based on collected data and metrics. Another advantage is that Epsagon correlates all the aggregated data, which is vital in distributed architectures using AWS Lambda and other serverless services. Plus, Epsagon includes auto-instrumentation for languages like Python, Go, Java, Ruby, Node.js, PHP, and .NET, reducing the time it takes to instrument tracing.