Apache Kafka is an open-source platform for distributed data streaming that provides highly reliable and fault-tolerant capabilities to process a large number of events using the publish-subscribe model. Kafka also provides the capability to store and process events per a given use case and requirements, plus it can run as a single node or scale up to a cluster of nodes. It is heavily implemented in the real-time analysis of streaming data and in big data ecosystems such as Spark and Hadoop; it is also widely used to provide durability for the in-memory computing of microservices.

Kafka exposes all of its metrics via Java Management Extensions (JMX). Although there are various open-source plugins available on the internet to fetch Kafka’s metrics, it’s important to understand how fetching metrics works in its simplest form. In this blog, the first of two parts, we’ll show you how to collect Kafka metrics using Jolokia and also cover metric collection and visualization based on Prometheus and Grafana. 

An Introduction to Apache Kafka Architecture

Apache Kafka has three main components: Producer, Consumer, and Broker. We’ll define these and a few others below before delving into Apache Zookeeper, the utmost-required component in every Kafka deployment:

  • Producer: Application that produces the event
  • Consumer: Application that consumes and processes the event
  • Broker: An instance of a Kafka cluster (just another name given to a Kafka server)
  • Kafka Cluster: A collection of more than one Kafka server
  • Topic: A unique user-defined name for a data stream where events are published
  • Partition: Divisions of a topic, which can be divided into multiple partitions in a multi-Broker environment, with each partition residing on a single machine
  • Offset: A unique identifier for records as they enter a partition
  • Consumer Group: Used for faster processing of Kafka events, with each consumer working in the group attached to a different partition

Apache ZooKeeper 

Apache Zookeeper is also an open-source tool used for the coordination and centralized configuration management of many distributed systems. The role of ZooKeeper in Kafka clusters is to:

  • Manage Broker instances
  • Elect a Controller to manage partitions
  • Manage Kafka clusters
  • Maintain topic configurations

Monitoring a Kafka Cluster and Its Components

Note: All commands need to be run as a sudo user.

Collecting Kafka Metrics Using the Jolokia Agent

Jolokia is a JMX-HTTP bridge, which works via an agent-based approach and can be implemented to perform JMB-based operations. Jolokia is used internally by popular Kafka monitoring tools such as Metricbeat from Elastic, but we’ll be using the Jolokia JVM agent, connecting it to the process ID of the Kafka service. Let’s see the steps to do this:

Step 1. Download the Jolokia JVM agent:

wget -O jolokia-jvm-1.6.2-agent.jar https://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.6.2/jolokia-jvm-1.6.2-agent.jar -P /opt/

Step 2. Get the Process ID of the Kafka Broker:

PID=$(ps aux | grep ${USER} | grep "kafka.Kafka" | grep -v grep | awk '{ print $2 }')

Step 3. Attach the Jolokia JVM agent to the Kafka Broker Process to capture Brokers metrics. This will start the Jolokia server on port 8778:

java -jar /opt/jolokia-jvm-1.6.2-agent.jar start $PID --host localhost

Step 4. Run another Jolokia Agent Process for Consumer metrics. This will start another Jolokia server on port 8774:

export KAFKA_OPTS=-javaagent:/opt/jolokia-jvm-1.6.2-agent.jar=port=8774,host=localhost /bin/kafka-console-consumer.sh --topic=test --bootstrap-server=localhost:9092

Step 5. Fetch a Kafka Broker metric using the Jolokia server:

curl http://localhost:8778/jolokia/read/kafka.controller:type=KafkaController,name=ActiveControllerCount

The response will look like the following:

{"request":{"mbean":"kafka.controller:name=ActiveControllerCount,type=KafkaController","type":"read"},"value":{"Value":1},"timestamp":1609125581,"status":200}

Step 6. Fetch a Kafka Broker metric using the Jolokia HTTP server:

curl http://localhost:8774/jolokia/read/kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=test,partition=*/records-lag

The response will look like the following:

{"request":{"mbean":"kafka.consumer:client-id=*,partition=*,topic=test,type=consumer-fetch-manager-metrics","attribute":"records-lag","type":"read"},"value":{"kafka.consumer:client-id=consumer-1,partition=0,topic=test,type=consumer-fetch-manager-metrics":{"records-lag":0.0}},"timestamp":1609125503,"status":200}

Collecting Kafka Metrics Using Prometheus, Telegraf, and Grafana

Prometheus is a monitoring system with a time-series database, while Telegraf is a server agent for collecting metrics from systems. Grafana, meanwhile, connects with data sources to provide visualization for collected data. We’ll show you how to implement these tools in the steps below. 

Note: We assume that you have Prometheus, Telegraf, and Grafana already installed and running on your system.

Step 1: Download and extract Kafka: 

wget https://archive.apache.org/dist/kafka/1.1.0/kafka_2.11-1.1.0.tgz -P /tmp/ && tar -zxvf /tmp/kafka_2.11-1.1.0.tgz -C /opt/

Step 2: Make a symlink to the Kafka directory:

ln -s /opt/kafka_2.11-1.1.0 /opt/kafka

Step 3: Create a directory to configure the Prometheus JMX exporter:

mkdir /opt/kafka/prometheus/

Step 4: Install and configure the JMX exporter:

wget -P /opt/kafka/prometheus/ https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar

wget -P /opt/kafka/prometheus/ https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/kafka-0-8-2.yml

Step 5: Edit /opt/kafka/prometheus/kafka-0-8-2.yml and append following snippets:

- pattern : kafka.producer<type=producer-metrics, client-id=(.+)><>(.+):\w* 

  name: kafka_producer_$2 

- pattern : kafka.consumer<type=consumer-metrics, client-id=(.+)><>(.+):\w* 

  name: kafka_consumer_$2 

- pattern : kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+)><>(.+):\w* 

  name: kafka_consumer_$2

Step 6: Start ZooKeeper:

/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties

Step 7: Start the Broker:

KAFKA_HEAP_OPTS="-Xmx1000M -Xms1000M" KAFKA_OPTS="-javaagent:/opt/kafka/prometheus/jmx_prometheus_javaagent-0.3.0.jar=8787:/opt/kafka/prometheus/kafka-0-8-2.yml" /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

Step 8: Start the Consumer:

KAFKA_OPTS="-javaagent:/opt/kafka/prometheus/jmx_prometheus_javaagent-0.3.0.jar=8788:/opt/kafka/prometheus/kafka-0-8-2.yml" /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 0.0.0.0:9092 --topic test --from-beginning

Step 9: Start the Producer:

KAFKA_OPTS="-javaagent:/opt/kafka/prometheus/jmx_prometheus_javaagent-0.3.0.jar=8789:/opt/kafka/prometheus/kafka-0-8-2.yml" /opt/kafka/bin/kafka-console-producer.sh --Broker-list 0.0.0.0:9092 --topic test

Step 10: Edit /etc/prometheus/prometheus.yml and append the following snippets:

 - job_name: 'kafka-server'

    static_configs:

      - targets: ['127.0.0.1:8787']




  - job_name: 'kafka-consumer'

    static_configs:

      - targets: ['127.0.0.1:8788']




  - job_name: 'kafka-producer'

    static_configs:

      - targets: ['127.0.0.1:8789']




  - job_name: 'telegraf'

    static_configs:

      - targets: ['127.0.0.1:9200']

Step 11: Edit /etc/telegraf/telegraf.conf:

Inside the telegraph conf file, search for outputs.influxdb and comment out that line so that Telegraf does not look to send the outputs to the influxdb. Next, first comment out the influxdb output under outputs.prometheus_client by inserting the following lines: 

listen = ":9200"

collectors_exclude = ["gocollector", "process"]

Then, restart the Telegraf service for the changes to be applied. Before restarting, please ensure that the 9200 port is not being used by any other process.

Accessing the Prometheus Web Interface

Prometheus’ web interface is accessible by default on port 9090, http://localhost:9090/graph, and can be used to explore data stored inside Prometheus, as seen below:

 

Image 1: Kafka metrics on Prometheus

 

Once the data is available, you can either use Prometheus to explore the data using graphs or connect Grafana to Prometheus as a data source for creating visualizations. We’ll be covering the metrics details and Grafana-visualization examples in the next part of this series.

In the meantime, here are some examples of graphs generated on the Prometheus dashboard:

 

Image 2: Kafka rate of consumer bytes consumed

 

Image 3: Kafka consumer fetch rate

 

Summary

In this post, after a basic overview of the Kafka ecosystem, we went through ways to export Kafka metrics. Although there are various tools and stacks available for capturing these metrics, the basic methodology remains the same—fetching metrics via JVM agents that can capture metrics exposed by JMX. We first covered how to use the Jolokia JVM agent to fetch Kafka metrics and then looked at how to configure Prometheus, Telegraph, and Grafana for Kafka monitoring. 

In Part 2 of this blog, we will go through various important Kafka metrics one by one.

 

Read More:

Monitoring Managed Cloud Services with Distributed Tracing

Slack Outage: Third-Party Dependencies and Accountability