Distributed tracing with Kafka message headers

Implement distributed tracing with OpenTracing and view trace IDs in Apache Kafka message headers.

By Guillaume Aymé

Dec 11, 2020

Distributed tracing with Kafka message headers

Apache Kafka 0.11 introduced headers to messages in Kafka.

Pretty handy. Since Kafka is at the heart of business services, and services become more distributed and complex, headers make managing these environments much easier.

Kafka Headers act in much the same way as headers for HTTP. They add metadata to your message on top of the payload and key that you get with a Kafka message. They’re useful for annotating, auditing, monitoring and routing of events as they flow through Kafka.

This helps support a number of different use cases including:

Tracing data lineage
Adding business metadata to governance
Monitoring & observability

In Lenses 4.1, we’ve introduced querying headers alongside your Kafka message and key with SQL. Querying header and payload looks like this:

Having access to metadata in your Kafka headers can drastically accelerate time investigate issues such as for an IT incident or a compliance audit. An operator will have more business and technical context as they explore their Kafka events.

Kafka Headers for Observability

One of the biggest use cases is observability.

It’s important to be able to view transactions and data flows as they traverse different applications and APIs connected to Kafka.

APM solutions such as NewRelic and Dynatrace have taken advantage of Kafka headers to include IDs in Kafka messages, thus enabling distributed tracing.

Jaeger too. The OpenTracing Kafka Client allows you to trace spans across Kafka clients.

Then using Lenses to query headers can be particularly handy during an investigation.

For example, identifying a corrupted message (in the business sense as much as technical) in Kafka and needing to view the trace in Jaeger. Or vice versa, identifying an error trace in Jaeger and needing to find the business event in Kafka.

The same might apply for governance. You need governance to identify the lineage and provenance of the event.

Here’s a quick example for how to use the OpenTracing Kafka client to instrument your Kafka messages by injecting a trace ID into Kafka message headers. The walkthrough will involve:

Launching a Lenses Box “All-in-one” Kafka docker environment
Running the Jaeger “all-in-one” docker
Building a basic Kafka producer
Identify a trace in Lenses

Deploy Lenses

A Lenses Box contains a single broker Kafka with Lenses.io, Kafka Connect, Schema Registry and other services.

1. Get your docker command and license key from lenses.io/box. Or alternatively request a Trial to use Lenses against your own Kafka.

2. Run the docker

Ensure you have at least 4GB RAM.

We’ll be connecting to an external application so adjust the ADV_HOST accordingly if you’re not running it on a localhost. Also ensure ports 3030 (HTTP for the Lenses UI) and 9092 (for the Kafka client) are available.

3. Access the Lenses UI from http://<host>:3030. The environment can take a few minutes to be fully available.

Deploy Jaeger

The Jaeger client will allow us to collect and visualize traces.

1. Execute the following docker run command:

2. Access the Jaeger UI from http:<host>:16686

Build a basic Kafka producer

We’re going to develop a very basic Kafka client producing a single string message event to a “traces” topic.

If you’re using Maven, you'll find the project on Github.

Run the application with two arguments: the broker hostname:port and the second being the message to be published to the traces topic.

Identify the trace

1. From Lenses, access the Explore page and find the traces topic from the data catalog. The topic was automatically created by the producer.

2. If you’ve generated very few events, Lenses won’t have enough data to understand the serialization so you may need to hard-set the serialisation type for the Key and Value to String:String.