Mihalis Tsoukalos
Mihalis Tsoukalos
This tutorial will walk you through visualizing live data in a Kafka topic using the D3.js JavaScript library.
This technique might be used to democratise more data in Kafka across your organization, including to engineering and operations teams. It will also allow you to simplify how to integrate data from Kafka into your business applications.
But rather than querying the data directly against Kafka, we will query via Lenses.io. Lenses provides a governance portal for Apache Kafka.
Querying data into Kafka through Lenses has a few benefits:
Use of SQL: Lenses.io allows us to use SQL to query, filter and aggregate the data before visualizing it. This will reduce workload on the client side.
Deserialized payloads: Payload data in Kafka will also deserialize by Lenses whatever the format (AVRO, Protobuf, XML even proprietary formats via Serde).
Secure data access via namespaces and security tokens: Data access will be protected by role based access controls based on namespaces and access with service account tokens. This avoids you having to configure & manage ACLs in Kafka.
Last, note that if you want to visualize JSON data stored in a file, you can have a look at the Using D3 to visualize data blog post.
In order to be able to follow the steps of this tutorial, you will need the following:
Lenses up and running. You can use our free Lenses Box instance if necessary.
An Internet connection that will help you get D3.js and create the sample project.
We are going to read live data from a Kafka topic using Lenses API and a service account and visualize it using D3.js. The Kafka topic that will be used is called nyc_yellow_taxi_trip_data
.
This section will present the steps needed for implementing the described scenario beginning from Lenses.
The nyc_yellow_taxi_trip_data
Kafka topic contains records with the following kind of format (represented as JSON records):
The query that is going to be executed in the JavaScript code is the following:
Each record returned by the previous query contains 4 fields as defined in the SELECT
statement.
You will need to perform two steps in order to create a service account in Lenses:
Create a new Group that matches the requirements of that service account.
Create the service account and keep its token.
Note that the name of the service account should be unique in a Lenses installation.
You can find more information about creating a Lenses Service Account in this blog post. The name of the service account used in this tutorial will be service
.
In order to create the JavaScript project, you will need to execute some commands.
First, you should execute git clone https://github.com/wbkd/webpack-starter.git
inside the root directory of your project.
Then, you will then need to make changes to the following files:
package.json
./src/index.html
./src/scripts/index.js
After that you will need to execute one of the following two commands depending on the JavaScript package manager that you are using:
npm install
if you are using the npm JavaScript package manager.
yarn
if you are using the yarn JavaScript package manager.
If you decide to use our GitHub repository, you will need to execute the following command for creating the project:
git clone git@github.com:mactsouk/d3-service.git
You might need to make changes to ./src/scripts/index.js
to match your Lenses installation or define your own SQL query.
The last thing you should do is execute npm run start
to begin your application. If you are using yarn, execute yarn start
instead.
The following JavaScript code is used for getting the data from Lenses.
In order to get data from Lenses, you will need to create a WebSocket connection. As we are using a service account, there is no need to login to Lenses using Lenses REST API and get the authentication token.
In this section you will learn more about the format of the JSON records read from Lenses using JavaScript. The records returned by the SQL query have the following format:
Once you know the format of the data, you can easily choose the fields that interest you and are going to be included in the visualization process.
The returned data will be processed in the JavaScript script using the following code:
This means that the sum
field will contain the amount of money paid per trip.
This section will show the core JavaScript code used for visualizing the data. As mentioned before, we are going to use the D3.js library.
The JavaScript code that draws the dots and generates the tooltips is the following:
The radius of each circle depends on the number of passengers – the more passengers the taxi had, the bigger the circle.
In this section we are going to see the generated visualization. Note that each time you load the HTML file, you will get a slightly different output as we are working with dynamic data.
This section will present the main files of the project.
package.json
fileThe contents of package.json
are the following:
The dependencies
block is where you define the JavaScript packages that need to be downloaded for the project to execute.
./src/index.html
fileThe contents of ./src/index.html
are the following:
The embedded CSS code is required for the correct rendering of the HTML output.
./src/scripts/index.js
fileThe ./src/scripts/index.js
file is the most important file of the project:
This file contains all the JavaScript code. Its flow goes like this:
Create the WebSocket connection as defined in the firstMessage
object.
The message is sent using the webSocketRequest.send(JSON.stringify(firstMessage))
call.
The onmessage()
method is executed each time we receive new data from the WebSocket connection - this happens because we are expecting to get multiple JSON records back.
This streamEvent
is an object that has a data
attribute. That data
attribute has a type
property, which can be RECORD
, END
or ERROR
.
When we are dealing with a new RECORD
, we add it to the existing records.
When there is an ERROR
, we print it in the console.
When we are dealing with END
, we terminate the connection.
After the connection is terminated and all data has been read, websocketSubject.pipe(finalize(())
will return the data that is stored in the websocketData
variable.
The websocketData
variable is used by D3.js to visualize the desired data.
.eslintrc
fileThe contents of .eslintrc
are the following:
Now that you know how to visualize live data from Kafka topics through Lenses and D3.js, you should begin visualizing data from your own Kafka topics.