Splunk your Kafka with SQL

Explore & visualise real time data in Kafka with SQL through a custom SPL command in Splunk

By Guillaume Aymé

May 20, 2020

Here at Lenses.io, we’re focused on making data technologies such as Apache Kafka and Kubernetes as accessible to every organization as possible. It’s part of our DataOps vision and company DNA.

Lenses is built by developers, for developers. We understand the headaches they live with and the challenges they face seemingly have to learn a new data technology every few months. We believe that’s just not the right model. Developers want to build great applications, versus tooling or management interfaces. That’s where we come in...

Lenses provides a secure workspace that serves as an access layer into Kafka and Kubernetes. Using Lenses, developers gain an amazing access-controlled UI workspace to explore, move and process real-time data using SQL. We protect Kafka with a killer unified security model backed by namespaces and data policies (data anonymization). This means having granular access controls, auditing and not having to manage ACLs.

But we also provide python libraries and Go clients. This allows devs and ops teams to build integrations into Kafka through Lenses.io the way that suits them. They get access to the data in Kafka in a safe fashion whilst using the tools that they want. We believe that force-fitting developers into a limited set of custom tools isn’t cool at all. It’s actually no fun, we’ve been there.

And Splunk is a perfect example of such a tool. It’s become a standard for logs and machine-data and used by many IT and Security shops to search and analyze this data. We have lots of friends who are Splunk developers … and we have a great new Splunk TA (Splunk Technical AddOn) to share.

The purpose? It’s built to provide a custom SPL (Splunk’s proprietary query language) command (lensesiosearch) to query data live in a Kafka topic via Lenses by passing a SQL command. Lenses.io has it's own SQL engine that will do the hard work of extracting the data out of a Kafka topic whatever the serialisation (AVRO, Protobuf, JSON etc.). Actually, it also queries off connected Elasticsearch indexes too.

So what? You might say.

Query data in Apache Kafka in Splunk with Lenses.io and SQL

Firstly, this with no Splunk indexing costs. You can do this all using the free version of Splunk. Secondly, because you pass SQL, it makes it very easy for people who might not understand the data well or who aren't experts in Kafka. Thirdly, you can protect the data access with a namespace-baked security model and data masking.

The command is a generating command and is called like this:

Visualise data in Splunk from Apache Kafka topic with Lenses.io

So what are the usecases where this might be of value. Well, there are a few principle ones.

1. Security analyst needs to investigate a security incident by looking at business events

Cybercrime sucks and it’s on the rise with more threats and more sophistication. It’s getting worse during COVID-19. Security teams need data to do their jobs accurately and effectively. As adversaries try to disrupt business processes and critical applications they leave traces, investigators use these traces, bits and bytes of data, to connect the dots and detect and remediate. Oftentimes, these applications are connected to Kafka. So when it comes to an incident investigation, what a SOC and incident response team really want is access to business events (as well as logs of course). A large number of organizations use Splunk as their SIEM and investigation platform and they don’t want to use different tools when it comes to incident investigations.

2. SRE & Ops want to win back years of their lives lost with access to the “right” data for troubleshooting

The life of an SRE or Sys Admin can be pretty tough. Finding and fixing problems is really hard when the data is incomplete. These roles already use Splunk for observability and investigation through the logs, metrics, alerts and traces. But sometimes troubleshooting a microservice requires understanding the business events generated by the application. For example, a microservice isn’t reacting correctly to an event it should have received. Or an application is crashing whilst consuming an event off Kafka. Maybe the data is corrupted? Maybe it hasn’t arrived in the topic? Maybe it’s a security incident? Maybe you want to go home at a reasonable hour for once? You could index this data in Splunk but it’s often only needed for a single investigation. A bit overkill and expensive considering the size of the data and it doesn’t help that your Splunk license has been violated a couple of times this month. So best to query the data directly off Kafka and save yourself the hassle. Right?

3. That business report that knocks your manager’s socks off

Log and machine data are a great source of information for business analytics. Whereas a business event might represent a transaction: Joe bought a leather belt at 30 bucks on Monday. This data tells a bigger story. “38368 reacted to a marketing campaign last week and came back to the site from an iPad on Monday to browse pairs of Crocs but they were out of stock so ended up selecting a recommended alternative of a product item T7342834”.

See the problem? Apart from them wanting Crocs, T7342834 and 38368 are IDs. Not designed to be read by humans. Splunk is great at visualising this and delivering it to a manager as a Dashboard. But this data needs to be enriched to map to things humans can understand. Chances are, those customer and product mappings are sitting in a KTable in Kafka. Query the machine data and then use the lookup, join or append functions in Splunk to enrich it on-the-fly through a query to a Kafka topic. Voila. And guess what? Again, no impact on your Splunk license, your boss is happy and you get a huge raise. Well, maybe.

Ok. So how do I get started with this?

To start exploring your data in Kafka with Lenses.io and SQL in a few minutes, all you need to do is install the Lenses.io AddOn on your Splunk Search Head and configure it to connect to Lenses with the Lenses URL and Security Token.

Lenses.io points to your Kafka. But if you want to test it out without your own Kafka, use our free Lenses.io Cloud Workspace which includes an instance of Lenses.io and Kafka for development purposes.

Splunk AddOn for Lenses.io Configuration Settings

For the Splunk App, checkout and download the AddOn on Github. And see the full readme on how to installation it.

It’s an early release with room to improve, so please provide feedback by pinging me (Guillaume Ayme) on our community Slack channel: https://launchpass.com/lensesio