SQL for data exploration in a multi-Kafka world

SQL is an easy, powerful way to process and analyze data streams, but how do you do this with hundreds of Kafka clusters?

By Guillaume Aymé

Oct 20, 2024

SQL for data exploration in a multi-Kafka world

Every enterprise is modernizing their business systems and applications to respond to real-time data. Within the next few years, we predict that most of an enterprise's data products will be built using a streaming fabric – a rich tapestry of real-time data, abstracted from the infrastructure it runs on.

This streaming fabric spans not just one Apache Kafka cluster, but dozens, hundreds, maybe even thousands of them. For various reasons (compliance, workload isolation, acquisitions), there may be different Kafka deployments and vendors running on a pick-and-mix of on-premise, in the cloud, or on the edge.

How do engineering teams maintain a democratic, data mesh approach that makes it easy to find and explore streaming data? To create innovative products that businesses can count on, when clusters of streaming data live in different silos?

Enter Lenses 6 Panoptes and its enhanced SQL for Kafka, our answer that allows engineers to autonomously work with all data streams at once, across a heterogenous Apache Kafka estate.

The evolution of Lenses SQL for Kafka

Since 2018, Lenses has been laying the groundwork for more developers to do data streaming. We recognized early on that using a language every developer knows, like SQL, to interact with streaming data could make this happen. Now, six years later, as enterprises contend with increasingly distributed Kafka environments, there are not tens of engineers working with Kafka, but often thousands within the same organisation. These SQL capabilities are more relevant than ever.

Lenses 6 introduces a power-up on this feature: Global SQL Studio. But before we dive into what this means, let's revisit the foundational SQL engines that power Lenses.

Two SQL engines, one syntax

Lenses offers two SQL engines, both unified with the same SQL syntax for easy transitions from one to the other:

SQL Snapshot Engine: Designed for ad-hoc, point-in-time queries against Kafka topics. It's lightweight and super responsive. It supports all types of serialization - from Avro and Protobuf to XML and CSV.
SQL Processor Engine: A Lenses-native technology for continuous data transformation in streams. It builds and deploys (on your Kubernetes) Kafka Stream applications based on your SQL syntax, integrating with your existing microservices architecture.

Real-world applications of SQL Snapshot

Restructuring & filtering data in a topic

Explode an array of products for a customer order to quickly diagnose a corrupted product_id in an event:

Qualifying the quality & profile of data in a topic

Understand the shape and profile of data in a Kafka topic. For example the cardinality of a field, the distinct counts of a field to design a partitioning strategy. Or even to just understand how clean it is. This comes with almost full AINSI-SQL like capabilities & functions.

Prototyping a join

Join two streams together point-in-time to see how they behave before you build your stream processing application that will continuously join these streams:

Parity in the SQL Syntax

Although the two engines are different and for different use cases, they have been designed to have almost exactly SQL-syntax parity – allowing an engineer to prototype and analyze data with SQL Snapshot Engine, then continuously process and transform the data with SQL Processors.

Unified security and auditing

Lenses has always been appreciated for its powerful RBAC model and data masking features, to control what data engineers can and cannot see.

We’ve revamped the permission model in Lenses 6 for even greater power and granularity, and applied an IAM-as-code mechanism that allows you to manage permissions in Git.

Every SQL or data operation generates an audit log which you can integrate natively into Splunk, or your other SIEM. This includes the offsets of the data viewed, ensuring compliance and traceability across your entire Kafka ecosystem.

Why Lenses SQL for Kafka?

You might wonder, "aren't there enough SQL engines for Kafka already in the market?"

While it's true that there are many options out there, Lenses SQL stands out:

It’s the only technology that works equally well for exploration (Snapshot) & processing (Processing)
It never moves the data out of Kafka
No need for an external or embedded lake house, database, or cluster
Offers an almost full ANSI-SQL-like experience
It’s super responsive with near-instant query execution
It's fully integrated with a robust RBAC model
It's delivered in a beautiful UX.

A new multi-Kafka world

We have designed Lenses 6 Panoptes to offer a unified experience for developers to work with data across whichever streaming infrastructure makes sense for the business, by totally abstracting the Developer Experience from the number and type of Kafka clusters. As a result, developers have the freedom to discover, process and integrate data across these environments.

developer-experience-hybrid-kafka-infrastructure

Introducing Global Data Catalog & Global SQL Studio for Kafka

Lenses 6 brings new major capabilities when it comes to finding data across a distributed Kafka landscape.

We’ve brought Data Catalog to the global level, allowing engineers to safely search the metadata of their Kafka streams across the business. Imagine an engineering team in the Customer Service department needing to build an application processing data held in a Kafka cluster run by the Ground Operations IT team – how would they even find this data, or know it exists? Now they can.

Once you’ve found the data, it’s time to explore it. Lenses 6 Panoptes rises to this challenge with the Global SQL Studio. It’s an extension to the much-loved single-Kafka version, but this time with the ability to search globally, protected with a global IAM model and from a single experience.

Take Global SQL Studio for a spin

If you’re working with distributed Kafka deployments, Lenses 6 Panoptes and its Global SQL Studio offer a single, secure way to explore and interrogate your streaming data. By breaking down walls between Kafka clusters and providing a single pane of glass for your entire streaming ecosystem, we hope to unlock a whole new level of data streaming possibilities for teams.

Lenses 6.0 Community Edition is available to use for free, with up to two Kafka clusters: