New Lenses Multi-Kafka Developer Experience

Download

Luggage lost in a world of streaming data

By

Drew Oetzel

Dec 03, 2024

Democratizing and sharing data inside and outside your organization, as a real-time data stream, has never been more in demand. Treating data as-a-product and adopting Data Mesh practices is leading the way.  Here, we explain the concept through a real-life example of an airline building applications that process data across different domains.

What if I told you that the data you need to pilot your real-time AI-powered future was actually within your company’s existing systems? Specifically, it flows through your streaming data pipelines. Across your IT stack, data is streaming through Apache Kafka-based systems on its way to and from your various applications. This is your company’s data in its freshest, most raw form, and this is what developers need to transform separate real-time environments into a hyperconnected real-time enterprise.  

In the era of Data Mesh, each of these data streams represents a potential 'data product' – a valuable, shareable asset that can drive innovation across the enterprise.

Flying blind with streaming data pipelines

The problem is that the “pipes” this super valuable data is flowing through are opaque. Producer and consumer applications at either end of the “pipes” know what is in them, but other teams and divisions can’t easily peer inside. It gets even harder when you think about where these “pipes” are located – they could be in any cloud, or locked into any streaming-vendor solution.

Over past decades, engineering teams have integrated data from these operational systems into a lakehouse or data lake. 

And while these lakes are certainly useful, they are – by design – backwards looking; not real-time. What we still lack is the ability to share data from operational systems and microservices; live data that helps us understand how applications are behaving in real-time.

This challenge underscores the need to treat streaming data as a product: data that is sharable and valuable across different teams and domains.

The baggage at the bottom of your data lake

An airline can sift through its data lake to determine which airports are most likely to misroute a passenger’s bag. However, they can’t use the lake to build a software application that alerts a passenger that their bag is misrouted, while said passenger is still in the air. 

If an airline wanted to build a system that would alert passengers still in flight that, unfortunately, their bag will not be on the carousel when they land, they would need to work with their baggage location data in near real time. And further, if they wanted an AI agent to help arrange baggage delivery later that day – again, still in flight - that AI agent would also need to know where that bag actually was in the moment.

The data discovery challenge

Now, imagine you're a developer charged with creating this preemptive lost baggage system for the airline's top frequent fliers. Likely you are part of a customer service team, not the baggage handling or flight status team. Your first order of business is 'how do I get real time baggage information, plus flight status information, plus boarding information so I can detect when a frequent flier is on a plane, but their baggage is not'?

Tracking down these disparate data streams is a daunting task. They will be locked up in different departments, in unknown formats, flowing through edge data pipes at the actual airports. This project seems doomed from the start. Not because the data doesn't exist but because it's too hard to find, much less parse and understand.

Sharing data between domains

How many times is this happening right now? New top-down projects shelved because your developers can’t find the data, or get access to it. New bottom-up ideas never even investigated because your developers don’t even know if the required data flows exist. 

This challenge highlights a critical industry trend: the need for a Data Mesh approach and Data Catalog solutions that can bridge the gap between operational systems.

Treating data as a product means making it accessible, understandable, and usable by different teams, breaking down the silos that currently restrict innovation.

What if I told you there was a product designed to let you jump right over this data obscurity problem? Lenses 6 Panoptes is designed to allow our poor example developer to search across the entire organization safely. It is the only tooling that lets you explore a streaming infrastructure that spans multiple clouds, vendors, and deployments. 

Lost Luggage Diagram 1 02 (1)

Lenses 6 Panoptes

Build predictive real-time applications, for passengers to find luggage faster

Lenses 6 Panoptes is designed to allow the airline developer to search across the entire organization safely. It is the only tooling that lets you explore Kafka-API-Based streaming infrastructure that spans multiple clouds, vendors, and deployments. Named after the all-seeing (friendly) giant from ancient Greece, it helps developers deliver on the promise of a data mesh by sharing data that reflects real-world changes.

Our developer  can start with baggage-related streams from the baggage department’s Apache Kafka cluster, and see their format. This would then allow her to find passenger boarding streams and flight status streams in a completely different department’s Kafka cluster. She could then investigate the feasibility of bringing these dataflows together in real time. All from one place.

Lenses 6 Panoptes would let her do even more. She could use Lenses SQL capabilities to peer into those data “pipes” and see the format of the data flows – without ever having to move the data.  She can evaluate the quality of the data flows. She can plan what her app will need to do with the data to make it work for her use case.

She could do all of this without having to beg a platform team in a different department for sample data, or wonder if the normalized version of the data she sees in the data lake is how it’s going to look live. She could do all of this without filing a single ticket – all from a single unified interface. 

Now the project won’t be shelved. She and her team can build the predictive lost baggage application and let me know, while I’m still flying across the Atlantic, not to bother waiting sadly at the baggage carousel but to head directly to my hotel. Then the AI agent could ask me which hotel I’m staying at, and schedule a delivery of my bag the next day.

How many stuck data streaming projects?

Leaving our example airline behind and getting back to you and your company’s data. What real-time data projects are currently sitting shelved? What cool innovation that would transform your customers’ experience remains a developer’s pipe dream, simply because they don’t know about the data you already have?

Empowering developers to search and discover data flows throughout your enterprise is the central ethos of Lenses 6. We believe that your developers and product managers can dream up new useful real-time and AI-powered applications, but only if they can find the data flows to enable them. The data your teams need to succeed, you already have; your teams just can’t find it.

Most of all, we want to help your developers innovate and create the applications that will improve all our lives! Here’s to someday never again having to stand dejected at the baggage carousel, waiting for a bag that will never arrive.

___

Docker run and download Lenses 6 for free.