Kafka infrastructure, monitoring, data - Which is your priority?

Data shouldn’t be the third-wheel when it comes to Kafka.

Doug Cohen

Jan 13, 2021

Kafka infrastructure, monitoring, data - Which is your priority?

At the heart of Kafka is real-time data. With data at the center of any Kafka environment, it should be the area that gets the most attention, but typically it gets the least. This happens because we see most organizations split their Kafka efforts into three areas: infrastructure, monitoring, and data operations.

As Kafka is complex to operate and security needs to be built-in, data operations are typically delegated to a few Kafka experts, leaving software engineers and data scientists totally in the dark. Let’s break these three components down and look at how we can go about solving the data problem.

Kafka Infrastructure

It might sound obvious, but seriously consider whether on-prem Kafka can work for your business - and how it will scale. Even if you think Kafka has a small place in your data architecture, you should plan for it to be the center of your strategic business universe. Because it will be, at least for a while.

Due to the cost & complexity required to manage Kafka and the dependence on that hard-to-source Kafka skillset, removing the heavy lifting from a self-managed infrastructure is invaluable. Services like AWS MSK*, Confluent Cloud, Aiven, and Azure HDInsights are taking the market by storm.

While managed services can sometimes be restrictive based on the toolsets they provide, if your use-case isn’t particularly bespoke, there is little reason to not go with a managed service provider for Kafka.

*By the way, if you evaluated MSK before June of 2020 and decided it wasn’t for you, give it another go. In typical AWS fashion, they have made significant improvements to the service and it is now being used widely.

Kafka monitoring

If you choose a managed service provider, many provide tools to monitor infrastructure.

If you are managing the infrastructure yourself, it’s time to roll up your sleeves and dig into open-source tools or build-it-yourself applications. The amount of work required to build this out yourself further promotes the movement to a managed Kafka service.

Make sure you calculate the Kafka ROI of opting for tooling out of the box.

Open-source tooling nor DIY applications don’t quite cover everything you’ll need, however.

While you can gain insights into your infrastructure and high-level Kafka health, they don’t provide insights into the data itself.

In the world of Kafka, the infrastructure is not your only problem. The problems often reside in the data itself. You or your team will eventually ask:

“Is my data in Kafka?!?”

Traditional monitoring cannot answer this question.

The data problem

Kafka is extremely powerful, but also takes experience to look under the hood and understand the data moving through it. Even if you are a veteran, troubleshooting & finding specific events can be extremely time consuming and seemingly impossible.

In an ideal world, you would enable any user to access, explore, and transform the data. Most solutions come with some serious down-sides. Here are the four most common examples:

1. Training your teams on Kafka

Problem: No organization is trying to be the best at Kafka; they just want to gain value from the data.

2. Build in-house tools

Problem: If you have a huge team with a lot of time to take on a project like this, it could work. This just isn’t a reality for most companies.

3. Hire many Kafka experts

Problem: This just isn’t scalable. Kafka growth will likely outpace your hiring strategy and Kafka experts are some of the most expensive developers out there.

4. Dump your data somewhere else (e.g. Elasticsearch)

Problem: This might help devs do some debugging and understanding of how their data resides in Kafka, but you’ve just lost your real-time functionality and made your entire system more complex and reliant on additional hardware.

If they are able to determine that there is an issue with the data pipeline, it’s right back to the Kafka experts to help.

The takeaway? Master data, not Kafka.

Being the master of your own DataOps destiny

But there’s also the big question of your current and future data and technology roadmap. Some cloud providers may offer an all-in-one managed service and data experience for business users. The broader questions to ask yourself here when sourcing a Kafka partner and tools for seeing into and governing your data are:

Does the Kafka partner(s)’ roadmap fit with my future plans for Kafka, e.g. a multi-cloud strategy, or am I headed towards vendor lock-in?
Will they integrate and allow me to observe data and flows beyond Kafka, e.g. Elasticsearch, Apps running on Kubernetes, etc?
Essentially, how dependent does your business want to be on one particular flavor of technology, forever?

DataOps: Open up your data, but not too much

When opening up your Kafka data to any developer, data scientist, etc. across your organization you need to consider compliance, governance, and make sure that no one can break anything.

Prioritize a rich set of tools to enable governance and put up appropriate guardrails to give you the confidence to open up Kafka to users. This goes well beyond the infrastructure-level authentication that you might get from your managed services provider.

The time has come for a data-centric approach to security. As the value of data increases, organizations are processing large amounts of sensitive data to deliver an amazing service to their customer:

Offer Kafka monitoring that goes beyond the infrastructure and can help find data, understand its provenance and lineage as well as mitigate or solve incidents.
Remove worries about breaching compliance by introducing granular access and obfuscation to certain data, that can be adjusted depending on evolving regulatory requirements.

Through this paradox of protecting and enabling data, you can create a healthy and safe multi-tenant environment for teams to build apps and flows.

Where to begin?

Try out Lenses for DataOps here.