The human cost of Kafka monitoring

It’s a Friday evening work cocktail night. Terry, a developer from the eCommerce team (that wasn’t invited for reasons we shall not divulge), messages you with a regular request. All his messages are “urgent” except this one doesn’t even have a “hi”: Can you check if this specific message has been published into Kafka?

Human Cost of Kafka Monitoring

You would think (incorrectly) that Terry could do this himself. Yet, since the eCommerce authentication for the website was re-architected around Kafka, you alone are the data observability solution, supporting more than 50 developers.

Priya, the SRE, has just received a PagerDuty alert, escalated from the support desk and the Head of eCommerce. She’s standing next to you.

It’s going to be a long night, and not on the cocktails.

An extra hour or two to troubleshoot isn’t the end of the world when Kafka is a pilot project engineered by a couple of Kafka connoisseurs. But when you have a new, critical application - like Terry’s new authentication microservice - the data you need to monitor your platform, and who has access to it, has to change.

PD message