What can the pandemic teach us about building data products?

Today’s Kafka edge case could be tomorrow’s revenue driver. COVID-19 says you shouldn’t wait to find out.

Dave Harper

Sep 28, 2020

What can the pandemic teach us about building data products?

Most people I’m talking to in the data world have to achieve more with less - and faster.

Normally we don’t really feel the weight of this day-to-day. But COVID-19 has forced many of us to adapt our approach to new problems and even rethink existing business models. Often, those changes leave us better off.

What better example than what’s currently happening in healthcare and life science? For these industries, the availability of scientific research and results made available to researchers, government institutions and healthcare companies on the spot is essential. This information is made available through well-engineered event streaming. It becomes clear very quickly that building data products like this isn't something you can sleep on; faster time-to-market saves lives and solves the biggest global crisis of our time.

Let me explain how it works anecdotally.

An organization we’ve worked with in recent months is developing such data products to help identify a COVID-19 vaccine.

Apache Kafka is the natural choice of data technology for such a task, owing to its proven power in distributing real-time data; but Kafka in itself is incredibly complex to manage, build on, govern and scale. So, where is this precious time best spent for the architects and engineers of these event-driven systems?

Is it better spent on building the platform features or focusing on achieving the right data outcomes instead?

I’d pick the outcomes every time. Our customers would too; especially those whose work is mission-critical to the pandemic.

More and more we hear from engineering teams “we don’t want to just be ‘the Kafka experts’.”

Our customer working to discover a vaccine had their developers working around the clock to manage intense release processes. They don’t have the time to research and build their own tools, quickly learning they couldn’t afford to waste time on manually debugging new data flows.

Then as soon as the data is in production the Operations team has to manage Kafka. Ops don’t know Kafka, nor they don’t have the time to learn it - in fact they have enough systems to worry about already.

Having a ready-to-go environment to enable the Ops team in monitoring, operating, governing and deploying on Kafka was essential.

Of course we’re dealing with patient test data here so the question of data security came up.

Again this team doesn't have time to build granular access controls, audit trails or data masking.

Instead they opted for ready-made tools for scaling real-time data and apps (we've put together a checklist for you here to help scope what you'll need for a successful streaming data platform: https://lenses.io/resources/build-a-kafka-data-platform/).

An edge case today, a revenue driver tomorrow

Most engineers and data practitioners I talk to at the moment don’t have such obvious time pressures and it makes them less conscious of how valuable their time is. Yet resources in these organizations are reduced all the same, and there is a near constant risk of disruption from new incumbents or global giants - regardless of the industry you work in.

DataOps - the bringing together of people, data and apps to escalate business outcomes - can help accelerate your real-time data product delivery.

3 amigos DataOps principles for Apache Kafka

For medical and health projects like those mentioned above, this means providing self-service from the ground up directly to the data practitioners who can test, learn and act on the insights that are relevant to them - and do so in real-time.

Before using Lenses to drive DataOps, the team was experiencing difficulties in harnessing Kafka’s power: most connectors were failing; mapping to Elastic was tough and yet data availability downstream was critical.

By providing access points and a control panel for Kafka, the organization was able to replay raw events framed in their respective contexts to generate value. This meant not just describing the events themselves, but the events that led to them through a Topology view and data catalog.

When it comes to governance, the team can now publish different events to different data consumers, from software engineers to a government task force, without waiting months to build features in-house. They’re able to offer a secure environment complete with full auditing capabilities so they can confidently open up Kafka, stay compliant and allow the right experts to weigh in on what’s happening in front of them.

You can see how another healthcare company, Babylon Health, are building data products to make healthcare globally available with Apache Kafka.

They have 200 microservices relying on Kafka to manage healthcare data, and they use Lenses to help them find the balance between innovation and governance.

Lenses has made it easy for 90 engineers to manage their own data securely, saving time when debugging data ingestion and integrations.

See how they did it in this presentation:

Big Data LDN: Freeing up engineering and infrastructure resources to scale with DataOps