Andrew Stevenson
The state of Kafka replication
Reasons and challenges for Kafka replication between clusters, including data sharing, disaster recovery, and workload migration.

Andrew Stevenson
Real-time data increasingly needs to be distributed across multiple domains, clouds, and locations. Disaster recovery is one reason for sharing real-time data across clusters. With this, companies need to pivot between providers almost instantly to spread risk and maintain performance (just look at the January DeepSeek-R1 LLM news).
This creates a challenging paradox: while teams need autonomy to choose data streaming technologies and mirror data themselves, they also need secure, compliant Kafka replication that’s not tied to a single vendor.
While there are now numerous ways to replicate data at rest, the replication of streaming data just hasn't evolved at the same pace.
Here are a few good reasons for replicating real-time data across streaming technologies.
Kafka topics need to be synchronized and available for consumption across multiple applications and domains. This portability allows for data movement and collaboration across the organization, and beyond.
Data movement accelerated the popularity of data marketplaces and data mesh architectures, allowing organizations to treat real-time data as a product (like at Netflix and Intuit). Data sharing – within and outside of a business – can unlock new revenue streams.
Cross-Kafka replication can be used here for several reasons:
It minimizes performance impacts from multiple external consumers
Reduces latency for distant partners
Isolates sensitive data from accidental exposure
Enables purpose-built clusters optimized specifically for data sharing.
Vendor lock-in is no longer acceptable. Organizations need to move between vendors (like Confluent to Redpanda, and back again) or migrate to solutions like MSK Express Brokers to reduce costs.
This is because, over the past two years, streaming infrastructure has diversified and specialized to cater for specific workloads, use cases, and industries. Now, teams don’t have to make trade-offs between cost, performance, openness, and ease of use in their Kafka strategy. They can have a multi-Kafka vendor approach.
Efficiently move your app from one environment to another – for example, on-prem to cloud, cloud to edge, or for workload isolation.
Companies are moving apps closer to data sources (factories, stores) at the edge while using multiple clouds. This strategy recognizes data gravity – the tendency for applications to be pulled toward where data resides – while enabling workload mobility across locations to boost performance, meet compliance, and cut latency.
Critical applications need continuous data replication with consumer offsets. When disasters hit, teams can quickly switch to backup systems without service interruptions. Regulations often require fast recovery, and downtime hurts both profits and reputation in today's connected business world.
When you need specific production data in test environments to reproduce issues, you can mirror selected topics from production to staging with necessary obfuscation. This approach supports complex AI and ML testing with high-quality, real data, while maintaining security and compliance.
A two-stage Kafka architecture uses a landing cluster to handle raw data before anonymizing it for the main cluster. This ensures sensitive data is masked before reaching downstream consumers or third-party systems. As data privacy regulations tighten and cross-organizational data sharing increases, robust masking and governance models become essential.
While Kafka replication might sound straightforward, the devil is in the details. You need to orchestrate multiple elements:
Data migration: Moving messages between clusters while preserving ordering, timestamps, and headers
Schema migration: Syncing schema registries to maintain data compatibility and evolution across clusters
Consumer Offset migration: Transferring consumer group positions for failover and migration
Topic configuration migration: Replicating topic-level settings like partitions, retention policies, and cleanup policies
Enterprise-grade Kafka replication requires:
Cost-effective scaling: Optimizing resource usage while handling increasing data volumes and throughput
Robust data governance: Enforcing data access controls, audit trails, and compliance policies across clusters
Comprehensive Kafka monitoring: Real-time visibility into replication lag, throughput, and error rates
Data transformation capabilities: Supporting in-flight data modifications for masking, filtering, or enrichment
Simple, engineer-friendly Kafka configuration: Providing intuitive interfaces and declarative configuration options
Automation support (replication-as-code): Enabling infrastructure-as-code practices for replication management.
With all this complexity, we need a fundamental shift in how engineers interact with distributed data, across streaming technologies. While current solutions allow basic data movement between Kafka clusters, they lack a unified, user-friendly operational experience across providers and deployments.
Engineers shouldn’t have to juggle multiple tools and interfaces to oversee data flows, handle schema evolution, or maintain governance policies. Teams need a platform-neutral way to move and manage data across infrastructure while maintaining security and compliance. This approach would let teams focus on extracting value from data rather than managing the complexities of cross-Kafka replication.
Organizations will continue adopting hybrid and multi-cloud strategies. This won’t change. But the ability to easily work between them will become a big differentiating factor – and this is where current Kafka replication solutions fall short.
The world of data streaming is rising to meet these challenges. To do so, we're releasing Lenses K2K – addressing these needs with a vendor agnostic approach to Kafka replication.
Want to try it out? Register for the Lenses K2K preview.