Taming MirrorMaker2

By

Drew Oetzel

Oct 09, 2025

Let me tell you about the time I may have used some words I would not use in front of my grandmother when staring at Kafka logs. I was testing replication with MirrorMaker2 for a client project back during the dark days of Covid. The MM2 logs showed everything was running. The connectors were healthy. The networks were fine. And yet, nothing was happening. Turns out, I had a single character wrong in my regex pattern. One. Character. And yes, I know, REGEX has sucked since the 1950s, but something about the whole complex setup of MM2 made even finding the REGEX problem that much harder. 

If you've worked with Apache Kafka's MirrorMaker 2, you probably have your own version of this story. MM2 is genuinely powerful, it handles consumer offsets automatically, replicates topic configurations, and supports complex multi-datacenter topologies. It's a massive improvement over the original MirrorMaker. But that power comes with complexity, and the configuration format is, speaking as kindly as I can, not the most intuitive thing you'll ever encounter.

This post walks through the four most common configuration pitfalls I've seen when setting up MM2. These aren't obscure edge cases. These are the gotchas that will bite you on day one if you're not careful.

Gotcha #1: Topic Renaming

Here's the scenario: you're setting up disaster recovery. If your primary cluster goes down, and you want consumers to seamlessly switch to your secondary cluster. You fire up MM2, check the secondary cluster, and find... primary-cluster.orders, primary-cluster.payments, primary-cluster.inventory. Your topics are there, but they've all been renamed with a cluster prefix.

This is MM2's default behavior, and it exists for good reasons. When you're doing bidirectional replication or aggregating data from multiple source clusters, those prefixes prevent naming collisions and duplication if you’re doing bi-directional replication. If you're pulling topics from three different clusters into a central one, you definitely want to know which cluster each topic came from.

But for disaster recovery or active-passive failover scenarios, this creates a nightmare. Your applications are hardcoded to consume from orders, not primary-cluster.orders. When you fail over, nothing works without reconfiguring every single consumer and producer.

Let's look at what a naive configuration looks like:

This configuration will absolutely replicate your topics. It just won't replicate them the way you want. Every topic from primary-cluster will show up on secondary-cluster with that prefix attached.

The fix is the replication.policy.class property. MM2 uses a replication policy to determine how to name topics on the destination cluster. The default is DefaultReplicationPolicy, which adds the source cluster name as a prefix. To keep topic names identical, you need IdentityReplicationPolicy:

With IdentityReplicationPolicy, topics keep their original names. orders stays orders. payments stays payments. Your applications can fail over without any reconfiguration.

One important note: the replication policy also affects how MM2 handles internal topics for offset translation and checkpoint creation. When you use IdentityReplicationPolicy, you lose some of MM2's automated offset translation capabilities because those internal topics rely on the naming convention created by DefaultReplicationPolicy. For many disaster recovery setups, this tradeoff is worth it, but be aware of what you're giving up. Topic translation versus topic replication is such a big topic I’ve moved it into it’s own spin off blog here. Check it out for a deep dive between the two and a run down of multiple methods handling it. (link to spin off blog)

If you need something in between—maybe you want to keep names the same for most topics but add prefixes for some—you can implement a custom replication policy. It's more work, but MM2 gives you that flexibility.

Gotcha #2: The Offset Dance

You've got your topics replicating cleanly. Names are preserved. Data is flowing. You're feeling good. Then your primary cluster actually fails, you switch your consumers to the secondary, and they start processing messages from three days ago. Or worse, they skip ahead and miss thousands of messages entirely.

Consumer offsets are the invisible infrastructure that makes Kafka work. They track where each consumer group is in each topic partition. Without them, consumers don't know where to pick up. MM2 can replicate these offsets, but only if you tell it to.

Here's a configuration that looks reasonable but is missing the critical piece:

This will replicate your topics and data just fine. But when a consumer group fails over to the secondary cluster, it won't find its offsets there. Depending on your consumer configuration, it will either start from the beginning (reprocessing everything) or start from the latest offset (skipping everything that was in flight).

The fix requires two properties:

The sync.group.offsets.enabled property does exactly what it sounds like—it tells MM2 to replicate consumer group offsets. Without this set to true, offset replication doesn't happen at all.

The sync.group.offsets.interval.seconds property controls how frequently those offsets are synced. This creates a tradeoff. Sync too infrequently, and you risk losing more data during a failover because the offsets on your secondary cluster will be outdated. Sync too frequently, and you create additional load on your clusters and network.

Thirty seconds is a reasonable default for many workloads. If you're dealing with high-throughput topics where losing even a few seconds of progress is unacceptable, you might go lower—maybe 10 or 15 seconds. If you're replicating lower-priority data where some reprocessing during failover is acceptable, you could go higher to reduce overhead.

One subtle point: MM2 doesn't just copy offsets directly. It translates them. See my offset Replication vs. Translation blog for details..

Also worth noting: sync.group.offsets.enabled has a companion property, sync.group.offsets.interval.millis, which does the same thing but with millisecond precision. In practice, second-level granularity is usually sufficient, but the option exists if you need it.

Gotcha #3: Filtering Follies

MM2 uses regular expressions to determine which topics and consumer groups to replicate. This is powerful—you can set up complex filtering rules with just a few lines of configuration. It's also dangerous, because REGEX is, as I pointed out in my opening anecdote, notoriously easy to get slightly wrong.

Consider this configuration:

The intention here is clear: replicate all topics that start with user-events-. But this won't work. In regex, the asterisk means "zero or more of the preceding character." So this pattern matches user-events, user-events–, user-events—----, and so on. It doesn't match user-events-production or user-events-staging.

The correct wildcard in regex is . *, which means "zero or more of any character":

Now you'll actually replicate user-events-production, user-events-staging, and any other topic that starts with user-events-.

If you want to replicate everything, use . * by itself:

If you want to replicate topics matching several different patterns, you can use alternation:

This replicates any topic starting with orders-, payments-, or inventory-.

The same logic applies to consumer groups:

Another common mistake: forgetting that certain characters have special meaning in REGEX. If you have topics with dots in their names (like com.example.orders), you need to escape those dots if you want them to be treated literally:

Without the backslashes, the dots match any character, which might work accidentally but isn't technically correct.

My advice: before you deploy your MM2 configuration to production, test your REGEX patterns. You can use online REGEX testers. Most LLMs are very helpful with writing REGEX too these days. But you will still need to test.

One more thing about filtering: MM2 also has blacklist properties (topics.blacklist and groups.blacklist) that exclude topics or groups even if they match your include patterns. These use REGEX too, so the same rules apply. Using REGEX with both include and exclude patterns is a very dangerous matching game! A topic that matches both the include and exclude patterns will be excluded.

Gotcha #4: The Remote Cluster Shuffle

MM2's configuration syntax for defining replication flows is not particularly intuitive. You have to explicitly state which clusters exist and then explicitly define which direction data should flow. This isn't obvious from the documentation, and it's easy to end up with a configuration that looks right but doesn't do what you expect.

Here's a minimal configuration that seems like it should work:

You've defined two clusters. You've specified their bootstrap servers. But nothing is replicating. Why? Because you haven't told MM2 to actually replicate anything.

MM2 uses the arrow syntax to define replication flows. The format is source->target.enabled. You need to explicitly enable each direction you want:

Now MM2 knows to replicate from primary-cluster to secondary-cluster.

If you want bidirectional replication, you need to enable both directions:

This is powerful because it means you can set up complex topologies. Maybe you have three clusters in different regions, and you want data to flow from each edge cluster to a central cluster, but not between the edge clusters:

Technically, you don't need to explicitly set flows to false—disabled is the default. But I like to include them anyway for clarity. When someone (including future me) looks at this configuration six months from now, there's no ambiguity about what flows are active.

Each replication flow can have its own configuration overrides too. Maybe you want different topics or consumer groups replicated in each direction:

This replicates only high-priority topics from A to B, but replicates everything from B to A.

One more thing to watch for: MM2 configurations can be deployed in two ways. You can run MM2 as a dedicated Connect cluster with a configuration file, or you can deploy MM2 connectors to an existing Connect cluster using the REST API. The syntax is slightly different depending on which approach you use. The arrow notation (source->target.enabled) works in the standalone configuration file format.

If you're deploying via the REST API, the property names change. You can’t use the arrow shortcuts in the configuration files. 

Take this setup for example using stand-alone MM2:

This would look like this for a shared Kafka Connect using the REST API to interact with MM2:

Exact same configuration - very different formats!

Replicating with Confidence

MirrorMaker 2 is not a tool you can just point at your clusters and expect to work perfectly. It requires careful configuration and a solid understanding of what each property does. But once you get past the initial learning curve, it's remarkably capable.

The four gotchas covered here—topic renaming, offset synchronization, regex filtering, and replication flow definition—account for the vast majority of MM2 issues I've seen in the wild. Get these right, and you're most of the way there. Please see the spin-off blog to this post for an even deeper dive into offsets and replication. 

My practical advice: start simple. Set up MM2 in a development environment with a single unidirectional replication flow first. Get that working. Verify that topics replicate with the names you expect. Check that consumer offsets are syncing. Test your regex patterns against real topic names. Only after you've confirmed the basics should you move on to more complex setups like bidirectional replication or multi-cluster topologies.

Document your configuration choices. Six months from now, when someone asks why you used IdentityReplicationPolicy or why offset sync is set to 30 seconds, you'll be glad you left comments explaining your reasoning.

Test your failover procedures before you need them. Replicate data to your secondary cluster, then actually try failing over a consumer group. Does it pick up where it left off? Are the offsets translated correctly? Better to find out during a drill than during an actual outage.

And finally, keep an eye on MM2's internal topics and metrics. MM2 creates several internal topics (with names like mm2-offset-syncs and heartbeats) to coordinate its work. Monitor these topics. If they stop getting updates, something is wrong. MM2 also exposes metrics through JMX that can help you understand replication lag and throughput. Ingest these metrics somewhere AND the MM2 logs somewhere you can search and monitor them. You will need all the help you can get when things go a big sideways. 

MirrorMaker 2 isn't perfect, and it definitely has sharp edges. But with the right configuration and some attention to detail, it's a solid tool for keeping your Kafka clusters in sync. Just watch out for those gotchas.