Skip to content

Conversation

@BewareMyPower
Copy link
Owner

Motivation

Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail.

Here is a log observed in a production environment:

WARN org.apache.pulsar.broker.service.BrokerService - Replication or
dedup check failed. Removing topic from topics list
persistent://public/__kafka/__consumer_offsets-partition-40,
java.util.concurrent.CompletionException: java.lang.RuntimeException:
org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException:
kop

If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually.

Modificatons

When initializing a PersistentTopic, if there is any replicator cursor while the responding cluster does not exist, ignore the exception from addReplicationCluster. Then, remove this "zombie" cursor.

Verifications

PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor is added to verify PersistentTopic#initialize will succeed and the zombie replicator cursor will be removed.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/replicator-zombie-cursors branch 2 times, most recently from c57d1ac to 1469b85 Compare March 30, 2023 17:11
@BewareMyPower BewareMyPower changed the title [fix][broker] Skip creating the replicator when the remote cluster is absent [fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent Mar 30, 2023
… cluster is absent

### Motivation

Sometimes when a remote cluster is deleted, the replication cursor might
still exist for some topics. In this case, creating producers or
consumers on these topics will fail.

Here is a log observed in a production environment:

> WARN  org.apache.pulsar.broker.service.BrokerService - Replication or
> dedup check failed. Removing topic from topics list
> persistent://public/__kafka/__consumer_offsets-partition-40,
> java.util.concurrent.CompletionException: java.lang.RuntimeException:
> org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException:
> kop

If it happened, unloading the topic or restarting the broker could not
help. We have to remove the cursor manually.

### Modificatons

In `addReplicationCluster`, before getting the replication client, check
the namespace policy and topic policy first. If the remote cluster does
not exist, skip adding the replication client and remove the cursor.

### Verifications

`PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor` is added
to verify `PersistentTopic#initialize` will succeed and the zombie
replicator cursor will be removed.
@BewareMyPower BewareMyPower force-pushed the bewaremypower/replicator-zombie-cursors branch from 1469b85 to 8aeb37d Compare March 31, 2023 09:30
@github-actions
Copy link

github-actions bot commented May 3, 2023

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants