Q&A: How to Troubleshoot Duplicate Data in Elasticsearch #24226
-
QuestionWe deployed multiple aggregator-pattern vectors in Kubernetes, using a single consumer group to consume from multiple topics before writing data to Elasticsearch. However, we occasionally observed duplicate entries in Elasticsearch—sometimes two duplicates, sometimes three—all sharing identical offsets. Logs only showed Elasticsearch request timeouts. I'm unsure whether this stems from duplicate Kafka consumption or Elasticsearch request retries. Vector Configimage: replicas: 3 logLevel: "info" customConfig: Vector Logs2025-11-13T06:17:49.557448Z WARN sink{component_kind="sink" component_id=es component_type=elasticsearch}:request{request_id=80870}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
|
Hey, I guess you can use the dedupe transform, something like this: Identify the field that holds the unique value, and based on that, eliminate the duplicate logs. |
Beta Was this translation helpful? Give feedback.
Hey, I guess you can use the dedupe transform, something like this:
Identify the field that holds the unique value, and based on that, eliminate the duplicate logs.