Commit d307708
committed
Optimize Kafka sink with parallel record sending
Changed from sequential to parallel record sending using parSequence.
This allows the Kafka producer to batch records more efficiently while
maintaining the same reliability guarantees.
Previous approach (traverse_):
- Sent records one at a time sequentially
- Each send blocked until completion
- Kafka producer couldn't batch efficiently
- Performance bottleneck with high-latency Kafka systems (e.g., WarpStream)
New approach (parSequence):
- Fires all sends immediately without blocking between them
- Waits for all to complete before checkpointing
- Kafka producer can apply internal batching logic
- Same at-least-once delivery semantics
- Significant performance improvement with WarpStream (2-5x)
All reliability guarantees preserved:
- All sends must succeed before checkpoint
- Failures propagate and stop the stream
- No data loss risk
- Kafka producer retry logic still applies1 parent 21f494e commit d307708
File tree
1 file changed
+4
-5
lines changed- modules/kafka/src/main/scala/com.snowplowanalytics.snowplow.enrich.kafka
1 file changed
+4
-5
lines changedLines changed: 4 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
50 | 48 | | |
| 49 | + | |
51 | 50 | | |
52 | 51 | | |
53 | 52 | | |
| |||
0 commit comments