[FLINK-38199] Add SinkUpsertMaterializerBenchmark#106
[FLINK-38199] Add SinkUpsertMaterializerBenchmark#106rkhachatryan wants to merge 5 commits intoapache:masterfrom
Conversation
src/main/java/org/apache/flink/benchmark/SinkUpsertMaterializerBenchmark.java
Outdated
Show resolved
Hide resolved
| @Param("100") | ||
| public int payloadSize; |
There was a problem hiding this comment.
For micro benchmark purposes, like optimising code, guarding against regressions, the lower the payload, the better, as you get more contrasting effects of any performance related changes - optimisations/changes to the related code will be more clearly visible instead of being watered down by bytes copy.
So I would shrink it down, maybe to 10 or 20.
There was a problem hiding this comment.
I think both cases can be problematic; I'll change to a smaller value and think about keeping 100 (or increasing it).
There was a problem hiding this comment.
Changed to
@Param({"10", "250"})
There was a problem hiding this comment.
Hmm.. it now takes 2+ hours for the whole benchmark to run locally.
I'm going to change to a single value of 20.
There was a problem hiding this comment.
Can we reduce the number of records per invocation? Or pick different number of records depending on the benchmark case? Slower benchmarks probably don't need as much records per invocation as faster cases.
There was a problem hiding this comment.
It takes long due to the number of combinations + benchmark setup, not due to the number of records:
- Warmup: 10 iterations, 10 s each
- Measurement: 10 iterations, 10 s each
- forks: 3
So it's 3 * 2 * 10 * 10 = 10m for one combination.
And here are benchmark parameters:
- upsert key: true / false
- state backend: rocksdb / heap
- retract delay: 1 / 1000
- payload size, retract percentage - fixed, so don't increase
meaning there 2^3=8 combinations in total.
So the total time is 8*10=80 minutes.
I don't see what combinations can we exclude, all seem important.
OTH, I think benchmark setup is probably redundant for this particular benchmark. I haven't notice significant difference when running with lower duration/forkCount/measurementCount. So I'd rather reduce those
For example to
- Warmup: 1 iterations, 5 s each
- Measurement: 5 iterations, 5 s each
- forks: 2
So it would be 2 * (5 + 5*5) * 8 = 8 minutes.
WDYT?
There was a problem hiding this comment.
I wouldn't touch the fork counts.
The recommended target for a single invocation is < 1s. Within a single iteration JMH works by invoking the benchmark method repeatedly for ~1s. If single invocation takes more than 1s, than iteration ends after the single invocation, which is usually not desired needed.
Given that here we are not spawning whole Flink job, on a mini cluster, I'm pretty sure we could reduce the invocation duration.
I guess maybe part of the problem is that your single invocation is also responsible for building up the state size, which arguably should/could be moved out to a setup method? You could:
- build up the desired state size/length of the values list in the setup method
- in invocation just run single process record call, or maybe single insertion and single retraction, after which the state size/list length remains unchanged?
That should be not only fine (as there would be no setup code in the benchmarked method), but that would be even by the book the intended way how to use JMH.
Each iteration would then take exactly 1s, so 1 minute per parameters combination, 8 minutes in total.
src/main/java/org/apache/flink/benchmark/SinkUpsertMaterializerBenchmark.java
Outdated
Show resolved
Hide resolved
pnowojski
left a comment
There was a problem hiding this comment.
mostly LGTM % opened comments
src/main/java/org/apache/flink/benchmark/SinkUpsertMaterializerBenchmark.java
Outdated
Show resolved
Hide resolved
|
@flinkbot run azure |
Requires apache/flink#26876