-
Notifications
You must be signed in to change notification settings - Fork 102
Also compare batch measurements in nvbench_compare.py #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| if has_batch_data: | ||
| if ( | ||
| abs(frac_diff_batch) <= 0.01 | ||
| ): # TODO(bgruber): what value to use here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, let's get some input internally on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just pick a sensible default and let the user override with command-line opts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use min_noise estimated from abs(frac_diff_batch) <= min_noise if available, like we do for cold measurements, and use yellow tint if min_noise is not available.
Perhaps in case when has_batch_data is True, min_noise should always be available.
|
I like the idea of splitting them to a new line, I think it'd be cleaner. Or making them into separate tables? That way you could still quickly scan a column to check for outliers. That'd be harder if the timings were alternating cold/batch. |
d140027 to
f77d001
Compare
|
@oleksandr-pavlyk I would still like to consider this:
But I also don't have bandwidth to work on it for now. We can leave the PR until I need to compare batch measurements again and get annoyed why they don't show up :) |
Fixes: #247
Cold and batch measurements can sometimes differ substantially, so we want to show both. An example is kernels using PDL (Programmatic Dependent Launch).
Here is a comparison of DeviceTransform with and without PDL (see also NVIDIA/cccl#5249):
The table becomes a bit unwieldy. We could consider dropping the
DiffandB Diffcolumns to improve the situation. Alternatively, we could emit two rows per benchmark.