add bailout for estimation and benchmarking in autotuning #8816
+112
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
bailout_in_estimation/benchmarkingas optional params toprune_configs_byA common pattern for myself is to autotune over a large search space before dialing
in on a subset of that search space for subsequent runs. Usually like:
This large search space usually has a couple bad and very bad choices, sometimes
10x, 100x, or 1000x slower than the best choice. I think it would be useful to be able
to tell the autotuner "if the first run of choice X is Y times slower than the fastest choice
so far, then stop benchmarking".
I think this is most valuable for folks who reserve hardware from cloud providers, as it reduces
the time spent using that hardware to benchmark choices that are really bad so it costs less overall.
I followed the format of
prune_configs_bythat already exists, so the usage would be like:This way the user can define the criteria for which they would like to bailout, depending on
the best timing seen so far across all configs, as well as where in
do_benchthe bailout is (I thought it made sense in this way, since we maybe want to be more forgiving early on and
less forgiving as we progress through the benchmarking).
This gave me some nice overall reduction in autotuning cost, for the tutorial matmul on two shapes
~50% less time was spent in benchmarking for a search space of ~750 configs. The performance
seemed to be similar enough given noise:
There is some additional time saved, since if we bailout we don't synchronize on the remaining
events and therefore there is some pipelining of those remaining events executing and the next
config being compiled, but I didn't know a good way to measure this directly.
I also think it would be valuable to allow users to skip configs that would definitely fail
with out of resources. For example, if the user knows that increasing block sizes strictly
increases shared memory usage for their kernel then it would make sense that if a certain
config fails with out of resources then any related configs that are strictly increasing of the
already failing block sizes can be skipped. If this pull request is valuable, I can follow up with
this option as a next change.
Thank you for the consideration!
New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD.mypy fails, with an unrelated error in
python/triton/runtime/build.py:34Select one of the following.
/testforlittests/unittestfor C++ tests/python/testfor end-to-end testsSelect one of the following.
littests.littests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)