-
Notifications
You must be signed in to change notification settings - Fork 92
[GuideLLM Refactor] Updates and Fixes for benchmark outputs, schemas, and stats calculations #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR performs a significant refactoring to reorganize the codebase's statistics, schemas, and benchmarking components. The main changes consolidate utilities into schemas, remove deprecated presentation modules, and restructure the test suite to align with the new organization.
Key changes:
- Moved statistics utilities from
utils/statistics.pytoschemas/statistics.pyand updated to use probability density functions (PDFs) instead of cumulative distribution functions (CDFs) - Relocated pydantic utilities from
utils/pydantic_utils.pytoschemas/base.py - Removed deprecated presentation modules (
injector.py,data_models.py,builder.py) - Complete rewrite of statistics tests to use parametrized fixtures and broader distribution coverage
- Added new console utilities for formatted table printing
Reviewed Changes
Copilot reviewed 59 out of 61 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/utils/test_statistics.py | Complete rewrite with parametrized fixtures testing multiple probability distributions |
| tests/unit/utils/test_pydantic_utils.py | Updated import path from utils to schemas |
| tests/unit/presentation/* | Removed deprecated presentation test files |
| tests/unit/mock_benchmark.py | Updated class name from BenchmarkSchedulerStats to BenchmarkSchedulerMetrics |
| src/guidellm/utils/statistics.py | Deleted - moved to schemas/statistics.py |
| src/guidellm/schemas/statistics.py | New file with refactored statistics using PDF-based approach |
| src/guidellm/schemas/base.py | New file consolidating pydantic utilities |
| src/guidellm/utils/console.py | Enhanced with table printing capabilities and improved documentation |
| src/guidellm/utils/functions.py | Added safe_format_number utility |
| Multiple schema files | Updated imports and restructured benchmark schemas |
Comments suppressed due to low confidence (3)
src/guidellm/data/preprocessors/formatters.py:44
- This class does not call RequestFormatter.init during initialization. (GenerativeTextCompletionsRequestFormatter.init may be missing a call to a base class init)
class GenerativeTextCompletionsRequestFormatter(RequestFormatter):
src/guidellm/data/preprocessors/formatters.py:118
- This class does not call RequestFormatter.init during initialization. (GenerativeChatCompletionsRequestFormatter.init may be missing a call to a base class init)
class GenerativeChatCompletionsRequestFormatter(RequestFormatter):
src/guidellm/data/preprocessors/formatters.py:307
- This class does not call RequestFormatter.init during initialization. (GenerativeAudioTranscriptionRequestFormatter.init may be missing a call to a base class init)
class GenerativeAudioTranscriptionRequestFormatter(RequestFormatter):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Ran this benchmark: guidellm benchmark
--target http://vllm-standalone-granite-3-2b.llmd.svc.cluster.local \
--data "prompt_tokens=4096,prompt_tokens_stdev=512,prompt_tokens_min=2048,prompt_tokens_max=6144,output_tokens=512,output_tokens_stdev=128,output_tokens_min=1,output_tokens_max=1024" \
--max-seconds 10 \
--profile concurrent \
--rate 10And got the following error: ...
File "/root/guidellm/src/guidellm/benchmark/benchmarker.py", line 161, in run
benchmark = benchmark_class.compile(
accumulator=accumulator,
scheduler_state=scheduler_state,
)
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/benchmark.py", line 134, in compile
metrics=GenerativeMetrics.compile(accumulator),
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/metrics.py", line 797, in compile
incomplete = accumulator.incomplete.get_within_range(start_time, end_time)
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/accumulator.py", line 623, in get_within_range
if (stats.request_end_time >= start_time)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/guidellm/src/guidellm/schemas/request_stats.py", line 81, in request_end_time
raise ValueError("resolve_end timings should be set but is None.")
ValueError: resolve_end timings should be set but is None.Edit: Seems to be due to max-seconds constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the max-seconds bug here are some high-level notes:
- Needs signoff
- Needs cleanup. To retroactively run pre-commit on only the changes:
pre-commit run --from $(git merge-base main@{u} HEAD) --to HEAD - Only glanced over the accumulator and statistics code. Will do a more in-depth review time permitting but don't block on it.
- Is warmup/cooldown only seconds now and not sometimes a percent? Setting
--warmup .1results in a table entry ofWarm Sec: 0.1.
bfc4b2f to
4a951c4
Compare
…or the latest state of refactor Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
…ons, metrics, and outputs Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Co-authored-by: Samuel Monson <[email protected]> Signed-off-by: Mark Kurtz <[email protected]> Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
73e7539 to
6c7f133
Compare
Signed-off-by: Mark Kurtz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the max-seconds error is fixed. I'll do more testing tomorrow, but it works for me.
The new CLI table outputs are a little busy to look at, but work. Adding more horizontal padding may help, but isn't necessary.
| self._add_field( | ||
| headers, | ||
| values, | ||
| "Run Info", | ||
| "Requests", | ||
| json.dumps(benchmark.config.requests), | ||
| ) | ||
| self._add_field( | ||
| headers, values, "Run Info", "Backend", json.dumps(benchmark.config.backend) | ||
| ) | ||
| self._add_field( | ||
| headers, | ||
| values, | ||
| "Run Info", | ||
| "Environment", | ||
| json.dumps(benchmark.config.environment), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests and environment values are not the most pretty, but it works and doesn't exclude any information.
The format doesn't preserve the format of the original input, like it does on main right now, but if that's okay then I'm okay with proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was of the opinion that for the csv side, this is more one off, manual debugging in spreadsheets if someone needs it. If they want actual access to the data, to query/manipulate/act on it, then they should be interacting with either the json file or loading it into a python object. I could be moved the otherway to try and format it better, but I think I would consider that a follow up to this PR
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't have anything obvious that I am going to block on. We just need to fix that test dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Either land #440 first for test fixes or ignore.
Summary
Details
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##)