[llm_bench] Add batch considering in throughput and 2nd token latency for LLMPipeline #3108

sbalandi · 2025-12-11T12:51:16Z

Description

It was found that llm_bench shows very low throughput for speculative decode. This happens because llm_bench calculates the latency for tokens other than the first based on the PerfMetrics raw_metrics.m_new_token_times, which do not take into account the batch. raw_metrics.m_durations are also calculated based on raw_metrics.m_new_token_times, but the time is divided into batches, which significantly affects the calculations for speculative decoding.

CVS-174513

Fixes #(issue)

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

… for LLMPipeline

AsyaPronina

Thanks a lot!!

[llm_bench] Add batch considering in throughput and 2nd token latency…

cbcd06b

… for LLMPipeline

sbalandi requested review from AsyaPronina and as-suvorov December 11, 2025 12:51

github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Dec 11, 2025

as-suvorov approved these changes Dec 11, 2025

View reviewed changes

Merge branch 'master' into sd_llm_bench

a509d30

AsyaPronina approved these changes Dec 12, 2025

View reviewed changes

Merge branch 'master' into sd_llm_bench

dc855e7

sbalandi added this pull request to the merge queue Dec 17, 2025

Merged via the queue into openvinotoolkit:master with commit 2c35439 Dec 17, 2025
97 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm_bench] Add batch considering in throughput and 2nd token latency for LLMPipeline #3108

[llm_bench] Add batch considering in throughput and 2nd token latency for LLMPipeline #3108

Uh oh!

sbalandi commented Dec 11, 2025

Uh oh!

AsyaPronina left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[llm_bench] Add batch considering in throughput and 2nd token latency for LLMPipeline #3108

[llm_bench] Add batch considering in throughput and 2nd token latency for LLMPipeline #3108

Uh oh!

Conversation

sbalandi commented Dec 11, 2025

Description

Checklist:

Uh oh!

AsyaPronina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants