Testing #3087

peterchen-intel · 2025-11-30T11:36:57Z

Description

CVS-###

Fixes #(issue)

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

…genai into HEAD

Signed-off-by: fishbell <[email protected]>

…genai into HEAD

Signed-off-by: fishbell <[email protected]>

Signed-off-by: xiping.yan <[email protected]>

2: bell's update, Signed-off-by: xiping.yan <[email protected]>

Signed-off-by: xiping.yan <[email protected]>

Copilot

Pull request overview

This PR introduces support for EAGLE3 speculative decoding, a new variant of speculative decoding that improves inference performance. The key changes include:

Implementation of EAGLE3 speculative decoding algorithm with hidden state handling
Python test infrastructure updates to support EAGLE3 models
Model transformation passes for extracting hidden states from specific layers
CI/CD workflow updates to test EAGLE3 functionality

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/python_tests/utils/hugging_face.py	Added EAGLE3 model detection and conditional tokenizer handling
tests/python_tests/test_continuous_batching.py	Added EAGLE3 speculative decoding test cases
tests/python_tests/samples/test_speculative_decoding_lm.py	Refactored test structure and added EAGLE3 test class
tests/python_tests/samples/conftest.py	Added EAGLE3 model conversion configuration
src/cpp/src/utils.hpp	Removed ModelDesc struct and helper functions (moved to separate file)
src/cpp/src/utils.cpp	Changed default prefix caching setting and moved ModelDesc functions
src/cpp/src/speculative_decoding/update_request_structs.hpp	Added hidden_states field to GeneratedSequence struct
src/cpp/src/speculative_decoding/speculative_decoding_stateful.hpp	Added model_desc.hpp include
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp	Introduced generate_common template function and GenerateStrategy struct
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp	Refactored generate() to use generate_common with strategy pattern
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp	New file implementing EAGLE3 decoding with model transformations
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp	New file containing EAGLE3 implementation details
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.hpp	Added EAGLE3-specific pipeline implementation class
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp	Added hidden state handling and EAGLE3 pipeline support
src/cpp/src/sequence_group.hpp	Added hidden state storage and manipulation methods to Sequence
src/cpp/src/sampling/sampler.hpp	Added draft-to-target token mapping support
src/cpp/src/sampling/sampler.cpp	Implemented token offset computation for EAGLE3 draft model
src/cpp/src/model_desc.hpp	New file defining ModelDesc struct
src/cpp/src/model_desc.cpp	New file implementing ModelDesc utility functions
src/cpp/src/llm/pipeline.cpp	Added EAGLE3 runtime info handling and model initialization
src/cpp/src/continuous_batching/pipeline_impl.hpp	Added step counter for embeddings handling
src/cpp/src/continuous_batching/pipeline_impl.cpp	Updated embedding append logic for EAGLE3
src/cpp/src/continuous_batching/pipeline.cpp	Added EAGLE3 pipeline instantiation logic
src/cpp/src/continuous_batching/model_runner.hpp	Added hidden state management infrastructure
src/cpp/src/continuous_batching/model_runner.cpp	Implemented hidden state import/export during inference
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp	Added EAGLE3 forward declarations and friend classes
samples/python/visual_language_chat/benchmark_vlm.py	Added draft model support (appears to be test/debug code)
.github/workflows/windows.yml	Added EAGLE3 test workflow step
.github/workflows/manylinux_2_28.yml	Added EAGLE3 test workflow step
.github/workflows/linux.yml	Added EAGLE3 test workflow step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-30T11:38:49Z

tests/python_tests/utils/hugging_face.py

+        hf_tokenizer.save_pretrained(models_path)
+        # convert tokenizers as well
+        convert_and_save_tokenizer(hf_tokenizer, models_path)


The conversion from str(models_path) on line 241 to just models_path is inconsistent with line 243 which still uses str(models_path) in the convert_and_save_tokenizer call. For consistency, either both should use str(models_path) or both should use models_path directly.

Suggested change

hf_tokenizer.save_pretrained(models_path)

# convert tokenizers as well

convert_and_save_tokenizer(hf_tokenizer, models_path)

hf_tokenizer.save_pretrained(str(models_path))

# convert tokenizers as well

convert_and_save_tokenizer(hf_tokenizer, str(models_path))

Copilot · 2025-11-30T11:38:49Z

tests/python_tests/samples/test_speculative_decoding_lm.py

+    @pytest.mark.parametrize(
+        "convert_model, convert_draft_model, sample_args",
+        [
+            pytest.param("Qwen3-1.7B", "qwen3_1.7b_eagle3", test_prompt, marks=pytest.mark.skip(reason='CVS-171947, CVS-171943, CVS-174959')),


The test is marked as skipped with multiple CVS ticket references. Consider adding a more descriptive reason explaining what issues these tickets represent and what needs to be fixed before the test can be enabled.

Suggested change

pytest.param("Qwen3-1.7B", "qwen3_1.7b_eagle3", test_prompt, marks=pytest.mark.skip(reason='CVS-171947, CVS-171943, CVS-174959')),

pytest.param(

"Qwen3-1.7B",

"qwen3_1.7b_eagle3",

test_prompt,

marks=pytest.mark.skip(

reason=(

"Skipped due to known issues: "

"CVS-171947 (model conversion failure), "

"CVS-171943 (output mismatch), "

"CVS-174959 (runtime error). "

"Enable this test after resolving these issues."

)

),

),

Copilot · 2025-11-30T11:38:50Z

src/cpp/src/utils.cpp

    SchedulerConfig default_config;
    default_config.max_num_batched_tokens = std::numeric_limits<size_t>::max(); // don't limit total batch size
-    default_config.enable_prefix_caching = true; // for better TTFT in chat scenarios
+    default_config.enable_prefix_caching = false; // for better TTFT in chat scenarios


The comment states 'for better TTFT in chat scenarios' but the code is disabling prefix caching, which typically would hurt TTFT performance. The comment appears to contradict the implementation. Please clarify or correct the comment to explain why disabling prefix caching improves TTFT.

Suggested change

default_config.enable_prefix_caching = false; // for better TTFT in chat scenarios

default_config.enable_prefix_caching = false; // disables prefix caching; in some chat scenarios, this may reduce memory usage or avoid stale cache issues, but may hurt TTFT performance

Copilot · 2025-11-30T11:38:50Z

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

                result.removed_tokens_cnt = remove_tokens_from_sequence(running_sequence, min_generated_tokens, logit_processor);
-
+                //if (!m_is_validation_mode_enabled)
+                    //std::cout << "removed " << result.removed_tokens_cnt << " tokens from sequence " << std::endl;


Commented-out debug code should be removed before merging to production.

Suggested change

//std::cout << "removed " << result.removed_tokens_cnt << " tokens from sequence " << std::endl;

Copilot · 2025-11-30T11:38:50Z

src/cpp/src/continuous_batching/model_runner.hpp

+                } else if (_is_hs_internal()) {
+                    // fill hidden_state_data with m_hidden_states
+                    if (hidden_state_data) {
+                        OPENVINO_ASSERT(num_scheduled_tokens == 1, "unexpected num_scheduled_tokens in speculative drafting stage in eagle3 mode");


Corrected spelling of 'drafting' to 'decoding' in the error message.

Suggested change

OPENVINO_ASSERT(num_scheduled_tokens == 1, "unexpected num_scheduled_tokens in speculative drafting stage in eagle3 mode");

OPENVINO_ASSERT(num_scheduled_tokens == 1, "unexpected num_scheduled_tokens in speculative decoding stage in eagle3 mode");

Copilot · 2025-11-30T11:38:51Z

samples/python/visual_language_chat/benchmark_vlm.py

+        print("draft_model_path=", draft_model_path)
+        print("device=", device)
+        draft_model = ov_genai.draft_model(str(draft_model_path), device)
+        #pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)


Debug print statements and commented-out code should be removed before merging. These appear to be temporary testing code that shouldn't be in production.

Suggested change

print("draft_model_path=", draft_model_path)

print("device=", device)

draft_model = ov_genai.draft_model(str(draft_model_path), device)

#pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)

draft_model = ov_genai.draft_model(str(draft_model_path), device)

Copilot · 2025-11-30T11:38:51Z

.github/workflows/windows.yml

            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
            timeout: 90
+          - name: 'EAGLE3 speculative decoding tests'
+            cmd: |


Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change

cmd: |

cmd: |

# TODO: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.

# Reason: [Add explanation here, e.g., "Required for feature X not yet merged upstream. See PR #123."]

# Remove this and use official optimum-intel release when upstream PR is merged.

Copilot · 2025-11-30T11:38:51Z

.github/workflows/manylinux_2_28.yml

            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
            timeout: 90
+          - name: 'EAGLE3 speculative decoding tests'
+            cmd: |


Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change

cmd: |

cmd: |

# FIXME: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.

# Reason: Required changes are not yet merged upstream. See https://github.com/huggingface/optimum-intel/pull/<PR_NUMBER> (replace with actual PR/issue link).

# Remove this and use official optimum-intel release once changes are merged.

Copilot · 2025-11-30T11:38:52Z

.github/workflows/linux.yml

            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
            timeout: 90
+          - name: 'EAGLE3 speculative decoding tests'
+            cmd: |


Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change

cmd: |

cmd: |

# FIXME: Installing from a personal fork is fragile. This is required because the official optimum-intel does not yet support EAGLE3 speculative decoding.

# Remove this and use the official optimum-intel release once https://github.com/huggingface/optimum-intel/pull/XXX is merged and released.

Signed-off-by: xipingya <[email protected]>

songbell and others added 14 commits November 22, 2025 00:25

eagle3 top1 impl

786b2e5

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

cc20fca

…genai into HEAD

update test

0d67b9e

Signed-off-by: fishbell <[email protected]>

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

bf603eb

…genai into HEAD

apply some review comments, fix merge left over

c16b50b

Signed-off-by: fishbell <[email protected]>

apply copilot

aab75bd

Signed-off-by: fishbell <[email protected]>

tune test

948819a

Signed-off-by: fishbell <[email protected]>

update benchmark.py

e75034a

pass embedding to draft model.

6848bed

Signed-off-by: xiping.yan <[email protected]>

add model_desc header to fix mutual include issue.

5e49ddf

Signed-off-by: xiping.yan <[email protected]>

pass embedding handle to eagle3

cbdaf47

Signed-off-by: xiping.yan <[email protected]>

1: default enable_prefix_caching = false;

bde1ede

2: bell's update, Signed-off-by: xiping.yan <[email protected]>

embedding still need to pass tokenizer.

fd1d938

Signed-off-by: xiping.yan <[email protected]>

Fix accuracy issue from bell.

138d9d5

Signed-off-by: xiping.yan <[email protected]>

Copilot AI review requested due to automatic review settings November 30, 2025 11:36

peterchen-intel added do_not_merge do_not_review labels Nov 30, 2025

Copilot AI reviewed Nov 30, 2025

View reviewed changes

fix issue. copy offset wrong.

5ee79e7

Signed-off-by: xipingya <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing #3087

Testing #3087

peterchen-intel commented Nov 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	default_config.enable_prefix_caching = false; // for better TTFT in chat scenarios
	default_config.enable_prefix_caching = false; // disables prefix caching; in some chat scenarios, this may reduce memory usage or avoid stale cache issues, but may hurt TTFT performance

	OPENVINO_ASSERT(num_scheduled_tokens == 1, "unexpected num_scheduled_tokens in speculative drafting stage in eagle3 mode");
	OPENVINO_ASSERT(num_scheduled_tokens == 1, "unexpected num_scheduled_tokens in speculative decoding stage in eagle3 mode");

-            cmd: |
+            cmd: |
+              # TODO: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.
+              # Reason: [Add explanation here, e.g., "Required for feature X not yet merged upstream. See PR #123."]
+              # Remove this and use official optimum-intel release when upstream PR is merged.

-            cmd: |
+            cmd: |
+              # FIXME: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.
+              # Reason: Required changes are not yet merged upstream. See https://github.com/huggingface/optimum-intel/pull/<PR_NUMBER> (replace with actual PR/issue link).
+              # Remove this and use official optimum-intel release once changes are merged.

-            cmd: |
+            cmd: |
+              # FIXME: Installing from a personal fork is fragile. This is required because the official optimum-intel does not yet support EAGLE3 speculative decoding.
+              # Remove this and use the official optimum-intel release once https://github.com/huggingface/optimum-intel/pull/XXX is merged and released.

Testing #3087

Are you sure you want to change the base?

Testing #3087

Conversation

peterchen-intel commented Nov 30, 2025

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants