-
Notifications
You must be signed in to change notification settings - Fork 310
[JS] VLMPipeline implementation #3112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JS] VLMPipeline implementation #3112
Conversation
Signed-off-by: Kirill Suvorov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive JavaScript bindings support for the VLMPipeline (Visual Language Model Pipeline) with several improvements to error handling, code structure, and API design.
Key Changes:
- Implemented VLMPipeline similar to LLMPipeline with enhanced error handling that uses callbacks to reject promises instead of fatal errors
- Separated streamer and callback logic for clearer semantics and different use cases
- Added wrapper properties
is_initializingandis_generatingto prevent incorrect concurrent operations - Refactored performance metrics to use inheritance through a base template class to avoid code duplication
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js/tests/vlmPipeline.test.js | Comprehensive test suite for VLMPipeline functionality including text generation, streaming, image/video handling, and chat operations |
| src/js/tests/utils.js | Helper functions to create test image and video tensors for VLM testing |
| src/js/tests/models.js | Added VLLM model reference for testing |
| src/js/src/vlm_pipeline/vlm_pipeline_wrapper.cpp | Core VLMPipeline implementation with improved error handling and async generation |
| src/js/src/vlm_pipeline/start_chat_worker.cpp | AsyncWorker implementation for starting chat sessions |
| src/js/src/vlm_pipeline/perf_metrics.cpp | VLM-specific performance metrics wrapper extending base metrics |
| src/js/src/vlm_pipeline/init_worker.cpp | AsyncWorker for pipeline initialization with concurrent init prevention |
| src/js/src/vlm_pipeline/finish_chat_worker.cpp | AsyncWorker for finishing chat sessions |
| src/js/src/perf_metrics.cpp | Refactored to use base template class for code reuse |
| src/js/src/llm_pipeline/llm_pipeline_wrapper.cpp | Updated to use shared helper function for decoded results |
| src/js/src/helper.cpp | Added tensor conversion helpers and result conversion utilities |
| src/js/src/addon.cpp | Registered VLMPipeline and VLMPerfMetrics classes |
| src/js/lib/utils.ts | Added VLMPipelineProperties type definition |
| src/js/lib/pipelines/vlmPipeline.ts | TypeScript VLMPipeline class with generate, stream, and chat methods |
| src/js/lib/pipelines/llmPipeline.ts | Moved DecodedResults to separate module for reuse |
| src/js/lib/perfMetrics.ts | Type definitions for performance metrics including VLM-specific metrics |
| src/js/lib/index.ts | Exported VLMPipeline factory and related types |
| src/js/lib/decodedResults.ts | Shared DecodedResults classes for LLM and VLM pipelines |
| src/js/lib/addon.ts | TypeScript interface definitions for VLMPipeline addon |
| src/js/include/vlm_pipeline/vlm_pipeline_wrapper.hpp | Header for VLMPipelineWrapper class |
| src/js/include/vlm_pipeline/start_chat_worker.hpp | Header for VLMStartChatWorker AsyncWorker |
| src/js/include/vlm_pipeline/perf_metrics.hpp | Header for VLMPerfMetricsWrapper |
| src/js/include/vlm_pipeline/init_worker.hpp | Header for VLMInitWorker AsyncWorker |
| src/js/include/vlm_pipeline/finish_chat_worker.hpp | Header for VLMFinishChatWorker AsyncWorker |
| src/js/include/perf_metrics.hpp | Updated to use base template class |
| src/js/include/helper.hpp | Added function declarations for tensor and result conversion |
| src/js/include/base/perf_metrics.hpp | Base template class for performance metrics wrappers using CRTP pattern |
| src/js/include/addon.hpp | Added VLM pipeline and metrics references to AddonData |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
400a8c4 to
5ca69c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::shared_ptr<ov::genai::VLMPipeline>& pipe, | ||
| std::shared_ptr<bool> is_initializing, | ||
| const std::string model_path, | ||
| std::string device, |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device parameter should be passed by const reference to avoid unnecessary string copying in the constructor.
| std::string device, | |
| const std::string& device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
site/docs/use-cases/image-processing/_sections/_run_model/_code_example_js.mdx
Outdated
Show resolved
Hide resolved
site/docs/use-cases/image-processing/_sections/_run_model/_code_example_js.mdx
Outdated
Show resolved
Hide resolved
site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx
Outdated
Show resolved
Hide resolved
Co-authored-by: Yaroslav Tarkan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
site/docs/use-cases/image-processing/_sections/_run_model/_code_example_js.mdx
Outdated
Show resolved
Hide resolved
site/docs/use-cases/image-processing/_sections/_run_model/_code_example_js.mdx
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return this.texts[0]; | ||
| } | ||
| const lines = this.scores.map((score, i) => `${score.toFixed(6)}: ${this.texts[i]}`); | ||
| return lines.join("\n"); |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toString() implementation differs from the original in llmPipeline.ts. The original handled the last element separately to avoid a trailing newline, but this version always adds newlines between all elements. While functionally similar, this creates an inconsistency in output format compared to the original implementation.
| return lines.join("\n"); | |
| let result = ""; | |
| for (let i = 0; i < lines.length; i++) { | |
| result += lines[i]; | |
| if (i !== lines.length - 1) { | |
| result += "\n"; | |
| } | |
| } | |
| return result; |
| return env.Undefined(); | ||
| } | ||
|
|
||
| Napi::Value VLMPipelineWrapper::start_chat(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Napi::Value VLMPipelineWrapper::start_chat(const Napi::CallbackInfo& info) { | |
| void VLMPipelineWrapper::start_chat(const Napi::CallbackInfo& info) { |
| return env.Undefined(); | ||
| } | ||
|
|
||
| Napi::Value VLMPipelineWrapper::set_chat_template(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove return value from void functions
| Napi::Value VLMPipelineWrapper::set_chat_template(const Napi::CallbackInfo& info) { | |
| void VLMPipelineWrapper::set_chat_template(const Napi::CallbackInfo& info) { |
| obj.Set("texts", cpp_to_js<std::vector<std::string>, Napi::Value>(env, results.texts)); | ||
| obj.Set("scores", cpp_to_js<std::vector<float>, Napi::Value>(env, results.scores)); | ||
| obj.Set("perfMetrics", PerfMetricsWrapper::wrap(env, results.perf_metrics)); | ||
| obj.Set("subword", Napi::String::New(env, results)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if subword is used in context of DecodedResults
I think it can be a leftover to keep backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will remove it when LLMPipeline is refactored
almilosz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few comments
| }; | ||
|
|
||
| void VLMInitWorker::OnOK() { | ||
| *this->is_initializing = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this variable needed also for LLMPipeline? If yes please add it in separate PR
| InstanceMethod("setGenerationConfig", &VLMPipelineWrapper::set_generation_config)}); | ||
| } | ||
|
|
||
| Napi::Value VLMPipelineWrapper::init(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Napi::Value VLMPipelineWrapper::init(const Napi::CallbackInfo& info) { | |
| void VLMPipelineWrapper::init(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VALIDATE_ARGS_COUNT now returns Env().Undefined(). It doesn't allow it to be done now. :(
Let's fix it later.
| [context, this](Napi::Env) { // Finalizer used to clean threads up | ||
| context->native_thread.join(); | ||
| delete context; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Finalizer is repeated for each pipeline. We can extract it like here and reuse it: https://github.com/openvinotoolkit/openvino/blob/805d41a1c7ae209619651e57863d961ee32c65c2/src/bindings/js/node/src/infer_request.cpp#L197
| } else { | ||
| // If no exceptions from streamer, call the final callback with the result | ||
| napi_status status = | ||
| context->callback.BlockingCall([result, &report_error](Napi::Env env, Napi::Function jsCallback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here result is passed by value. Such copy may affect performance. Maybe use shared_ptr or capture?
Description
This PR introduces comprehensive support for the JavaScript bindings in the VLMPipeline.
VLMPipeline is implemented similarly to LLMPipeline, but with some improvements:
vLmPerformInferenceThreadwas modified for a much better user experience. It no longer throws fatal errors. Now, the thread uses a callback to reject the promise and throw a catchable exception to the user. If that is not possible, an error message is logged.is_initializing, was added to prevent the pipeline from being loaded incorrectly twice.is_generating, was added to prevent the incorrect attempt to call generate in parallel. This leads to an InferRequest exception, which may confuse the user.CVS-172789
Checklist: