Skip to content

Conversation

@inabao
Copy link
Collaborator

@inabao inabao commented Nov 17, 2025

#1252

Summary by Sourcery

Enable HGraph to support a PQR reordering stage by adding a PqrReorder implementation, wiring it into the HGraph lifecycle, exposing configurable reorder parameters, and covering it with new tests

New Features:

  • Introduce PQR-based reorder strategy (PqrReorder) for HGraph to refine search using product quantization residuals

Enhancements:

  • Extend ReorderInterface with Train/InsertVector/Resize/(De)Serialize methods and add CalResidual to FlattenInterface/FlattenDataCell
  • Enhance HGraph to select and use the configured reorder strategy (flatten or PQR) through constructor, training, search, insertion, resizing, and serialization
  • Extend HGraphParameter and JSON templates with reorder configuration options

Build:

  • Include new reorder sources (pqr_reorder, reorder_parameter, flatten_reorder_parameter) in impl/reorder CMakeLists

Tests:

  • Add PQR reorder functional tests in test_hgraph.cpp for both PR and daily pipelines

@inabao inabao self-assigned this Nov 17, 2025
@inabao inabao added the kind/feature New feature or request label Nov 17, 2025
@inabao inabao requested a review from wxyucs as a code owner November 17, 2025 08:32
@sourcery-ai
Copy link

sourcery-ai bot commented Nov 17, 2025

Reviewer's Guide

This PR extends HGraph to support a new PQR-based reordering strategy alongside the existing flatten reorder. It introduces a Reorder interface and parameter framework, implements PqrReorder (with residual computation, bias management, training, query reordering, serialization), integrates configurable reorder instantiation and lifecycle hooks into HGraph, enhances FlattenDataCell for residual calculation, updates JSON schema/parameter mapping, and augments tests to cover PQR scenarios.

Sequence diagram for HGraph instantiation with configurable reorder strategy

sequenceDiagram
    participant HGraph
    participant HGraphParameter
    participant ReorderParameter
    participant FlattenReorder
    participant PqrReorder
    HGraph->>HGraphParameter: access reorder_param
    HGraphParameter->>ReorderParameter: create from JSON
    alt reorder_param.name_ == PQR_REORDER
        HGraph->>PqrReorder: instantiate with basic_flatten_codes, common_param, reorder_param
    else
        HGraph->>FlattenReorder: instantiate with high_precise_codes, allocator
    end
    HGraph->>ReorderInterface: assign to reorder_
Loading

Sequence diagram for HGraph lifecycle hooks with Reorder integration

sequenceDiagram
    participant HGraph
    participant ReorderInterface
    participant Dataset
    participant StreamWriter
    participant StreamReader
    participant DistHeap
    Note over HGraph: During Train
    HGraph->>ReorderInterface: Train(base_data, num_elements)
    Note over HGraph: During add_one_point
    HGraph->>ReorderInterface: InsertVector(data, inner_id)
    Note over HGraph: During resize
    HGraph->>ReorderInterface: Resize(new_size)
    Note over HGraph: During Serialize
    HGraph->>ReorderInterface: Serialize(writer)
    Note over HGraph: During Deserialize
    HGraph->>ReorderInterface: Deserialize(reader)
    Note over HGraph: During KnnSearch/RangeSearch/SearchWithRequest
    HGraph->>ReorderInterface: Reorder(query, candidate_heap, k)
Loading

Class diagram for new and updated Reorder classes

classDiagram
    class ReorderInterface {
        <<interface>>
        +Reorder(input, query, topk, allocator) const
        +InsertVector(vector, id)
        +Train(vector, count)
        +Resize(new_size)
        +Serialize(writer) const
        +Deserialize(reader)
        -size_
    }
    class FlattenReorder {
        +Reorder(input, query, topk, allocator) const
        -flatten_
        -allocator_
    }
    class PqrReorder {
        +Reorder(input, query, topk, allocator) const
        +InsertVector(vector, id)
        +Train(vector, count)
        +Resize(new_size)
        +Serialize(writer) const
        +Deserialize(reader)
        -flatten_
        -reorder_code_
        -allocator_
        -bias_
        -dim_
        -metric_
    }
    ReorderInterface <|.. FlattenReorder
    ReorderInterface <|.. PqrReorder

    class ReorderParameter {
        +name_
    }
    class FlattenReorderParameter {
        +FromJson(json)
        +ToJson() const
    }
    class PqrReorderParameter {
        +residual_param_
        +FromJson(json)
        +ToJson() const
        +CheckCompatibility(other) const
    }
    ReorderParameter <|.. FlattenReorderParameter
    ReorderParameter <|.. PqrReorderParameter

    class HGraph {
        -reorder_
        +reorder(query, candidate_heap, k) const
    }
    HGraph --> ReorderInterface : uses
Loading

Class diagram for FlattenDataCell residual calculation extension

classDiagram
    class FlattenInterface {
        +CalResidual(vector, residual, count)
    }
    class FlattenDataCell {
        +CalResidual(vector, residual, count)
        +quantizer_
        +io_
    }
    FlattenInterface <|.. FlattenDataCell
Loading

File-Level Changes

Change Details Files
Introduce a generic Reorder interface and parameter framework
  • Add no-op methods (Train, InsertVector, Resize, Serialize, Deserialize) to ReorderInterface
  • Define ReorderParameter base class and Flatten/PQR parameter subclasses
  • Implement CreateReorderParam factory and parameter compatibility checks
  • Update CMakeLists to include new reorder parameter sources
src/impl/reorder/reorder.h
src/impl/reorder/reorder_parameter.h
src/impl/reorder/reorder_parameter.cpp
src/impl/reorder/flatten_reorder_parameter.h
src/impl/reorder/pqr_reorder_parameter.h
src/impl/reorder/CMakeLists.txt
Implement PqrReorder strategy
  • Add PqrReorder class with Reorder, InsertVector, Train, Resize, Serialize, Deserialize methods
  • Compute residuals via FlattenInterface, manage bias for distance adjustment
  • Normalize vectors for cosine metric and apply product quantization on residuals
  • Serialize/deserialize codebook and bias values
src/impl/reorder/pqr_reorder.cpp
src/impl/reorder/pqr_reorder.h
Enable residual computation in flatten codes
  • Add CalResidual method to FlattenDataCell for batch residual computation
  • Expose CalResidual in FlattenInterface and throw on default
  • Include SIMD subtraction for residual calculation
src/datacell/flatten_datacell.h
src/datacell/flatten_interface.h
Integrate configurable reorder into HGraph lifecycle
  • Select FlattenReorder or PqrReorder in HGraph constructor based on reorder_param
  • Hook reorder_->Train/InsertVector/Resize/Serialize/Deserialize in Train, add_one_point, resize, Serialize/Deserialize
  • Simplify HGraph::reorder signature and update calls in KnnSearch and RangeSearch
  • Extend HGraphParameter to parse/emit reorder settings and update JSON template and parameter mapping
src/algorithm/hgraph.cpp
src/algorithm/hgraph.h
src/algorithm/hgraph_parameter.cpp
src/algorithm/hgraph_parameter.h
src/constants.cpp
src/inner_string_params.h
Add end-to-end tests for PQR reorder
  • Extend test_hgraph to include reorder_type and reorder_codes_quantization_type in build parameters
  • Implement TestHGraphPQR helper covering PR and daily scenarios
  • Add TEST_CASE entries for HGraph PQR in test suite
tests/test_hgraph.cpp

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link

Summary of Changes

Hello @inabao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the HGraph index by introducing support for Product Quantization Reorder (PQR). This new reordering strategy aims to improve the accuracy of vector similarity search results. The changes involve a modular redesign of the reordering component, allowing for different reorder implementations, and integrating PQR into the core HGraph lifecycle, from parameter configuration and training to search operations and persistence.

Highlights

  • New PQR Reorder Strategy: Introduced a new reordering strategy called PQR (Product Quantization Reorder) for HGraph, designed to enhance search accuracy by refining candidate lists.
  • Modular Reorder Implementation: Refactored the reordering mechanism with a new ReorderInterface, allowing for dynamic selection between PqrReorder and FlattenReorder based on configuration.
  • HGraph Integration: Integrated the PQR reorder into the HGraph index's core lifecycle, including parameter configuration, training, vector insertion, search operations, and persistence (serialization/deserialization).
  • Residual Calculation Support: Added a CalResidual method to the FlattenInterface and its FlattenDataCell implementation, which is crucial for the residual quantization process used by PQR.
  • Enhanced Parameterization and Testing: Extended HGraph parameters to support PQR-specific configurations and added new test cases to thoroughly validate the functionality and performance of the PQR reorder.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Guard calls to reorder_->Serialize and reorder_->Deserialize behind use_reorder_ (or a null check) to avoid dereferencing a null reorder_ when reordering is disabled.
  • Consider centralizing the mapping of reorder-related JSON keys and default values (in DEFAULT_MAP, CheckAndMappingExternalParam, and parameter templates) to reduce duplication and make future changes easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Guard calls to reorder_->Serialize and reorder_->Deserialize behind use_reorder_ (or a null check) to avoid dereferencing a null reorder_ when reordering is disabled.
- Consider centralizing the mapping of reorder-related JSON keys and default values (in DEFAULT_MAP, CheckAndMappingExternalParam, and parameter templates) to reduce duplication and make future changes easier.

## Individual Comments

### Comment 1
<location> `tests/test_hgraph.cpp:584-585` </location>
<code_context>
+                auto index = TestIndex::TestFactory(test_index->name, param, true);
+                auto dataset = HGraphTestIndex::pool.GetDatasetAndCreate(
+                    dim, resource->base_count, metric_type);
+                TestIndex::TestContinueAdd(index, dataset, true);
+                HGraphTestIndex::TestGeneral(index, dataset, search_param, recall);
+                vsag::Options::Instance().set_block_size_limit(origin_size);
+            }
</code_context>

<issue_to_address>
**suggestion (testing):** No direct assertion of PQR reorder-specific outputs or behaviors.

Consider adding assertions that specifically test PQR reorder outputs, such as verifying reordered results and correct computation/storage of bias and residuals.

Suggested implementation:

```cpp
                auto index = TestIndex::TestFactory(test_index->name, param, true);
                auto dataset = HGraphTestIndex::pool.GetDatasetAndCreate(
                    dim, resource->base_count, metric_type);
                TestIndex::TestContinueAdd(index, dataset, true);
                HGraphTestIndex::TestGeneral(index, dataset, search_param, recall);

                // PQR reorder-specific assertions
                // 1. Check that reordered results exist and are valid
                auto reordered_results = index->GetReorderedResults();
                ASSERT_FALSE(reordered_results.empty());
                // Optionally, check ordering or values
                for (size_t i = 1; i < reordered_results.size(); ++i) {
                    ASSERT_LE(reordered_results[i-1].score, reordered_results[i].score);
                }

                // 2. Check that bias and residuals are computed and stored
                auto bias = index->GetBias();
                auto residuals = index->GetResiduals();
                ASSERT_FALSE(bias.empty());
                ASSERT_FALSE(residuals.empty());
                ASSERT_EQ(bias.size(), resource->base_count);
                ASSERT_EQ(residuals.size(), resource->base_count);

                vsag::Options::Instance().set_block_size_limit(origin_size);
            }

```

- The code assumes the existence of `GetReorderedResults()`, `GetBias()`, and `GetResiduals()` methods on the index object. If these do not exist, you will need to implement them in the index class.
- You may want to adjust the assertions to match the actual data structures and expected values in your test context.
- If you use a different assertion framework, replace `ASSERT_FALSE` and `ASSERT_EQ` with the appropriate macros.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for PQR (Product Quantization Residuals) reordering in HGraph, which is a significant feature enhancement. The changes are extensive, touching the HGraph lifecycle from construction and training to search and serialization to properly integrate the new reordering strategy. My review identified a critical bug in the PQR distance calculation logic and a potential null pointer dereference. I have also included several suggestions to improve code safety, clarity, and adherence to modern C++ practices. Addressing these points will improve the robustness and maintainability of the new feature.

Comment on lines 40 to 46
auto pqr_reorder_param = std::dynamic_pointer_cast<PqrReorderParameter>(reorder_param);
reorder_code_ =
FlattenInterface::MakeInstance(pqr_reorder_param->residual_param_, inner_common);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The result of std::dynamic_pointer_cast is not checked. If reorder_param is not of type PqrReorderParameter, pqr_reorder_param will be nullptr, and dereferencing it on the next line will cause a crash. It's crucial to add a null check here to handle potential invalid parameter types gracefully.

        auto pqr_reorder_param = std::dynamic_pointer_cast<PqrReorderParameter>(reorder_param);
        if (pqr_reorder_param == nullptr) {
            throw VsagException(ErrorType::INVALID_ARGUMENT, "Invalid reorder parameter type for PqrReorder");
        }
        reorder_code_ =
            FlattenInterface::MakeInstance(pqr_reorder_param->residual_param_, inner_common);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +46 to +52
float final_dist = candidate_result[i].first;
dists[i] = 1 - dists[i];
if (metric_ == MetricType::METRIC_TYPE_L2SQR) {
final_dist += bias_[ids[i]];
dists[i] *= 2;
}
reorder_heap->Push(final_dist - dists[i], ids[i]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for calculating the reordered distance is incorrect and difficult to understand. The line dists[i] = 1 - dists[i]; is applied unconditionally, which leads to incorrect distance calculations for all metric types. For L2, it introduces an error, and for Cosine/IP, it also produces the wrong value. I suggest refactoring this loop for correctness and clarity.

        float final_dist = candidate_result[i].first;
        if (metric_ == MetricType::METRIC_TYPE_L2SQR) {
            final_dist += bias_[ids[i]] - 2 * dists[i];
        } else {
            final_dist -= dists[i];
        }
        reorder_heap->Push(final_dist, ids[i]);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reorder_code_ uses a fixed IP as the distance metric, so there should be no issues with this

#include "storage/stream_writer.h"
namespace vsag {

static const int64_t TRAIN_MAX_SIZE = 65535;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and type safety, this constant could be a static constexpr uint64_t and named according to a style guide (e.g., kTrainMaxSize). This makes it a compile-time constant and ensures type consistency with count in the Train method.

static constexpr uint64_t kTrainMaxSize = 65535;

FP32ComputeIP(residual_vector.data(), residual_vector.data(), dim_);
}
if (need_release) {
allocator_->Deallocate((void*)codes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a C-style cast (void*) to cast away const is unsafe and hides the intent. It's better to use const_cast to make it explicit that const is being cast away, which improves code safety and readability.

Suggested change
allocator_->Deallocate((void*)codes);
allocator_->Deallocate(const_cast<uint8_t*>(codes));


void
PqrReorder::Train(const void* vector, uint64_t count) {
int64_t train_size = count > TRAIN_MAX_SIZE ? TRAIN_MAX_SIZE : count;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::min is clearer than a ternary operator for capping a value. Additionally, since count is uint64_t, train_size should also be an unsigned type to avoid potential signed/unsigned comparison issues.

Suggested change
int64_t train_size = count > TRAIN_MAX_SIZE ? TRAIN_MAX_SIZE : count;
const uint64_t train_size = std::min(count, static_cast<uint64_t>(TRAIN_MAX_SIZE));

DEFINE_POINTER(FlattenReorderParameter);
class FlattenReorderParameter : public ReorderParameter {
public:
explicit FlattenReorderParameter() : ReorderParameter(std::move(FLATTEN_REORDER)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::move on a const char* literal is unnecessary as it has no effect and can be misleading. The std::string constructor will be called regardless. Removing it makes the code cleaner.

Suggested change
explicit FlattenReorderParameter() : ReorderParameter(std::move(FLATTEN_REORDER)) {
explicit FlattenReorderParameter() : ReorderParameter(FLATTEN_REORDER) {

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 78.72340% with 10 lines in your changes missing coverage. Please review.

❌ Your patch check has failed because the patch coverage (78.72%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main    #1340      +/-   ##
==========================================
+ Coverage   91.83%   91.91%   +0.07%     
==========================================
  Files         320      322       +2     
  Lines       17857    17901      +44     
==========================================
+ Hits        16399    16453      +54     
+ Misses       1458     1448      -10     
Flag Coverage Δ
cpp 91.91% <78.72%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 89.53% <ø> (ø)
datacell 93.21% <ø> (+0.36%) ⬆️
index 90.78% <100.00%> (-0.06%) ⬇️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49bcede...76c4906. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: jinjiabao.jjb <[email protected]>
Signed-off-by: jinjiabao.jjb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants