-
Notifications
You must be signed in to change notification settings - Fork 510
fix: check metric compatibility before using vector index #5609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: check metric compatibility before using vector index #5609
Conversation
Previously, vector search would use an ANN index regardless of whether the index's metric type matched the query's requested metric. This produced incorrect distances when, for example, an index built with metric="dot" was used for a query with metric="l2". Now the scanner checks if the index's metric matches the user's requested metric. If they don't match, it silently falls back to flat search. If the user doesn't specify a metric, the index's metric is used. Changes: - Query.metric_type is now Option<DistanceType> (None = use index default) - Scanner checks metric compatibility before using an index - Explain plan now shows the metric being used - Java bindings updated to make distanceType optional Fixes lance-format#5608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code ReviewSummary: This PR fixes a critical bug (#5608) where vector search incorrectly used the ANN index regardless of the query's metric type, returning wrong distances. The fix changes No Major Issues FoundThe implementation is correct and well-tested. The logic properly:
Minor Observations (Non-blocking)1. Double index open in scanner.rs (lines ~3037-3066) The index is opened to check metric compatibility, but then 2. Java API backward compatibility The change from required new Query.Builder().setKey(...).build() // Used L2Now uses index default or data-type default. This is the intended behavior per the fix, but downstream Java users relying on implicit L2 may see different results if their index uses a different metric. This seems acceptable for a bug fix, but worth noting in release notes. TestsGood test coverage with LGTM |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
When a user specifies a metric that is incompatible with the data type (e.g., L2 on binary vectors), use the index with its own metric rather than falling back to flat search which would fail. The logic now is: - If metrics match: use the index - If user metric is incompatible with data type: use the index - If user metric is compatible but different from index: flat search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
This reverts commit 0a8d618.
Summary
Option<DistanceType>(None = use index default)Test plan
test_knn_metric_mismatch_falls_back_to_flat_searchtest_knn_no_metric_uses_index_metricFixes #5608
🤖 Generated with Claude Code