Lazy scorers #2726

fulmicoton · 2025-10-30T19:00:46Z

Follow up of #2681

Allow lazy evaluation of score. As soon as we identified that a doc won't
reach the topK threshold, we can stop the evaluation.
Adds a new type of ordering. Asc, Desc, Ascending but none at the end.
The latter is more natural for most search application, but is not what is common in SQL (which I assume is @stuhood / paradedb use case)
Allow for a different segment level score, segment level score and their conversion.
Rationalization of part of the code / API

This PR breaks public API, but fixing code should be straightforward.

- Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward.

stuhood

I think that this interface works! Thanks a lot for exploring it.

My only suggestion would be to break interfaces a little bit harder (naming wise), if possible: the capabilities exposed by this API go way beyond "tweaking a score", and into sorting on arbitrary features.

src/collector/top_score_collector.rs

src/collector/tweak_score_top_collector.rs

src/collector/top_score_collector.rs

src/collector/sort_key/sort_key_computer.rs

stuhood · 2025-11-03T05:52:59Z

I made a sketch of how this design could support implementing Collector::collect_block over here: #2728 ... seems promising, thanks!

fulmicoton · 2025-11-10T09:36:31Z

@stuhood Can you review for good now? I did not integrate your work on batching yet. I think this can be done in a separate PR.

The PR ended up larger, because I added the flexibility to deal with ordering equivalent to
ORDER BY col DESC NULLS LAST as well as ORDER BY col DESC to be compatible with both Quickwit and ParadeDB.

stuhood

Thanks so much for tackling this! The reduction in API surface area is really wonderful, and it looks like this enables the boxed/erased features in a cleaner way than #2681 did too.

Also, based on another experiment I was doing: I like removing impl Collector for TopDocs (despite the API change), because it cleans up the builder interface to not have to carry the generic type around.

Thanks again!

stuhood · 2025-11-10T22:17:11Z

src/collector/sort_key/sort_key_computer.rs

+        false
+    }
+
+    /// Sorting by score is special in that it allows for the Block-Wand optimization.


This comment is specific to the overridden method for scores.

stuhood · 2025-11-10T22:19:19Z

src/collector/sort_key/sort_by_static_fast_value.rs

+/// Sorts by a fast value (u64, i64, f64, bool).
+///
+/// The field must appear explicitly in the schema, with the right type, and declared as
+/// a fast field..


Suggested change

/// a fast field..

/// a fast field.

stuhood · 2025-11-10T22:19:31Z

src/collector/sort_key/sort_by_static_fast_value.rs

+///
+/// If the field is multivalued, only the first value is considered.
+///
+/// Document that do not have this value are still considered.


Suggested change

/// Document that do not have this value are still considered.

/// Documents that do not have this value are still considered.

stuhood · 2025-11-10T22:20:28Z

src/collector/sort_key/sort_by_string.rs

+///
+/// If the field is multivalued, only the first value is considered.
+///
+/// Document that do not have this value are still considered.


Suggested change

/// Document that do not have this value are still considered.

/// Documents that do not have this value are still considered.

stuhood · 2025-11-10T22:24:17Z

src/collector/sort_key/sort_by_score.rs

+    // Sorting by score is special in that it allows for the Block-Wand optimization.
+    fn collect_segment_top_k(
+        &self,
+        k: usize,
+        weight: &dyn crate::query::Weight,
+        reader: &crate::SegmentReader,
+        segment_ord: u32,
+    ) -> crate::Result<Vec<(Self::SortKey, DocAddress)>> {


I completely agree with not looping the batching into this change.

One thing that I wonder is whether batched, lazy collection could be used to implement block-wand, which might make scores less of a special case! Something to experiment with in the future.

stuhood · 2025-11-10T22:30:38Z

src/collector/sort_key/sort_by_score.rs

+
+/// Sort by similarity score.
+#[derive(Clone, Debug, Copy)]
+pub struct SortBySimilarityScore;


Naming nit: here we spell out "similarity score", whereas with TopDocs::order_by_score and the other method removals/changes, I think that we recognize rightly that a "score" should probably only mean a "similarity score". So I think that you could probably drop Similarity from the name here, which would make this more symmetrical with order_by_score.

fulmicoton-dd added 2 commits October 30, 2025 17:33

Small preliminary refactoring.

bef2bc1

stuhood reviewed Oct 30, 2025

View reviewed changes

src/collector/top_score_collector.rs Outdated Show resolved Hide resolved

src/collector/tweak_score_top_collector.rs Outdated Show resolved Hide resolved

src/collector/tweak_score_top_collector.rs Outdated Show resolved Hide resolved

fulmicoton-dd added 8 commits October 31, 2025 10:08

CR comment

176e984

Cargo fmt and renaming

88d4826

refactoring

485ee06

refactoring

1a677e6

Refactoring

b2cb883

unification

78ef833

renaming bizarro

01bc7f2

fixing unit test

2fba123

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch from 83308bf to 2fba123 Compare November 1, 2025 15:15

stuhood reviewed Nov 1, 2025

View reviewed changes

src/collector/top_score_collector.rs Outdated Show resolved Hide resolved

bugfix

b383510

stuhood reviewed Nov 2, 2025

View reviewed changes

src/collector/sort_key/sort_key_computer.rs Outdated Show resolved Hide resolved

stuhood mentioned this pull request Nov 3, 2025

Implement collect_block for lazy scorers. #2728

Draft

fulmicoton-dd added 6 commits November 4, 2025 18:46

unit test

61fc533

pleasing clippy

6cbad9e

reorganizing code

6b3845a

unit test

d56d8e3

trying to rationalize order

2632904

unit tests passing

c3d926b

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch 2 times, most recently from 291f254 to 259007c Compare November 7, 2025 21:13

Doc test passing

ef32cbb

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch from 259007c to ef32cbb Compare November 7, 2025 21:34

blop

3b6cdaf

fulmicoton changed the title ~~Paul.masurel/lazy scorers~~ Lazy scorers Nov 10, 2025

fulmicoton marked this pull request as ready for review November 10, 2025 09:34

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch 6 times, most recently from 6db70fb to bfcf508 Compare November 10, 2025 10:12

blop

e092e93

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch 2 times, most recently from cd4e41e to a55995a Compare November 10, 2025 12:58

Moving obsolete unit tests

71d9a5d

fulmicoton-dd force-pushed the paul.masurel/lazy-scorers branch from a55995a to 71d9a5d Compare November 10, 2025 13:04

stuhood approved these changes Nov 10, 2025

View reviewed changes

stuhood mentioned this pull request Nov 10, 2025

feat: Add support for ordering by multiple fields. #2681

Closed

	/// Document that do not have this value are still considered.
	/// Documents that do not have this value are still considered.

Uh oh!

Lazy scorers #2726

Are you sure you want to change the base?

Lazy scorers #2726

Uh oh!

Conversation

fulmicoton commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuhood left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stuhood commented Nov 3, 2025

Uh oh!

fulmicoton commented Nov 10, 2025

Uh oh!

stuhood left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fulmicoton commented Oct 30, 2025 •

edited

Loading

stuhood left a comment •

edited

Loading

stuhood left a comment •

edited

Loading

stuhood Nov 10, 2025 •

edited

Loading