Skip to content

Conversation

@chance-wnb
Copy link
Contributor

@chance-wnb chance-wnb commented Nov 20, 2025

Summary

This PR introduces the database schema for a queue-based call annotation system in Weave Trace. The system enables teams to create annotation queues, manually curate calls for review, and collaborate on systematic call annotation with multi-annotator support.

https://wandb.atlassian.net/browse/WB-29392

Database Schema Overview

The migration creates three core tables that work together to support annotation workflows:

1. annotation_queues - Queue Definitions

Stores queue metadata and configuration:

  • Queue identity: UUID-based unique identifier
  • Scorers: Array of weave refs (e.g., weave:///entity/project/scorer/name:version) for multi-dimensional annotation
  • Soft delete support: deleted_at for non-destructive queue archival

Key Features:

  • Native Array(String) for scorer refs (efficient access without JSON parsing)
  • Lightweight UPDATE support for modifying queue properties

2. annotation_queue_items - Queue Membership

Tracks which calls belong to each queue using a proxy pattern:

  • Call membership: Links queues to calls without modifying the immutable calls_merged table
  • Cached call fields: Snapshots of started_at, ended_at, op_name, trace_id at add-time for fast sorting/filtering without joins
  • Display fields: Per-batch configuration of which JSON paths to show annotators (e.g., ['input.prompt', 'output.text'])
  • Multi-queue support: Same call can appear in multiple queues with independent state

Why the Proxy Pattern?

  • ✅ Keeps annotation workflow metadata separate from core call data
  • ✅ Enables multi-queue support (same call in different queues)
  • ✅ Fast queries without expensive joins to the massive calls_merged table
  • ✅ Snapshot semantics: queue items capture call state at add-time

3. annotator_queue_items_progress - Workflow State Tracking

Manages per-annotator progress through annotation workflows:

  • Workflow states: pendingcompleted or skipped
  • Denormalized fields: queue_id and call_id for efficient querying without joins
  • Multi-annotator ready: Schema supports both shared pool (MVP) and future N-annotations-per-call workflows

Index Strategy (ORDER BY Clause Rationale)

ClickHouse's ORDER BY clause determines both physical sort order and the primary key index. Our indexing strategy prioritizes direct lookups by UUID and efficient JOINs over uniqueness enforcement (which is handled at the application layer).

annotation_queues: ORDER BY (project_id, id)

Optimizes:

  • ✅ Direct queue lookups: SELECT * FROM annotation_queues WHERE id = ?
  • ✅ List queues in a project: SELECT * FROM annotation_queues WHERE project_id = ?

Rationale: Queues are accessed primarily by their UUID identifier. Project-level scoping ensures tenant isolation.

annotation_queue_items: ORDER BY (project_id, queue_id, id)

Optimizes:

  • ✅ Fast JOIN lookups: JOIN annotation_queue_items qi ON qi.id = p.queue_item_id (primary use case)
  • ✅ Direct item access: SELECT * FROM annotation_queue_items WHERE id = ?
  • ✅ List all items in queue: SELECT * FROM annotation_queue_items WHERE queue_id = ?

Why NOT ORDER BY (project_id, queue_id, call_id)?

I considered including call_id for duplicate checking optimization:

-- Duplicate check query (runs during batch add)
SELECT call_id FROM annotation_queue_items
WHERE project_id = ? AND queue_id = ? AND call_id IN (...)

However, we chose to prioritize id because:

  1. call_id doesn't provide uniqueness: The same call can appear in multiple queues (multi-queue support is a core feature)
  2. JOIN frequency >> duplicate check frequency:
    • JOINs happen on every annotation action (read display fields, check state)
    • Duplicate checks only happen during batch add operations (less frequent)
  3. Small dataset per queue: With typical queue sizes (100s-1000s of items), scanning queue granules to filter by call_id is fast enough (microseconds)
  4. Direct item API access: Many endpoints use GET /annotation_queue_items/{item_id}, which requires fast UUID lookup

Trade-off Accepted: Duplicate checking requires scanning all granules in a queue and filtering by call_id, but this is negligible for typical queue sizes.

annotator_queue_items_progress: ORDER BY (project_id, queue_id, id)

Optimizes:

  • ✅ Direct progress record access: PATCH /annotation_queue_progress/{progress_id}
  • ✅ List all progress in queue: SELECT * FROM annotator_queue_items_progress WHERE queue_id = ?

Why NOT include annotation_state in ORDER BY?

I considered ORDER BY (project_id, queue_id, annotation_state, id) to optimize:

-- Query for available work
SELECT * FROM annotator_queue_items_progress
WHERE queue_id = ? AND annotation_state = 'pending'

However, I chose NOT to include annotation_state because:

  1. Causes data reorganization on UPDATE: Including annotation_state in the sort key means every state transition (pendingclaimedcompleted) physically moves the row on disk
  2. High update frequency: Items transition through states constantly during annotation workflows
  3. Defeats lightweight UPDATE optimization: The migration specifically enables ClickHouse's lightweight UPDATE feature (enable_block_number_column), which works best when updated columns are NOT in the sort key
  4. Minimal query performance impact: With typical queue sizes (100s-1000s), scanning all queue granules and filtering by state is fast enough

ClickHouse Granule Scanning:

  • Without annotation_state in ORDER BY: Scan ~2 granules (8192 rows each) for a 10K-item queue
  • With annotation_state in ORDER BY: Scan ~0.25 granules (only 'pending' items)
  • At this scale, the 4x difference is negligible (microseconds) compared to data reorganization overhead (milliseconds + merge overhead)

Trade-off Accepted: State filtering requires scanning all queue granules, but this optimizes for the more critical update path.

Technical Details

Lightweight UPDATE Support

All three tables enable ClickHouse's lightweight UPDATE feature:

SETTINGS
    enable_block_number_column = 1,
    enable_block_offset_column = 1;

This allows efficient in-place updates for:

  • display_fields in annotation_queue_items (if needed)
  • annotation_state, claimed_at, completed_at in annotator_queue_items_progress

Note: This adds minimal overhead to SELECT queries (patch application) but dramatically improves UPDATE performance.

Soft Delete Pattern

All tables include deleted_at Nullable(DateTime64(3)) for non-destructive deletion:

  • NULL = active record
  • Non-NULL = soft deleted (timestamp of deletion)

Related Documentation

Full design document: Queue-Based Call Annotation System.md


This PR is database-only and introduces no breaking changes. The tables are new and do not modify existing schema.

Copy link
Contributor Author

chance-wnb commented Nov 20, 2025

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@wandbot-3000
Copy link

wandbot-3000 bot commented Nov 20, 2025

@chance-wnb chance-wnb changed the title feat(weave): DB schema for annotation queue feat(weave): Add Database Migration for Queue-Based Call Annotation System Nov 20, 2025
@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch from cae9903 to 41a900b Compare November 21, 2025 03:51
@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch from 41a900b to 4de8dfe Compare November 22, 2025 02:26
@github-actions
Copy link
Contributor

❌ Documentation Reference Check Failed

No documentation reference found in the PR description. Please add either:

This check is required for all PRs except those that start with "chore(weave)" or explicitly state "docs are not required". Please update your PR description and this check will run again automatically.

@chance-wnb chance-wnb marked this pull request as ready for review November 25, 2025 17:33
@chance-wnb chance-wnb changed the title feat(weave): Add Database Migration for Queue-Based Call Annotation System chore(weave): Add Database Migration for Queue-Based Call Annotation System Nov 25, 2025
Copy link
Collaborator

@tssweeney tssweeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall feedback:

  1. I think this is overly complicated for MVP. the annotator_queue_items_progress table seems unnecessary under the condition that you only want a single annotator. Under the condition that you want multiple annotators, it needs revision to be logically consistent.
  2. There is a key piece of data here which are the feedback records themselves. I am not seeing how these feedback records relate back to the requisite queue. It seems to me that we are doing a lot of de-normalization, but the most important bit of data (the feedback itself) is not easily found
  3. I am concerned about the mutability of some of these fields (specifically the scores on the queue) and how that would effect the overall bookeeping. If you don't want mutability, I would really consider modifying this to be more like Monitors that can benefit from refs and immutability.

Example: ['weave:///entity/project/scorer/error_severity:abc123']
Native Array(String) type enables efficient access without JSON parsing.
*/
scorer_refs Array(String),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan on having this field be mutable? What happens to all the downstream data if indeed it is mutated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be immutable at least for MVP. If a queue manager ever wants to score with something else. they should create a new queue.

And they shouldn't remove an existing scorer, because the annotators are or have already worked on some of them. If necessary we can address these conflicts post MVP if there is such a need.

`call_started_at`: Cached from call for sorting/filtering without joins.
Snapshot at add time.
*/
call_started_at DateTime64(3),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you decide on started_at, ended_at, op_name, and trace_id as the fields to denormalize? did you profile the cost of joining before doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't profiled yet. I don't have real production data to test yet. I could mock some data, but I fear that would take too much of time. Let me know if you feel profiling is necessary.

I decide these field based on this critera:

  1. They will never change. So we don't have to worry about keeping them in sync.
  2. We could offer users to sort from the annotation_queue_items table directly without joining with call table. Even joining with call table isn't slow, directly querying must be faster (even theoretically) and less resource intensive.

`display_fields`: JSON paths to show annotator, e.g., ['input.prompt', 'output.text'].
Focuses annotator attention on relevant fields. Specified per batch when adding calls.
*/
display_fields Array(String),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in design review, this is going to be heavily duplicative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think duplicating is fine, especially for MVP.

  1. Most of the case the display fields are not long.
  2. If we don't do this, it makes the other part of the system more complicated. For simplicity purpose, I suggest we try this first.

) ENGINE = MergeTree()
ORDER BY (project_id, queue_id, id)
SETTINGS
enable_block_number_column = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on the pros/cons of these settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are must-have if we intend to use the light weight update feature.

There are 3 conditions to enable light weight update (UPDATE statement)

  1. The table must be MergeTree
  2. enable_block_number_column = 1
  3. enable_block_offset_column = 1

- completed (2): Annotation finished
- skipped (3): Annotator chose to skip
*/
annotation_state Enum8(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field seems overloaded. pending and claimed seem like properties on the item itself. where as completed or skipped seem like user-specific properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, these can be all treated as UI specific state. Why do you feel they are wrong? what else do you suggest?

Copy link
Contributor Author

@chance-wnb chance-wnb Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have claimed now. But I guess the other points (user-specific properties) are still not addressed. Please share your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this dynamic, rows in this table have two purposes (ie. overloaded). a) you need to create a new (empty) annotator_queue_items_progress row for every single annotation_queue_items. These will have state=pending and user=null. Then, once a user picks up the item, the annotator_queue_items_progress transitions to either completed/skipped with an attached user-id.

I think this dual mode overloads the table. you have two goals:

  1. Determine if an annotation_queue_items is eligible for labelling (available)
  2. Hold the label state for a given user <> annotation_queue_items tuple.

I think a cleaner approach is to separate concerns: ONLY add a row to annotator_queue_items_progress when a user has completed or skipped the annotation. This means that the annotation_state will never be pending here. In order to determine if something is available, it must have < X number of completed annotations (no more costly than your current approach)

If you want to reduce conflicts, having a claimed_at field on the annotation_queue_items could allow for time-based collision avoidance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with this approach. a record in annotator_queue_items_progress is created lazily when a user actually operate on the queue item.

-- ============================================================================
-- annotation_queues: Queue definitions and metadata
-- ============================================================================
CREATE TABLE annotation_queues (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is strikingly similar to monitors (a name + scorers) - i am curious if you explored using the same structure as monitors.

Copy link
Contributor Author

@chance-wnb chance-wnb Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just a coincidence. They are semantically very different. I suggest not eagerly combining them and allow room for them to develop independently.

The main reason, is that in the annotate queue we are using scorers to present an annotation template. the human fill in the value. The input intake process is very vague. It requires the human reading whatever available the judge.

When in monitor cases, the scorers are supposed to run automatically by machine, with preset, determined input.

So maybe the scorer concept is overloaded. There could be a separate business object to represent annotation template.

-- ============================================================================
-- annotator_queue_items_progress: Per-annotator workflow state tracking
-- ============================================================================
CREATE TABLE annotator_queue_items_progress (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotator_queue_items_progress

  • queue_item
  • scorer_ref
  • user_id
  • feedback_id

Copy link
Contributor Author

@chance-wnb chance-wnb Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of quickly loading the data for the queue list view and know the progress of each queue, having the (queue_item, scorer_ref, user_id, feedback_id) tuple probably doesn't help much. We can already infer that from the call <-> feedback relationship using the existing tables.

The complex part is to calculate

const completed = 0;
for (queue_item in queue) {
    const allScorersExist = queue.score_refs.all(
       score_ref => feadback.has(score_ref, queue_item.call_id) 
    );
    if (allScorersExist) {
        completed ++;
    }
}

this calculation is too expensive. I have a hard time imagining how it can be done in SQL level without loading all the data into memory.

However doing this:

const completed = COUNT ( [ for queue_item => queue_item.state in [COMPLETED, SKIP] in queue.queue_items]);

is easy.

This means: the need to check if each individual scorer in the group of queue.score_refs has a feedback is avoided. We are going to roll up a cascaded result of queue.score_refs.all( score_ref => feadback.has(score_ref, queue_item.call_id) ) onto the queue_item level (in reality a row in annotator_queue_items_progress table). The roll up can be calculated conveniently on the frontend when user completes an annotation for each specific queue item.

Separately, due to the introduction of skipped state for the annotation workflow. Checking the existence of a feedback to determine whether an annotation work item has completed no longer work. However, the rolled up status on a annotator_queue_items_progress entry can work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are doing a bunch of other denormalization, idk why you would avoid adding a feedback_id here. That would make the lookup so much faster

Copy link
Contributor Author

@chance-wnb chance-wnb Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a memo: I dont think the feedback_id(a dict) is a reliable source for getting feedbacks. Tracing the feedbacks using call_id is more reliable.

This avoids dealing with the field initially being empty, and external changes to the call's feedback. (eg. removal)

For reverse reference and book-marking purposes, the feedbacks created by the workflow of the queue will link to the queue by queue_id on the feedback table.

@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch from 4de8dfe to ef5278c Compare November 26, 2025 00:55
@chance-wnb chance-wnb requested a review from tssweeney November 26, 2025 01:57
/*
`annotator_id`: W&B user ID of the annotator. NULL when state is 'pending'.
*/
annotator_id Nullable(String),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend that this is non-nullable, which implies we do not insert a "empty" version of this for each row. This also implies that the annotation_state should have in_progress, not pending and mark as in_progress when it starts to avoid collision

Copy link
Contributor Author

@chance-wnb chance-wnb Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now made it non-nullable.

In terms of in_progress , if we don't want the claim feature, then we probably don't need this state:

  1. I can check whether an existing annotator_queue_items_progress to guard against conflicts, in case an existing annotator_queue_items_progress item is discovered, then it means there was a simultaneous submission. The current submission will be dropped, UI will prompt the user to operate on the next available queue item.

  2. I don't want to make it in a way that as long as a user sees a queue_item , we mark it as in_progress , that might be too eager, while user might just be interested in browsing the queue.

  3. So the ideal solution will be explicitly implement the claim feature. Our prior conversation seem to suggest we prefer simplicity for the MVP, that means if a user intend to complete an annotation item, he can just quickly finish it as best as he can. Please let me know if you feel claim feature is necessary, then we simply need to include the in_progress state. otherwise I am keeping only "complete" and "skipped".

/*
`completed_at`: Timestamp when annotation was completed. NULL if not completed.
*/
completed_at Nullable(DateTime64(3)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state_updated_at

Copy link
Contributor Author

@chance-wnb chance-wnb Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in favor of updated_at . I think we can get away with not having this column and still make state tracking coherent. (I'd like to start with minimum setup unless there is a strong reason not to)

@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch 2 times, most recently from f131bda to 98ab3fe Compare December 16, 2025 21:10
@chance-wnb
Copy link
Contributor Author

Hi @tssweeney

In the new iteration, I addressed these items:

  1. Add a queue_id reference field in the feedback table
  2. Remove the "pending" state in annotator_queue_items_progress:annotation_state , since a record is only created when user actually annotated a queue item.
  3. Removed the redundant fields completed_by complete_at

The change is not much, but the usage of the tables are different now. which are reflected in the dependent PRs in the stack.

@chance-wnb chance-wnb requested a review from tssweeney December 16, 2025 22:28
annotation_state Enum8(
'completed' = 0,
'skipped' = 1
) DEFAULT 'completed',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small followup: didn't you want an "in progress" version here which is created when the task is started in order to help avoid multiple labelers being assigned the same task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim,

I explained this above, but I am sure it is buried among the lines, let me paste it again:

In terms of in_progress , if we don't want the claim feature, then we probably don't need this state:

I can check whether an existing annotator_queue_items_progress to guard against conflicts, in case an existing annotator_queue_items_progress item is discovered, then it means there was a simultaneous submission. The current submission will be dropped, UI will prompt the user to operate on the next available queue item.

I don't want to make it in a way that as long as a user sees a queue_item , we mark it as in_progress , that might be too eager, while user might just be interested in browsing the queue.

So the ideal solution will be explicitly implement the claim feature. Our prior conversation seem to suggest we prefer simplicity for the MVP, that means if a user intend to complete an annotation item, he can just quickly finish it as best as he can. Please let me know if you feel claim feature is necessary, then we simply need to include the in_progress state. otherwise I am keeping only "complete" and "skipped".

On a second thought today, I decide to add it for now, so it occupies a "slot", whether/how I can use it, I will figure it out later.

@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch from c22eb96 to 7a95c07 Compare December 30, 2025 03:29
@chance-wnb chance-wnb force-pushed the chance/annotation_queue_db branch from 7a95c07 to 58e407b Compare December 31, 2025 02:12
/*
`description`: Optional description of the queue's purpose.
*/
description Nullable(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better if this was just an empty string and String type (Nullable has some overhead, probably worth it for deleted_at to make that explicit but for string type empty string is pretty obvious)

*/
deleted_at Nullable(DateTime64(3))
) ENGINE = MergeTree()
ORDER BY (project_id, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this i'm assuming most queries use the id? Like we have queries "get annotation queus by id"? Otherwise we will want to consider secondary indices on whatever we are conditionining on, or if we are sorting by "created_at" for example we probably want a minmax index on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if most queries are "get me all annotation queues in the project" then we are good

`queue_id`: The queue ID this feedback was created from.
References annotation_queues.id. NULL when feedback is created outside of queues.
*/
ADD COLUMN queue_id Nullable(String) DEFAULT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could consider using String instead of Nullable(String) not necessary but an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants