-
Notifications
You must be signed in to change notification settings - Fork 140
chore(weave): Add Database Migration for Queue-Based Call Annotation System #5772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=e082febddcc427d66c059c433a2484dbdedeb543 |
cae9903 to
41a900b
Compare
41a900b to
4de8dfe
Compare
|
❌ Documentation Reference Check Failed No documentation reference found in the PR description. Please add either:
This check is required for all PRs except those that start with "chore(weave)" or explicitly state "docs are not required". Please update your PR description and this check will run again automatically. |
tssweeney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall feedback:
- I think this is overly complicated for MVP. the
annotator_queue_items_progresstable seems unnecessary under the condition that you only want a single annotator. Under the condition that you want multiple annotators, it needs revision to be logically consistent. - There is a key piece of data here which are the feedback records themselves. I am not seeing how these feedback records relate back to the requisite queue. It seems to me that we are doing a lot of de-normalization, but the most important bit of data (the feedback itself) is not easily found
- I am concerned about the mutability of some of these fields (specifically the scores on the queue) and how that would effect the overall bookeeping. If you don't want mutability, I would really consider modifying this to be more like Monitors that can benefit from refs and immutability.
| Example: ['weave:///entity/project/scorer/error_severity:abc123'] | ||
| Native Array(String) type enables efficient access without JSON parsing. | ||
| */ | ||
| scorer_refs Array(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan on having this field be mutable? What happens to all the downstream data if indeed it is mutated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be immutable at least for MVP. If a queue manager ever wants to score with something else. they should create a new queue.
And they shouldn't remove an existing scorer, because the annotators are or have already worked on some of them. If necessary we can address these conflicts post MVP if there is such a need.
| `call_started_at`: Cached from call for sorting/filtering without joins. | ||
| Snapshot at add time. | ||
| */ | ||
| call_started_at DateTime64(3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you decide on started_at, ended_at, op_name, and trace_id as the fields to denormalize? did you profile the cost of joining before doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't profiled yet. I don't have real production data to test yet. I could mock some data, but I fear that would take too much of time. Let me know if you feel profiling is necessary.
I decide these field based on this critera:
- They will never change. So we don't have to worry about keeping them in sync.
- We could offer users to sort from the annotation_queue_items table directly without joining with call table. Even joining with call table isn't slow, directly querying must be faster (even theoretically) and less resource intensive.
| `display_fields`: JSON paths to show annotator, e.g., ['input.prompt', 'output.text']. | ||
| Focuses annotator attention on relevant fields. Specified per batch when adding calls. | ||
| */ | ||
| display_fields Array(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed in design review, this is going to be heavily duplicative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think duplicating is fine, especially for MVP.
- Most of the case the display fields are not long.
- If we don't do this, it makes the other part of the system more complicated. For simplicity purpose, I suggest we try this first.
| ) ENGINE = MergeTree() | ||
| ORDER BY (project_id, queue_id, id) | ||
| SETTINGS | ||
| enable_block_number_column = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on the pros/cons of these settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are must-have if we intend to use the light weight update feature.
There are 3 conditions to enable light weight update (UPDATE statement)
- The table must be MergeTree
- enable_block_number_column = 1
- enable_block_offset_column = 1
| - completed (2): Annotation finished | ||
| - skipped (3): Annotator chose to skip | ||
| */ | ||
| annotation_state Enum8( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field seems overloaded. pending and claimed seem like properties on the item itself. where as completed or skipped seem like user-specific properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, these can be all treated as UI specific state. Why do you feel they are wrong? what else do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have claimed now. But I guess the other points (user-specific properties) are still not addressed. Please share your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this dynamic, rows in this table have two purposes (ie. overloaded). a) you need to create a new (empty) annotator_queue_items_progress row for every single annotation_queue_items. These will have state=pending and user=null. Then, once a user picks up the item, the annotator_queue_items_progress transitions to either completed/skipped with an attached user-id.
I think this dual mode overloads the table. you have two goals:
- Determine if an
annotation_queue_itemsis eligible for labelling (available) - Hold the label state for a given user <>
annotation_queue_itemstuple.
I think a cleaner approach is to separate concerns: ONLY add a row to annotator_queue_items_progress when a user has completed or skipped the annotation. This means that the annotation_state will never be pending here. In order to determine if something is available, it must have < X number of completed annotations (no more costly than your current approach)
If you want to reduce conflicts, having a claimed_at field on the annotation_queue_items could allow for time-based collision avoidance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with this approach. a record in annotator_queue_items_progress is created lazily when a user actually operate on the queue item.
| -- ============================================================================ | ||
| -- annotation_queues: Queue definitions and metadata | ||
| -- ============================================================================ | ||
| CREATE TABLE annotation_queues ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is strikingly similar to monitors (a name + scorers) - i am curious if you explored using the same structure as monitors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just a coincidence. They are semantically very different. I suggest not eagerly combining them and allow room for them to develop independently.
The main reason, is that in the annotate queue we are using scorers to present an annotation template. the human fill in the value. The input intake process is very vague. It requires the human reading whatever available the judge.
When in monitor cases, the scorers are supposed to run automatically by machine, with preset, determined input.
So maybe the scorer concept is overloaded. There could be a separate business object to represent annotation template.
| -- ============================================================================ | ||
| -- annotator_queue_items_progress: Per-annotator workflow state tracking | ||
| -- ============================================================================ | ||
| CREATE TABLE annotator_queue_items_progress ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotator_queue_items_progress
- queue_item
- scorer_ref
- user_id
- feedback_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of quickly loading the data for the queue list view and know the progress of each queue, having the (queue_item, scorer_ref, user_id, feedback_id) tuple probably doesn't help much. We can already infer that from the call <-> feedback relationship using the existing tables.
The complex part is to calculate
const completed = 0;
for (queue_item in queue) {
const allScorersExist = queue.score_refs.all(
score_ref => feadback.has(score_ref, queue_item.call_id)
);
if (allScorersExist) {
completed ++;
}
}
this calculation is too expensive. I have a hard time imagining how it can be done in SQL level without loading all the data into memory.
However doing this:
const completed = COUNT ( [ for queue_item => queue_item.state in [COMPLETED, SKIP] in queue.queue_items]);
is easy.
This means: the need to check if each individual scorer in the group of queue.score_refs has a feedback is avoided. We are going to roll up a cascaded result of queue.score_refs.all( score_ref => feadback.has(score_ref, queue_item.call_id) ) onto the queue_item level (in reality a row in annotator_queue_items_progress table). The roll up can be calculated conveniently on the frontend when user completes an annotation for each specific queue item.
Separately, due to the introduction of skipped state for the annotation workflow. Checking the existence of a feedback to determine whether an annotation work item has completed no longer work. However, the rolled up status on a annotator_queue_items_progress entry can work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are doing a bunch of other denormalization, idk why you would avoid adding a feedback_id here. That would make the lookup so much faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a memo: I dont think the feedback_id(a dict) is a reliable source for getting feedbacks. Tracing the feedbacks using call_id is more reliable.
This avoids dealing with the field initially being empty, and external changes to the call's feedback. (eg. removal)
For reverse reference and book-marking purposes, the feedbacks created by the workflow of the queue will link to the queue by queue_id on the feedback table.
4de8dfe to
ef5278c
Compare
| /* | ||
| `annotator_id`: W&B user ID of the annotator. NULL when state is 'pending'. | ||
| */ | ||
| annotator_id Nullable(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend that this is non-nullable, which implies we do not insert a "empty" version of this for each row. This also implies that the annotation_state should have in_progress, not pending and mark as in_progress when it starts to avoid collision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now made it non-nullable.
In terms of in_progress , if we don't want the claim feature, then we probably don't need this state:
-
I can check whether an existing
annotator_queue_items_progressto guard against conflicts, in case an existingannotator_queue_items_progressitem is discovered, then it means there was a simultaneous submission. The current submission will be dropped, UI will prompt the user to operate on the next available queue item. -
I don't want to make it in a way that as long as a user sees a queue_item , we mark it as
in_progress, that might be too eager, while user might just be interested in browsing the queue. -
So the ideal solution will be explicitly implement the
claimfeature. Our prior conversation seem to suggest we prefer simplicity for the MVP, that means if a user intend to complete an annotation item, he can just quickly finish it as best as he can. Please let me know if you feelclaimfeature is necessary, then we simply need to include thein_progressstate. otherwise I am keeping only "complete" and "skipped".
| /* | ||
| `completed_at`: Timestamp when annotation was completed. NULL if not completed. | ||
| */ | ||
| completed_at Nullable(DateTime64(3)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state_updated_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in favor of updated_at . I think we can get away with not having this column and still make state tracking coherent. (I'd like to start with minimum setup unless there is a strong reason not to)
f131bda to
98ab3fe
Compare
|
Hi @tssweeney In the new iteration, I addressed these items:
The change is not much, but the usage of the tables are different now. which are reflected in the dependent PRs in the stack. |
| annotation_state Enum8( | ||
| 'completed' = 0, | ||
| 'skipped' = 1 | ||
| ) DEFAULT 'completed', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small followup: didn't you want an "in progress" version here which is created when the task is started in order to help avoid multiple labelers being assigned the same task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tim,
I explained this above, but I am sure it is buried among the lines, let me paste it again:
In terms of in_progress , if we don't want the claim feature, then we probably don't need this state:
I can check whether an existing annotator_queue_items_progress to guard against conflicts, in case an existing annotator_queue_items_progress item is discovered, then it means there was a simultaneous submission. The current submission will be dropped, UI will prompt the user to operate on the next available queue item.
I don't want to make it in a way that as long as a user sees a queue_item , we mark it as in_progress , that might be too eager, while user might just be interested in browsing the queue.
So the ideal solution will be explicitly implement the claim feature. Our prior conversation seem to suggest we prefer simplicity for the MVP, that means if a user intend to complete an annotation item, he can just quickly finish it as best as he can. Please let me know if you feel claim feature is necessary, then we simply need to include the in_progress state. otherwise I am keeping only "complete" and "skipped".
On a second thought today, I decide to add it for now, so it occupies a "slot", whether/how I can use it, I will figure it out later.
c22eb96 to
7a95c07
Compare
7a95c07 to
58e407b
Compare
| /* | ||
| `description`: Optional description of the queue's purpose. | ||
| */ | ||
| description Nullable(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better if this was just an empty string and String type (Nullable has some overhead, probably worth it for deleted_at to make that explicit but for string type empty string is pretty obvious)
| */ | ||
| deleted_at Nullable(DateTime64(3)) | ||
| ) ENGINE = MergeTree() | ||
| ORDER BY (project_id, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this i'm assuming most queries use the id? Like we have queries "get annotation queus by id"? Otherwise we will want to consider secondary indices on whatever we are conditionining on, or if we are sorting by "created_at" for example we probably want a minmax index on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if most queries are "get me all annotation queues in the project" then we are good
| `queue_id`: The queue ID this feedback was created from. | ||
| References annotation_queues.id. NULL when feedback is created outside of queues. | ||
| */ | ||
| ADD COLUMN queue_id Nullable(String) DEFAULT NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could consider using String instead of Nullable(String) not necessary but an option.

Summary
This PR introduces the database schema for a queue-based call annotation system in Weave Trace. The system enables teams to create annotation queues, manually curate calls for review, and collaborate on systematic call annotation with multi-annotator support.
https://wandb.atlassian.net/browse/WB-29392
Database Schema Overview
The migration creates three core tables that work together to support annotation workflows:
1.
annotation_queues- Queue DefinitionsStores queue metadata and configuration:
weave:///entity/project/scorer/name:version) for multi-dimensional annotationdeleted_atfor non-destructive queue archivalKey Features:
Array(String)for scorer refs (efficient access without JSON parsing)2.
annotation_queue_items- Queue MembershipTracks which calls belong to each queue using a proxy pattern:
calls_mergedtablestarted_at,ended_at,op_name,trace_idat add-time for fast sorting/filtering without joins['input.prompt', 'output.text'])Why the Proxy Pattern?
calls_mergedtable3.
annotator_queue_items_progress- Workflow State TrackingManages per-annotator progress through annotation workflows:
pending→completedorskippedqueue_idandcall_idfor efficient querying without joinsIndex Strategy (ORDER BY Clause Rationale)
ClickHouse's
ORDER BYclause determines both physical sort order and the primary key index. Our indexing strategy prioritizes direct lookups by UUID and efficient JOINs over uniqueness enforcement (which is handled at the application layer).annotation_queues:ORDER BY (project_id, id)Optimizes:
SELECT * FROM annotation_queues WHERE id = ?SELECT * FROM annotation_queues WHERE project_id = ?Rationale: Queues are accessed primarily by their UUID identifier. Project-level scoping ensures tenant isolation.
annotation_queue_items:ORDER BY (project_id, queue_id, id)Optimizes:
JOIN annotation_queue_items qi ON qi.id = p.queue_item_id(primary use case)SELECT * FROM annotation_queue_items WHERE id = ?SELECT * FROM annotation_queue_items WHERE queue_id = ?Why NOT
ORDER BY (project_id, queue_id, call_id)?I considered including
call_idfor duplicate checking optimization:However, we chose to prioritize
idbecause:call_iddoesn't provide uniqueness: The same call can appear in multiple queues (multi-queue support is a core feature)call_idis fast enough (microseconds)GET /annotation_queue_items/{item_id}, which requires fast UUID lookupTrade-off Accepted: Duplicate checking requires scanning all granules in a queue and filtering by
call_id, but this is negligible for typical queue sizes.annotator_queue_items_progress:ORDER BY (project_id, queue_id, id)Optimizes:
PATCH /annotation_queue_progress/{progress_id}SELECT * FROM annotator_queue_items_progress WHERE queue_id = ?Why NOT include
annotation_statein ORDER BY?I considered
ORDER BY (project_id, queue_id, annotation_state, id)to optimize:However, I chose NOT to include
annotation_statebecause:annotation_statein the sort key means every state transition (pending→claimed→completed) physically moves the row on diskenable_block_number_column), which works best when updated columns are NOT in the sort keyClickHouse Granule Scanning:
annotation_statein ORDER BY: Scan ~2 granules (8192 rows each) for a 10K-item queueannotation_statein ORDER BY: Scan ~0.25 granules (only 'pending' items)Trade-off Accepted: State filtering requires scanning all queue granules, but this optimizes for the more critical update path.
Technical Details
Lightweight UPDATE Support
All three tables enable ClickHouse's lightweight UPDATE feature:
SETTINGS enable_block_number_column = 1, enable_block_offset_column = 1;This allows efficient in-place updates for:
display_fieldsinannotation_queue_items(if needed)annotation_state,claimed_at,completed_atinannotator_queue_items_progressNote: This adds minimal overhead to SELECT queries (patch application) but dramatically improves UPDATE performance.
Soft Delete Pattern
All tables include
deleted_at Nullable(DateTime64(3))for non-destructive deletion:Related Documentation
Full design document: Queue-Based Call Annotation System.md
This PR is database-only and introduces no breaking changes. The tables are new and do not modify existing schema.