-
Notifications
You must be signed in to change notification settings - Fork 201
concurrency-safe vertex iterators for levelled forest #8202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
concurrency-safe vertex iterators for levelled forest #8202
Conversation
…led forest (incl. detailed documentation and test coverage).
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
peterargue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks for improving these docs.
One general comment, I think the docs as a whole make it clear, but it would be easy to confuse the use case of VertexIteratorConcurrencySafe.
The intention is to allow concurrent use of a single iterator. I could see people making the assumption it was for concurrent access to the forest.
| // This vertex iterator does NOT COPY the provided list of vertices for | ||
| // efficiency reasons. For APPEND_ONLY `VertexList`s, the `VertexIterator` | ||
| // can be wrapped into a VertexIteratorConcurrencySafe to make it concurrency | ||
| // safe. By design, the ResultForest guarantees this. Hence, construction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // safe. By design, the ResultForest guarantees this. Hence, construction | |
| // safe. By design, the LevelledForest guarantees this. Hence, construction |
| assert.Equal(t, len(vertexList), len(iterator.data)+1) | ||
| }) | ||
|
|
||
| t.Run(fmt.Sprintf("fully filled non-empty slice (len = 10, cap = 10)"), func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| t.Run(fmt.Sprintf("fully filled non-empty slice (len = 10, cap = 10)"), func(t *testing.T) { | |
| t.Run("fully filled non-empty slice (len = 10, cap = 10)", func(t *testing.T) { |
| // Even if the Levelled Forest makes additions to the input slice, we maintain our own notion of | ||
| // length and backing slice. | ||
| // CAUTION: | ||
| // - we NOT COPY the list's containers for efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // - we NOT COPY the list's containers for efficiency. | |
| // - we do NOT COPY the list's containers for efficiency. |
Status quo: concurrency limitations of the leveled forest
The
levelled forestis an in-memory data structure that stores vertices of a forest-style graph. Internally it uses:flow-go/module/forest/leveled_forest.go
Lines 24 to 26 in de694d9
utilizing the lower-level structs
flow-go/module/forest/leveled_forest.go
Lines 31 to 32 in de694d9
flow-go/module/forest/leveled_forest.go
Lines 41 to 48 in de694d9
The forest itself has no internal synchronization and all operations mutate shared state. Hence, concurrent reads and mutations (methods
LevelledForest.AddVertexPruneUpToLevel) introduce the usual race conditionsIterator-specific issue
The core subtlety is that the iterator API reads directly into the internal slice structures of the forest (intentionally avoiding copying for performance). Even with locking of the forest, an iterator may be accessed from a different goroutine than the one holding the mutex. Example:
Specifically, the
VertexIteratorinternally references aVertexListslice, which is owned by the forest. If the forest mutates that slice concurrently, we have to worry that an iterator may:In other words: the concurrency hazard is not the iterator’s logic but the fact that the iterator reads from a slice that can be concurrently mutated by other goroutines updating the forest.
Why the forest itself does not need stronger concurrency guarantees
For performance reasons, the forest is not concurrency safe, since it is very easy for the higher-level logic to provide synchronization if desired (wrapping all forest operations in a mutex, see code snippets below). Because all structural mutations funnel through a narrow API surface, this is marginal extra work.
Exemplary code snippets
Focus of this PR: supportive primitives are required for concurrency-safe iterators
At the hear of the challenge lies the fact that iterator references forest-owned slices. Recalling elements from the iterator is trivially safe if the forest were to never mutate that slice underneath the iterator. However, this would defeat the purpose of a concurrent environment. The iterator therefore requires support from the forest to guarantee that the portion of the data is is reading is not modified concurrently:
PR Summary
This PR provides a small wrapper (
VertexIteratorConcurrencySafe) hat can be applied to the non-concurrency safeVertexIterator. Algorithmically, the wrapper is very light and efficient, because it does not copy the data. However, the challenge is proving it's concurrency safety, which is the main contribution of this PR.Extensive documentation and tests validate the concurrency model and ensure the forest's continuing operations do not interfering with iterator.
The forest itself remains intentionally non–thread-safe, since callers can trivially guard forest mutations using a mutex.