-
Notifications
You must be signed in to change notification settings - Fork 4.1k
add pending signaled wrs checking #3137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pending signaled wrs checking #3137
Conversation
3fb155a to
bb827cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements tracking of pending signaled RDMA send work requests (WRs) to optimize window management. The changes add a counter for signaled WRs and conditionally wake up the writing thread based on this counter to prevent premature thread switching.
- Adds
_pending_signaled_wrscounter to track signaled send WRs - Introduces WR ID generation for RDMA work requests
- Updates window wake-up logic to consider pending signaled WRs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/brpc/rdma/rdma_endpoint.h | Adds atomic counter for pending signaled WRs |
| src/brpc/rdma/rdma_endpoint.cpp | Implements WR ID generation, counter management, and conditional window wake-up logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| switch (wc.opcode) { | ||
| case IBV_WC_SEND: { // send completion | ||
| // Do nothing | ||
| if (wc.wr_id == 0) { |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition checks if wc.wr_id == 0, but GenerateWrId() explicitly generates non-zero IDs (comment at line 930 states '0 is an invalid Id'). The signaled WRs in CutFromIOBufList at line 880 don't set wr.wr_id, leaving it as 0 after memset at line 808. This logic appears inverted - it should decrement when wc.wr_id != 0 (for SendImm) or the counter should be incremented for signaled sends in CutFromIOBufList that have wr_id == 0.
| if (wc.wr_id == 0) { | |
| if (wc.wr_id != 0) { |
chenBright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR causes the RdmaTest.send_rpcs_in_one_qp of brpc_rdma_unittest.cpp to fail.
src/brpc/rdma/rdma_endpoint.cpp
Outdated
| uint32_t wnd_thresh = _local_window_capacity / 8; | ||
| if (_window_size.fetch_add(acks, butil::memory_order_relaxed) >= wnd_thresh | ||
| || acks >= wnd_thresh) { | ||
| if (_pending_signaled_wrs.load(butil::memory_order_relaxed) <= 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When _pending_signaled_wrs is greater than 1, _window_size is not updated, and acks will be lost.
src/brpc/rdma/rdma_endpoint.cpp
Outdated
|
|
||
| ibv_send_wr wr; | ||
| memset(&wr, 0, sizeof(wr)); | ||
| wr.wr_id = GenerateWrId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be sufficient to set wr_id to a fixed value for Imm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, just use a fixed value
d31e8c3 to
f7f03e6
Compare
| return -1; | ||
| } | ||
|
|
||
| if (signaled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line, I'm not sure if send cqe can generate energy after this atomic operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The number of pending signaled send WRs | ||
| butil::atomic<uint16_t> _pending_signaled_wrs; | ||
| // The number of pending acks | ||
| uint16_t _pending_acks; |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _pending_acks member is not thread-safe (plain uint16_t instead of atomic). In polling mode, multiple poller threads can call HandleCompletion concurrently for the same endpoint, leading to race conditions on lines 997, 1016, and 1018 where this variable is read and written without synchronization.
| uint16_t _pending_acks; | |
| butil::atomic<uint16_t> _pending_acks; |
src/brpc/rdma/rdma_endpoint.cpp
Outdated
|
|
||
| ssize_t RdmaEndpoint::HandleCompletion(ibv_wc& wc) { | ||
| bool zerocopy = FLAGS_rdma_recv_zerocopy; | ||
| uint16_t pending_signaled_wrs = 2; |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable pending_signaled_wrs is initialized to 2, which appears to be a magic number. This initialization value doesn't reflect the actual number of pending signaled WRs and relies on being overwritten in the IBV_WC_SEND case. Consider initializing it to the actual value from _pending_signaled_wrs or using a constant with a descriptive name to clarify its purpose as a sentinel value.
| uint16_t pending_signaled_wrs = 2; | |
| uint16_t pending_signaled_wrs = _pending_signaled_wrs.load(butil::memory_order_relaxed); |
583a2af to
078e868
Compare
078e868 to
ab7debe
Compare
chenBright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use two windows: one representing the local sending window size and the other representing the remote receiving window size. Sending should only occur if both window values are greater than 0.
See #3145 . We encountered the same problem, and the above method solved it.
| << wc.opcode; | ||
| return -1; | ||
| } | ||
| if (pending_signaled_wrs <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that _window_size may still be larger than the actual sq size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give an example to describe it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the client sends _local_window_capacity / 4 + 1 requests, it receives an ACK and a send CQE from the server. At this point, _window_size increases by _local_window_capacity / 4 + 1, and sq size increases by _local_window_capacity / 4.
The issue of inconsistency between _window_size and sq size still exists.
Yes, I think your PR is better. |
What problem does this PR solve?
Issue Number: #3132
Problem Summary:
What is changed and the side effects?
Changed:
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: