-
Notifications
You must be signed in to change notification settings - Fork 24
Add interface to add client data rpc #846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interface to add client data rpc #846
Conversation
bf220c9 to
b069b44
Compare
| // becoming a leader. | ||
|
|
||
| RD_LOGD(NO_TRACE_ID, "become_leader_cb: setting traffic_ready_lsn from {} to {}", current_gate, new_gate); | ||
| m_listener->on_become_leader(m_group_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the purpose of on_become_xxx. Should we move it to the beginning of become_xxx_cb to call the upper callback first, similar to how handle_commit does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_become_leader and on_become_follower will be used for homeobject scrubbing. when scrubbing is on-going, if leader switch happens, the old leader will become follower and thus on_become_follower will be called on this node, where we stop the scrubbing thread that will request scrub results from the other two members. one of the follower will become leader and thus on_become_leader will be called on this node, where we will read the scrub superblk and start the scrubbing thread to request scrub result from the other two memebers.
Should we move it to the beginning
no need to do this now, what we want is to make sure that we can be notified with the leader change event, it does not matter it is called in the beginning or end.
|
Can we do a POC regarding the required communication primitives We can to the POC of HO in HS UT (which is easier to implement RepldevListener), that gives team better understanding the requirements |
|
I am now working on the POC code with this change to see if this is workable |
b069b44 to
2ee228f
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #846 +/- ##
==========================================
- Coverage 56.51% 48.30% -8.22%
==========================================
Files 108 110 +2
Lines 10300 12809 +2509
Branches 1402 6149 +4747
==========================================
+ Hits 5821 6187 +366
+ Misses 3894 2537 -1357
- Partials 585 4085 +3500 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d8fa192 to
0915f3e
Compare
0915f3e to
5ec0d54
Compare
|
this PR add three apis into repl_dev(exposes nuraft_mesg data service to upper layer) : I have written a POC to test this in homeobject layer, where the deserialized VSM are transfered through the data channel and then compared with leader . please see you can build homestore using this PR to create a local homestore conan repo and then use the following branch to build homeobject , and then run homestore_scrubber_test to see how it works. @xiaoxichen @Besroy any suggestion for adding this three api? |
| data_service_request_handler_t const& request_handler) = 0; | ||
|
|
||
| // send a unidirectional data rpc to dest with request_name and cli_buf | ||
| virtual nuraft_mesg::NullAsyncResult data_request_unidirectional(nuraft_mesg::destination_t const& dest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a generic interface we probably cannot use nuraft_mesg::XXXX,
we can use either
replica_member_info
or
replica_id_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using destination_t = std::variant< peer_id_t, role_regex, svr_id_t >
destination_t support several type of input params, this is more adaptive,
maybe other upper layer(not homeobject) wants to use svr_id_t or role_regex .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can re-encapsulate or redefine a type that is fine... but not exposing a nuraft type. it breaks the boundary.
HO doesnt need to care if HS is using nuraft_mesg for data_request_unidirectional, or use its own grpc, or use http, tcp , ftp, whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree and actually I have already tried to expose a clean api, but it is a little complicated.
coming to (http, tcp , ftp, whatever.), when failing to transfer data, we need to tell the upper layer the detailed failure info(why fails). so IMO, no need to do such a high level re-encapsulatation. since we actually use grpc to transfer data in the bottom layer, when grpc error happens, we should return this error to upper layer so that it can have a clear idea why grpc fails.
however, in nuraft_mesg, data_service_request_bidirectional never return a grpc error, and always change a grpc error to nuraft error with grpc_status_to_nuraft_code, and I am confused about the necessity here. As a result, at homestore level , when error happens, we can only get the error of a nuraft type for data_service_request_bidirectional . so if we do encapsulation here, we have three choices:
-
do not re-encapsulate, return directly as it is.
-
add a new function to convert nuraft error to grpc error, but this will miss somed detailed information. for example.
case ::grpc::StatusCode::INVALID_ARGUMENT:
case ::grpc::StatusCode::UNIMPLEMENTED:
case ::grpc::StatusCode::UNAUTHENTICATED:
case ::grpc::StatusCode::PERMISSION_DENIED:
case ::grpc::StatusCode::RESOURCE_EXHAUSTED:
case ::grpc::StatusCode::OUT_OF_RANGE:
return nuraft::cmd_result_code::BAD_REQUEST;
if we want to convert nuraft::cmd_result_code::BAD_REQUEST to grpc error , which grpc should we chose.
3 add a rude convert, which will lump all the nuraft error types under a single umbrella error type with no detailed info. then the upper lay can only know this transfer failure, but have no idea why it fails.
for now, I chose 1, although it looks not that ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well each layer is expected to lost some information , for example when you calling an HTTP service got a 500, how you expect to know the detail error inside the 500, is that GW cannot talk to Nudata? or GW cannot reach DM?
Another example you got EIO when your disk IO fail, the specific device error code is hidden. .
When an underlying error is important , then it is time to create a specific error mapping, for that error. We can add a log line in the error converter if needed.
It is even fine we use same type and directly return what upstream returns, but just , dont directly expose the upstream type .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let met try to re-encapsulate
|
|
||
| bool RaftReplDev::add_data_rpc_service(std::string const& request_name, | ||
| data_service_request_handler_t const& request_handler) { | ||
| return m_msg_mgr.bind_data_service_request(request_name, m_group_id, request_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, kind of concerning it will be a problem during shutdown, the HO will be deconstructed first then the handler is invalid.
It is not major atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good point. but actually , we call homestore::shutdown in HSHomeObject::shutdown(), this means Homeobject is not destructed until homestore shutdown return.
|
@Besroy @xiaoxichen I add some encapsulation so that no nuraft_mesg type will be exposed to upper layer |
|
|
||
| /// @brief when restart, after all the logs are replayed and before joining raft group, notify the upper layer | ||
| virtual void on_log_replay_done(const group_id_t& group_id) {}; | ||
| virtual void on_log_replay_done(const group_id_t& group_id) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was designed to be pure virtual originally。when we added this interface, in order to not block the homeobject build, we make it not pure virtual and have a default implementation. now that we have already implement this virtual function in homeobject side, we can change it to pure virtual now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it will break homeblks..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, solo_repl_dev never use ReplDevListener, so any change to ReplDevListener will not break solo_repl_dev, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, at least in the SoloReplDevUT there is a ReplDevListener, so I am expecting same pattern, the ReplDevListener is not solely for RaftReplDev.
Personally I wont touch it , but you can ask in the channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let me double confirm this in homestore channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove the changes on this part, let`s keep it for now
| using trace_id_t = u_int64_t; | ||
|
|
||
| using data_service_request_handler_t = std::function< void(boost::intrusive_ptr< sisl::GenericRpcData >& rpc_data) >; | ||
| ENUM(role_regex, uint8_t, LEADER, FOLLOWER, ALL, ANY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repl_role_regex
|
|
||
| using data_service_request_handler_t = std::function< void(boost::intrusive_ptr< sisl::GenericRpcData >& rpc_data) >; | ||
| ENUM(role_regex, uint8_t, LEADER, FOLLOWER, ALL, ANY); | ||
| using peer_id_t = boost::uuids::uuid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use replica_id_t directly.
| ENUM(role_regex, uint8_t, LEADER, FOLLOWER, ALL, ANY); | ||
| using peer_id_t = boost::uuids::uuid; | ||
| using svr_id_t = int32_t; | ||
| using destination_t = std::variant< peer_id_t, role_regex, svr_id_t >; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using repl_dest_t = std::variant< replica_id_t, repl_dest_role, svr_id_t>;
| using svr_id_t = int32_t; | ||
| using destination_t = std::variant< peer_id_t, role_regex, svr_id_t >; | ||
|
|
||
| ENUM(data_rpc_error_code, uint8_t, SUCCESS, TIMEOUT, SERVER_NOT_FOUND, CANCELLED, SERVER_ALREADY_EXISTS, TERM_MISMATCH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repl_deta_rpc_error_code
| void handle_fetch_data_response(sisl::GenericClientResponse response, std::vector< repl_req_ptr_t > rreqs); | ||
| bool is_resync_mode(); | ||
| data_rpc_error_code nuraft_to_data_rpc_error_code(nuraft::cmd_result_code const& nuraft_err); | ||
| nuraft_mesg::destination_t change_to_nuraft_mesg_destination(destination_t dest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets have a consistent naming for the converter.
nuraft_to_hs_error
hs_to_nuraft_dest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I have changed the code according to all the above comments. PTAL
ad004bf to
9242419
Compare
expose nuraft_message to upper layer, so that they can register their own rpc call by calling
bind_data_service_request