-
Notifications
You must be signed in to change notification settings - Fork 37
Return disk agent to online #4903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Return disk agent to online #4903
Conversation
|
Hi! Thank you for contributing! |
|
добавь описание к пру и issue |
| new TEvDiskRegistryPrivate::TEvDiskRegistryAgentListExpiredParamsCleanup()); | ||
| } | ||
|
|
||
| void TDiskRegistryActor::ScheduleRestoreDisksToOnline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void TDiskRegistryActor::ScheduleRestoreDisksToOnline( | |
| void TDiskRegistryActor::ScheduleRestoreDisksToOnlineIfNeeded( |
и добавить внутрь функции проверку
if (!Config->GetCheckAgentsToRestoreToOnlineInterval()) {
return;
}
cloud/blockstore/libs/storage/disk_registry/disk_registry_state.cpp
Outdated
Show resolved
Hide resolved
| TEvDiskRegistry::TEvGetClusterCapacityRequest, | ||
| HandleGetClusterCapacity); | ||
|
|
||
| HFunc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может игнорировать сообщеньку когда DR в read only и перезапускать шедулинг сообщенек когда перевели из ReadOnly
cloud/blockstore/libs/storage/disk_registry/disk_registry_actor_restore_agents_to_online.cpp
Show resolved
Hide resolved
cloud/blockstore/libs/storage/disk_registry/disk_registry_actor_restore_agents_to_online.cpp
Show resolved
Hide resolved
|
тесты? |
| ctx, | ||
| TBlockStoreComponents::DISK_REGISTRY, | ||
| "Restoring agents with status \"back from unavailable\" and last state change more than " | ||
| "RestoreAgentsToOnlineInterval ago"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
распечатал бы конкретный RestoreAgentsToOnlineInterval
| Y_UNUSED(args); | ||
|
|
||
| for (const auto& agent : args.affectedAgents) { | ||
| LOG_INFO( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может одни сообщением распечатать сразу всех агентов?
| "Restored agent to online state: %s", | ||
| agent.c_str()); | ||
| } | ||
| bool immediatly = args.affectedAgents.size() >= Config->GetRestoreAgentsCountPerTransaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может лучше возвращать из функции RestoreAgentsFromWarning
булку про то что такие агенты еще остались и в зависимости от нее шедулить транзакцию, так наверное пожлегантнее будет чем смотреть на список args.affectedAgents
| runtime->AdvanceCurrentTime(24h); | ||
| runtime->DispatchEvents({}, 100ms); | ||
| agent = diskRegistry.GetAgentNodeId("agent-1"); | ||
| UNIT_ASSERT_EQUAL(agent->Record.GetAgentState(), NProto::EAgentState::AGENT_STATE_ONLINE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
проверить бы еще что если не прошло RestoreAgentsToOnlineInterval то мы не переводим агента
| if (!Config->GetCheckAgentsToRestoreToOnlineInterval()) { | ||
| return; | ||
| } | ||
| if (immediatly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.Schedule(
immediatly ? TDuration::Zero() : Config->GetCheckAgentsToRestoreToOnlineInterval(),
new TEvDiskRegistryPrivate::TEvDiskRegistryRestoreAgentsToOnline());
Issue: https://st.yandex-team.ru/NBS-5971
Restores agents with the status "back from unavailable" and a time elapsed since the last change greater than RestoreAgentsToOnlineInterval. The check for the presence of such agents is performed with the period set in the CheckAgentsToRestoreToOnlineInterval parameter. If the number of such agents is greater than or equal to what is set in the RestoreAgentsCountPerTransaction parameter, an additional check is performed immediately.