Skip to content

Conversation

@PeshkovMikhail
Copy link
Contributor

@PeshkovMikhail PeshkovMikhail commented Dec 24, 2025

Issue: https://st.yandex-team.ru/NBS-5971

Restores agents with the status "back from unavailable" and a time elapsed since the last change greater than RestoreAgentsToOnlineInterval. The check for the presence of such agents is performed with the period set in the CheckAgentsToRestoreToOnlineInterval parameter. If the number of such agents is greater than or equal to what is set in the RestoreAgentsCountPerTransaction parameter, an additional check is performed immediately.

@github-actions
Copy link
Contributor

Hi! Thank you for contributing!
The tests on this PR will run after a maintainer adds an ok-to-test label to this PR manually. Thank you for your patience!

@vladstepanyuk
Copy link
Collaborator

добавь описание к пру и issue

new TEvDiskRegistryPrivate::TEvDiskRegistryAgentListExpiredParamsCleanup());
}

void TDiskRegistryActor::ScheduleRestoreDisksToOnline(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void TDiskRegistryActor::ScheduleRestoreDisksToOnline(
void TDiskRegistryActor::ScheduleRestoreDisksToOnlineIfNeeded(

и добавить внутрь функции проверку

if (!Config->GetCheckAgentsToRestoreToOnlineInterval()) {
    return;
}

TEvDiskRegistry::TEvGetClusterCapacityRequest,
HandleGetClusterCapacity);

HFunc(
Copy link
Collaborator

@vladstepanyuk vladstepanyuk Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

может игнорировать сообщеньку когда DR в read only и перезапускать шедулинг сообщенек когда перевели из ReadOnly

@vladstepanyuk
Copy link
Collaborator

тесты?

ctx,
TBlockStoreComponents::DISK_REGISTRY,
"Restoring agents with status \"back from unavailable\" and last state change more than "
"RestoreAgentsToOnlineInterval ago");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

распечатал бы конкретный RestoreAgentsToOnlineInterval

Y_UNUSED(args);

for (const auto& agent : args.affectedAgents) {
LOG_INFO(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

может одни сообщением распечатать сразу всех агентов?

"Restored agent to online state: %s",
agent.c_str());
}
bool immediatly = args.affectedAgents.size() >= Config->GetRestoreAgentsCountPerTransaction();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

может лучше возвращать из функции RestoreAgentsFromWarning
булку про то что такие агенты еще остались и в зависимости от нее шедулить транзакцию, так наверное пожлегантнее будет чем смотреть на список args.affectedAgents

runtime->AdvanceCurrentTime(24h);
runtime->DispatchEvents({}, 100ms);
agent = diskRegistry.GetAgentNodeId("agent-1");
UNIT_ASSERT_EQUAL(agent->Record.GetAgentState(), NProto::EAgentState::AGENT_STATE_ONLINE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

проверить бы еще что если не прошло RestoreAgentsToOnlineInterval то мы не переводим агента

if (!Config->GetCheckAgentsToRestoreToOnlineInterval()) {
return;
}
if (immediatly) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    ctx.Schedule(
        immediatly ? TDuration::Zero() : Config->GetCheckAgentsToRestoreToOnlineInterval(),
        new TEvDiskRegistryPrivate::TEvDiskRegistryRestoreAgentsToOnline());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants