Skip to content

Conversation

@vladstepanyuk
Copy link
Collaborator

@vladstepanyuk vladstepanyuk commented Dec 25, 2025

#4293
To avoid bugs, when the Disk Registry thinks that the Disk Agent has attached Path, but because of a race condition, the Path is detached, we should handle all attach or detach requests with some generation number. We can keep a counter in RAM for all Disk Agents lets call it Disk Agent Generation, and increment it with each CMS action like AddHost/AddDevice or RemoveDevice/RemoveHost. When attaching or closing Path via TEvAttachPathRequest/TEvDetachPathRequest, or as a result of registration, we should pass Disk Agent and Disk Registry tablet generations. A Disk Agent Generation needs to order attach and detach requests sent in one DR generation. In this way, we can order all attach and detach requests and reject outdated ones.

@vladstepanyuk vladstepanyuk changed the title issue-4293: order all attach detach requests with dr and da generations issue-4293: order all attach detach requests with DR and DA generations Dec 25, 2025
@vladstepanyuk vladstepanyuk added large-tests Launch large tests for PR blockstore Add this label to run only cloud/blockstore build and tests on PR labels Dec 25, 2025
ui64 diskAgentGeneration)
{
if (diskRegistryGeneration < DiskRegistryGeneration) {
return MakeError(E_ARGUMENT, "outdated disk registry generation");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

это не ретрабельная ошибка. Что должен делать клиент когда ее получит?

Copy link
Collaborator Author

@vladstepanyuk vladstepanyuk Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Клиент - это таблетка DiskRegistry, так что, если появился DiskRegistry с большим поколением, то мы можем добить эту таблетку или хотя бы перестать слать запросы, которые в любом случае будут проигнорированы.

}

// Filter from unknown paths.
auto unknownPaths = std::ranges::partition(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

может имеется ввиду новые пути?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну тут не только новые пути, скорее это защита от потерянных девайсов, диск реджестри будет о них знать и пытаться слать запросы с ними, а DA уже про них ничего не знает, и поскольку мы требуем, чтобы девайсы после аттача совпадали с девайсами на момент старта диск агента, если мы не отфильтруем от подобных потерянных девайсов, то не сможем ввести новые.

pathsToPerformAttachDetachRange.begin(),
pathsToPerformAttachDetachRange.end());

if (auto error = State->CheckAttachDetachPathsRequestGeneration(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

вот это условие нет смысла проверить в самом начале?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну до него только две проверки: на то, что фича включена, и на то, что нет уже исполняющегося запроса.
Функция CheckAttachDetachPathsRequestGeneration не только проверяет поколения, но и перещелкивает их, поэтому, кажется, будет нехорошо перещелкивать поколение, пока исполняется какой-то запрос или если фича вообще выключена.

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 87b8546.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
6082 6081 0 0 0 1 0

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit a959b89.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
6082 6081 0 0 0 1 0

};

TResultOrError<TCheckAttachDetachPathRequestResult>
CheckAttachDetachPathsRequest(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

это все константные методы?

@sharpeye
Copy link
Collaborator

It'll be helpful to have a high-level description of what is going on in this PR, not just a reference to an issue.

@vladstepanyuk
Copy link
Collaborator Author

It'll be helpful to have a high-level description of what is going on in this PR, not just a reference to an issue.

added description

TVector<TString> PathsToAttach;
TVector<TString> AlreadyAttachedPaths;

ui64 DiskRegistryGeneration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

инициализировать 0 ?

TVector<TString> PathsToDetach;
TVector<TString> AlreadyDetachedPaths;

ui64 DiskRegistryGeneration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 0

TVector<TString> paths,
EAction action);

TResultOrError<TCheckAttachDetachPathRequestResult> CheckAttachPathsRequest(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а зачем три метода, одного не достаточно?

}

return TCheckAttachDetachPathRequestResult{
.AlreadyInWantedStatePaths = alreadyAttachedPaths,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(alreadyAttachedPaths) ?

[actorSystem, daId, pathsToDetach = std::move(pathsToDetach)](
auto) mutable
[actorSystem,
daId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selfId = ctx.SelfID;

pathsToDetach = std::move(pathsToDetach),
alreadyDetachedPaths = std::move(alreadyDetachedPaths),
diskRegistryGeneration = record.GetDiskRegistryGeneration(),
diskAgentGeneration = record.GetDiskAgentGeneration()](auto) mutable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

параметр не важен?

TRequest request;
request.DiskAgentGeneration = GenerateDiskAgentGeneration();

auto devicesInRequest =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ui64 писать так же сложно как auto, но читать в два раза понятнее

}
}

Y_UNIT_TEST_F(ShouldRejectOldattachDetachRequests, TFixture)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attach

Comment on lines +1864 to +1868
uint64 DiskAgentGeneration = 3;

// Disk Registry Tablet generation needed to order attach/detach path
// requests sent from two different DR generations.
uint32 DiskRegistryGeneration = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В THeader есть RequestId и RequestGeneration - не подойдут? https://github.com/ydb-platform/nbs/blob/main/cloud/blockstore/public/api/protos/headers.proto#L46-L52

FindProcessesWithOpenFile(Devices[0]).size());
}

Y_UNIT_TEST_F(AttachDetachPathStressTest, TFixture)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StressTest - это что-то про другой тип теста, не UT

Runtime->DispatchEvents(TDispatchOptions(), TDuration::Seconds(1));

TVector<TString> paths;
for (const auto& fPath: PartLabels) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path

if (auto error = State->CheckAttachDetachPathsRequestGeneration(
diskRegistryGeneration,
diskAgentGeneration);
HasError(error) && pathsToPerformAttachDetach)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

почему pathsToPerformAttachDetach влияет на ошибку?

Comment on lines +70 to +71
ui64 DiskRegistryGeneration = 0;
ui64 DiskAgentGeneration = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это лучше унести в актор

Comment on lines +17 to +21
auto TDiskAgentActor::CheckAttachDetachPathsRequest(
ui64 diskRegistryGeneration,
ui64 diskAgentGeneration,
TVector<TString> paths,
EAction action) -> TResultOrError<TCheckAttachDetachPathRequestResult>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это можно распилить на части: (1) проверка, что фичка включена, (2) обновление поколения запроса (3) распиливание paths на две части ([attached, detached)

2176242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blockstore Add this label to run only cloud/blockstore build and tests on PR large-tests Launch large tests for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants