-
Notifications
You must be signed in to change notification settings - Fork 37
issue-4293: order all attach detach requests with DR and DA generations #4911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
issue-4293: order all attach detach requests with DR and DA generations #4911
Conversation
| ui64 diskAgentGeneration) | ||
| { | ||
| if (diskRegistryGeneration < DiskRegistryGeneration) { | ||
| return MakeError(E_ARGUMENT, "outdated disk registry generation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это не ретрабельная ошибка. Что должен делать клиент когда ее получит?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Клиент - это таблетка DiskRegistry, так что, если появился DiskRegistry с большим поколением, то мы можем добить эту таблетку или хотя бы перестать слать запросы, которые в любом случае будут проигнорированы.
| } | ||
|
|
||
| // Filter from unknown paths. | ||
| auto unknownPaths = std::ranges::partition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может имеется ввиду новые пути?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ну тут не только новые пути, скорее это защита от потерянных девайсов, диск реджестри будет о них знать и пытаться слать запросы с ними, а DA уже про них ничего не знает, и поскольку мы требуем, чтобы девайсы после аттача совпадали с девайсами на момент старта диск агента, если мы не отфильтруем от подобных потерянных девайсов, то не сможем ввести новые.
| pathsToPerformAttachDetachRange.begin(), | ||
| pathsToPerformAttachDetachRange.end()); | ||
|
|
||
| if (auto error = State->CheckAttachDetachPathsRequestGeneration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
вот это условие нет смысла проверить в самом начале?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ну до него только две проверки: на то, что фича включена, и на то, что нет уже исполняющегося запроса.
Функция CheckAttachDetachPathsRequestGeneration не только проверяет поколения, но и перещелкивает их, поэтому, кажется, будет нехорошо перещелкивать поколение, пока исполняется какой-то запрос или если фича вообще выключена.
cloud/blockstore/libs/storage/disk_agent/disk_agent_actor_attach_detach_path.cpp
Outdated
Show resolved
Hide resolved
| }; | ||
|
|
||
| TResultOrError<TCheckAttachDetachPathRequestResult> | ||
| CheckAttachDetachPathsRequest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это все константные методы?
|
It'll be helpful to have a high-level description of what is going on in this PR, not just a reference to an issue. |
added description |
| TVector<TString> PathsToAttach; | ||
| TVector<TString> AlreadyAttachedPaths; | ||
|
|
||
| ui64 DiskRegistryGeneration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
инициализировать 0 ?
| TVector<TString> PathsToDetach; | ||
| TVector<TString> AlreadyDetachedPaths; | ||
|
|
||
| ui64 DiskRegistryGeneration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= 0
| TVector<TString> paths, | ||
| EAction action); | ||
|
|
||
| TResultOrError<TCheckAttachDetachPathRequestResult> CheckAttachPathsRequest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а зачем три метода, одного не достаточно?
| } | ||
|
|
||
| return TCheckAttachDetachPathRequestResult{ | ||
| .AlreadyInWantedStatePaths = alreadyAttachedPaths, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(alreadyAttachedPaths) ?
| [actorSystem, daId, pathsToDetach = std::move(pathsToDetach)]( | ||
| auto) mutable | ||
| [actorSystem, | ||
| daId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selfId = ctx.SelfID;
| pathsToDetach = std::move(pathsToDetach), | ||
| alreadyDetachedPaths = std::move(alreadyDetachedPaths), | ||
| diskRegistryGeneration = record.GetDiskRegistryGeneration(), | ||
| diskAgentGeneration = record.GetDiskAgentGeneration()](auto) mutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
параметр не важен?
| TRequest request; | ||
| request.DiskAgentGeneration = GenerateDiskAgentGeneration(); | ||
|
|
||
| auto devicesInRequest = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ui64 писать так же сложно как auto, но читать в два раза понятнее
| } | ||
| } | ||
|
|
||
| Y_UNIT_TEST_F(ShouldRejectOldattachDetachRequests, TFixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attach
| uint64 DiskAgentGeneration = 3; | ||
|
|
||
| // Disk Registry Tablet generation needed to order attach/detach path | ||
| // requests sent from two different DR generations. | ||
| uint32 DiskRegistryGeneration = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В THeader есть RequestId и RequestGeneration - не подойдут? https://github.com/ydb-platform/nbs/blob/main/cloud/blockstore/public/api/protos/headers.proto#L46-L52
| FindProcessesWithOpenFile(Devices[0]).size()); | ||
| } | ||
|
|
||
| Y_UNIT_TEST_F(AttachDetachPathStressTest, TFixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StressTest - это что-то про другой тип теста, не UT
| Runtime->DispatchEvents(TDispatchOptions(), TDuration::Seconds(1)); | ||
|
|
||
| TVector<TString> paths; | ||
| for (const auto& fPath: PartLabels) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
| if (auto error = State->CheckAttachDetachPathsRequestGeneration( | ||
| diskRegistryGeneration, | ||
| diskAgentGeneration); | ||
| HasError(error) && pathsToPerformAttachDetach) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
почему pathsToPerformAttachDetach влияет на ошибку?
| ui64 DiskRegistryGeneration = 0; | ||
| ui64 DiskAgentGeneration = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это лучше унести в актор
| auto TDiskAgentActor::CheckAttachDetachPathsRequest( | ||
| ui64 diskRegistryGeneration, | ||
| ui64 diskAgentGeneration, | ||
| TVector<TString> paths, | ||
| EAction action) -> TResultOrError<TCheckAttachDetachPathRequestResult> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это можно распилить на части: (1) проверка, что фичка включена, (2) обновление поколения запроса (3) распиливание paths на две части ([attached, detached)
#4293
To avoid bugs, when the Disk Registry thinks that the Disk Agent has attached Path, but because of a race condition, the Path is detached, we should handle all attach or detach requests with some generation number. We can keep a counter in RAM for all Disk Agents lets call it Disk Agent Generation, and increment it with each CMS action like AddHost/AddDevice or RemoveDevice/RemoveHost. When attaching or closing Path via TEvAttachPathRequest/TEvDetachPathRequest, or as a result of registration, we should pass Disk Agent and Disk Registry tablet generations. A Disk Agent Generation needs to order attach and detach requests sent in one DR generation. In this way, we can order all attach and detach requests and reject outdated ones.