Skip to content

Deadlock when numerous identical actions run at the same time #2001

@chrisstaite-menlo

Description

@chrisstaite-menlo

On a rare occasion it appears that workers somehow end up processing all of the requests twice and therefore end up locking themselves up. This is the cause for the random worker lock up, but I'm not sure why it happens.

Here's an example log:

warn{"timestamp":"2025-10-22T09:47:21.604176Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"f6bf3a8df3a1b6bc5d7fe27ae5419ace6900fc3cc414226c29b9e0eb03bc645b-5264\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.604176Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"f6bf3a8df3a1b6bc5d7fe27ae5419ace6900fc3cc414226c29b9e0eb03bc645b-5264\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.614965Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"e928292e301bc3ac4d92adb91ed6c4c0f970dbd1f25e8770dd5bd169f640b75f-13180\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.614965Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"e928292e301bc3ac4d92adb91ed6c4c0f970dbd1f25e8770dd5bd169f640b75f-13180\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.636644Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"4111923225b3a30fda94fdc797ace9d6561bcfb4fe8afac60e8bc4b41eea9173-13249\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.636644Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"4111923225b3a30fda94fdc797ace9d6561bcfb4fe8afac60e8bc4b41eea9173-13249\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.639999Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"a70c0ab328b077af31640a2f880b9da5ee39730626eae7869bc791271a961943-5887\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.639999Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"a70c0ab328b077af31640a2f880b9da5ee39730626eae7869bc791271a961943-5887\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.677947Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"26d6ae6eb8c14431e2e931d35b1ab90f8c369452885c002f4c654770ff2b4742-13242\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.677947Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"26d6ae6eb8c14431e2e931d35b1ab90f8c369452885c002f4c654770ff2b4742-13242\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.678043Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"96f50fb4fba62a463cf13c9586e2c1b953af94263744a8094850ba8569b3752c-5754\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.678043Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"96f50fb4fba62a463cf13c9586e2c1b953af94263744a8094850ba8569b3752c-5754\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.678292Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"6a6b7c3a06506bcdc570529758eee869e5f9d5dc6ac0c77ff584019d88af9d7d-5789\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.678292Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"6a6b7c3a06506bcdc570529758eee869e5f9d5dc6ac0c77ff584019d88af9d7d-5789\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.895726Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"bc9c9905197ec6852f8fde799daaf413aab7fba337861924a23f9aa4aa2d70c8-6052\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

warn{"timestamp":"2025-10-22T09:47:21.895726Z","level":"WARN","fields":{"message":"Running prepare_action for DigestInfo(\"bc9c9905197ec6852f8fde799daaf413aab7fba337861924a23f9aa4aa2d70c8-6052\")"},"target":"nativelink_worker::running_actions_manager","span":{"name":"worker_start_action"},"spans":[{"name":"\"worker_0\"","name":"worker"},{"name":"worker_start_action"}]}

This log came from a worker with this added to the start of inner_prepare_action:

tracing::warn!("Running prepare_action for {:?}", self.action_info.command_digest);

Having looked at the command digests, there's nothing special about them. So it would appear that somehow the worker has managed to register twice with the scheduler or something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions