Skip to content

Conversation

@kaimast
Copy link
Contributor

@kaimast kaimast commented Oct 7, 2025

These changes are part of #3874, but were moved to their own PR to reduce the size of the latter.

Motivation

Before, #4021, snarkOS nodes would call std::process::exit before shutting down. This was problematic "because this function never returns [and] terminates the process, no destructors on the current stack or any other thread’s stack will be run" (source).
The code worked around this by sleeping for multiple seconds during shutdown. Such a sleep may be too long in a test setting and too short in production. Additionally, the lack of a clean shutdown makes it more complicated to perform certain tasks at shutdown, such as caching the block tree on disk or flushing the log file.

#4021, as far as I understand, was intended to be a quick fix until this slightly cleaner version is merged. This PR improves on the former by encapsulating all signal handling logic in a dedicated module and avoids spawning the main node logic in a detached task, which makes catching panics easier.
For snarkos-display shutting down using Escape will now work the same as with Ctrl+C.

Proposed Changes

In the current design, the code passes around an AtomicBool as a shutdown flag. This design generally works, but does not allow waiting for the flag to be set. This PR introduces a SignalHandler struct and a Stoppable trait.

The Stoppable trait serves the same function as the AtomicBool did before, but it is a little clearer what its purpose is from its struct and function names alone.

The SignalHandler struct implements the Stoppable trait and also launches a background task that wait for Ctrl+C.

Testing

To test, I verified that shut down works correctly in multiple scenarios. "correctly" means that the process exists cleanly, and that the on-disk state is not corrupted. The latter was verified by starting the node again and ensuring it can load the stored state fine.

I went through the following scenarios

  • started a production node and connected to canary
  • ran the production node with and without terminal UI (display)
  • ran production node sync with and without CDN

Additionally, I ran a development network and verified that nodes all nodes shut down cleanly without getting stuck or returning nonzero exit codes.

@kaimast kaimast force-pushed the feat/signal-handling branch 3 times, most recently from fe9131c to 6b4fda4 Compare October 13, 2025 16:44
@kaimast kaimast force-pushed the feat/signal-handling branch from 6b4fda4 to aec7b2a Compare October 23, 2025 22:01
@kaimast kaimast force-pushed the feat/signal-handling branch 2 times, most recently from eee21b7 to d2a5661 Compare November 12, 2025 01:58
@kaimast kaimast force-pushed the feat/signal-handling branch 2 times, most recently from 3f2cbb7 to 369c075 Compare November 19, 2025 18:02
@kaimast kaimast marked this pull request as ready for review November 24, 2025 23:19
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small change left, and it's good to go 👍.

@kaimast kaimast force-pushed the feat/signal-handling branch from cbf4d86 to 1ff28fd Compare December 1, 2025 23:18
@kaimast kaimast force-pushed the feat/signal-handling branch from 1ff28fd to aa6ddfc Compare December 1, 2025 23:30
@kaimast kaimast requested a review from ljedrz December 1, 2025 23:33
ljedrz
ljedrz previously approved these changes Dec 2, 2025
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few final nits, but basically LGTM 👌.

@vicsn vicsn self-requested a review December 2, 2025 18:51
Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you note in the PR README which manual tests were run? Are they comprehensive?

@kaimast
Copy link
Contributor Author

kaimast commented Dec 3, 2025

Can you note in the PR README which manual tests were run? Are they comprehensive?

Done!

@kaimast kaimast requested a review from vicsn December 4, 2025 23:31
Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vicsn vicsn merged commit 226f3ac into staging Dec 5, 2025
5 checks passed
@vicsn vicsn deleted the feat/signal-handling branch December 5, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants