Skip to content

Conversation

@kolyshkin
Copy link
Contributor

Separated from #4928 as per @lifubang's suggestion.

See individual commits for details.

@kolyshkin kolyshkin requested a review from lifubang October 25, 2025 03:49
@kolyshkin kolyshkin force-pushed the better-nsenter-errors branch 2 times, most recently from aabdd90 to 600e983 Compare October 26, 2025 23:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors error handling in the nsenter component to improve error reporting clarity by distinguishing between errors that include errno information and those that don't. The changes introduce a new bailx macro for errno-free errors, add helper functions for safe I/O operations with proper error handling, and ensure consistent cleanup of child processes when errors occur.

Key Changes:

  • Introduced bailx macro for error reporting without errno, reorganizing the existing bail macro to use it
  • Added xread, xwrite, iobail, and improved sane_kill helper functions for consistent error handling and process cleanup
  • Replaced direct read/write calls with helper functions that handle partial I/O and automatically kill child processes on failure

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
libcontainer/nsenter/log.h Adds bailx macro for errno-free error reporting and refactors bail to use it
libcontainer/nsenter/nsexec.c Refactors error handling throughout using new bailx macro and I/O helper functions, improves sane_kill to preserve errno, and ensures proper child process cleanup on errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits, otherwise looks good!

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left a few comments :)

I guess this PR is not yet improving the error message in the case of: #4916, right? In that case, please ping me when that PR is not draft, I'd very much like to see that improvement :)

Comment on lines 642 to 648
if (pid > 0) {
int ret, saved_errno;

saved_errno = errno;
ret = kill(pid, signum);
errno = saved_errno;
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this returns the ret value of kill, but errno is set to something else, and we are missing what it was. It seems confusing.

Why not log the previous error (not sure if here or on the caller) and have this function return the ret val of kill and errno set to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to complicate things further than they are.

In fact, since no one is using the return value of sane_kill, and this is a last resort kill, and I think some of the calls will return ESRCH, we can just change its signature to void sane_kill.

We could also log a warning from the kill (unless it's ESRCH) but it's hard to see how it could be useful.

Copy link
Member

@cyphar cyphar Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, sane_kill is used in cleanup in the error path so we probably want to keep the errno untouched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, making it void sound good. I had checked the same yesterday (no one uses the ret code). I would log the kill error, though.

Since sane_kill after a failed read or write, but before reporting the
error from that read or write, it may change the errno value in case
kill(2) fails.

Save and restore the errno around the call to kill.

While at it,
 - change the code to return early;
 - don't return kill return value as no one is using it, and the errno
   value no longer correlates.

Signed-off-by: Kir Kolyshkin <[email protected]>
We use bail to report fatal errors, and bail always append %m
(aka strerror(errno)). In case an error condition did not set
errno, the log message will end up with ": Success" or an error
from a stale errno value. Either case is confusing for users.

Introduce bailx which is the same as bail except it does not
append %m, and use it where appropriate.

The naming follows libc's err(3) and errx(3).

PS we still use bail in a few cases after read or write, even
if that read/write did not return an error, because the code
does not distinguish between short read/write and error (-1).
This will be addressed by the next commit.

Signed-off-by: Kir Kolyshkin <[email protected]>
Add a few missing sane_kill calls where they make sense.

Remove one useless sane_kill of stage2_pid, as during SYNC_USERMAP stage2
is not yet started. It is harmless yet it makes the code slightly harder
to read.

Set the child pid to -1 upon receiving SYNC_CHILD_FINISH
to minimize the chances of killing an unrelated process.
When a child sends SYNC_CHILD_FINISH it is about to exit
(although theoretically it could be stuck during debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
Introduce and use iobail, xread, and xwrite wrappers so that we can
properly check read/write return value and call either bail or bailx on
error, with proper diagnostics (distinguishing failed read/write from a
short read/write).

This prevents the "Success" prefix in errors like:

	failed to sync with stage-1: next state: Success

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the better-nsenter-errors branch from 600e983 to 6c18b25 Compare October 29, 2025 00:29
@kolyshkin
Copy link
Contributor Author

OK I think I've addressed all the comments, @rata can you take another look?

@kolyshkin
Copy link
Contributor Author

I guess this PR is not yet improving the error message in the case of: #4916, right? In that case, please ping me when that PR is not draft, I'd very much like to see that improvement :)

You can definitely review #4928 right now; it is just a combination of patches from this PR plus the runc init fatal messages collection (the last commit). It's currently a draft because we need to merge this PR first.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

nits, but feel free to ignore:

  • I'd log the kill error in sane_kill().
  • I think I'd just reimplement iobail() inside xread and xwrite, I think it's too much indirection and cognitive load to have iobail() inside our custom read/write. It's just a few lines to deal with that in xread/xwrite without iobail()

@kolyshkin
Copy link
Contributor Author

  • I'd log the kill error in sane_kill().

There are places when both parent and stage1_pid kill stage2_pid, thus it may result in kill errors and I don't want more confusion for the user.

  • I think I'd just reimplement iobail() inside xread and xwrite, I think it's too much indirection and cognitive load to have iobail() inside our custom read/write. It's just a few lines to deal with that in xread/xwrite without iobail()

See, xread and xwrite mostly end up with success, while iobail always result in an exit and is thus marked as noreturn. Also, initially (see #4928 (comment)) I had both read and write as two macros (with and without kill):

CHECK_IO(write, syncfd, &s, sizeof(s), "failed to sync with parent: write(SYNC_USERMAP_PLS)");
CHECK_IO_KILL(write, syncfd, &s, sizeof(s), "some error message", stage1_pid, stage2_pid);

but @lifubang was more in favor of functions rather than macros (and I tend to agree).

Let's merge this as is for now; feel free to further improve upon this in subsequent PRs.

@kolyshkin kolyshkin merged commit fb01482 into opencontainers:main Oct 29, 2025
36 checks passed
@rata
Copy link
Member

rata commented Oct 29, 2025

SGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants