Skip to content

Conversation

@pagmatt
Copy link
Member

@pagmatt pagmatt commented May 25, 2022

This PR aims to identify simulation crashes due to out of memory errors. Indeed, in these cases the simulation script is terminated by the kernel with signal SIGKILL (on POSIX systems), and no errors are shown in either stdout or stderr. Instead, with these changes the user will be informed of the likely cause of the crash.

P.S. I could not find a definition of the SIGKILL return code in any Python module, so I not-so-elegantly defined it myself. The syntax is taken from this PEP "guideline". Feel free to propose a different approach if a more elegant solutions comes to mind :)

@pagmatt pagmatt requested a review from DvdMgr May 25, 2022 11:31
Copy link
Member

@DvdMgr DvdMgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Once we address these it should be good to go!

sem/runner.py Outdated
Comment on lines 347 to 356
print(error_message)
print(error_message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry about this

sem/runner.py Outdated
stderr_file.read(),
stdout_file.read()))
if return_code == SIGKILL_CODE:
error_message = '\nSimulation likely killed due to an out of memory error.\n' + \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not sound too confident about why the process was killed! How about:

"Simulation was killed. Possible causes may include an out of memory error."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitively better, you are right

Comment on lines +10 to +12
from typing import Final

SIGKILL_CODE: Final = -9 # POSIX return code which usually corresponds to out of memory events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to import typing.Final here? We don't use typing anywhere else in the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary, it just seemed "good practice" to use it for marking the variable (almost) as a constant. At least, that was my understanding from the PEP guideline which I linked above. If you prefer to limit the imports though I can remove it!

% (parameter,
stderr_file.read(),
stdout_file.read()))
if return_code == SIGKILL_CODE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth it to always print return_code when it's not 0, even when it's not exactly -9. Can you add this information to the common_error_message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@pagmatt pagmatt requested a review from DvdMgr May 30, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants