-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Prevent JSON file corruption with atomic writes #3586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3586 +/- ##
===========================================
- Coverage 93.17% 93.10% -0.07%
===========================================
Files 68 68
Lines 15411 15432 +21
===========================================
+ Hits 14359 14368 +9
- Misses 1052 1064 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if hasattr(os, 'fchmod'): | ||
| os.fchmod(temp_fd, 0o600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary – mkstemp's documentation says "The file is readable and writable only by the creating user ID.".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. This isn't needed. Thanks for pointing it out.
| if hasattr(os, 'fchmod'): | ||
| os.fchmod(temp_fd, 0o600) | ||
| with os.fdopen(temp_fd, 'w') as f: | ||
| temp_fd = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.fdopen is an alias of open. Exiting the context manager (for example when unwinding from an exception) should close the underlying fd, so I'm not sure the temp_fd = None + cleanup dance in except Exception below is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct that os.fdopen() takes ownership of the file descriptor and the context manager will close it automatically. But the cleanup code handles exceptions that occur before os.fdopen() and prevents double-close issues after it. The exception handler also removes orphaned temporary files from disk in both scenarios. Think of a scenario where there’s an exception before os.fdopen() (os.fchmod() fails), temp_fd is still open and needs manual closing to prevent file descriptor leaks.
| f.flush() | ||
| os.fsync(f.fileno()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to add an explicit flush + fsync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing the file doesn’t guarantee the data has reached the disk. f.write() sends data to Python’s buffer, then to the OS buffer, and the OS writes it to disk later.
If you call os.replace() right after closing and the system crashes before the OS flushes its buffer, you can end up with a partial or corrupt file. flush() pushes Python’s buffer to the OS, and fsync() forces the OS to commit it to disk. Together they ensure the data is fully written before you proceed. More here: https://docs.python.org/3/library/os.html#os.fsync
|
This PR will be closed shortly. Per our open source contribution guidelines, we'll update the original PR #3544 instead. |
|
Implementation moved to: #3597 |
Description of changes:
Use
os.replace()for atomic file operations to eliminate race conditions during concurrent access to the file cache. This ensures that writes are either fully completed or not applied at all, preventing partial writes that could leave the JSON file in an invalid state when multiple processes access the cache simultaneously.Note: This change is based on the approach suggested in #3544 by @ranlz77. Thanks for identifying this issue and proposing the initial solution.
Tests:
I verified that all tests including the the newly added unit tests pass.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.