Prevent JSON file corruption with atomic writes #3586

ubaskota · 2025-11-04T19:20:06Z

Description of changes:
Use os.replace() for atomic file operations to eliminate race conditions during concurrent access to the file cache. This ensures that writes are either fully completed or not applied at all, preventing partial writes that could leave the JSON file in an invalid state when multiple processes access the cache simultaneously.

Note: This change is based on the approach suggested in #3544 by @ranlz77. Thanks for identifying this issue and proposing the initial solution.

Tests:
I verified that all tests including the the newly added unit tests pass.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter · 2025-11-04T19:26:47Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 50.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.10%. Comparing base (8121342) to head (2b225c5).
⚠️ Report is 57 commits behind head on develop.

Files with missing lines	Patch %	Lines
botocore/utils.py	50.00%	12 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3586      +/-   ##
===========================================
- Coverage    93.17%   93.10%   -0.07%     
===========================================
  Files           68       68              
  Lines        15411    15432      +21     
===========================================
+ Hits         14359    14368       +9     
- Misses        1052     1064      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

akx · 2025-11-12T07:57:21Z

botocore/utils.py

+            if hasattr(os, 'fchmod'):
+                os.fchmod(temp_fd, 0o600)


This shouldn't be necessary – mkstemp's documentation says "The file is readable and writable only by the creating user ID.".

Yes, you are right. This isn't needed. Thanks for pointing it out.

akx · 2025-11-12T08:01:02Z

botocore/utils.py

+            if hasattr(os, 'fchmod'):
+                os.fchmod(temp_fd, 0o600)
+            with os.fdopen(temp_fd, 'w') as f:
+                temp_fd = None


os.fdopen is an alias of open. Exiting the context manager (for example when unwinding from an exception) should close the underlying fd, so I'm not sure the temp_fd = None + cleanup dance in except Exception below is necessary?

You're correct that os.fdopen() takes ownership of the file descriptor and the context manager will close it automatically. But the cleanup code handles exceptions that occur before os.fdopen() and prevents double-close issues after it. The exception handler also removes orphaned temporary files from disk in both scenarios. Think of a scenario where there’s an exception before os.fdopen() (os.fchmod() fails), temp_fd is still open and needs manual closing to prevent file descriptor leaks.

akx · 2025-11-12T08:01:32Z

botocore/utils.py

+                f.flush()
+                os.fsync(f.fileno())


Is there a particular reason to add an explicit flush + fsync?

Closing the file doesn’t guarantee the data has reached the disk. f.write() sends data to Python’s buffer, then to the OS buffer, and the OS writes it to disk later.

If you call os.replace() right after closing and the system crashes before the OS flushes its buffer, you can end up with a partial or corrupt file. flush() pushes Python’s buffer to the OS, and fsync() forces the OS to commit it to disk. Together they ensure the data is fully written before you proceed. More here: https://docs.python.org/3/library/os.html#os.fsync

ubaskota · 2025-11-21T18:21:57Z

This PR will be closed shortly. Per our open source contribution guidelines, we'll update the original PR #3544 instead.

ubaskota · 2025-11-21T19:36:02Z

Implementation moved to: #3597

Prevent JSON file corruption with atomic writes

2b225c5

akx reviewed Nov 12, 2025

View reviewed changes

ubaskota closed this Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent JSON file corruption with atomic writes #3586

Prevent JSON file corruption with atomic writes #3586

Uh oh!

ubaskota commented Nov 4, 2025

Uh oh!

codecov-commenter commented Nov 4, 2025 •

edited

Loading

Uh oh!

akx Nov 12, 2025

Uh oh!

ubaskota Nov 21, 2025

Uh oh!

akx Nov 12, 2025

Uh oh!

ubaskota Nov 21, 2025

Uh oh!

akx Nov 12, 2025

Uh oh!

ubaskota Nov 21, 2025

Uh oh!

ubaskota commented Nov 21, 2025

Uh oh!

ubaskota commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prevent JSON file corruption with atomic writes #3586

Prevent JSON file corruption with atomic writes #3586

Uh oh!

Conversation

ubaskota commented Nov 4, 2025

Uh oh!

codecov-commenter commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

akx Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ubaskota Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

akx Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ubaskota Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

akx Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ubaskota Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ubaskota commented Nov 21, 2025

Uh oh!

ubaskota commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Nov 4, 2025 •

edited

Loading