-
Notifications
You must be signed in to change notification settings - Fork 4
Copy files from FTP to an ObjectStore
#318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy files from FTP to an ObjectStore
#318
Conversation
Bump python version to 3.13 in pyproject.toml and .python-version. Fix Ruff errors: B911 (strict parameter for itertools.batched) and UP043 (unnecessary default type arguments).
But the test uses DWD's FTP server, and saves locally. And the obstore_worker has no retry logic yet. And there are still print statements!
I should remove some of these log statements later.
abcd259 to
e78842c
Compare
|
@aldenks would you be comfortable with me adding Without
|
aldenks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, and thanks for breaking up into manageable PRs! you're welcome to add pytest-asyncio to dev deps
My comments are all small, merge to main yourself when you've addressed whichever of them make sense
As requested by Alden
As requested by Alden
As requested by Alden. Tests fail! This has revealed a subtle bug: The main function calls obstore_queue.shutdown() before waiting for all obstore_workers to finish! I'll fix that in the next commit...
…ndle retries correctly.
…he obstore_queue before shutting it down. Simplified the code. A follow-on will modify the tests to mock sleep.
And use pytest-asyncio to simplify async tests in test_ftp_to_obstore.py.
This PR implements a simple, concurrent strategy for copying many files from an FTP server to an
ObjectStore.The main use-case for this code is copy GRIB files from DWD's FTP server to dynamical's source co-op bucket (see issue #258). To keep this PR small(ish), the code in this PR doesn't know anything about DWD. This PR just implements a general-purpose utility for downloading many FTP files.
A follow-on PR will implement logic to find which files to download.
Note that this PR relies on PR #319 to bump the Python version from 3.12 to 3.13 so we can use
asyncio.Queue.shutdown(which was only introduced in 3.13). Once PR #319 is merged intomain, this current PR will only show 3 files as changed.Still TODO before this PR is ready for review:
obstore_worker.printstatements tologstatements.logstatements: remove some; change log levels on some.