Skip to content

Conversation

@JackKelly
Copy link
Contributor

@JackKelly JackKelly commented Nov 19, 2025

This PR implements a simple, concurrent strategy for copying many files from an FTP server to an ObjectStore.

The main use-case for this code is copy GRIB files from DWD's FTP server to dynamical's source co-op bucket (see issue #258). To keep this PR small(ish), the code in this PR doesn't know anything about DWD. This PR just implements a general-purpose utility for downloading many FTP files.

A follow-on PR will implement logic to find which files to download.

Note that this PR relies on PR #319 to bump the Python version from 3.12 to 3.13 so we can use asyncio.Queue.shutdown (which was only introduced in 3.13). Once PR #319 is merged into main, this current PR will only show 3 files as changed.

Still TODO before this PR is ready for review:

  • Implement retry logic in obstore_worker.
  • Implement mock FTP server for testing.
  • Save to a temporary directory for testing.
  • Fix Ruff failures in main code due to updating to Python 3.13 (see comment below) (implemented in PR chore: upgrade to Python 3.13 and fix linting errors #319)
  • Convert print statements to log statements.
  • Or, actually, maybe don't return any list of files, but just log failures. And rely on the code that figures out what files to download to find missing files.
  • Describe function params in docstrings.
  • Tidy up log statements: remove some; change log levels on some.
  • Test failed FTP connection.
  • Test failed obstore write.
  • Test missing FTP file.

Bump python version to 3.13 in pyproject.toml and .python-version. Fix Ruff errors: B911 (strict parameter for itertools.batched) and UP043 (unnecessary default type arguments).
@JackKelly JackKelly force-pushed the copy-files-from-ftp-to-obstore branch from abcd259 to e78842c Compare November 19, 2025 21:05
@JackKelly
Copy link
Contributor Author

@aldenks would you be comfortable with me adding pytest-asyncio to the dev dependencies in pyproject.toml? It's not a biggie. We can test async functions without pytest-asyncio, it's a just a little prettier with pytest-asyncio:

Without pytest-asyncio:

@pytest.fixture
def sleep_time() -> float:
    return 0.5

async def _foo(sleep_time: float):
    await asyncio.sleep(sleep_time)

def test_foo(sleep_time: float):
    # Fixtures aren't applied to helper functions, so we have to pass in `sleep_time`:
    asyncio.run(_foo(sleep_time))

With pytest-asyncio

And after setting asyncio_mode = "auto" in [tool.pytest] in pyproject.toml:

@pytest.fixture
def sleep_time() -> float:
    return 0.5

async def test_foo(sleep_time: float):
    await asyncio.sleep(sleep_time)

@JackKelly JackKelly marked this pull request as ready for review November 24, 2025 11:33
Copy link
Member

@aldenks aldenks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, and thanks for breaking up into manageable PRs! you're welcome to add pytest-asyncio to dev deps

My comments are all small, merge to main yourself when you've addressed whichever of them make sense

@JackKelly JackKelly merged commit 2825ae1 into dynamical-org:main Nov 25, 2025
2 checks passed
@JackKelly JackKelly deleted the copy-files-from-ftp-to-obstore branch November 25, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants