-
Notifications
You must be signed in to change notification settings - Fork 2.4k
adapt to improved type hints and fix edge case issues with Package-files
#10673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideAligns package file typing with the new PackageFile protocol, fixes mismatched file key usage, and hardens HTTP repository link-to-file conversion by skipping files without computable hashes, while updating tests, locker behavior, and the Poetry core dependency accordingly. Sequence diagram for HttpRepository link to PackageFile conversionsequenceDiagram
participant HR as HttpRepository
participant L as Link
participant PI as PackageInfo
HR->>HR: _links_to_data(links, data)
loop for each Link in links
HR->>L: read yanked, hashes, filename, url
alt link is yanked and data not yanked
HR-->>HR: continue (skip link)
else link accepted
HR-->>HR: determine hash_type and file_hash
alt file_hash is None
HR->>HR: _log(Failed to determine hash..., warning)
HR-->>HR: skip adding PackageFile
else valid hash
HR-->>HR: append PackageFile{file, hash} to files
end
end
end
HR-->>PI: assign files to data.files
HR-->>HR: return data as dict_str_Any
Class diagram for updated PackageFile usageclassDiagram
class HttpRepository {
-_log(message str, level str)
_links_to_data(links list_Link, data PackageInfo) dict_str_Any
}
class PackageInfo {
+name NormalizedName
+version str
+yanked bool
+files list_PackageFile
}
class PackageFile {
+file str
+hash str
}
class Locker {
_get_locked_package(name str, version str, metadata dict_str_Any) Package
}
class Package {
+name str
+version str
+files list_PackageFile
}
class Link {
+filename str
+url str
+hashes dict_str_str
+yanked bool
}
HttpRepository --> PackageInfo : updates
HttpRepository --> Link : consumes
PackageInfo --> PackageFile : files
Package --> PackageFile : files
Locker --> Package : creates
Locker --> PackageFile : assigns_files
class PackageInfoInit {
__init__(
name NormalizedName,
version str,
summary str,
requires_dist list_str,
requires_python str,
files Sequence_PackageFile,
yanked bool,
cache_version str
) None
}
PackageInfo <|-- PackageInfoInit : initialization
Class diagram for Locker package.files assignment changeclassDiagram
class Locker {
_get_locked_package(name str, version str, metadata dict_str_Any) Package
}
class Package {
+name str
+version str
+files list_PackageFile
}
class PackageFile {
+file str
+hash str
}
Locker --> Package : returns
Package --> PackageFile : files
class MetadataHashes {
+hashes dict_str_Any
}
Locker ..> MetadataHashes : reads
note for Locker "When metadata contains hashes only, Locker now constructs PackageFile items with key file instead of name"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 3 issues, and left some high level feedback:
- The temporary dependency on
poetry-core @ git+https://github.com/radoering/poetry-core.git@package-files-typinginpyproject.tomlshould be reverted to a proper released version constraint before merging to avoid shipping a dependency on a personal fork/branch. - In
_links_to_data, the inline comment# Is that even possible?is vague; either document under what conditionsfile_hashcan beNoneor remove the speculative comment to keep the code self-explanatory.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The temporary dependency on `poetry-core @ git+https://github.com/radoering/poetry-core.git@package-files-typing` in `pyproject.toml` should be reverted to a proper released version constraint before merging to avoid shipping a dependency on a personal fork/branch.
- In `_links_to_data`, the inline comment `# Is that even possible?` is vague; either document under what conditions `file_hash` can be `None` or remove the speculative comment to keep the code self-explanatory.
## Individual Comments
### Comment 1
<location> `src/poetry/repositories/http_repository.py:344-347` </location>
<code_context>
)
- files: list[dict[str, Any]] = []
+ files: list[PackageFile] = []
for link in links:
if link.yanked and not data.yanked:
</code_context>
<issue_to_address>
**issue:** Dropping files with missing hashes may have unintended downstream effects when all links lack hashes.
Previously, links with unknown hashes still produced entries with `null` hashes; now they’re skipped entirely. If a release only has links without hashes, this could silently produce an empty `data.files`, changing how downstream selection/installation behaves. Consider whether this case should instead trigger a hard failure or a clearer error signal rather than an empty list.
</issue_to_address>
### Comment 2
<location> `tests/installation/test_chooser.py:216` </location>
<code_context>
package = Package("isort", "4.3.4")
- files = [
+ files: list[PackageFile] = [
{
"file": filename,
</code_context>
<issue_to_address>
**suggestion (testing):** Add a regression test for links/files without a hash being skipped by the repository.
`HttpRepository._links_to_data` now warns and skips files when it cannot determine a hash (`file_hash is None`), but current chooser tests only cover files with valid hashes. Please add a regression test (ideally in an http_repository–focused module) that uses links whose `link.hashes` lack the expected algorithms, verifies those files are excluded from `data.files`, and, if practical, checks that the warning is emitted. This will lock in the new behaviour and prevent reintroducing `null` hashes into the lockfile.
Suggested implementation:
```python
from tests.types import DistributionHashGetter
```
` section so it can be integrated correctly into your codebase.
Here are the code changes:
```xml
<file_operations>
<file_operation operation="edit" file_path="tests/installation/test_chooser.py">
<<<<<<< SEARCH
from tests.types import DistributionHashGetter
=======
from tests.types import DistributionHashGetter
>>>>>>> REPLACE
</file_operation>
</file_operations>
```
<additional_changes>
Please add the following regression test to an http-repository–focused test module (preferably `tests/repositories/test_http_repository.py`; if that file does not exist, you can put it into `tests/installation/test_chooser.py` near the other repository/chooser tests and adjust imports accordingly):
```python
import logging
import pytest
from poetry.core.packages.utils.link import Link
from poetry.repositories.http_repository import HttpRepository
def test_http_repository_skips_links_without_supported_hashes(
caplog: pytest.LogCaptureFixture,
) -> None:
"""
Regression test for HttpRepository._links_to_data skipping links whose
hashes do not contain a supported algorithm (e.g., no sha256).
This ensures that such links are not included in data.files and that a
warning is emitted, preventing null hashes from reappearing in lockfiles.
"""
# Arrange: create a repository and three links:
# - one with a valid sha256 hash (should be kept),
# - one with only an unsupported hash algorithm (e.g. md5),
# - one with no hash at all.
repository = HttpRepository("dummy", "https://example.com/simple")
valid_link = Link(
"https://example.com/files/demo-1.0.0.tar.gz"
"#sha256=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
)
md5_only_link = Link(
"https://example.com/files/demo-1.0.0.whl"
"#md5=0123456789abcdef0123456789abcdef"
)
no_hash_link = Link("https://example.com/files/demo-1.0.0.zip")
with caplog.at_level(logging.WARNING):
# NOTE: _links_to_data is private but is what we are regressing.
data = repository._links_to_data( # type: ignore[attr-defined]
"demo",
"1.0.0",
[valid_link, md5_only_link, no_hash_link],
)
# Assert: only the link with a supported hash (sha256) is present in data.files.
filenames = [file["file"] for file in data.files]
assert "demo-1.0.0.tar.gz" in filenames
assert "demo-1.0.0.whl" not in filenames
assert "demo-1.0.0.zip" not in filenames
# Assert: a warning was emitted about skipping links without suitable hashes.
assert any(
"hash" in record.getMessage().lower()
and ("skip" in record.getMessage().lower() or "cannot determine" in record.getMessage().lower())
for record in caplog.records
)
```
Integration notes:
1. **Imports**
Make sure the following imports exist at the top of whichever test module you place this test in (remove duplicates if they are already present):
```python
import logging
import pytest
from poetry.core.packages.utils.link import Link
from poetry.repositories.http_repository import HttpRepository
```
2. **Repository API alignment**
If the `HttpRepository` constructor signature in your codebase differs from:
```python
HttpRepository(name: str, url: str, *args, **kwargs)
```
adjust the instantiation accordingly, e.g.:
```python
repository = HttpRepository("dummy", "https://example.com/simple", config=config)
```
or whatever your tests use elsewhere.
3. **_links_to_data signature**
If `_links_to_data` has a different signature (e.g., extra parameters), update the call accordingly while still passing in a sequence of `Link` objects that:
- one has a valid `sha256` hash,
- at least one has only unsupported hashes,
- at least one has no hash.
4. **Log message wording**
The `assert any(...)` block checks for generic substrings (`"hash"`, `"skip"`, `"cannot determine"`).
If your implementation uses different wording in the warning, tweak this assertion to match the actual log message while still verifying that such a warning is emitted.
With this test in place, the new behaviour of `HttpRepository._links_to_data`—skipping links/files without an acceptable hash and logging a warning—will be locked in and guarded against regression.
</issue_to_address>
### Comment 3
<location> `tests/installation/test_executor.py:1823` </location>
<code_context>
)
def test_executor_known_hashes(
- package_files: list[dict[str, str]],
+ package_files: list[PackageFile],
expected_url_reference: dict[str, Any],
tmp_venv: VirtualEnv,
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `test_executor_known_hashes` to cover `PackageFile` structures originating from legacy `hashes` metadata.
Since the locker now builds `package.files` from legacy `hashes` metadata as `{"file": h, "hash": h}`, please add/extend a parametrized case where `package_files` matches this structure. This will ensure the executor correctly handles `PackageFile` instances originating from both modern `files` and legacy `hashes` data, and will catch any mismatch in expected keys or structure.
Suggested implementation:
```python
def test_executor_known_hashes(
package_files: Sequence[PackageFile],
expected_url_reference: dict[str, Any],
tmp_venv: VirtualEnv,
pool: RepositoryPool,
```
To fully implement your suggestion about legacy `hashes` metadata, you should also:
1. **Extend the existing parametrization for `test_executor_known_hashes`** (the `@pytest.mark.parametrize(...)` block that immediately precedes this function) with a new case that builds `package_files` from legacy-style metadata. Follow the same structure as the existing cases.
Assuming the current parametrization looks roughly like:
```python
@pytest.mark.parametrize(
"package_files, expected_url_reference",
[
(
[
PackageFile(
name="example-1.0.0-py3-none-any.whl",
hash="sha256:...",
),
],
{
"url": "...",
"hashes": ["sha256:..."],
},
),
# other cases...
],
)
def test_executor_known_hashes(...):
...
```
Add a new tuple to that list that **mirrors the legacy `hashes` representation** transformed into `PackageFile` objects. For example (adapt the names/URLs/hashes to match your existing fixtures):
```python
@pytest.mark.parametrize(
"package_files, expected_url_reference",
[
# existing cases...
# legacy hashes-based PackageFile entries: {"file": h, "hash": h}
(
[
PackageFile(
name="example-1.0.0-py3-none-any.whl",
hash="sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
),
PackageFile(
name="example-1.0.0.tar.gz",
hash="sha256:fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210",
),
],
{
"url": "https://files.pythonhosted.org/packages/example/example-1.0.0-py3-none-any.whl",
"hashes": [
"sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
"sha256:fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210",
],
},
),
],
)
```
The important part is that each `PackageFile` corresponds to what the locker would build from legacy metadata:
```python
# legacy locker shape:
{"file": "<dist-filename>", "hash": "<hash>"}
# test representation:
PackageFile(name="<dist-filename>", hash="<hash>")
```
2. **Ensure the executor under test doesn’t assume only “modern” `files` data.** The new parametrized case should exercise the same path the executor uses for the existing cases, just with `PackageFile` instances that originated from legacy `hashes` entries. If the test currently constructs `PackageFile` instances via a helper (e.g., `make_package_file_from_metadata(...)`), use that helper in the new case to keep things consistent.
3. If your existing parametrization uses shared fixtures or factory functions (e.g., reusing a base `Package` or URL), reuse them in the new case as well so the only difference is the shape/origin of the `PackageFile` data.
These additional changes will ensure the executor correctly handles `PackageFile` instances built from both modern `files` and legacy `hashes` metadata.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
b6d48be to
b61f74f
Compare
b61f74f to
196c5c7
Compare
Pull Request Check List
Requires: python-poetry/poetry-core#904
Fix edge case issues (and corrupt tests) uncovered by python-poetry/poetry-core#904.
Summary by Sourcery
Align package file handling with updated typing and fix edge cases around missing or invalid file hashes.
Bug Fixes:
Enhancements:
Build:
Tests: