Skip to content

Conversation

@radoering
Copy link
Member

@radoering radoering commented Dec 28, 2025

Pull Request Check List

Requires: python-poetry/poetry-core#904

Fix edge case issues (and corrupt tests) uncovered by python-poetry/poetry-core#904.

Summary by Sourcery

Align package file handling with updated typing and fix edge cases around missing or invalid file hashes.

Bug Fixes:

  • Correct package file key usage in chooser, executor, and locker logic to prevent mismatches and corrupted tests.
  • Skip files without resolvable hashes in HTTP repository link conversion to avoid writing invalid lockfile entries.

Enhancements:

  • Adopt the shared PackageFile type in repositories, inspection, executor tests, and chooser tests for stricter type safety and consistency.

Build:

  • Update poetry-core dependency to a specific Git revision that introduces improved package file typing.

Tests:

  • Adjust installation chooser and executor tests to use the new PackageFile structure and align with updated hash handling.

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 28, 2025

Reviewer's Guide

Aligns package file typing with the new PackageFile protocol, fixes mismatched file key usage, and hardens HTTP repository link-to-file conversion by skipping files without computable hashes, while updating tests, locker behavior, and the Poetry core dependency accordingly.

Sequence diagram for HttpRepository link to PackageFile conversion

sequenceDiagram
    participant HR as HttpRepository
    participant L as Link
    participant PI as PackageInfo

    HR->>HR: _links_to_data(links, data)
    loop for each Link in links
        HR->>L: read yanked, hashes, filename, url
        alt link is yanked and data not yanked
            HR-->>HR: continue (skip link)
        else link accepted
            HR-->>HR: determine hash_type and file_hash
            alt file_hash is None
                HR->>HR: _log(Failed to determine hash..., warning)
                HR-->>HR: skip adding PackageFile
            else valid hash
                HR-->>HR: append PackageFile{file, hash} to files
            end
        end
    end
    HR-->>PI: assign files to data.files
    HR-->>HR: return data as dict_str_Any
Loading

Class diagram for updated PackageFile usage

classDiagram
    class HttpRepository {
        -_log(message str, level str)
        _links_to_data(links list_Link, data PackageInfo) dict_str_Any
    }

    class PackageInfo {
        +name NormalizedName
        +version str
        +yanked bool
        +files list_PackageFile
    }

    class PackageFile {
        +file str
        +hash str
    }

    class Locker {
        _get_locked_package(name str, version str, metadata dict_str_Any) Package
    }

    class Package {
        +name str
        +version str
        +files list_PackageFile
    }

    class Link {
        +filename str
        +url str
        +hashes dict_str_str
        +yanked bool
    }

    HttpRepository --> PackageInfo : updates
    HttpRepository --> Link : consumes
    PackageInfo --> PackageFile : files
    Package --> PackageFile : files
    Locker --> Package : creates
    Locker --> PackageFile : assigns_files

    class PackageInfoInit {
        __init__(
            name NormalizedName,
            version str,
            summary str,
            requires_dist list_str,
            requires_python str,
            files Sequence_PackageFile,
            yanked bool,
            cache_version str
        ) None
    }

    PackageInfo <|-- PackageInfoInit : initialization
Loading

Class diagram for Locker package.files assignment change

classDiagram
    class Locker {
        _get_locked_package(name str, version str, metadata dict_str_Any) Package
    }

    class Package {
        +name str
        +version str
        +files list_PackageFile
    }

    class PackageFile {
        +file str
        +hash str
    }

    Locker --> Package : returns
    Package --> PackageFile : files

    class MetadataHashes {
        +hashes dict_str_Any
    }

    Locker ..> MetadataHashes : reads

    note for Locker "When metadata contains hashes only, Locker now constructs PackageFile items with key file instead of name"
Loading

File-Level Changes

Change Details Files
Adopt the new PackageFile type across code and tests and correct file-metadata key usage.
  • Import PackageFile for type checking in chooser and executor tests and in HTTP repository and inspection modules.
  • Annotate lists of package files as list[PackageFile] or Sequence[PackageFile] instead of generic dict/Mapping-based types.
  • Switch file-metadata dictionaries from using the 'filename' or 'name' key to the canonical 'file' key when building package.files data structures.
tests/installation/test_chooser.py
tests/installation/test_executor.py
src/poetry/repositories/http_repository.py
src/poetry/inspection/info.py
src/poetry/packages/locker.py
Handle missing file hashes robustly when converting HTTP links to lockfile file entries.
  • Change the HTTP repository _links_to_data implementation to build files as list[PackageFile].
  • Introduce a guard that logs a warning and skips a file when no hash can be determined instead of emitting a null-hash entry.
  • Preserve existing behavior of filtering yanked files unless the whole release is yanked.
src/poetry/repositories/http_repository.py
Update dependencies and lockfile to match the new core typing behavior.
  • Point the poetry-core dependency at a specific git ref that introduces the PackageFile typing changes.
  • Regenerate poetry.lock to reflect the updated poetry-core source and transitive dependency state.
pyproject.toml
poetry.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The temporary dependency on poetry-core @ git+https://github.com/radoering/poetry-core.git@package-files-typing in pyproject.toml should be reverted to a proper released version constraint before merging to avoid shipping a dependency on a personal fork/branch.
  • In _links_to_data, the inline comment # Is that even possible? is vague; either document under what conditions file_hash can be None or remove the speculative comment to keep the code self-explanatory.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The temporary dependency on `poetry-core @ git+https://github.com/radoering/poetry-core.git@package-files-typing` in `pyproject.toml` should be reverted to a proper released version constraint before merging to avoid shipping a dependency on a personal fork/branch.
- In `_links_to_data`, the inline comment `# Is that even possible?` is vague; either document under what conditions `file_hash` can be `None` or remove the speculative comment to keep the code self-explanatory.

## Individual Comments

### Comment 1
<location> `src/poetry/repositories/http_repository.py:344-347` </location>
<code_context>
             )

-        files: list[dict[str, Any]] = []
+        files: list[PackageFile] = []
         for link in links:
             if link.yanked and not data.yanked:
</code_context>

<issue_to_address>
**issue:** Dropping files with missing hashes may have unintended downstream effects when all links lack hashes.

Previously, links with unknown hashes still produced entries with `null` hashes; now they’re skipped entirely. If a release only has links without hashes, this could silently produce an empty `data.files`, changing how downstream selection/installation behaves. Consider whether this case should instead trigger a hard failure or a clearer error signal rather than an empty list.
</issue_to_address>

### Comment 2
<location> `tests/installation/test_chooser.py:216` </location>
<code_context>

     package = Package("isort", "4.3.4")
-    files = [
+    files: list[PackageFile] = [
         {
             "file": filename,
</code_context>

<issue_to_address>
**suggestion (testing):** Add a regression test for links/files without a hash being skipped by the repository.

`HttpRepository._links_to_data` now warns and skips files when it cannot determine a hash (`file_hash is None`), but current chooser tests only cover files with valid hashes. Please add a regression test (ideally in an http_repository–focused module) that uses links whose `link.hashes` lack the expected algorithms, verifies those files are excluded from `data.files`, and, if practical, checks that the warning is emitted. This will lock in the new behaviour and prevent reintroducing `null` hashes into the lockfile.

Suggested implementation:

```python
from tests.types import DistributionHashGetter

```

` section so it can be integrated correctly into your codebase.

Here are the code changes:

```xml
<file_operations>
  <file_operation operation="edit" file_path="tests/installation/test_chooser.py">
<<<<<<< SEARCH
from tests.types import DistributionHashGetter
=======
from tests.types import DistributionHashGetter
>>>>>>> REPLACE
  </file_operation>
</file_operations>
```

<additional_changes>
Please add the following regression test to an http-repository–focused test module (preferably `tests/repositories/test_http_repository.py`; if that file does not exist, you can put it into `tests/installation/test_chooser.py` near the other repository/chooser tests and adjust imports accordingly):

```python
import logging

import pytest
from poetry.core.packages.utils.link import Link
from poetry.repositories.http_repository import HttpRepository


def test_http_repository_skips_links_without_supported_hashes(
    caplog: pytest.LogCaptureFixture,
) -> None:
    """
    Regression test for HttpRepository._links_to_data skipping links whose
    hashes do not contain a supported algorithm (e.g., no sha256).

    This ensures that such links are not included in data.files and that a
    warning is emitted, preventing null hashes from reappearing in lockfiles.
    """
    # Arrange: create a repository and three links:
    # - one with a valid sha256 hash (should be kept),
    # - one with only an unsupported hash algorithm (e.g. md5),
    # - one with no hash at all.
    repository = HttpRepository("dummy", "https://example.com/simple")

    valid_link = Link(
        "https://example.com/files/demo-1.0.0.tar.gz"
        "#sha256=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
    )
    md5_only_link = Link(
        "https://example.com/files/demo-1.0.0.whl"
        "#md5=0123456789abcdef0123456789abcdef"
    )
    no_hash_link = Link("https://example.com/files/demo-1.0.0.zip")

    with caplog.at_level(logging.WARNING):
        # NOTE: _links_to_data is private but is what we are regressing.
        data = repository._links_to_data(  # type: ignore[attr-defined]
            "demo",
            "1.0.0",
            [valid_link, md5_only_link, no_hash_link],
        )

    # Assert: only the link with a supported hash (sha256) is present in data.files.
    filenames = [file["file"] for file in data.files]

    assert "demo-1.0.0.tar.gz" in filenames
    assert "demo-1.0.0.whl" not in filenames
    assert "demo-1.0.0.zip" not in filenames

    # Assert: a warning was emitted about skipping links without suitable hashes.
    assert any(
        "hash" in record.getMessage().lower()
        and ("skip" in record.getMessage().lower() or "cannot determine" in record.getMessage().lower())
        for record in caplog.records
    )
```

Integration notes:

1. **Imports**  
   Make sure the following imports exist at the top of whichever test module you place this test in (remove duplicates if they are already present):

   ```python
   import logging

   import pytest
   from poetry.core.packages.utils.link import Link
   from poetry.repositories.http_repository import HttpRepository
   ```

2. **Repository API alignment**  
   If the `HttpRepository` constructor signature in your codebase differs from:

   ```python
   HttpRepository(name: str, url: str, *args, **kwargs)
   ```

   adjust the instantiation accordingly, e.g.:

   ```python
   repository = HttpRepository("dummy", "https://example.com/simple", config=config)
   ```

   or whatever your tests use elsewhere.

3. **_links_to_data signature**  
   If `_links_to_data` has a different signature (e.g., extra parameters), update the call accordingly while still passing in a sequence of `Link` objects that:
   - one has a valid `sha256` hash,
   - at least one has only unsupported hashes,
   - at least one has no hash.

4. **Log message wording**  
   The `assert any(...)` block checks for generic substrings (`"hash"`, `"skip"`, `"cannot determine"`).  
   If your implementation uses different wording in the warning, tweak this assertion to match the actual log message while still verifying that such a warning is emitted.

With this test in place, the new behaviour of `HttpRepository._links_to_data`—skipping links/files without an acceptable hash and logging a warning—will be locked in and guarded against regression.
</issue_to_address>

### Comment 3
<location> `tests/installation/test_executor.py:1823` </location>
<code_context>
 )
 def test_executor_known_hashes(
-    package_files: list[dict[str, str]],
+    package_files: list[PackageFile],
     expected_url_reference: dict[str, Any],
     tmp_venv: VirtualEnv,
</code_context>

<issue_to_address>
**suggestion (testing):** Extend `test_executor_known_hashes` to cover `PackageFile` structures originating from legacy `hashes` metadata.

Since the locker now builds `package.files` from legacy `hashes` metadata as `{"file": h, "hash": h}`, please add/extend a parametrized case where `package_files` matches this structure. This will ensure the executor correctly handles `PackageFile` instances originating from both modern `files` and legacy `hashes` data, and will catch any mismatch in expected keys or structure.

Suggested implementation:

```python
def test_executor_known_hashes(
    package_files: Sequence[PackageFile],
    expected_url_reference: dict[str, Any],
    tmp_venv: VirtualEnv,
    pool: RepositoryPool,

```

To fully implement your suggestion about legacy `hashes` metadata, you should also:

1. **Extend the existing parametrization for `test_executor_known_hashes`** (the `@pytest.mark.parametrize(...)` block that immediately precedes this function) with a new case that builds `package_files` from legacy-style metadata. Follow the same structure as the existing cases.

   Assuming the current parametrization looks roughly like:
   ```python
   @pytest.mark.parametrize(
       "package_files, expected_url_reference",
       [
           (
               [
                   PackageFile(
                       name="example-1.0.0-py3-none-any.whl",
                       hash="sha256:...",
                   ),
               ],
               {
                   "url": "...",
                   "hashes": ["sha256:..."],
               },
           ),
           # other cases...
       ],
   )
   def test_executor_known_hashes(...):
       ...
   ```

   Add a new tuple to that list that **mirrors the legacy `hashes` representation** transformed into `PackageFile` objects. For example (adapt the names/URLs/hashes to match your existing fixtures):

   ```python
   @pytest.mark.parametrize(
       "package_files, expected_url_reference",
       [
           # existing cases...

           # legacy hashes-based PackageFile entries: {"file": h, "hash": h}
           (
               [
                   PackageFile(
                       name="example-1.0.0-py3-none-any.whl",
                       hash="sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
                   ),
                   PackageFile(
                       name="example-1.0.0.tar.gz",
                       hash="sha256:fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210",
                   ),
               ],
               {
                   "url": "https://files.pythonhosted.org/packages/example/example-1.0.0-py3-none-any.whl",
                   "hashes": [
                       "sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
                       "sha256:fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210",
                   ],
               },
           ),
       ],
   )
   ```

   The important part is that each `PackageFile` corresponds to what the locker would build from legacy metadata:
   ```python
   # legacy locker shape:
   {"file": "<dist-filename>", "hash": "<hash>"}
   # test representation:
   PackageFile(name="<dist-filename>", hash="<hash>")
   ```

2. **Ensure the executor under test doesn’t assume only “modern” `files` data.** The new parametrized case should exercise the same path the executor uses for the existing cases, just with `PackageFile` instances that originated from legacy `hashes` entries. If the test currently constructs `PackageFile` instances via a helper (e.g., `make_package_file_from_metadata(...)`), use that helper in the new case to keep things consistent.

3. If your existing parametrization uses shared fixtures or factory functions (e.g., reusing a base `Package` or URL), reuse them in the new case as well so the only difference is the shape/origin of the `PackageFile` data.

These additional changes will ensure the executor correctly handles `PackageFile` instances built from both modern `files` and legacy `hashes` metadata.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@radoering radoering force-pushed the package-files-typing branch from b6d48be to b61f74f Compare December 28, 2025 12:15
@radoering radoering force-pushed the package-files-typing branch from b61f74f to 196c5c7 Compare December 31, 2025 06:34
@radoering radoering merged commit e7b52af into python-poetry:main Dec 31, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant