Skip to content

Conversation

@radoering
Copy link
Member

@radoering radoering commented Dec 31, 2025

Pull Request Check List

Actually, this is just an internal change that makes it easier to get the url, size and upload-time of a wheel/sdist. However, the change requires a bump of the cache version.

Requires: python-poetry/poetry-core#905
Related-to: python-poetry/poetry-plugin-export#336
Related-to: #10356
Related-to: #10646

  • Added tests for changed code.
  • Updated documentation for changed code.

Summary by Sourcery

Populate package file metadata with URLs, sizes, and upload times and ensure this additional information is not persisted into lock files, updating cache expectations accordingly.

New Features:

  • Include file URLs, sizes, and upload times in package file metadata when resolving from HTTP, PyPI JSON, and legacy repositories, as well as for direct file and URL origins.

Enhancements:

  • Adjust link and repository handling to propagate size and upload-time data into package links and file entries.
  • Ensure the locker strips non-essential file metadata (such as URLs) when writing lock files, preserving only file names and hashes.
  • Expand and update tests and fixtures across repositories, providers, and direct origins to validate the new file metadata behaviour and cache version bump.

Tests:

  • Add and update tests for legacy and PyPI repositories, provider search, direct origins, and locker behaviour to cover the enriched package file metadata and lockfile output.

Chores:

  • Bump the cached repository cache version to reflect the new package file shape and temporarily depend on a feature branch of poetry-core exposing file metadata.

…a package index

This makes it easier to get the required information to write a pylock.toml file. However, we have to make sure that it does not slip into poetry.lock (because we want to avoid unnecessary changes to the format.)
* url, size and upload-time of an artifact are cached now
* JSON API is prefered to HTML API (size and upload-time are only available via JSON API)
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 31, 2025

Reviewer's Guide

Extends Package.files metadata to include URL, size, and upload time from HTTP/JSON/legacy/PyPI sources and direct file origins while ensuring lock files still only store file and hash, with supporting fixtures/tests and a cache version bump.

Sequence diagram for extended file metadata from PyPI to lock file

sequenceDiagram
    actor User
    participant Installer
    participant Provider
    participant PyPiRepository
    participant JsonLinkSource
    participant HttpRepository
    participant Package
    participant Locker

    User->>Installer: install package
    Installer->>Provider: resolve dependency
    Provider->>PyPiRepository: search package

    PyPiRepository->>JsonLinkSource: request links
    JsonLinkSource->>JsonLinkSource: parse JSON files
    JsonLinkSource->>JsonLinkSource: create Link(size, upload_time)
    JsonLinkSource-->>PyPiRepository: Link objects

    PyPiRepository->>HttpRepository: get release info
    HttpRepository->>HttpRepository: _links_to_data(links, data)
    HttpRepository->>Package: set files[{file, hash, url, size, upload_time}]
    PyPiRepository-->>Provider: Package with rich files

    Provider-->>Installer: resolved Package
    Installer->>Locker: lock project with Package
    Locker->>Locker: _dump_package(package, target, env)
    Locker->>Locker: sort and strip files to {file, hash}
    Locker-->>Installer: lock data with minimal file metadata
Loading

Updated class diagram for package file metadata handling

classDiagram
    class Package {
        +list~dict~ files
    }

    class HttpRepository {
        +_links_to_data(links, data) dict
    }

    class PyPiRepository {
        +_get_release_info(name, version) PackageInfo
    }

    class JsonLinkSource {
        +_link_cache() LinkCache
    }

    class DirectOrigin {
        +get_package_from_file(file_path) Package
        +get_package_from_url(url) Package
    }

    class Provider {
        +_search_for_file(dependency) Package
    }

    class Locker {
        +_dump_package(package, target, env) dict
    }

    class CachedRepository {
        +CACHE_VERSION Constraint
    }

    class Link {
        +filename str
        +url_without_fragment str
        +size int
        +upload_time_isoformat str
    }

    class PackageInfo {
        +files list~dict~
    }

    HttpRepository --> PackageInfo : populates_files
    HttpRepository --> Link : consumes

    PyPiRepository --> PackageInfo : populates_files

    JsonLinkSource --> Link : creates_with_size_upload_time

    DirectOrigin --> Package : sets_files_with_size

    Provider --> Package : no_direct_files_assignment

    Locker --> Package : reads_files
    Locker ..> Package : stores_only_file_and_hash

    CachedRepository ..> PyPiRepository : used_for_caching
Loading

Flow diagram for package file metadata through repositories and locker

flowchart LR
    A[PyPI JSON API] --> B[JsonLinkSource]
    B -->|creates Link with size and upload_time| C[HttpRepository]
    C -->|_links_to_data adds url, size, upload_time| D[Package.files]

    A2[Legacy/PyPI file listing] --> C

    E[DirectOrigin.get_package_from_file] -->|computes hash and size| D

    D -->|read files| F[Locker._dump_package]
    F -->|keep only file and hash| G[poetry.lock]

    subgraph CRepo[CachedRepository]
        C
    end
Loading

File-Level Changes

Change Details Files
Populate Package.files with URL, size and upload-time metadata from repositories and JSON link sources.
  • In HttpRepository._links_to_data, include link.url_without_fragment as url and propagate link.size and link.upload_time_isoformat into the files entries
  • In PyPiRepository._get_release_info, add url plus optional size and upload_time_iso_8601 fields from JSON file_info when building data.files
  • In Json link source _link_cache, read size and upload-time from file entries and pass them into Link construction so downstream code can access them
src/poetry/repositories/http_repository.py
src/poetry/repositories/pypi_repository.py
src/poetry/repositories/link_sources/json.py
Capture file metadata for direct/file dependencies and stop computing it in the provider.
  • In DirectOrigin.get_package_from_file, set package.files with file name, sha256 hash, and filesystem size for the artifact
  • Remove Package.files assignment (and get_file_hash usage) from Provider._search_for_file, delegating file metadata population to DirectOrigin
src/poetry/packages/direct_origin.py
src/poetry/puzzle/provider.py
Ensure locker only persists file name and hash, not URL or size, and test this behavior.
  • Change Locker._dump_package to normalize package.files to dicts containing only file and hash before sorting and dumping
  • Rename and extend tests in tests/packages/test_locker.py to cover null descriptions and to assert that file URLs are omitted from the lock file
src/poetry/packages/locker.py
tests/packages/test_locker.py
Update PyPI and legacy repository tests/fixtures to validate new file metadata (url, size, upload-time).
  • Introduce TestLegacyRepository and switch legacy_repository fixtures to it, with a json flag to distinguish HTML vs JSON fixtures
  • Add helper fixtures get_legacy_dist_url and get_legacy_dist_size_and_upload_time to extract URL, size and upload-time from legacy HTML/JSON fixtures and use them in legacy repository tests expectations
  • Add get_pypi_file_info fixture for PyPI JSON fixtures and extend pypi_repository tests to assert url, size and upload_time in package.files
tests/repositories/fixtures/legacy.py
tests/repositories/test_legacy_repository.py
tests/repositories/fixtures/pypi.py
tests/repositories/test_pypi_repository.py
Extend provider and direct origin tests to assert populated Package.files metadata for file-based dependencies.
  • Update puzzle provider tests for file-based sdists/wheels to assert Package.files entries including size
  • Update direct origin tests to assert Package.files entries with file, hash and size for both local file and URL-based origins
tests/puzzle/test_provider.py
tests/packages/test_direct_origin.py
Adjust chooser tests and cache configuration to align with new behavior and dependencies.
  • Normalize hash formatting in installation chooser tests without changing semantics
  • Bump CachedRepository.CACHE_VERSION to 2.1.0 to invalidate old cache data that lacks the new metadata
  • Point pyproject.toml poetry-core dependency to the feature branch that provides URL/size/upload-time support
tests/installation/test_chooser.py
src/poetry/repositories/cached_repository.py
pyproject.toml
poetry.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The temporary dependency override in pyproject.toml pointing to the radoering/poetry-core git branch should be reverted back to the canonical python-poetry/poetry-core source (and version-pinned appropriately) before merging.
  • The new legacy fixtures get_legacy_dist_url and get_legacy_dist_size_and_upload_time have slightly misleading error messages (e.g. referencing .html in the JSON helper) and repeated path-discovery logic; consider aligning the messages with the actual file types and extracting the common package/version resolution into a shared helper to reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The temporary dependency override in pyproject.toml pointing to the radoering/poetry-core git branch should be reverted back to the canonical python-poetry/poetry-core source (and version-pinned appropriately) before merging.
- The new legacy fixtures `get_legacy_dist_url` and `get_legacy_dist_size_and_upload_time` have slightly misleading error messages (e.g. referencing `.html` in the JSON helper) and repeated path-discovery logic; consider aligning the messages with the actual file types and extracting the common package/version resolution into a shared helper to reduce duplication.

## Individual Comments

### Comment 1
<location> `src/poetry/repositories/http_repository.py:381-384` </location>
<code_context>
+                        "url": link.url_without_fragment,
+                    }
+                )
+            if link.size is not None:
+                files[-1]["size"] = link.size
+            if link.upload_time_isoformat is not None:
+                files[-1]["upload_time"] = link.upload_time_isoformat

         if not files:
</code_context>

<issue_to_address>
**issue:** Accessing `files[-1]` can fail when `file_hash` is `None` and nothing is appended.

When `file_hash is None`, we only log and don’t append to `files`, but the subsequent size/upload_time blocks still run and access `files[-1]`. If the first (or any) link has no hash, `files` may be empty and this will raise `IndexError`. Either `continue` when `file_hash is None`, or only set size/upload_time when `files` is non-empty (e.g., by moving those assignments into the branch where the append occurs).
</issue_to_address>

### Comment 2
<location> `tests/repositories/fixtures/pypi.py:166-175` </location>
<code_context>
+def get_pypi_file_info(
</code_context>

<issue_to_address>
**suggestion:** Make get_pypi_file_info filename parsing robust to project names containing dashes

This logic will break for wheel filenames where the project name contains dashes (e.g. `my-package-1.0.0-py3-none-any.whl` yields `package_name == "my"`, `version == "package"`). That could make tests subtly wrong once such fixtures are added.

To make this helper robust, please use a safer parsing approach (e.g. `packaging.utils.parse_wheel_filename` / `packaging.tags`, or a more conservative `rsplit`-based pattern) so it continues to work for project names with dashes.

Suggested implementation:

```python
from collections.abc import Callable
from pathlib import Path

from packaging.utils import parse_wheel_filename
from requests import PreparedRequest

```

```python
@pytest.fixture
def get_pypi_file_info(
    package_json_locations: list[Path],
) -> Callable[[str], dict[str, Any]]:
    def get_file_info(name: str) -> dict[str, Any]:
        if name.endswith(".whl"):
            distribution, version, _, _ = parse_wheel_filename(name)
            package_name = distribution
        else:
            package_name, version = name.removesuffix(".tar.gz").rsplit("-", 1)
        path = package_json_locations[0] / package_name
        if not path.exists():
            raise RuntimeError(

```
</issue_to_address>

### Comment 3
<location> `tests/repositories/fixtures/legacy.py:267-270` </location>
<code_context>
+def get_legacy_dist_size_and_upload_time(
+    legacy_package_json_locations: list[Path],
+) -> Callable[[str], tuple[int | None, str | None]]:
+    def get_size_and_upload_time(name: str) -> tuple[int | None, str | None]:
+        package_name = name.split("-", 1)[0]
+        path = Path()
+        for location in legacy_package_json_locations:
+            path = location / f"{package_name}.json"
+            if path.exists():
+                break
+        if not path.exists():
+            raise RuntimeError(
+                f"Fixture for {package_name}.json not found in legacy fixtures"
+            )
</code_context>

<issue_to_address>
**nitpick (typo):** Fix misleading error message in legacy JSON fixture helper

The final `RuntimeError` still refers to an HTML URL, but this helper operates on `{package_name}.json` fixtures. Please update the message to mention the JSON fixture (`{package_name}.json`) and the missing file, rather than a URL, so test failures are clearer.

```suggestion
        if not path.exists():
            searched_paths = ", ".join(str(location) for location in legacy_package_json_locations)
            raise RuntimeError(
                f"Legacy JSON fixture file '{package_name}.json' not found in any of: {searched_paths}"
            )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant