-
Notifications
You must be signed in to change notification settings - Fork 2.4k
fill url, size and upload-time in Package.files, bump cache version #10677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fill url, size and upload-time in Package.files, bump cache version #10677
Conversation
…a package index This makes it easier to get the required information to write a pylock.toml file. However, we have to make sure that it does not slip into poetry.lock (because we want to avoid unnecessary changes to the format.)
* url, size and upload-time of an artifact are cached now * JSON API is prefered to HTML API (size and upload-time are only available via JSON API)
Reviewer's GuideExtends Package.files metadata to include URL, size, and upload time from HTTP/JSON/legacy/PyPI sources and direct file origins while ensuring lock files still only store file and hash, with supporting fixtures/tests and a cache version bump. Sequence diagram for extended file metadata from PyPI to lock filesequenceDiagram
actor User
participant Installer
participant Provider
participant PyPiRepository
participant JsonLinkSource
participant HttpRepository
participant Package
participant Locker
User->>Installer: install package
Installer->>Provider: resolve dependency
Provider->>PyPiRepository: search package
PyPiRepository->>JsonLinkSource: request links
JsonLinkSource->>JsonLinkSource: parse JSON files
JsonLinkSource->>JsonLinkSource: create Link(size, upload_time)
JsonLinkSource-->>PyPiRepository: Link objects
PyPiRepository->>HttpRepository: get release info
HttpRepository->>HttpRepository: _links_to_data(links, data)
HttpRepository->>Package: set files[{file, hash, url, size, upload_time}]
PyPiRepository-->>Provider: Package with rich files
Provider-->>Installer: resolved Package
Installer->>Locker: lock project with Package
Locker->>Locker: _dump_package(package, target, env)
Locker->>Locker: sort and strip files to {file, hash}
Locker-->>Installer: lock data with minimal file metadata
Updated class diagram for package file metadata handlingclassDiagram
class Package {
+list~dict~ files
}
class HttpRepository {
+_links_to_data(links, data) dict
}
class PyPiRepository {
+_get_release_info(name, version) PackageInfo
}
class JsonLinkSource {
+_link_cache() LinkCache
}
class DirectOrigin {
+get_package_from_file(file_path) Package
+get_package_from_url(url) Package
}
class Provider {
+_search_for_file(dependency) Package
}
class Locker {
+_dump_package(package, target, env) dict
}
class CachedRepository {
+CACHE_VERSION Constraint
}
class Link {
+filename str
+url_without_fragment str
+size int
+upload_time_isoformat str
}
class PackageInfo {
+files list~dict~
}
HttpRepository --> PackageInfo : populates_files
HttpRepository --> Link : consumes
PyPiRepository --> PackageInfo : populates_files
JsonLinkSource --> Link : creates_with_size_upload_time
DirectOrigin --> Package : sets_files_with_size
Provider --> Package : no_direct_files_assignment
Locker --> Package : reads_files
Locker ..> Package : stores_only_file_and_hash
CachedRepository ..> PyPiRepository : used_for_caching
Flow diagram for package file metadata through repositories and lockerflowchart LR
A[PyPI JSON API] --> B[JsonLinkSource]
B -->|creates Link with size and upload_time| C[HttpRepository]
C -->|_links_to_data adds url, size, upload_time| D[Package.files]
A2[Legacy/PyPI file listing] --> C
E[DirectOrigin.get_package_from_file] -->|computes hash and size| D
D -->|read files| F[Locker._dump_package]
F -->|keep only file and hash| G[poetry.lock]
subgraph CRepo[CachedRepository]
C
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 3 issues, and left some high level feedback:
- The temporary dependency override in pyproject.toml pointing to the radoering/poetry-core git branch should be reverted back to the canonical python-poetry/poetry-core source (and version-pinned appropriately) before merging.
- The new legacy fixtures
get_legacy_dist_urlandget_legacy_dist_size_and_upload_timehave slightly misleading error messages (e.g. referencing.htmlin the JSON helper) and repeated path-discovery logic; consider aligning the messages with the actual file types and extracting the common package/version resolution into a shared helper to reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The temporary dependency override in pyproject.toml pointing to the radoering/poetry-core git branch should be reverted back to the canonical python-poetry/poetry-core source (and version-pinned appropriately) before merging.
- The new legacy fixtures `get_legacy_dist_url` and `get_legacy_dist_size_and_upload_time` have slightly misleading error messages (e.g. referencing `.html` in the JSON helper) and repeated path-discovery logic; consider aligning the messages with the actual file types and extracting the common package/version resolution into a shared helper to reduce duplication.
## Individual Comments
### Comment 1
<location> `src/poetry/repositories/http_repository.py:381-384` </location>
<code_context>
+ "url": link.url_without_fragment,
+ }
+ )
+ if link.size is not None:
+ files[-1]["size"] = link.size
+ if link.upload_time_isoformat is not None:
+ files[-1]["upload_time"] = link.upload_time_isoformat
if not files:
</code_context>
<issue_to_address>
**issue:** Accessing `files[-1]` can fail when `file_hash` is `None` and nothing is appended.
When `file_hash is None`, we only log and don’t append to `files`, but the subsequent size/upload_time blocks still run and access `files[-1]`. If the first (or any) link has no hash, `files` may be empty and this will raise `IndexError`. Either `continue` when `file_hash is None`, or only set size/upload_time when `files` is non-empty (e.g., by moving those assignments into the branch where the append occurs).
</issue_to_address>
### Comment 2
<location> `tests/repositories/fixtures/pypi.py:166-175` </location>
<code_context>
+def get_pypi_file_info(
</code_context>
<issue_to_address>
**suggestion:** Make get_pypi_file_info filename parsing robust to project names containing dashes
This logic will break for wheel filenames where the project name contains dashes (e.g. `my-package-1.0.0-py3-none-any.whl` yields `package_name == "my"`, `version == "package"`). That could make tests subtly wrong once such fixtures are added.
To make this helper robust, please use a safer parsing approach (e.g. `packaging.utils.parse_wheel_filename` / `packaging.tags`, or a more conservative `rsplit`-based pattern) so it continues to work for project names with dashes.
Suggested implementation:
```python
from collections.abc import Callable
from pathlib import Path
from packaging.utils import parse_wheel_filename
from requests import PreparedRequest
```
```python
@pytest.fixture
def get_pypi_file_info(
package_json_locations: list[Path],
) -> Callable[[str], dict[str, Any]]:
def get_file_info(name: str) -> dict[str, Any]:
if name.endswith(".whl"):
distribution, version, _, _ = parse_wheel_filename(name)
package_name = distribution
else:
package_name, version = name.removesuffix(".tar.gz").rsplit("-", 1)
path = package_json_locations[0] / package_name
if not path.exists():
raise RuntimeError(
```
</issue_to_address>
### Comment 3
<location> `tests/repositories/fixtures/legacy.py:267-270` </location>
<code_context>
+def get_legacy_dist_size_and_upload_time(
+ legacy_package_json_locations: list[Path],
+) -> Callable[[str], tuple[int | None, str | None]]:
+ def get_size_and_upload_time(name: str) -> tuple[int | None, str | None]:
+ package_name = name.split("-", 1)[0]
+ path = Path()
+ for location in legacy_package_json_locations:
+ path = location / f"{package_name}.json"
+ if path.exists():
+ break
+ if not path.exists():
+ raise RuntimeError(
+ f"Fixture for {package_name}.json not found in legacy fixtures"
+ )
</code_context>
<issue_to_address>
**nitpick (typo):** Fix misleading error message in legacy JSON fixture helper
The final `RuntimeError` still refers to an HTML URL, but this helper operates on `{package_name}.json` fixtures. Please update the message to mention the JSON fixture (`{package_name}.json`) and the missing file, rather than a URL, so test failures are clearer.
```suggestion
if not path.exists():
searched_paths = ", ".join(str(location) for location in legacy_package_json_locations)
raise RuntimeError(
f"Legacy JSON fixture file '{package_name}.json' not found in any of: {searched_paths}"
)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Pull Request Check List
Actually, this is just an internal change that makes it easier to get the url, size and upload-time of a wheel/sdist. However, the change requires a bump of the cache version.
Requires: python-poetry/poetry-core#905
Related-to: python-poetry/poetry-plugin-export#336
Related-to: #10356
Related-to: #10646
Summary by Sourcery
Populate package file metadata with URLs, sizes, and upload times and ensure this additional information is not persisted into lock files, updating cache expectations accordingly.
New Features:
Enhancements:
Tests:
Chores: