Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PEP-658 #139

Merged
merged 24 commits into from
Dec 15, 2024
Merged

Support PEP-658 #139

merged 24 commits into from
Dec 15, 2024

Conversation

ryanking13
Copy link
Member

This PR adds PEP 658: Serve Distribution Metadata in the Simple Repository API support to micropip.

When downloading package using Simple API (which is not a default), micropip will now check for distribution metadata and download the metadata file simultaneously which (hopefully) speeds up the dependency resolution.

Copy link
Contributor

@Carreau Carreau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't like adding gz files to repositories as this is the way XZ backdoor worked.

@@ -165,7 +196,7 @@ async def _fetch_bytes(self, fetch_kwargs: dict[str, Any]):
raise e
else:
raise ValueError(
f"Can't fetch wheel from '{self.url}'. "
f"Can't fetch wheel from '{url}'. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Can't fetch wheel from '{url}'. "
f"Can't fetch wheel from {url!r}. "

@ryanking13
Copy link
Member Author

Personally I don't like adding gz files to repositories as this is the way XZ backdoor worked.

Yeah, we've removed gz files in this repository so I think I should remove it in this PR too.

@ryanking13
Copy link
Member Author

@hoodmane @agriyakhetarpal Can one of you review this when you have time? I would like to include this in the next release.

# we just log the error and continue, as the metadata can be fetched extracted from the wheel.
logger.debug("Metadata not available for %s", wheel.name)

await wheel_download_task
await self.gather_requirements(wheel.requires(extras))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If download_pep658_metadata() succeeds is it possible to exchange the order here and call gather_requirements() before awaiting wheel_download_task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. I originally implemented in that way in #90... probably I did something wrong while cherry-picking changes.

Copy link
Member

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me implementation wise. Though it's not clear whether by itself it has any benefits yet since we block for downloading the wheel before resolving out the dependencies. So before we wanted one request to succeed before continuing and now if metadata is present we want two. If we can resolve the dependencies before the wheel arrives hopefully that's a performance win.

@mscolnick was asking about behavior where we would resolve and make a lock file without actually installing any wheels which would be a use case for this.

@ryanking13
Copy link
Member Author

If we can resolve the dependencies before the wheel arrives hopefully that's a performance win.

Yes, that's what I hope to achieve. How much it actually helps performance will depend on the network condition though.

@agriyakhetarpal
Copy link
Member

@hoodmane @agriyakhetarpal Can one of you review this when you have time? I would like to include this in the next release.

Thanks for the ping! Yes, I can help review it too. I am not acquainted with this PEP, so I'll need to read a bit about it.

Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the PEP now; these changes mostly look good to me. Thank you, @ryanking13! I hope this will give some beneficial resolution speedups before the full wheel downloads kick in:

However, I think this PEP should be accompanied by PEP 714 as well, which fixed a few bugs in PEP 658's implementation because PyPI and pip both had bugs that caused the metadata not to be parsed correctly. While I am confident that we haven't introduced that bug here, we should remain closer to PyPI regarding the implementation, so we should account for that change here. The key changes are as follows (please let me know if I missed something):

  • for the HTML API, rename to data-core-metadata, and for the JSON API, rename to core-metadata, and
  • fall back to the currently proposed data-dist-info-metadata if they don't exist

Details: what I understand from that PEP is that PyPI kept the current/broken keys but also added these new keys for backward compatibility, from this line in the PEP:

To support clients that used the previous key names, the HTML representation MAY also be emitted using the data-dist-info-metadata, and if it does so it MUST match the value of data-core-metadata.

And for the changes to add to micropip (client-side):

Clients consuming the JSON representation of the Simple API MUST read the PEP 658 metadata from the key core-metadata if it is present. They MAY optionally use the legacy dist-info-metadata key if it is present but core-metadata is not.

I don't intend to block the merge in case the release is urgently needed (apologies for the delay in the review); but I do feel that we should incorporate both PEPs in the same PR, since I don't think a lot of further changes would be required; just a few renames to the new APIs added, some fallbacks, and a few changes to the testing indices (which I am not familiar with the code for, though).

Here is pip's fix: pypa/pip#12078, and similarly for Warehouse: pypi/warehouse#13706


Also, one slightly related point for self.parsed_url.hostname: the Anaconda.org index does not seem to support this PEP on the server side, which is an aspect that we should put into some consideration when we switch to using https://anaconda.org/pyodide/simple in pyodide/pyodide-build#43 (while that is still okay because it's just for the build-time dependencies) and if we start using it for https://github.com/pyodide/pyodide-recipes later on to host most (if not all) wheels post pyodide/pyodide#4918:

pip from PyPI:

Tap to expand

$ pip install cython --dry-run --no-cache
Collecting cython
  Downloading Cython-3.0.11-py2.py3-none-any.whl.metadata (3.2 kB)
Downloading Cython-3.0.11-py2.py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 37.1 MB/s eta 0:00:00
Would install Cython-3.0.11

However, when downloading from Anaconda.org:

Tap to expand

$ pip install -i https://pypi.anaconda.org/pyodide/simple cython --dry-run --no-cache
Looking in indexes: https://pypi.anaconda.org/pyodide/simple
Collecting cython
  Downloading https://pypi.anaconda.org/pyodide/simple/cython/3.0.11/Cython-3.0.11-py2.py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 1.8 MB/s eta 0:00:00
Would install Cython-3.0.11 


Unrelated point, but similar to @Carreau's suggestion above: we should remove the existing wheel files from tests/test_data/wheel/, since they will only bloat the repository as we proceed to add more commits to the default branch. They can be stored elsewhere (maybe in a GitHub release in a new repository and downloaded into the directory using pooch before running the test suite).

micropip/wheelinfo.py Outdated Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member

Also, the current tests/test_data/pypi_response/pytest_simple.html does contain data-core-metadata and not data-dist-info-metadata, so we should indeed account for the change in micropip/externals/mousebender/simple.py.

@ryanking13
Copy link
Member Author

ryanking13 commented Dec 12, 2024

However, I think this PEP should be accompanied by PEP 714 as well, which fixed a few bugs in PEP 658's implementation because PyPI and pip both had bugs that caused the metadata not to be parsed correctly.

Oh, that's a good point indeed. I think updating the mousebender would work, so let me update the PR.

@ryanking13
Copy link
Member Author

Unrelated point, but similar to @Carreau's suggestion above: we should remove the existing wheel files from tests/test_data/wheel/, since they will only bloat the repository as we proceed to add more commits to the default branch. They can be stored elsewhere

Yes, maybe let's discuss it in a separate issue. I don't like to put bunch of big files in the repository for testing, but I am also not sure whether putting it in a separate place is a good way to go.

ryanking13 and others added 2 commits December 12, 2024 04:37
Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
@ryanking13
Copy link
Member Author

Also, the current tests/test_data/pypi_response/pytest_simple.html does contain data-core-metadata and not data-dist-info-metadata, so we should indeed account for the change in micropip/externals/mousebender/simple.py.

I just checked the test file and it seems like it contains both data-core-metadata and data-dist-info-metadata.

@agriyakhetarpal
Copy link
Member

Also, the current tests/test_data/pypi_response/pytest_simple.html does contain data-core-metadata and not data-dist-info-metadata, so we should indeed account for the change in micropip/externals/mousebender/simple.py.

I just checked the test file and it seems like it contains both data-core-metadata and data-dist-info-metadata.

Oops, sorry! Yes, PyPI contains both the new key and the fallback. I was searching for data-distinfo-metadata, without the hyphen in dist-info – I stand corrected.

Unrelated point, but similar to @Carreau's suggestion above: we should remove the existing wheel files from tests/test_data/wheel/, since they will only bloat the repository as we proceed to add more commits to the default branch. They can be stored elsewhere

Yes, maybe let's discuss it in a separate issue. I don't like to put bunch of big files in the repository for testing, but I am also not sure whether putting it in a separate place is a good way to go.

Yes, opened a separate issue: #165

Please feel free to re-request a review once you are done here, thanks!

@ryanking13
Copy link
Member Author

@agriyakhetarpal Thanks, I think it is ready now.

Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great, @ryanking13!

micropip/package_index.py Show resolved Hide resolved
@ryanking13 ryanking13 merged commit 902c495 into pyodide:main Dec 15, 2024
6 checks passed
@ryanking13 ryanking13 deleted the pep-658-take2 branch December 15, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants