-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic PyPi package rating and removal #16923
Comments
See #16034 for some previous discussion around this topic. As a rough summary: "dead" packages are not something that PyPI currently has much of an opinion around -- others have proposed a more curatorial approach to the index, but that approach requires a significant amount of administrative/maintenance overhead. That overhead would draw time from other ongoing maintenance and development efforts. I think rankings of package quality are an interesting idea, but IMO PyPI shouldn't be in the business of issuing those kinds of value judgements. This is especially true when such metrics include things like package age, since a significant "draw" of the Python packaging ecosystem is that old packages typically still work and don't need to be updated with every CPython release. In other words: if we were to begin penalizing packages based on (perceived) inactivity, we'd likely end up penalizing some of the most important packages on the index, regardless of whether they're actually abandoned or not. (This isn't meant to imply that PyPI can't get rid of obviously dead packages. Only that an "objective" metric for identifying those packages may be difficult to obtain, outside of a handful of obvious cases!) |
Hi @woodruffw, Thank you for your answer. I understand that such a metric is hard to design, which is why I stressed its open-source aspect. The age of a package that I have mentioned is an important aspect to consider alongside all of the others, not by itself: If a person has just published an empty package, it is not as much of a problem as a person who published an empty package 15 years ago and never looked back. None of the ranking rules I mentioned makes sense on its own, but taken all together I think they should have a decent false positive rate. I don't advocate to blindly deploy such quality metric, but to measure their efficacy and then determine whether it is a good idea to deploy it. I see that in the link you provided, @di mentions that there indeed is already the 541, but the year-long backlog (currently open issues go back to March 2023) is only increasing. Here is an example of a package name I was interested in, which seems rather dead, and yet my request has received no reply whatsoever, nor I believe I will receive one as there are hundreds of such requests before my own. Without some automation helping out, a year of backlog can start to grow out of control. |
I agree, but it's my personal (non-maintainer!) opinion that such a metric, even if it's a low-FP one, doesn't belong on PyPI itself. It'd be good for such a metric to exist in a third-party context, but I don't think PyPI itself should be in the business of making value judgements around package quality. OTOH, I think there two things that would reasonably be within PyPI's purview:
Just for context, the backlog is actually no longer increasing: it's been decreasing for a few weeks now that there's someone working on it full time: https://discuss.python.org/t/is-pep-541-still-the-correct-solution/27436/25. That trend should continue into the future. I understand that it's frustrating to wait a long time for a support request, but things are getting better on that front. |
I am currently scraping (with an appropriate frequency) the PyPI metadata for the packages from the API. Hopefully, I should be able to create a public anonymised dataset and try out approaches, so to determine the number of problematic packages. With the speed I am using, it will take at least 10 days or so to build a first version of the dataset. Hopefully the percentage of problematic packages is small, but nevertheless I believe such a study to be of some usefulness, all open sourced of course. |
I recently searched for a package for a particular application and came across several packages that just contained a "skeleton" i.e. all required files and structures were there but no functionality implemented at all. The "skeleton" had been uploaded with version 0.0.1 and then no more activity for several years. Is it perhaps so that some people register a package at pypi in order to have control of that package name for possible future use? It does make pypi less useful IMO if one has to browse through a bunch of empty packages before actually finding something that at least partly works. And if the most natural names for some package are taken then someone actually implementing a working package would have to pick a name less likely for users to find. A possibility to "downvote" or something similar in order to eventually purge packages is perhaps necessary? |
Yes, this is a pretty common user behavior, with both benign and malicious intent: the benign case is one where people will put a stub package up because they intend to upload a "real" one soon, and the malicious case is namespace squatting. Accounts that engage in pervasive squatting are typically banned once detected or reported; examples of this can be seen on PyPI's support issue tracker.
I think this would be very difficult to do in a way that's simultaneously (1) sufficiently general, (2) not itself subject to abuse (e.g. up/downvote smurfing), and (3) doesn't induce negative incentives in the ecosystem (e.g. discouraging new packages for fear of being downvoted/overshadowed). This goes back to the point in #16923 (comment), which is that a metric here might be useful, but (IMO) shouldn't live on PyPI itself. There's also some larger discussion about reputation in #991, which elaborates on the challenges to building such a system without significant downsides. |
So where do I report an empty package that's had no activity since creation 5 years ago? |
You can report it on https://github.com/pypi/support. But keep in mind that PyPI doesn't always inherently consider empty packages a problem; the support process is primarily for dealing with large-scale squatting or if you want to perform a PEP 541 takeover of an abandoned project name. |
What's the problem this feature will solve?
At this time, there are lots of dead packages hosted on Pip.
These packages are characterized by no link to the source code, no README, and sometimes a single almost empty release long ago by a user who has never logged in again since. This impacts the name availability, which seems to have a rather large backlog at the time of writing, and generally makes Pip suffer from package rot (more and more results I get when I search a package name are just dead things).
Describe the solution you'd like
This may be Déformation professionnelle, but I believe it should be possible to create a ranking of sorts for package quality, based on an open-source algorithm that people may contribute to. This algorithm would receive a package directory in input alongside its structure metadata and spits out a score depending on how many desirable traits it has (or has not). Packages that fall outside of a certain percentile of the distribution, get flagged and an email is sent to their owner advising them that their package has entered a grace period after which, if no action is taken by them, it will be deleted, or the ownership of the package transferred to some other user who may take over its development.
Examples of such rules
Examples of rules, which I stress I believe should be defined as an open source library and be gradually improved upon by the community, could be:
I believe such simple rules could already help eliminate a large amount of dead packages.
I would be most definitely open to contributing to such a project.
User Interface
This score may be integrated into the PyPi website itself, allowing users who are comparing packages to see their ratings.
Pypi warning
A pip install operation may warn the user installing a given package that it has a very low score and may get deleted.
The text was updated successfully, but these errors were encountered: