A new package index for Python
The Python Package Index (PyPI) is the principal repository of libraries for the Python programming language, serving more than 170 million downloads each week. Fifteen years after PyPI launched, a new edition is in beta at pypi.org, with features like better search, a refreshed layout, and Markdown README files (and with some old features removed, like viewing GPG package signatures). Starting April 16, users visiting the site or running pip install will be seamlessly redirected to the new site. Two weeks after that, the legacy site is expected to be shut down and the team will turn toward new features; in the meantime, it is worth a look at what the new PyPI brings to the table.
Growing needs and a restart
In the early 2000s, several Python developers wrote and ran their own tools cataloging and linking to available Python packages. In 2002, Richard Jones successfully proposed PEP 301 to create an official index meant to run on a single server and linking to Python packages hosted elsewhere. Jones, and Martin von Löwis who joined him as a core maintainer soon after, started, administered, and improved the site — before the advent of Django, Flask, Pyramid, and other Python web frameworks.
Jones, von Löwis, and (starting in the 2010s) Donald Stufft were volunteers — as with Wikipedia, "the Cheese Shop" (as it was named by Barry Warsaw) became popular before it got consistent upkeep from paid staffers, and demands on PyPI's infrastructure grew steadily. For packagers' convenience and to improve the experience of end users, the maintainers started allowing packagers to upload files onto a central server via ssh; the PyPI application assumed those files lived on its local filesystem.
For security,
performance, and user experience reasons, PyPI stopped indexing
projects with files that were hosted externally in 2015. As PEP 470 ("Removing External Hosting Support on PyPI") stated, often
"end users want to use PyPI as a repository but the author wants to use
PyPI solely as an index
". Meanwhile, PyPI's file-hosting needs grew
to over a
terabyte. Volunteer developers and system administrators battled
outages,
malicious packages, and spam attacks, while the age of the code base and its
structure made it hard to maintain — and Sisyphean to develop new
features. Generous infrastructure donations helped; for instance, Fastly's
donation of Content Delivery Network (CDN) service in 2013 improved
performance substantially.
A slow takeover of functionality
Stufft worked on packaging and distribution projects for several years. He did this mostly as a volunteer, though he now works for Amazon Web Services and spends two paid days per week on PyPI, pip, and related tools. He started a replacement PyPI effort, Crate, in 2011. A few years later, he changed tack and began work on what turned into Warehouse, which proved to be a solid foundation for PyPI 2.0. Warehouse is a web application using the Pyramid web framework, with 100% test coverage of its Python code, and a Docker-based development environment to make it easier to hack on locally.
Volunteer contributors, such as developer Dustin Ingram, joined the project. Designer and front-end developer Nicole Harris volunteered, assessing legacy PyPI, articulating design goals, and starting an overhaul of the user interface. Ernest W. Durbin III worked steadily in development and operations as a volunteer, improving the infrastructure behind Warehouse's pre-production installations, first at preview-pypi.python.org, then warehouse.python.org, then pypi.io, and, since late 2013, pypi.org.
Given its years of live testing, calling pypi.org a "beta" belies its longevity and durability. (Stufft's original migration plan predicted Warehouse would gradually come to "own" various database tables "as time goes on" but didn't predict it would take quite this long.) Warehouse always had read access to the canonical PyPI database; this was easier than creating a mirror database, and enforced discipline for Warehouse developers. Legacy PyPI allowed packagers to upload releases via command-line tools like setuptools or through an in-browser interface. However, its uploading routines increasingly failed to fully record new releases (causing HTTP 500 internal server errors), which led to a ~10% error rate by June 2016. At that point, Stufft advised Python packagers that it was a better experience to upload releases to the canonical PyPI database via Warehouse, using the command line tool Twine, than via pypi.python.org. Starting in July 2017, PyPI went so far as to disable uploading via the old site.
Throughout this period, Warehouse was labeled "pre-production" to acknowledge its missing features, layout changes, and occasional outages. Uploading (an API interaction) worked well, but the browser user interface still lacked significant features. Most notably, important features, such as email management, and significant project owner/maintainer administrative functionality, such as release deletion, were only available using the legacy site.
Fresh code and momentum
In early 2016, maintainers of Python packaging and distribution tools were eager to see Warehouse development speed up so that it could replace legacy PyPI. I started speaking with the Python Software Foundation (PSF) Packaging Working Group to discuss applying for Mozilla Open Source Support (MOSS) funding; an award proposal was submitted in 2017 that Mozilla accepted. MOSS-funded work started in December 2017; I serve as Warehouse's project manager. Harris, Durbin, Ingram, Laura Hampton, and I have improved PyPI's code base and infrastructure toward to the goal of redirecting traffic to the new site and shutting down the old one.
The group has also nurtured new contributors. Jones and Stufft found that legacy PyPI could not attract a group of volunteer contributors to reduce the workload on the core maintainers, mainly because newcomers found it nearly impossible to understand, or even locally deploy, the code base. Warehouse's frameworks, docs, developer environment setup, and configuration are superior, making onboarding new developers and deploying their work far easier. Just between February 20 and March 20 this year, Warehouse merged 127 pull requests by 20 distinct authors; it continues to attract new contributors, some of whom are entirely new to open source.
Changes, new features, and deprecations
The most obvious improvement in Warehouse is the browser interface. The new site looks, as longtime Python users finding the site often notice, like a site from the current decade. The colors have changed, it's mobile-responsive, and the layout reflects what Harris has learned from user testing. The new front-end is more accessible to people with visual and motor-control disabilities (with more work to come). In the legacy code base, it was difficult to change the interface because content and presentation were mixed together. In contrast, Warehouse uses model/view/controller conventions, and uses front-end frameworks and tools: Jinja2 for templates, Sass with SCSS to handle CSS, Stimulus for JavaScript, and gulp to process and prepare front-end files for serving.
Beyond just the new interface is new functionality. Warehouse provides a chronological release history for each project (example), an easy-to-read project activity journal for project maintainers (see screen shot below), user-visible Gravatars and email addresses for maintainers on project pages, and support for multiple project URLs (e.g., for a homepage and a repo) on a project's PyPI page. Previously, to put a project description on PyPI, maintainers had to submit documents formatted in reStructuredText. Warehouse supports Markdown README files, thanks to improved metadata handling that required improvements to many parts of the Python packaging toolchain.
The original PyPI drew upon SourceForge and Freshmeat.net software categories to create a list of standard "Trove classifiers". Packagers label their releases with these classifiers to describe their target platforms, Python versions, intended audience, and frameworks, and to suggest the project's maturity status. Warehouse uses ElasticSearch for faceted search. This lets users perform intersection searches and filter the project list by multiple classifiers, making packagers' classifier choices more useful (see screen shot below). Project maintainers also no longer need to register a project with a separate command before initially uploading it to PyPI.
Overall, Warehouse has newer back-end infrastructure than legacy PyPI did, supporting new features and a more scalable site. Instead of assuming that it lives on a single server, Warehouse assumes that its PostgreSQL database, file storage, search, queueing (Redis), and other parts may live in different containers or on different machines. Durbin added configuration management and instrumented Warehouse to gather statistics to view using Datadog. In the course of his infrastructure work, he built cabotage, a new deployment infrastructure tool that securely manages secrets with end-to-end TLS and lets PyPI maintainers automate managing software and configuration changes.
In the interests of more sustainable long-term policies and to fight spam, PyPI has removed or deprecated several features already. For instance, one of the steps taken to handle a spam attack earlier this year is to require that users verify an email address in order to upload releases. And uploading new releases via the web interface instead of the API is no longer allowed, which simplifies PyPI's job.
In general, very little about a PyPI project can now be altered via the browser. Project maintainers used to be able to update release descriptions in the browser, but to update release metadata, maintainers now need to upload a new release to respect release metadata immutability. PyPI no longer allows HTTP access to APIs; it's now HTTPS-only. Also, in advance of PyPI's CDN (Fastly) turning off support for TLS 1.0 and 1.1 on June 30, Warehouse supports only TLS versions 1.2 and above.
Download counts are no longer visible in PyPI's API; instead, PyPI advises curious statisticians to use the data set it uploads to Google BigQuery. As the open-source service Libraries.io improves its PyPI dependency analysis and metrics coverage, PyPI is increasingly directing users there, instead of providing such services itself. Similarly, getting out of the documentation-hosting game and deferring to Read the Docs, PyPI no longer allows package maintainers to upload docs to pythonhosted.com. In addition, as legacy PyPI shuts down, users will also lose the ability to log in with OpenID and Google auth.
Warehouse's signature handling demonstrates a
shift in Python's thinking regarding key management and package
signatures.
Ideally, package users, software distributors, and package distribution
tools would regularly use signatures to verify Python package integrity.
For the most part, however, they don't, and there are major
infrastructural barriers to them effectively doing so.
Therefore, GPG/PGP signatures for packages are no longer visible in
PyPI's web interface. Project maintainers can still attach signatures to their
release uploads, and those signatures still appear in the Simple
Project API as described in PEP
503. Stufft has made no secret of his
opinion that "package signing is not the Holy Grail
"; current
discussion among packaging-tools developers leans toward
removing signing features from another part of the Python packaging ecology
(the wheel library) and working toward implementing The Update Framework
instead. Relatedly, Warehouse, unlike legacy PyPI, does not provide an interface
for users to manage GPG or SSH public keys.
Thanks to redirects, most sites, services, and tools will probably be able to seamlessly switch to the new site when it launches on April 16. Migration guides for Python users, project maintainers, and API users are available. Currently the main snags seem to be the TLS 1.0/1.1 deprecation affecting users with old versions of OpenSSL (including users on some versions of Mac OS X) and the redirects affecting companies whose private internal package indexes include packaging clients that cannot follow an HTTP 302 redirect from pypi.python.org to pypi.org.
Future features
Shutting down legacy PyPI frees Warehouse to make major database schema changes that would have broken features in legacy PyPI and frees maintainers to concentrate on new improvements. As the MOSS award runs out, PSF's Packaging Working Group is seeking further funding to continue Warehouse work, particularly to audit and improve accessibility and application security. Volunteer Luke Sneeringer and others are discussing better authentication for release uploaders, including a bearer token authentication scheme involving Macaroons, and two-factor authentication. While Stufft is deferring to Ingram, Harris, and Durbin for day-to-day Warehouse leadership, he aims to eventually deprecate its XML-RPC API and architect new APIs, probably along RESTful lines. Warehouse developers will discuss and work on some of these tasks during sprints at PyCon in Cleveland, Ohio this May and at EuroPython in Edinburgh, Scotland in July.
Beyond security, accessibility, and APIs, Warehouse contributors are interested in performing further systematic user testing and adding user-friendly features like group/organization support for related projects and, potentially, language localization. Warehouse will also need to make it easier to change project ownership: with the acceptance of PEP 541, a long-awaited policy on "Package Index Name Retention," PyPI administrators have a policy framework to address requests to take over maintainership and ownership of abandoned project names. PyPI administrators are finalizing the implementation details now, which will enable administrators to start resolving hundreds of backlogged requests. Rather than treat user support requests like bug reports, Warehouse developers plan to create or integrate with a proper user support ticket system.
The pace of further improvements will depend on whether Python packaging and distribution tools receive further financial support and on how volunteers' enthusiasm and investment grows or shifts once the deadline urgency has passed. There is plenty to do even after the switch. The ongoing story of Python packaging will continue to evolve, and Warehouse — or something that eventually replaces it — will have to adapt.
[I would like to thank Ernest W. Durbin III, Nicole Harris, Dustin
Ingram, and Donald Stufft for reviewing this article.]
Index entries for this article | |
---|---|
GuestArticles | Harihareswara, Sumana |
Python | Packaging |
Posted Apr 12, 2018 13:07 UTC (Thu)
by anarcat (subscriber, #66354)
[Link] (14 responses)
As someone who worked with GnuPG quite a bit, I understand how painful it is to interoperate, but it feels like we should keep at least some out of band authentication during the migration to TUF... I'm worried that the current rearchitecture just completely drops the PGP signatures altogether (not that they are really enforced significantly anyways of course... ;)
Posted Apr 12, 2018 15:56 UTC (Thu)
by ber (subscriber, #2142)
[Link]
Posted Apr 12, 2018 16:54 UTC (Thu)
by sumanah (guest, #59891)
[Link] (12 responses)
I may not have been as clear as I wanted to be. You said "I'm worried that the current rearchitecture just completely drops the PGP signatures altogether" but, in my view, Warehouse does not do that.
Posted Apr 12, 2018 17:25 UTC (Thu)
by mmerickel (guest, #117211)
[Link] (4 responses)
Posted Apr 12, 2018 18:06 UTC (Thu)
by jwilk (subscriber, #63328)
[Link]
As far I could see, the only thing related to OpenPGP key management was a field where you could put 32-bit(!) key ID. But this field wasn't used for anything.
Posted Apr 12, 2018 18:34 UTC (Thu)
by dstufft (guest, #93456)
[Link] (2 responses)
There were two distinct features that dealt with PGP signatures on PyPI:
1. The ability to associate a PGP key identity with your user account.
These two features were, as far as the code itself was concerned, entirely unrelated. You could (and still can) submit a detached signature using any key that you possess regardless of what PyPI thinks your public key is via the mechanism in (1). The only purpose to the mechanism in (1) is to allow people to discover what your public key is, theoretically to them allow them to validate the signatures if you used that public key.
So what's changed in the new PyPI?
Well (1) is gone, because it is almost entirely pointless. The only way to add or get the information stored in that field is using a HTTP request protected by TLS. If you rely on trusting that field, then the security of the system you've created devolves into trusting TLS, and if you're trusting TLS then you might as well trust TLS for everything and not just one part of it.
(2) still exists and works, you can still sign files with whatever key you possess and upload a detached signature as part of the file upload. The only thing that has changed in regards to (2) is that we will no longer indicate in the UI if a file has been PGP signed or not, nor provide a link in the UI to download it. The file is still available for download (using the URL specified in PEP 503) and it is still possible to upload signatures.
Posted Apr 12, 2018 18:52 UTC (Thu)
by anarcat (subscriber, #66354)
[Link] (1 responses)
So I don't think (1) was completely pointless: more channels to verify OpenPGP keys is always a good thing... I understand you're taking the direction of TUF to fix that problem, but I figured that *removing* the feature wasn't exactly necessary in order to get there...
Of course, maybe it's the reverse and the feature was never implemented in Warehouse in which case I understand better the decision: no need implementing a feature in a "wrong" way that will be implemented in another "better" way later. :)
Thanks for the clarification!
Posted Apr 12, 2018 19:04 UTC (Thu)
by dstufft (guest, #93456)
[Link]
Yea, that feature was never implemented in Warehouse (other than modeling the database tables needed to support it so our auto migration scripts wouldn't try to remove them).
Posted Apr 12, 2018 17:43 UTC (Thu)
by anarcat (subscriber, #66354)
[Link] (2 responses)
For example, others have pointed out that the system is already sort of broken if you can't update the keys: for example, my certification key expires once a year and i need to update it in a few places like this to keep it working...
Right now, I feel the only integrity/authentication system that is effectively in place is really HTTPS and the X509 cartel, and the trust that we have in the integrity of the host(s) running PyPI. If any single component in there gets compromised, it's basically game over///
Posted Apr 12, 2018 18:44 UTC (Thu)
by dstufft (guest, #93456)
[Link] (1 responses)
The system still works almost entirely the same as it always has, the only differences are we're not exposing the signature in the UI anymore, and you no longer have a little text field to publish what your public key identity is for a specific user.
We've removed the PGP signatures from the UI in an attempt to de-emphasize them. They are largely pointless in their current implementation because they lack a coherent trust model that applies to the packaging domain. You could build a secure package signing protocol ontop of PGP, but you'd do so by effectively throwing out the WOT portions of PGP. Personally I'd rather remove them entirely, because I think they are 99% security theater in their current implementation, but folks argued against doing that until the replacement was in place, and the code was already written to support them, so I conceded.
As far as the removal of the little text field to publish your public key, that is gone in the new PyPI because it was 100% pointless. I don't even think it was even exposed to end users anywhere, but if it was, you couldn't actually use it for anything. If you're trying to design a secure crpytosystem ontop fo the features that PyPI has, the only point in that field would be to make a HTTPS request to PyPI to ask what the author's public key is so you could verify that the signature was made by an authorized key. However at that point you're trusting HTTPS to tell you who to trust to sign a package, and if you assume HTTPS is not trustworthy, then a malicous attacker could just tell you to use their own key rather than the author's key. So at that point any system which used that, was effectively as secure as relying only on HTTPS.
Posted Apr 12, 2018 21:43 UTC (Thu)
by anarcat (subscriber, #66354)
[Link]
1. software gets packaged in Debian
In step 4, it is useful to have the key available from PyPI instead of fishing it outside. Inciting maintainers to publish their keys on PyPI also helps in making that model work.
But again, I understand where you're coming from and I am very thankful and happy for the new PyPI. It seems you have done an awesome job with a huge project, and I didn't mean to nitpick on this pet peeve of mine. ;) So: congrats, and I'm curious to see what TUF for in PyPI in the future!
Posted Apr 12, 2018 17:59 UTC (Thu)
by mikemol (guest, #83507)
[Link] (2 responses)
Don't get me wrong; the hashes are useful and carry value, but they're not _signatures_, they're just tamper checks.
Posted Apr 12, 2018 21:36 UTC (Thu)
by sumanah (guest, #59891)
[Link]
Posted Apr 13, 2018 22:33 UTC (Fri)
by sumanah (guest, #59891)
[Link]
Posted Mar 10, 2024 22:23 UTC (Sun)
by LtWorf (subscriber, #124958)
[Link]
Posted Apr 13, 2018 14:30 UTC (Fri)
by sumanah (guest, #59891)
[Link] (2 responses)
1. What ought to be explained or listed on the PyPI help (FAQ) page that isn't there right now? I used to be able to see that with fresh eyes, but now know too much about Python packaging, PyPI's history and context, and so on.
2. Are there other trends and policies in Python packaging and distribution that you'd like explained, either here in LWN or in the official Python packaging guide or elsewhere? Some past LWN coverage:
For instance, pip 10 is probably coming out tomorrow and carries an upgrade strategy change.
Posted Apr 16, 2018 3:06 UTC (Mon)
by murukesh (subscriber, #97031)
[Link] (1 responses)
Posted Apr 16, 2018 21:27 UTC (Mon)
by sumanah (guest, #59891)
[Link]
Posted Apr 13, 2018 20:09 UTC (Fri)
by rdfm (subscriber, #50178)
[Link] (3 responses)
At work we noticed the TLS brownouts but since there was nothing prominent thought there was some temporary problem. We didn’t find the post on PSF blog and we didn’t look at the pypi status page. When the TLS blackout started it took us a week to update all our old python installs for legacy systems and we still need to figure out client advice.
I would suggest that such future infrastructure changes be prominently advertised on the python.org homepage (in advance)
Posted Apr 13, 2018 22:29 UTC (Fri)
by sumanah (guest, #59891)
[Link] (2 responses)
For reference (for other folks): PyPI has just turned off support for TLS versions 1.0 and 1.1 (announcement on the general Python announcement email list: https://mail.python.org/pipermail/python-announce-list/20... ). Also, on June 30, 2018, all Python.org sites are going to entirely stop supporting TLS versions 1.0 and 1.1, because PyPI's CDN provider, Fastly, is deprecating support for those versions (blog post: https://pyfound.blogspot.com/2017/01/time-to-upgrade-your... ).
We're seeing that some users of older versions of OpenSSL are affected. Users of OS X versions 10.12 and below who use Python are particularly affected by this deprecation, as the Apple-supplied system Python (version 2.7) links to an older version of OpenSSL, so "pip install" now fails for them. A detailed explanation of that is in https://github.com/pypa/warehouse/issues/3293#issuecommen... . Upgrading pip to 9.0.3 will generally fix the issue. To upgrade affected clients, run:
curl https://bootstrap.pypa.io/get-pip.py | python
Whenever anyone has trouble `pip install`ing anything, I hope they turn up the verbosity with `-vv` to check the error message and check the PyPI/python.org status page http://status.python.org/ . And we've just started up a pretty low-traffic PyPI announcement email list https://mail.python.org/mm3/mailman3/lists/pypi-announce.... that would probably be good for folks to subscribe to if they are at companies that depend on PyPI.
Posted Apr 14, 2018 6:25 UTC (Sat)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Would you please do not spread such horrible antipatterns?
Posted Apr 14, 2018 15:14 UTC (Sat)
by sumanah (guest, #59891)
[Link]
For audiences and contexts like this one, perhaps this suggestion is better:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
# Inspect get-pip.py for any malevolence. Then run the following:
python get-pip.py
Posted Apr 19, 2018 6:07 UTC (Thu)
by Pythonista (guest, #123787)
[Link] (2 responses)
Posted Apr 19, 2018 19:19 UTC (Thu)
by sumanah (guest, #59891)
[Link] (1 responses)
In the short term, you might want to use this libraries.io page, or subscribe to the Newest Packages RSS feed or the Latest Updates RSS feed.
For a longer-term solution, we're working on a few related issues: giving the user the option to see more search results on a (bookmarkable, I'd predict) search query, adding an "search by new release" sort order, and an RSS feed for new file uploads to existing packages.
Hope this helps.
Posted Apr 19, 2018 20:48 UTC (Thu)
by Pythonista (guest, #123787)
[Link]
Removal of any signatures system
Removal of any signatures system
It is already in the stable GnuPG implementation and Gpg4win, so if you send an email from Outlook (and the GpgOL extension from Gpg4win installed) to me, you will my pubkey directly from my company via TLS and it will get encrypted.
Hi! I'm the author of the article.
Removal of any signatures system
Folks other than me can discuss the merits of Python package signing, what ought to be done in the future, and so on. But I hope this clarifies the current situation.
Removal of any signatures system
Removal of any signatures system
Removal of any signatures system
2. The ability to upload a detached signature alongside the file during upload, and having that available for download, and displayed in the UI.
Removal of any signatures system
Removal of any signatures system
Removal of any signatures system
Removal of any signatures system
Removal of any signatures system
2. linting tools warn that PGP signatures could be checked
3. maintainer checks if upstream tarballs have a signature
4. if they do, the public key responsible for the signature is added to the Debian package
5. future updates to the package will verify the tarball with the signature, using a TOFU model
Removal of any signatures system
mikemol, thanks for the heads-up. That's not clear enough and we need to fix the docs. The signatures are available, but the API response does not specifically mention the signature filenames and the user has to concatenate a .asc suffix onto a filename to check whether that signature exists. So, we host the signatures but don't make it easy to retrieve them or check whether a particular release is associated with a signature or not. This is, I will admit, very inconvenient for people who want to check for signatures.
Removal of any signatures system
Fixed the docs. Thanks again.
Removal of any signatures system
Removal of any signatures system
Folks reading this might be able to help me with two questions:
Help page & future coverage
Help page & future coverage
Thanks. Proposed a fix.
Help page & future coverage
My only gripe...
My only gripe...
My only gripe...
I genuinely welcome a better suggestion for a one-liner command-line invocation (for use in things like tweets and announcement emails) that gets the user the latest pip (see the opening comment here on why and how the whole of pip is contained in that file), verifies the SSL certificate, and works on all supported versions of Mac OS X and approximately all Linux distros (including headless systems).
advising users on how to upgrade pip
What happened to the "list of new stuff"
Thanks for the question.
What happened to the "list of new stuff"
What happened to the "list of new stuff"