-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pip download prefers newer package version even when local package exists #5500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for Git 8000 Hub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That behaviour is by design. Pip will always prefer the latest available version, it takes no account of where a package comes from. |
If a package already exists in a directory specified with --find-links, PackageFinder still prefers a newer version of the package found at the remote package index. Fix by preferring local package file when found.
@pfmoore |
So? That's the point of pip download. I don't know if I'm missing something here but I can't see what the problem is. What exactly do you use the files downloaded via |
The point is that consistency is useful. Things that behave differently all the time is less useful than things that do the same thing every time. Had pip supported handling multiple requirement files and dealt properly with the dependencies, this wouldn't be a problem though. With two requirement files, as explained earlier, you never actually know exactly what package versions will be downloaded.
Second reason is speed. By looking locally and finding a package that satisfies the dependencies, there is no need to check remotely. Therefore, a call to |
I'm not sure I follow. Pip's current behaviour is perfectly consistent - I described it above:
In fact, if we preferred local files, we'd be harming consistency, because you'd get something different installed depending on what was present locally. I don't see anything actionable here. Pip's current behaviour is by design, if you want to propose a change, you'll need to provide details of what you propose, and you'll probably need more persuasive arguments than you've currently offered. |
I agree that the current default behaviour shouldn't be changed, but an option to be able to prefer local packages over checking remotely would still be useful. What I propose is to have an option that makes pip check locally if a package that satisfies the given dependency already exists locally, and if so, do not check remotely. |
OK, so what you're suggesting is an option to I can see the logic in that. If you wanted to create a PR implementing it, I'm not going to object. I can't say that I find your justification for the behaviour compelling, but that's something that can be debated later, when there's a PR to review. |
This would also be very useful for HPC clusters on which the staff may build python wheels that are optimized for their CPU architecture. The current behavior requires HPC staff to always be recompiling new versions as soon as they are out, or risk users using dramatically slower python packages in some situations. Being able to tell pip to favor a local wheelhouse over some minor version increase found online would be very useful to us. |
@bendikro Any news/updates on this? This would be very useful for us. |
An example of this causing an issue in practice: Let's say I'm using python 2.7. Matplotlib 3 supports only python 3.5+. If I install a package that has |
Add option --prefer-local-compatible to download and install commands With this option enabled, local directories specified with --find-links are searched first. If package requirements are satisfied by packages found locally, the local package will be preferred, and no remote URLs will be checked.
I've created a PR suggesting a new option |
I remain unconvinced that this is a good idea, but as I said above, if someone else feels it's worth taking this forward, I won't object. For the record, though, my objection to this isn't so much that it's difficult to implement or explain the basics of the proposed behaviour, it's more about the maintenance burden:
|
Wait... this is for |
Absolutely not. See all of my previous comments about why this should not be added to However, I've just noticed that the PR adds the option to |
I disagree. As manager of an HPC cluster and a very comprehensive wheel house, we strongly want that for install too. Wheels downloaded from online repositories break or under-perform way too often. |
"Always prefer the latest available version" is the complete opposite of consistency. It means that any two successive installation will yield different results, even when performed on the exact same host. |
For the sake of argument, lets define consistency. My definition of something consistent : Something is consistent when it yields the same result when executed
Getting both 1) and 2) is very hard. It requires basically having the whole software stack/operating system managed by the same system. This is never going to be achieved by Getting 2) cannot possibly be achieved without having 1) unless you are executing things at the exact same time or unless you pin down the version of every package you install. What is left is 1). Current pip behaviour does not give 1) at all. If install packages today, I will get widely different versions than what I installed 6 months ago. Item 1) Can however be achieved assuming there is a local set of packages that are fixed/supported. This is the case on our HPC clusters. We also achieve 2) as long as users remain on our infrastructures (multiple clusters). However, both 1) and 2) are jeopardized by the current pip behaviour and the lack of ability to tell pip that packages available in our wheel house are preferred to those more recent version that can be downloaded. |
Sigh. I guess we're simply going to have to disagree. Pip has mechanisms (version pinning, hash checking) to satisfy your requirement (1). Just because you choose not to use them, or because they don't work easily in your particular situation, doesn't mean pip can't do that. Nor does it mean that pip needs another method of doing the same thing. I remain -1 on this whole proposal, and you've pretty much convinced me that accepting the option for |
It's not that I choose not to use them, it's because nobody (i.e. package developers) ever does. |
I guess that an option |
i would like to note that at work we always use version-pinning and/or constraint files, its simply insane not to have that in place in production environments where consistency is a must also i wonder if the "strategy" option for |
Yeah, advanced users of HPC will use version-pinning. But this is not just any lambda user. When managing a HPC cluster, you are dealing with thousands of users that know very little about good practices. Any small step you can take to reduce the amount of rope with which they can hang themselves is tickets and problems that are avoided. |
@pfmoore I must admit I did not understand you from the previous discussion to be so strongly against having this option for Due to the additional interest given to this ticket, I wanted to put together a PR with an implementation prototype with the new option. Including the option for |
OK, fair enough. I still don't see sufficient benefit in this change to justify the cost, though. Just as a question, why don't you use something like a local devpi instance that serves your "local" files, but if there are no local files for a package falls back to PyPI? I'm pretty sure devpi can do things like this (and if it's not the default behaviour there is a plugin system that lets you customise the behaviour). Or just simply write a small webapp that serves an index that behaves as you want it to? |
I was unaware of devpi, but running a web server is not an option. We can't run a web server on a HPC cluster, and compute nodes on which jobs run and pip may be called don't necessarily have access to the web. We want the packages we serve to be available without needing web access. It currently works nicely by just having a directory containing the wheels which is accessible on our filesystems, and configuring The only caveat, as is being discussed, is that it will not limit itself to whatever it found first in the directory pointed to by |
The initial reason I wanted to change the behavior is to make There are two cases where not needed can be applied.
Point 1 conflicts with the current pip behavior, where Currently, even with pinned versions for all packages, where all packages exists locally, |
I'll try to explain our use case. We have a multitude of projects that rely on different virtual environments for different tasks, e.g. system tests, unit tests, running various python scripts, etc. We used to have multiple requirement files containing only the strictly necessary requirement specifications for each virtual environment. However, due to the issue mentioned above, we ended up generating one requirement file with pinned package versions for each virtual environment instead. Whenever the requirement file changes, or the virtual environment is removed (make clean), the required package versions are first downloaded to a local cache directory with There is not a set of specific package versions we use for all the projects, but each project, and each virtual environment has a set of requirements with pinned versions. Therefore, running a devpi instance is not very convenient. |
@bendikro, it does not need to define a new "local" concept and check for |
This thread is getting very confused. I suspect we're hitting a case of the XY Problem. The original statement of the problem here was that "pip download does not prefer package found locally even if it satisfies the requirements when there is a newer available at the remote package index". That's not a problem, because that's not how It's possible that in attempting to solve an issue in their local environments, @bendikro and/or @mboisson have identified that if pip preferred "local" files over "remote" ones, then they could use that to solve their problem. That's fine, but as noted, it's not how pip works. Rather than proposing that pip gets changed to work the way you wish it would in order to implement the solution you'd thought of, can I suggest that we go back to the underlying problems? If you raise one or more new issues describing what your underlying problem is, maybe we can either find a solution using pip as it currently works, or we can identify a change to pip that doesn't have the difficulties that "prefer local files" does but still helps address the problem. (Disclaimer: My personal feeling is that there's likely an acceptable solution using pip as it stands, maybe with some local environment config changes, or with a process change in how you're working. What I've understood of the underlying problems so far doesn't seem like it's something that needs a pip change. But I may be wrong.) |
@pfmoore How would you solve, with current pip, that users need to install the latest version in a specific wheelhouse even if there's a more recent version on PyPI? Ideally, the user only need to For example, in the local wheelhouse, there's matplotlib v3.0.1 but on PyPI there's v3.0.2. The v3.0.1 is the preferred candidate to be installed. As mentionned by @mboisson, |
@ccoulombe Pin the version. "Ideally, the user only need to pip install |
@pfmoore ok, let me roll back to our problems. Problem 1)
Problem 2)
Problem 3)
Problem 4)
Please suggest a solution that solves all 4 problems that does not equate to "prefer locally built packages". |
Can we somehow tell pip to not ever download binary (i.e. compiled) packages from online repositories ? Pure python packages are usually alright. |
@mboisson Thanks for the clarification. It'll take me a while to digest that but I appreciate the explanation.
Yes - |
I also realize that my question was not precise enough. Can we somehow tell pip not to ever download binary packages, nor their source equivalent (i.e. only ever download pure python packages) ? Not downloading the binary version of |
@mboisson If I follow that set of requirements, the only control you have over what options pip sees when run by your users is the global configuration file? Also, you stated earlier that running a local index wasn't an option, but I don't see anything in your problem statement that precludes it. And my immediate thought when seeing your requirements is that PyPI is not a good fit for your requirements, and running a local index (that passes through to PyPI when appropriate) is exactly the solution that other environments I've hear of with similar constraints tend to use... |
I think we're confusing each other here. What do you mean by "download"? From PyPI? If that, then only by using |
Correct, the only control we have over what options pip sees is the global configuration file. Running an index which requires running a server is not an option. Having some sort of script that |
I mean that unless it's a pure python package act as if you were using |
Hmm, I'd like to say "why not?" but I'll accept that as a fact for now. In which case, you could (note, this is untested!) write a script that grabs https://pypi.org/simple and modifies it so that links to packages you have locally point to your local copies, and put that somewhere you can reference via a file: URL (which you can then treat as a repository index via You'll need to refresh your local index regularly, but that's a cost of not being able to support a web server (which could do the refresh on the fly). In spite of agreeing to accept "not able to use a web server" as a constraint, I'd also like to point out that you could put an index on an external site like heroku - after all, your users can access the internet, so it's not like they couldn't access that as an index...
How do you detect that it's pure Python? That's not possible without building the package (as some packages have optional C extensions). |
That's an interesting idea. We'll think about it.
In some cases, our users do have Internet access, but in others they don't (i.e. when they are running on compute nodes on the cluster). So running a web server would be an option for some cases, but it would require to keep two distinct solutions, with the risk that they would eventually diverge in the list of packages they provide.
I'ld say that anything for which the wheel is |
Well, if they don't, they can't access PyPI so problem solved 😄
But you said you didn't want to try to compile sdists that aren't "pure Python" either? There's no way of telling whether a sdist is "pure Python". |
Well, yes, if they can still access our local index (and don't need packages that aren't in there), which they can't if it's a web server.
"pure python" packages will usually end-up as a |
... and again we hit something that I don't follow. You say they don't have "internet access". What precisely do you mean by that? No access to any sort of IP connection other than the local machine? Or no access outside of the local network? How do they currently access your local index? As a shared filesystem? There's no reason that wouldn't still be possible (all of my suggestions have only been about hosting an index on a web server - the actual distribution files themselves would remain on the local filesystem). You could have the global config set up as
That's as far as I can reasonably go designing this for you - you'll need to do some work yourself to fill in the blanks, but hopefully it's enough to give you the idea.
End up as, yes. But at the point we're trying to make a decision, they are just |
Yes
Yes, shared filesystem.
Yes, a local index file sounds like it could work. |
Mmm, I believe that our issue seems to have been fixed and merged (a.k.a. the |
Also, a colleague mentioned the option to have a constraint file, which can be set globally in a PIP_CONFIG_FILE, and in which one could exclude any version of a package which is more recent than the locally available version. |
Environment
Description
pip download
does not prefer package found locally even if it satisfies the requirements when there is a newer available at the remote package indexExpected behavior
Prefer the already existing package as long as long as it satisfies the dependency requirements
How to Reproduce
pkg_cache
pip3 download --dest pkg_cache/ --find-links pkg_cache/ setuptools==39.0.1 && pip3 download --dest pkg_cache/ --find-links pkg_cache/ setuptools
Output
The text was updated successfully, but these errors were encountered: