-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml.etree.ElementTree.Element inconsistent warning for bool #83122
Comments
Steps to reproduce: $ python3.7
Python 3.7.2 (default, May 13 2019, 13:52:56)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))
False
$ python3.7
Python 3.7.2 (default, May 13 2019, 13:52:56)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['_elementtree'] = None
>>> import xml.etree.ElementTree as ET
>>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))
__main__:1: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. This is (almost) the smallest test case, but in real code what I was trying to write was:
The unintuitive bool() behaviour was surprising, compared to other typical True / False determinations on objects, and I think what the FutureWarning is trying to avoid. Please implement the same warning for bool(Element) in _elementtree.c as exists in ElementTree.py. bpo-29204 was making similar alignments between the versions, but didn't consider this FutureWarning. |
Agreed that both should behave the same. And, we should finally decide whether this should really be changed or not. The current behaviour is counter-intuitive, but it's been like that forever and lots of code depends on it in one way or another, so … |
It's easier to justify a change in behaviour if the warning is emitted. With no legacy concerns, I would be happy for bool() to change, but I'm not the one who would receive the grumbly tickets. How about emitting the warning in the next release, then assessing the behaviour change for version n+2? |
You make a good case to just leave it be. +1 from me to leave as-is. |
Not so quick. :) You're probably aware of the details, but still, let me state clearly what this is about, to make sure that we're all on the same page here. The Element class in ElementTree represents an XML tag, but it also mimics a sequence container of child elements. Its behaviour is quite list-like, in that it supports indexing and slicing of its children "list", and also has an "append()" method to add one at the end. All of this is a nice and simple API, so what's the catch? The gotcha is the behaviour of "__bool__". Following the sequence protocol, the truth value of an Element depends on its children. No children – false. Has children, true. While this makes sense from the sequence protocol side, I don't think it is what users expect. In Python, it is very common to say "if something: …" in order to test if "something" is "a thing". When doing this with Elements, users usually don't think of the element's children but the Element itself, often because they got it back from from kind of tree search, as in the OP's example using ".find()". I would argue that this is brittle, because it works when testing with Elements that happen to have children, but it fails when the Element that was found has no children. The intention of the "FutureWarning" was to allow changing the behaviour at some point to always return true as the truth value, i.e. to remove the "__bool__" implementation all together. This
The main question now is: what is more common? Code that only accidentally works because it didn't get in touch with child-free (leaf-)elements yet? Or code that would start failing with this change because it really wants to find out if an Element has children by truth-testing it. I'm sure that both exist, because I have seen both, and my very limited *feeling* from what I've seen is that there's more brittle code out there than intentional code that truth-tests Elements, and that future users are also more likely to get it wrong until they get bitten by it and learn the actual meaning of the truth-test. But, we don't know, and this isn't something to easily find out via a code search. I'm personally more leaning towards changing the behaviour to help new users, but lacking enough evidence, not changing something long-standing is always the safer choice. I'm aware that I will have to take the "final" decision in the end, but would be happy to hear some more intuitions, ideas or even data points first. |
Stefan's arguments looked reasonable to me. I supported deprecating Element.__bool__. This was an old plan, although we failed with deprecation in the C implementation (this is not the first oversight in deprecations in ElementTree). But there is a problem: removing __bool__ will not change the value of bool(element). It will still return False for an Element without children, because bool() falls back to __len__ if __bool__ is not defined, and Element.__len__ is defined. So we will need either to define Element.__bool__ always returning True, and this will be the first case when __bool__ and __len__ return inconsistent values, or to deprecate __len__ (and perhaps __getitem__) that goes too far from the initial plan. So now I am not sure. Making Element.__bool__ always returning True or raising an exception could help to fix some bugs in user code. But this differs from common behavior of sequence-like types in Python. On other hand, we have a precedence: bool(NotImplemented) raises an exception now (this also can save from common errors). When I started to write this message I was -0 for changing Element.__bool__. Now I am +0, after considering all arguments that came in my mind, but still have doubts. Can we ask some of the language ideologues? |
I'm +1 on adding the deprecation and changing it to an error in the future. I think it's an anti-pattern that objects with complex behavior that includes some sequence behavior allow the fallback behavior for checking the empty sequence. I recall being bitten by this nearly 20 years ago (at Zope) when some database class used this pattern -- somehow the app ended up thinking there was no database at all when in fact there was one but it was empty, and it reinitialized the database with bad results. The fallback from truth testing to __len__() is built into the language and cannot be changed, but we *can* override __bool__() to raise an exception, and I think that's what we should do. (Clearly that's also what we were planning to do, hence the warning in the Python version.) |
I went digging through the archives, made more interesting as elementtree was imported into the standard library. AFAICT, the FutureWarning for __bool__ (or __nonzero__ in py2) appeared circa 2007-06 in version 1.3a2:
The docs are still there, warning about the pitfalls and suggesting "This behaviour is likely to change somewhat in ElementTree 1.3." 12 years on, it has not, but what was the intended change (+effbot for possible context)?? I assumed on first reading that the plan was to switch bool(Element) to return True, but others have pointed out the inconsistency with len(), or having __bool__() raise an exception. My 2c would be to steer devs away from using bool(Element) as a final step, i.e. keep the existing True/False definition, warts and all, but raise a SyntaxWarning (RuntimeWarning?) with a similar message to the current FutureWarning. |
I doubt effbot will respond. :-) I think the first order of business is to add the same FutureWarning to the C version of the code, and to reset the clock on the deprecation. In terms of the final stages, why wouldn't we want it to just always raise? Just because there's so much code that might break? It shouldn't be a SyntaxWarning, since this is most definitely not a syntax issue. I'd be okay with RuntimeError (or RuntimeWarning if we decide to keep the old outcome with a warning forever). (Huh, does FutureWarning count as a deprecation?) |
I was close to merging the linked PR but there's some renewed concerns about the proposed behavior here: Do we really want to change this behavior and potentially force a lot of people to change their working code? In addition, the current code emits FutureWarning. I don't really understand what that warning is supposed to be used for (https://docs.python.org/3/library/exceptions.html#FutureWarning), but at least it means we plan to change the behavior later. So we should have a plan for how we'll do that. And whatever we do, the Python and C versions of the code should behave consistently. Here are two ways forward: (1) We change both the Python and the C versions of ElementTree to emit a DeprecationWarning on using __bool__ in 3.11. In 3.13, we change the behavior to make __bool__ always return True. (Or perhaps, make it raise an exception.) (2) We give up on changing this behavior and remove the existing FutureWarning from the Python code. But in that case we should document the gotcha in the boolean behavior of nodes. I don't have enough experience with ElementTree to recommend either option, but I'm happy to help implement it once we come to a decision. |
The I agree that the purpose of FutureWarning even existing is... unclear. A behavior changing in the future should be called a Deprecation. Before deciding on (1) vs (2) though, an experiment with a warning surfaced loudly to see what potential scale of cleanup required across a pile of projects or in a large codebase like we have at work would be useful data. |
The docs for FutureWarning state that it's for users of applications, whereas DeprecationWarning is for other Python developers. Not sure that that's a very helpful distinction in practice though. Note that the PR is stalled: #31149 |
Yeah I don't find FutureWarning super useful, but as documented it effectively means DeprecationWarning is what should've been used. The audience for it is developers, not application end users who should be assumed to know nothing about code and APIs. |
I find FutureWarning useful, although there are less use cases for it then for DeprecationWarning. The difference between DeprecationWarning and FutureWarning is that the former is used for changes which will cause raising an exception in future, while the latter is used when the future behavior change will be more silent -- just return different result or have different side effects. You can ignore DeprecationWarning for a while, because it is difficult to miss an exception in future versions. You should be more careful with FutureWarning, because a silent change can lead to subtle bugs in future releases. This is why FutureWarning is more visible by default. Emitting FutureWarning in
There are variants of 1 and 3 which are less destructive for current code which checks that "it is a thing". If elements in your code always have children, you have not need to change it. If they can be childless, your code have a bug in any case (it will be automatically fixed by 1a, and 3a will force you to fix it).
|
I've updated #31149 to raise DeprecationWarning. I don't think raising an exception is unexpected; the docs have warned since 2.7 that this is an ambiguous pattern. I also think changing the behavior based on whether the element has children is very surprising. In that case, you might never see the DeprecationWarning while testing, and then get bitten in production when encountering childless elements "in the wild". |
…1149) When testing element truth values, emit a DeprecationWarning in all implementations. This had emitted a FutureWarning in the rarely used python-only implementation since ~2.7 and has always been documented as a behavior not to rely on. Matching an element in a tree search but having it test False can be unexpected. Raising the warning enables making the choice to finally raise an exception for this ambiguous behavior in the future.
The bool() of ET.Element is discouraged from being used, see: python/cpython#83122
This is a bug, and a bad one. There should be documentation in every single function that can be bitten by this bug. Currently, there's a caution here: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.remove.
This caution should be in bold in every single place, especially https://docs.python.org/3/library/xml.etree.elementtree.html#finding-interesting-elements and https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.find. As if XML isn't bad as it is. With regards to fixing this bug, I would nuke the whole module specifically to get rid of this bug without breaking backwards compat. Call it |
ElementTree
#31149Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: