PyIter_Next has ambiguous return value #105201

iritkatriel · 2023-06-01T18:10:18Z

As discussed in capi-workgroup/problems#1, we have some C API functions that have ambiguous return values, requiring the caller to query PyErr_Occurred() to find out whether there was an error.

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error, without requiring the user to call PyErr_Occurred().

In this issue we will discuss the iterator API. PyIter_Next return NULL for both error and for the iterator being exhausted. PyErr_Occurred() distinguishes between the cases.

Linked PRs

The text was updated successfully, but these errors were encountered:

… an ambiguous return value

erlend-aasland · 2023-06-02T13:41:59Z

Before we rush to introduce new APIs, can we clarify the following:

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error.

I interpret this as:

for function retuning int, 0 is a successful call, -1 is an error
for function returning PyObject pointer, NULL is an error, anything else is ok

Error means a raised exception.

iritkatriel · 2023-06-02T13:52:40Z

I don't think it's a good API for an iterator if you need to check 2 things just to know whether iteration completed, and then after the loop you need to check again why it was completed.

erlend-aasland · 2023-06-02T13:56:21Z

Perhaps iterator APIs should get their own issue, since they require ambiguous return values? It seems to me they don't fit in the premises established by the OP.

iritkatriel · 2023-06-02T16:45:36Z

Sure, let make this issue about iterators.

iritkatriel · 2023-06-02T16:58:52Z

My proposal is to add a new PyObject *PyIter_NextItem(PyObject* iter, int *err) which returns the same as PyIter_Next, but also sets *err to 0 in success and -1 on error.

The reason to do this and not int PyIter_NextItem(PyObject* iter, PyObject **item) is because of how this function is used while looping over an iterator: for each iteration we want to know whether we got another value or not. Only after (when we got a NULL) we want to check the error value to see how the iteration exited.

So I am proposing:

PyObject *item;
int err;
while (item = PyIter_NextItem(iterator, &err)) {
    /* do something with item */
    ...
    /* release reference when done */
    Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
    /* error */
}
else {
    /* no error */
}

rather than

PyObject *item;
int err;
while ((err = PyIter_NextItem(iterator, &item) ) == 0 && *item != NULL) {
  /* do something with item */
  ...
  /* release reference when done */
  Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
  /* error */
}
else {
  /* no error */
}

iritkatriel · 2023-06-02T16:59:26Z

CC @encukou

erlend-aasland · 2023-06-02T19:15:23Z

I still fail to see how the problem "API A has ambiguous return value" is solved by introducing API B which also has an ambiguous return value.

iritkatriel · 2023-06-04T09:33:41Z

I’m giving up on incrementally fixing this in the current c api and will turn my attention to the new c api work. Those who warned me were right, unfortunately.

iritkatriel · 2023-06-04T09:34:57Z

(See discussion on the attached PR for the full picture).

gvanrossum · 2023-06-05T17:39:38Z

I’m giving up on incrementally fixing this in the current c api and will turn my attention to the new c api work. Those who warned me were right, unfortunately.

Isn't this overreacting? Ignoring Raymond's knee-jerk reaction, Erlend's feedback points to a real issue with how we've been talking about this, but is not a reason to give up on incremental fixes. The way I read Erlend's feedback is that the problem isn't that the result or return value is ambiguous, but that it requires calling PyErr_Occurred() to get the full picture. There are a few ways we can design the replacement for PyIter_Next() to avoid PyErr_Occurred(), and we simply have to look at the ergonomics to choose one. That seems a rational choice we can make.

iritkatriel · 2023-06-05T18:16:56Z

Isn't this overreacting? [...] s not a reason to give up on incremental fixes.

I figured with two -1s and no other responses this would die due to "no consensus", and in general I think "is it worth it" discussions for incremental changes are always going to be opinionated/political, which is not really how I want to spend my time.

But I'll reopen the issue and PR so we can continue.

erlend-aasland · 2023-06-05T19:13:56Z

FWIW, here's my thoughts about this: I really liked an early variant of your PR where you had an int return value and an PyObject * item output param. IMO, that was an API that is less likely to be misused/misunderstood; the return value aligns with a well established C idiom¹. I don't see it as a problem that looping will involva a few more lines of code. As I see it, the resulting "explicit code" is easier to read and reason about. If I can chose between an API designed to save lines in a while loop and an API designed to be as unambiguous as possible, I'll go for the latter any day. But API design is subjective, even though there certainly are aspects which can be discussed objectively.

Also, I appreciate that there is a dedicated place to record API problems and discuss them; I really appreciate Irit's hard work with managing all of this. API design is hard, and I think it is important to discuss proposed APIs thoroughly before landing a design², rather than rushing that process and landing possibly premature solutions. I also understand that long discussions about new APIs can be discouraging.

I believe the use of well established C idioms should be weighted in API design, as I believe it makes for APIs that are less likely to be misused. Maybe I'm wrong. ↩
Take a peak at the HPy repo and see how carefully they design every single API! Personally, I find it very inspiring to follow their discussions. ↩

markshannon · 2023-06-06T10:05:25Z

We should decide in general whether functions that effectively return a (error, value) pair should have the signature
int foo(..., PyObject **value) or PyObject *foo(..., int *err).

I strongly prefer the former: I think it makes for nicer flowing code, makes it harder to miss the error, and is (IMO) more idiomatic C.

As to what the return values should be:

-1 for an error. As we use that convention everywhere
0 for a "lesser" result, which in this case is termination (for PyDict_GetItem, the case where the key is absent).
1 for the "greater" result, the presence of a value (for PyDict_GetItem, the case where the key is present).

What is the "lesser" or "greater" result is somewhat subjective, but I think it makes for the least surprising API.

There are three cases to deal with, the return code should reflect that. Any code using the function will need to handle all three cases, with two tests, but should be able to test for the most common case with a single test. The trio of values, -1, 0, -1 allows that.

The version returning an int code can be used just as efficiently as the value returning a PyObject * value.

E.g. with int PyIter_NextItem(PyObject *iter, PyObject **next)

    while ((err = PyIter_NextItem(iterator, &next)) > 0) {
        use(next);
        Py_DECREF(next);
    }
    if (err <= 0) {
        /* Cleanup */
        return -1;
    }
    /* Done */

This sort of three value return is relatively common, so we should have a consistent, documented pattern for it.

erlend-aasland · 2023-06-06T10:16:59Z

I agree with Mark on all points. I also think we should agree on and document the new API guidelines before solving any existing issues; that way, there will hopefully be fewer long and exhausting discussions for each new API.

iritkatriel · 2023-06-06T10:23:26Z

@markshannon That's not a bad option, I tried it in an earlier iteration and I'm not sure I landed on anything better. How would you fix PyUnicode_Compare? It currently returns -1, 0 1 and we need to add a fourth value for error.

@erlend-aasland I appreciate your remarks, but let's not refer to what happened on that PR as "discussion". Children of an impressionable age have access to GitHub and they might be reading this. We should teach them better than that.

markshannon · 2023-06-06T11:09:25Z

How would you fix PyUnicode_Compare?

PyUnicode_Compare can only fail because it has the wrong type signature.
Change it from int PyUnicode_Compare(PyObject *left, PyObject *right) to
int PyUnicode_Compare(PyUnicodeObject *left, PyUnicodeObject *right) and it becomes infallible.

markshannon · 2023-06-06T11:17:18Z

Assuming you want to keep the more general signature, then yes it does need more return values.
There are in fact five possible return values: error, less, equal, more, unordered. The last for sets, floats, etc.
In which case I would use the following enum:

enum {
     ERROR = -1,
     UNORDERED = 1,
     LESS_THAN = 2,
     GREATER_THAN = 4
     EQUAL = 8
}

See https://github.com/python/cpython/blob/main/Include/internal/pycore_code.h#L469 for why this seemingly odd choice makes sense.

encukou · 2023-06-21T09:19:37Z

As to what the return values should be:

-1 for an error. As we use that convention everywhere

0 for a "lesser" result, which in this case is termination (for PyDict_GetItem, the case where the key is absent).

1 for the "greater" result, the presence of a value (for PyDict_GetItem, the case where the key is present).

That sounds like a good direction.
A general guideline should also suggest when *result should be set. For the “lesser” case, do we leave it untouched, set it to NULL, or say that users shouldn't use it? What about the “error” case?

For PyDict_GetItem, it seems ergonomic enough to do:

-1 for an error, everywhere. *result is undefined, may be garbage, don't touch it.
0 for success
- result is NULL, where the key is absent
- result is non-NULL if found

In a PyWeakref_GetRef-style function, determining whether to return 1 or 0 is a (admittedly tiny) bit of extra work that most callers could ignore. Is it worth the consistency? Or should __next__-style function be the odd ones that return 1 for convenience in a while loop?

markshannon · 2023-06-22T09:23:18Z

For PyDict_GetItem, I'd prefer to return 0 for not found, and 1 for found.
This seems more consistent; not found is the "lesser" result.
It also reduces memory traffic a bit, as *result is only touched if the value is found.

For example, implementing a version of dict.get, where the key is expected to be missing:

PyObject *
dict_get_probably_missing(PyDictObject *self, PyObject *key, PyObject *default)
{
   
   PyObject *val;
   int res = PyDict_GetItem(self, key, &val);
   if (res == 0) {
       return Py_NewRef(default);
   }
   if (res > 0) {
        return val;
   }
   else {
       return NULL;
   }
}

markshannon · 2023-06-22T09:25:37Z

PyWeakref_GetRef does a lot of work to check that the weakref is not NULL and is a weak ref.
See capi-workgroup/problems#31

erlend-aasland · 2023-06-22T09:30:06Z

Can we please open a devguide issue for establishing and documenting these API guidelines, and then move the general discussion we're now having over there? When that discussion has landed, and guidelines are documented, we can ~~start~~ continue fixing existing problematic APIs by adding new APIs according to the established guidelines.

iritkatriel · 2023-08-23T11:12:00Z

I'm not going to pursue this.

erlend-aasland · 2024-07-24T13:10:27Z

The C API WG voted on and agreed on the following API:

int PyIter_NextItem(PyObject *iter, PyObject **item)

Return -1 on error; raise an exception and set *item to NULL
Return 0 and set *item to NULL on StopIteration
Return 1 and set *item to a strong reference to the iterator item on success

iter and item cannot be NULL.

Return -1 and set an exception on error; return 0 if the iterator is exhausted, and return 1 if the next item was fetched successfully. Prefer this API to PyIter_Next(), which requires the caller to use PyErr_Occurred() to differentiate between iterator exhaution and errors. Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>

Return -1 and set an exception on error; return 0 if the iterator is exhausted, and return 1 if the next item was fetched successfully. Prefer this API to PyIter_Next(), which requires the caller to use PyErr_Occurred() to differentiate between iterator exhaustion and errors. Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>

iritkatriel added type-feature A feature request or enhancement topic-C-API labels Jun 1, 2023

iritkatriel mentioned this issue Jun 1, 2023

Ambiguous return values capi-workgroup/problems#1

Open

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 1, 2023

pythongh-105201: Add PyIter_NextItem to replace PyIter_Next which has…

5e91524

… an ambiguous return value

bedevere-bot mentioned this issue Jun 1, 2023

gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output #105202

Closed

iritkatriel changed the title ~~Replace C-API functions with ambiguous return values~~ PyIter_Next has ambiguous return value Jun 2, 2023

iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023

iritkatriel reopened this Jun 5, 2023

encukou mentioned this issue Jun 20, 2023

C API: Add PyWeakref_GetRef() function #105927

Closed

This was referenced Jun 21, 2023

gh-105927: Add PyWeakref_GetRef() function #105932

Merged

Add a C API to assign a new version to a PyTypeObject #103091

Closed

markshannon mentioned this issue Jun 22, 2023

Document the preferred style for API functions with three, four or five-way returns python/devguide#1121

Closed

iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2023

erlend-aasland mentioned this issue Apr 10, 2024

Introduce replacement API for PyIter_Next with a non-ambiguous return value capi-workgroup/decisions#21

Closed

erlend-aasland reopened this Jul 24, 2024

erlend-aasland self-assigned this Jul 24, 2024

bedevere-app bot mentioned this issue Jul 26, 2024

gh-105201: Add PyIter_NextItem() #122331

Merged

erlend-aasland closed this as completed Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyIter_Next has ambiguous return value #105201

PyIter_Next has ambiguous return value #105201

PyIter_Next has ambiguous return value #105201

PyIter_Next has ambiguous return value #105201

Comments

Linked PRs

Footnotes