Preserve order of group results with Redis result backend #6218

maybe-sybr · 2020-07-10T01:33:26Z

Description

This PR is an alternate to #4858 authored by @lpsinger and fixes #3781. It ensures
that the order of results from a canvas group task wrapper matches the order of the
individual encapsulated tasks when using Redis as a results backend.

maybe-sybr · 2020-07-10T01:36:32Z

flakes seem happier than the previous PR on my machine. I'll see how CI here goes before I dig into getting the full test suite running locally.

I've stripped a commit from the previous PR which changed the version of flake8 used since it wasn't relevant to this changeset and was probably super outdated anyway. Otherwise, it's pretty much the same as #4858 except for some resolving the conflict in celery/backends/redis.py.

maybe-sybr · 2020-07-10T01:39:59Z

@georgepsarakis - this is a refresh of #4858 where you raised a question about the changes from using a List to a SortedSet in the Redis backend wrt backward compatibility. Can you revisit this? ref: #4858 (comment)

lgtm-com · 2020-07-10T01:45:34Z

This pull request introduces 1 alert and fixes 2 when merging 88c7a78 into 9d54c8a - view on LGTM.com

new alerts:

1 for Non-exception in 'except' clause

fixed alerts:

1 for Non-exception in 'except' clause
1 for Module is imported with 'import' and 'import from'

maybe-sybr · 2020-07-10T01:55:39Z

1 failed
- t/unit/backends/test_redis.py:591 test_RedisBackend.test_on_chord_part_return

Looks like we didn't change the args the mock accepts for zadd() - I think the fixup commit solves that. I am getting some other unit test failures on my box in that file though which is a bit concerning. We'll see how CI goes this time I guess.

Edit: Ah, I see that CI stops after a single failure. I'll fix up the mocks here and get that suite running locally before I push more changes here.

lgtm-com · 2020-07-10T02:18:52Z

This pull request introduces 2 alerts and fixes 1 when merging 00f529d into 9d54c8a - view on LGTM.com

new alerts:

2 for Module is imported with 'import' and 'import from'

fixed alerts:

1 for Module is imported with 'import' and 'import from'

maybe-sybr · 2020-07-10T05:44:26Z

This changeset should be good now. I had to correct the behaviour of the sorted set mocks in the backend test suite since they didn't match what Redis would actually do an also returned the score as well as the desired value in a 2-tuple.

I've left a fixup commit on the end which should be squashed down onto the first commit in this branch prior to merge. The mock fixups are in their own commit to make it clear what has changed since the previous MR to get things working. Let me know if you would like any further changes prior to merge, @auvipy. I'll keep an eye open for this CI run but I'm hoping it'll go green now.

lgtm-com · 2020-07-10T05:56:45Z

This pull request fixes 1 alert when merging edab532 into 9d54c8a - view on LGTM.com

fixed alerts:

1 for Module is imported with 'import' and 'import from'

auvipy · 2020-07-10T08:05:00Z

E redis.exceptions.ResponseError: WRONGTYPE Operation against a key holding the wrong kind of value

.tox/3.8-integration-redis/lib/python3.8/site-packages/redis/connection.py:756: ResponseError

auvipy

E redis.exceptions.ResponseError: WRONGTYPE Operation against a key holding the wrong kind of value

.tox/3.8-integration-redis/lib/python3.8/site-packages/redis/connection.py:756: ResponseError

maybe-sybr · 2020-07-10T10:51:53Z

Interesting. IIUC that's blowing up because the integration test is trying to access an element of the result object (or maybe the whole object) as a list. Should be a fairly straightforward fix, I'll try to get to it tomorrow.

lgtm-com · 2020-07-12T06:23:09Z

This pull request fixes 2 alerts when merging 18b8cac into 9d54c8a - view on LGTM.com

fixed alerts:

1 for Non-exception in 'except' clause
1 for Module is imported with 'import' and 'import from'

Fixes celery#3781.

The test suite still uses `lrange()` and `rpush()` to implement its `redis-echo` task chain integration tests, but these are unrelated to the handling of group results and remain unchanged.

lgtm-com · 2020-07-12T07:07:43Z

This pull request fixes 2 alerts when merging 9504ff6 into 9d54c8a - view on LGTM.com

fixed alerts:

1 for Non-exception in 'except' clause
1 for Module is imported with 'import' and 'import from'

codecov · 2020-07-12T07:08:18Z

Codecov Report

Merging #6218 into master will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6218      +/-   ##
==========================================
+ Coverage   83.57%   83.60%   +0.03%     
==========================================
  Files         145      145              
  Lines       17375    17416      +41     
  Branches     2160     2174      +14     
==========================================
+ Hits        14521    14561      +40     
- Misses       2631     2632       +1     
  Partials      223      223

Impacted Files	Coverage Δ
celery/app/amqp.py	`95.06% <ø> (ø)`
celery/app/base.py	`61.46% <ø> (ø)`
celery/utils/abstract.py	`19.04% <ø> (ø)`
celery/app/task.py	`96.19% <100.00%> (+0.01%)`	⬆️
celery/backends/redis.py	`92.09% <100.00%> (+0.04%)`	⬆️
celery/canvas.py	`93.07% <100.00%> (+0.05%)`	⬆️
celery/worker/request.py	`97.20% <100.00%> (+0.02%)`	⬆️
celery/bin/celery.py	`97.80% <0.00%> (-0.24%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d54c8a...b036164. Read the comment docs.

maybe-sybr · 2020-07-12T07:14:59Z

Looks like we need a test for group.freeze(group_index=...) and another which sanity tests that the group_index() property exists. That'll fix up the drops in coverage and make codecov happy.

auvipy · 2020-07-12T10:20:24Z

please do so when you have time.

georgepsarakis · 2020-07-12T13:21:48Z

celery/backends/redis.py


        client = self.client
        jkey = self.get_key_for_group(gid, '.j')
        tkey = self.get_key_for_group(gid, '.t')
        result = self.encode_result(result, state)
        with client.pipeline() as pipe:
            pipeline = pipe \
-                .rpush(jkey, self.encode([1, tid, state, result])) \
-                .llen(jkey) \
+                .zadd(jkey,


@maybe-sybr unfortunately at this point (and Line 449) we break backwards compatibility and exceptions will be raised. I think we need to handle both structures for the transition. What do you think?

For my purposes, I haven't run long term deployments where the Redis instance outlives the Celery one(s). It certainly seems feasible but would either:

require an extra type check against the backend before we actually build up the right pipeline for the observed type, or

construct a list or zset based pipeline depending on whether we think the group is zset based or not (perhaps if group_index is not None?)

Any preference on the approach?

Thinking about this a bit more, I don't really see how we can reasonably maintain backward compatibility here. Only workers interact with the results backend and use this code, so IIUC we're talking about situations where there's a progressive upgrade of celery workers in a cluster, with some running old code and others running the new code. Or am I misunderstanding and does result resolution on any part of the system also hit the result backend?

The problem with the mixed worker node situation is that old code will unconditionally use operations for lists while new code should optimistically use operations for sorted sets and fall back to lists if it has to interact with older workers. That'll be racy and break if a worker running old code happens to get beaten to the results key by one running new code. I suppose there could be a decision made on something like feature flags (or a naive version check) passed between workers when they mingle but I'm not familiar enough with the clustering protocol to know if that's a reasonable idea.

ping @georgepsarakis - any input on this? AFAICT running code which issues tasks on master and a worker on this branch works fine, and vice versa (with the results being unordered since the worker uses a list to store the group results). So I think what I said in the previous comment about mixed code in clusters of workers is where breakage would occur, and need some input on whether we need some sort of behavioural switch based on something sourced from the mingling process.

celery/backends/redis.py

lgtm-com · 2020-07-13T03:26:15Z

This pull request introduces 2 alerts and fixes 4 when merging b036164 into d537be4 - view on LGTM.com

new alerts:

1 for Non-exception in 'except' clause
1 for Wrong number of arguments in a call

fixed alerts:

3 for Module is imported with 'import' and 'import from'
1 for Non-exception in 'except' clause

maybe-sybr · 2020-07-13T05:35:18Z

I think the Appveyor build may have died/timed out. The red job ran for an hour and seems t have truncated output.

auvipy · 2020-07-13T06:12:37Z

need not to worry about AppVeyor now

auvipy · 2020-07-14T23:51:25Z

      res.children[0].children[0].result[0])

E TypeError: 'NoneType' object is not subscriptable

maybe-sybr · 2020-07-15T03:59:41Z

That's interesting @auvipy - I think that test passed on effectively the same changeset previously.

Build 8047 (red):

Build 8046 (green):

Seems like the test might be racey.

auvipy · 2020-07-15T04:24:17Z

I restarted the build

maybe-sybr · 2020-07-15T05:17:01Z

I managed to hit the error running the chord integration tests in t/integration/test_canvas.py with a redis backend. I'll submit a separate PR if I work out how to fix it since it doesn't seem like it's relevant to this changeset.

auvipy · 2020-07-15T06:52:26Z

all the tests passing now here! please submit another PR for the unrelated error

auvipy · 2020-07-15T07:58:40Z

I will wait for clarification or suggestion about keeping BC from George

maybe-sybr · 2020-07-20T00:09:25Z

@thedrow - I've made no changes to address the back-compat concerns. Awaiting a reply from @georgepsarakis since I'm not sure how we can go about solving the issue (per my comments in that thread) given my current understanding of when problems would manifest (multiple workers running mixed versions of celery IIUC).

Happy to make a follow up PR to fix any potential issues as you all see fit.

This should ensure that there is no breakage between workers with the code from celery#6218 and those without, unless the cluster owner specifically opts into the new behaviour.

This should ensure that there is no breakage between workers with the code from #6218 and those without, unless the cluster owner specifically opts into the new behaviour.

* Preserve order of return values from groups Fixes celery#3781. * Update for zadd arguments changed in redis-py 3 * Use more explicit loop variable name * Handle group_index not set * Use zrange instead of zrangebyscore * test: Fix Redis sorted set mocks in backend tests * test: Make canvas integration tests use `zrange()` The test suite still uses `lrange()` and `rpush()` to implement its `redis-echo` task chain integration tests, but these are unrelated to the handling of group results and remain unchanged. * test: Add unit tests for `group_index` handling * fix: Add `group_index` to `Context`, chord uplift * test: Sanity check `Request.group_index` property This adds a test to make sure the property exists and also changes the property to use the private `_request_dict` rather than the public property. Co-authored-by: Leo Singer <leo.singer@ligo.org>

This should ensure that there is no breakage between workers with the code from celery#6218 and those without, unless the cluster owner specifically opts into the new behaviour.

maybe-sybr mentioned this pull request Jul 10, 2020

Preserve order of return values from groups #4858

Closed

auvipy added this to the 4.4.x milestone Jul 10, 2020

auvipy self-requested a review July 10, 2020 04:32

maybe-sybr force-pushed the preserve-group-order branch from 00f529d to edab532 Compare July 10, 2020 05:41

auvipy requested changes Jul 10, 2020

View reviewed changes

lpsinger and others added 7 commits July 12, 2020 16:48

Preserve order of return values from groups

538c2b8

Fixes celery#3781.

Update for zadd arguments changed in redis-py 3

3661d6e

Use more explicit loop variable name

f5ade82

Handle group_index not set

d39ba3c

Use zrange instead of zrangebyscore

f5b173d

test: Fix Redis sorted set mocks in backend tests

dae28ba

test: Make canvas integration tests use zrange()

f7830e1

The test suite still uses `lrange()` and `rpush()` to implement its `redis-echo` task chain integration tests, but these are unrelated to the handling of group results and remain unchanged.

georgepsarakis reviewed Jul 12, 2020

View reviewed changes

test: Add unit tests for group_index handling

76473da

maybe-sybr commented Jul 13, 2020

View reviewed changes

celery/backends/redis.py Show resolved Hide resolved

fix: Add group_index to Context, chord uplift

9aa9080

maybe-sybr force-pushed the preserve-group-order branch from 46e327b to b036164 Compare July 13, 2020 03:14

auvipy closed this Jul 13, 2020

auvipy reopened this Jul 13, 2020

auvipy closed this Jul 15, 2020

auvipy reopened this Jul 15, 2020

maybe-sybr mentioned this pull request Jul 15, 2020

test: Fix rare failures in chord error test #6230

Merged

auvipy added Component: Redis Results Backend PR Type: Bugfix labels Jul 19, 2020

auvipy approved these changes Jul 19, 2020

View reviewed changes

auvipy merged commit 455e0a0 into celery:master Jul 19, 2020

maybe-sybr deleted the preserve-group-order branch July 27, 2020 23:07

maybe-sybr mentioned this pull request Jul 28, 2020

improv: Make use of ordered sets in Redis opt-in #6245

Merged

maybe-sybr mentioned this pull request Aug 5, 2020

fix flaky test_add_chord_to_chord #6263

Merged

maybe-sybr mentioned this pull request Sep 16, 2020

Deeply deserialize groups #6342

Merged

mkniewallner mentioned this pull request Aug 4, 2023

[5.3.0] Task results order is not preserved anymore #8421

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Preserve order of group results with Redis result backend #6218

Preserve order of group results with Redis result backend #6218

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Preserve order of group results with Redis result backend #6218

Preserve order of group results with Redis result backend #6218

Uh oh!

Conversation

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!