[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

Possible race condition: Revision IDs returned by the server do not match
Closed, ResolvedPublic

Description

See slack discussion, summarized notes:

  • I've seen this in browser tests that exercise repeat editing in VisualEditor. Video, test

  • In the test, the error occurs after saving an edit and clicking "Edit" again (very quickly, since it's an automated test)

Event Timeline

The error message in the video is "Revision IDs returned by the server do not match (document: 15, metadata: 19)."

I had a look at apache-access.log from the same test run, hoping to see how the wrong revision (15 instead of 19) is being fetched.
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/111128/artifact/log/apache-access.log

I'm not sure what's wrong, but I see something pretty suspicious. The following lines correspond to the first editor load, and the save:

2022-04-03T21:05:0415725833833GEThttp://127.0.0.1/rest.php/127.0.0.1/v3/page/html/Classical_kemen\xc3\xa7e/15?redirect=false&stash=truetext/html-VisualEditor-MediaWiki/1.39.0-alphaen
2022-04-03T21:05:042435288366GEThttp://127.0.0.1:9413/api.php?action=visualeditor&format=json&paction=parse&page=Classical_kemen%C3%A7e&uselang=en&formatversion=2application/jsonhttp://127.0.0.1:9413/index.php?title=Classical_kemen%C3%A7e&gesuggestededit=1&veaction=editMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36en-US,en;q=0.9

You can see the revision ID "15" in the rest.php requests, made internally by the VisualEditor API.

Below, these rows correspond to the edit being saved:

2022-04-03T21:05:061337226265POSThttp://127.0.0.1/rest.php/127.0.0.1/v3/transform/html/to/wikitext/Classical_kemen\xc3\xa7e/15text/plain-VisualEditor-MediaWiki/1.39.0-alphaen
2022-04-03T21:05:0524108199POSThttp://127.0.0.1:9413/api.phpapplication/jsonhttp://127.0.0.1:9413/index.php?title=Classical_kemen%C3%A7e&gesuggestededit=1&veaction=editMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36en-US,en;q=0.9

You can see the revision ID "15" again.

I expected to see some more requests afterwards for the second editor loads, hopefully including "15" or "19" somewhere prominently – but there are none! The editor apparently doesn't even attempt to load, it just shows the error. There are some other requests, but none that would load the content of the page for VisualEditor. I don't know why that happens, but this is probably the source of the issue.

There are definitely some bugs in VisualEditor if you try to reopen it too fast.

I was using variants of the following snippets in browser console to immediately reopen VE after an edit is saved:

mw.hook( 'postEdit' ).add( function () { $('#ca-ve-edit a').click(); } );

mw.hook( 'postEdit' ).add( function () { setTimeout( function () { $('#ca-ve-edit a').click(); }, 500 ); } )

At first, I noticed that after running the first snippet and making an edit, the editor wouldn't reopen, and instead get stuck in a weird state where the "Edit" button doesn't work, but the categories box below the page is highlighted as if VE was open.

After ~10 edits with different timeouts (trying to find the point where the above bug stops happening), I accidentally reproduced your bug exactly:

image.png (1×3 px, 175 KB)

There is no network request when this happens – the stale data is loaded from session storage, which is normally used for the auto-save feature, and which normally would be deleted after an edit is saved.

Change 778357 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/VisualEditor@master] DesktopArticleTarget.init: Wait for deactivating to finish before activating

https://gerrit.wikimedia.org/r/778357

Change 778357 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] DesktopArticleTarget.init: Wait for deactivating to finish before activating

https://gerrit.wikimedia.org/r/778357

Change 787881 had a related patch set uploaded (by MarkAHershberger; author: MarkAHershberger):

[mediawiki/core@REL1_38] More deprecations that interfere with VE and REL1_38

https://gerrit.wikimedia.org/r/787881

Change 787892 had a related patch set uploaded (by MarkAHershberger; author: MarkAHershberger):

[mediawiki/core@master] More deprecations that interfere with VE and REL1_38

https://gerrit.wikimedia.org/r/787892

Change 787880 had a related patch set uploaded (by MarkAHershberger; author: MarkAHershberger):

[mediawiki/core@REL1_38] Address deprecations for php8.1 and REL1_38

https://gerrit.wikimedia.org/r/787880

Change 787881 abandoned by MarkAHershberger:

[mediawiki/core@REL1_38] More deprecations that interfere with VE and REL1_38

Reason:

https://gerrit.wikimedia.org/r/787881

@MarkAHershberger Can you explain how these patches are meant to fix this issue? Did you mean to tag a different task?

@MarkAHershberger Can you explain how these patches are meant to fix this issue? Did you mean to tag a different task?

Looks like a fat-finger problem. fixed.

@kostajh I'm sorry, I haven't had more time to look into this. Clearly it's a bug in our code, but tracking it down doesn't seem easy.

I'm trying to figure out if it's worth investigating, or if it's okay to leave it be. It doesn't seem to affect real usage, just the tests. Did it just happen once after the previous fix, or is it a common error? If it's common, would it be feasible to just add a hardcoded delay in the test code?

@kostajh I'm sorry, I haven't had more time to look into this. Clearly it's a bug in our code, but tracking it down doesn't seem easy.

I appreciate your help thus far!

I'm trying to figure out if it's worth investigating, or if it's okay to leave it be. It doesn't seem to affect real usage, just the tests. Did it just happen once after the previous fix, or is it a common error? If it's common, would it be feasible to just add a hardcoded delay in the test code?

It's probably fine to leave it as is. I haven't seen builds fail, but since we retry tests 3 times per build, it is possible it is still occurring. Adding a hardcoded delay is not ideal (T301233: Do not use browser.pause() in Selenium tests) but probably the way forward if it turns out to be a recurring issue.

Right. I will close this then, since we've mostly fixed the problem with the previous patch. If anything changes and you'd like us to investigate this more, please open a new task.

Change 787892 abandoned by MarkAHershberger:

[mediawiki/core@master] More deprecations that interfere with VE and REL1_38

Reason:

https://gerrit.wikimedia.org/r/787892