10000 src/box/vy_cache.c:487: void vy_cache_on_write(struct vy_cache *, struct vy_entry, struct vy_entry *): Assertion `(*next_node)->flags & VY_CACHE_LEFT_LINKED' failed. · Issue #10879 · tarantool/tarantool · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

src/box/vy_cache.c:487: void vy_cache_on_write(struct vy_cache *, struct vy_entry, struct vy_entry *): Assertion `(*next_node)->flags & VY_CACHE_LEFT_LINKED' failed. #10879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ligurio opened this issue Nov 29, 2024 · 3 comments · Fixed by #10900
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches 3.2 Target is 3.2 and all newer release/master branches bug Something isn't working crash fuzzing vinyl

Comments

@ligurio
Copy link
Member
ligurio commented Nov 29, 2024

Bug description

  • OS: Linux
  • OS Version: Ubuntu 24.04
  • Architecture: amd64

Tarantool 3.3.0-entrypoint-290-gd26f574f8d
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=TRUE
Compiler: Clang-18.1.3
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -fsanitize=address -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/ubuntu/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror -g -ggdb -O0
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -fsanitize=address -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/ubuntu/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror -g -ggdb -O0

Steps to reproduce

./build/src/tarantool ./test/fuzz/lua/test_engine.lua --workers 1000 --test_duration $((30*60)) --engine vinyl --verbose --seed 22672

Actual behavior

<snipped>

2024-11-29 14:29:44.269 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:888 I> TX_ROLLBACK []
2024-11-29 14:29:44.270 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:888 I> DELETE_OP [[283505752557,"29af96c3-fb46-4dc9-8937-56ead1b168c6",448915935291.61,"123803121172","2024-11-29T14:29:44Z",695614702995,false,"cvonezpdum",160926069080]]
2024-11-29 14:29:44.270 [1291393] main/388/WRK #271 wal.c:1339 E> Aborting transaction 1914 during cascading rollback
2024-11-29 14:29:44.270 [1291393] main/388/WRK #271 wal.c:1340 E> ER_CASCADE_ROLLBACK: WAL has a rollback in progress
2024-11-29 14:29:44.271 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:892 I> ERROR: opname "DELETE_OP", err "WAL has a rollback in progress", args [[283505752557,"29af96c3-fb46-4dc9-8937-56ead1b168c6",448915935291.61,"123803121172","2024-11-29T14:29:44Z",695614702995,false,"cvonezpdum",160926069080]]
2024-11-29 14:29:44.271 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:888 I> TX_BEGIN []
2024-11-29 14:29:44.272 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:888 I> UPDATE_OP [[923313470976,"4406fbad-55ff-4d6a-a118-11cd925d2632",295111540586.94,"19800491511","2024-11-29T14:29:44Z",389677528218,false,"xncpryeyid",523637837144],[["^",2,184094857412],["=",7,false],["=",3,"5252b9cc-e118-4bbd-a435-bc498525a82b"],["+",6,460744961210],["=",10,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]],["=",5,"plbuupajqz"],["+",8,"113000203032"]]]
2024-11-29 14:29:44.272 [1291393] main/388/WRK #271/..test.fuzz.lua.test_engine test_engine.lua:888 I> SNAPSHOT_OP []
2024-11-29 14:29:44.273 [1291393] main/388/WRK #271 vy_scheduler.c:753 I> vinyl checkpoint started
2024-11-29 14:29:44.273 [1291393] main/108/checkpoint_daemon gc.c:643 I> scheduled next checkpoint for Fri Nov 29 14:29:56 2024
tarantool: src/box/vy_cache.c:487: void vy_cache_on_write(struct vy_cache *, struct vy_entry, struct vy_entry *): Assertion `(*next_node)->flags & VY_CACHE_LEFT_LINKED' failed.

Full log and tarantool binary: gh-10879.zip

Expected behavior

no assertion violation

@ligurio ligurio added bug Something isn't working vinyl labels Nov 29, 2024
@locker
Copy link
Member
locker 8000 commented Dec 2, 2024

@ligurio Do you have a core dump?

@ligurio
Copy link
Member Author
ligurio commented Dec 2, 2024

@ligurio Do you have a core dump?

No.

@locker
Copy link
Member
locker commented Dec 2, 2024

Cache invalidation on transaction rollback caused by a WAL write error seems to be broken for DELETE statements. Note, if there's a cache chain that was created after the rolled back DELETE statement was prepared, it won't be broken.

tarantool/src/box/vy_lsm.c

Lines 1029 to 1037 in bca9265

void
vy_lsm_rollback_stmt(struct vy_lsm *lsm, struct vy_mem *mem,
struct vy_entry entry)
{
vy_mem_rollback_stmt(mem, entry, &lsm->stat.memory.count);
/* Invalidate cache element. */
vy_cache_on_write(&lsm->cache, entry, NULL);
}

if (vy_stmt_type(entry.stmt) == IPROTO_DELETE && !exact) {
/* there was nothing and there is nothing now */
return;
}

Below is a script that should trigger the same assertion failure:

tarantool: ./src/box/vy_cache.c:487: vy_cache_on_write: Assertion `(*next_node)->flags & VY_CACHE_LEFT_LINKED' failed.
Aborted

It isn't a 100% reproducer though - that's why test function is called 100 times in a loop.

reproducer.lua
os.execute('rm -rf [0-9]*')

local fiber = require('fiber')

box.cfg{log_level = 'warn'}

local function test()
    local s = box.schema.space.create('test', {engine = 'vinyl'})
    s:create_index('primary')

    s:insert({10})
    s:insert({20})
    s:insert({30})
    box.snapshot()

    box.error.injection.set('ERRINJ_WAL_DELAY', true)

    fiber.new(function()
        s:delete({20})
    end)
    fiber.new(function()
        box.begin()
        s:replace({2})
        s:get({20})
        box.rollback()
    end)
    fiber.new(function()
        box.begin()
        s:replace({1})
        s:select({}, {iterator = 'ge', fullscan = true})
        box.rollback()
    end)

    box.error.injection.set('ERRINJ_WAL_WRITE', true)
    box.error.injection.set('ERRINJ_WAL_DELAY', false)
    fiber.sleep(0.1)
    box.error.injection.set('ERRINJ_WAL_WRITE', false)

    s:select({10}, {iterator = 'ge', fullscan = true})
    s:get({20})
    s:replace({10})

    s:drop()
end

for i = 1, 100 do
    test()
end

os.exit(0)

@locker locker self-assigned this Dec 2, 2024
@locker locker added the crash label Dec 2, 2024
@ligurio ligurio added the fuzzing label Dec 4, 2024
locker added a commit to locker/tarantool that referenced this issue Dec 5, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes tarantool#10879

NO_DOC=bug fix
locker added a commit that referenced this issue Dec 10, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes #10879

NO_DOC=bug fix
@locker locker added 2.11 Target is 2.11 and all newer release/master branches 3.2 Target is 3.2 and all newer release/master branches labels Dec 10, 2024
locker added a commit that referenced this issue Dec 10, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes #10879

NO_DOC=bug fix

(cherry picked from commit d64e29d)
locker added a commit that referenced this issue Dec 10, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes #10879

NO_DOC=bug fix

(cherry picked from commit d64e29d)
locker added a commit that referenced this issue Dec 10, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes #10879

NO_DOC=bug fix

(cherry picked from commit d64e29d)
picodata-account pushed a commit to picodata/tarantool that referenced this issue Dec 11, 2024
Once a statement is prepared to be committed to WAL, it becomes visible
(in the 'read-committed' isolation level) so it can be added to the
tuple cache. That's why if the statement is rolled back due to a WAL
error, we have to invalidate the cache. The problem is that the function
invalidating the cache (`vy_cache_on_write`) ignores the statement if
it's a DELETE judging that "there was nothing and there is nothing now".
This is apparently wrong for rollback. Fix it.

Closes tarantool#10879

NO_DOC=bug fix

(cherry picked from commit d64e29d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches 3.2 Target is 3.2 and all newer release/master branches bug Something isn't working crash fuzzing vinyl
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0