8000 test: flaky replication/election_qsync.test.lua test · Issue #5430 · tarantool/tarantool · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

test: flaky replication/election_qsync.test.lua test #5430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avtikhon opened this issue Oct 16, 2020 · 5 comments
Closed

test: flaky replication/election_qsync.test.lua test #5430

avtikhon opened this issue Oct 16, 2020 · 5 comments

Comments

@avtikhon
Copy link
Contributor
avtikhon commented Oct 16, 2020

Tarantool version:
Tarantool 2.6.0-208-ga20a04cba
Target: FreeBSD-amd64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=OFF
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-gnu-alignof-expression -Werror
CXX_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Werror

OS version:
FreeBSD 12

Bug description:

  1. https://gitlab.com/tarantool/tarantool/-/jobs/795916093#L5385

artifacts.zip

results file checksum: afaa5d0f392c8de5420a05b268d04741

[013] --- replication/election_qsync.result	Fri Oct 16 19:25:53 2020
[013] +++ replication/election_qsync.reject	Fri May  8 08:21:42 2020
[013] @@ -127,14 +127,15 @@
[013]   | ...
[013]  _ = box.space.test:replace{2}
[013]   | ---
[013] + | - error: Found uncommitted sync transactions from other instance with id 2
[013]   | ...
[013]  box.space.test:select{}
[013]   | ---
[013]   | - - [1]
[013] - |   - [2]
[013]   | ...
[013]  box.space.test:drop()
[013]   | ---
[013] + | - error: Found uncommitted sync transactions from other instance with id 2
[013]   | ...
[013]  
[013]  test_run:cmd('delete server replica')
[013] 
  1. manually reproduced - check below (happens as 4th error)

results file checksum: 1c68d778b6e78dff37609b55d536cfd8

[001] replication/election_qsync.test.lua             vinyl           [ fail ]
[001]
[001] Test failed! Result content mismatch:
[001] --- replication/election_qsync.result     Fri May  8 08:56:08 2020
[001] +++ var/rejects/replication/election_qsync.reject Thu Jun 17 08:46:44 2021
[001] @@ -145,8 +145,7 @@
[001]   | ...
[001]  box.space.test:select{}
[001]   | ---
[001] - | - - [1]
[001] - |   - [2]
[001] + | - - [2]
[001]   | ...
[001]  box.space.test:drop()
[001]   | ---
[001]
  1. manually run on FreeBSD VMware image found that it fails running the test alone (happens very rare):
[001] --- replication/election_qsync.result     Fri May  8 08:56:08 2020
[001] +++ var/rejects/replication/election_qsync.reject Sat Jun 19 12:00:47 2021
[001] @@ -142,11 +142,11 @@
[001]   | ...
[001]  _ = box.space.test:replace{2}
[001]   | ---
[001] + | - error: A rollback for a synchronous transaction is received
[001]   | ...
[001]  box.space.test:select{}
[001]   | ---
[001] - | - - [1]
[001] - |   - [2]
[001] + | - []
[001]   | ...
[001]  box.space.test:drop()
[001]   | ---
[001] 
  1. manually run on FreeBSD VMware image found that it fails running the test alone with assert (happens in *0%):
    [001] Assertion failed: (lsn > prev_lsn), function vclock_follow, file /home/vagrant/tarantool/src/lib/vclock/vclock.c, line 46.
[001] replication/election_qsync.test.lua             memtx           
[001] 
[001] [Instance "replica" killed by signal: 6 (SIGABRT)]
[001] 
[001] Last 15 lines of Tarantool Log file [Instance "replica"][/home/vagrant/tarantool/test/var/001_replication/replica.log]:
[001] 2021-06-19 09:52:59.876 [67205] main/112/applier/unix/:/home/vagrant/tarantool/test/var/001_replication/master.socket-iproto I> RAFT: message {term: 1, state: follower} from 1
[001] 2021-06-19 09:52:59.892 [67205] main/127/console/unix/: I> set 'replication_synchro_timeout' configuration option to 1000000
[001] 2021-06-19 09:52:59.892 [67205] main/127/console/unix/: I> set 'replication_synchro_quorum' configuration option to 3
[001] 2021-06-19 09:52:59.892 [67205] main/127/console/unix/: I> RAFT: start state machine
[001] 2021-06-19 09:52:59.892 [67205] main/127/console/unix/: I> set 'election_mode' configuration option to "candidate"
[001] 2021-06-19 09:52:59.892 [67205] main/127/console/unix/: I> set 'election_timeout' configuration option to 1000000
[001] 2021-06-19 09:53:00.295 [67205] main I> RAFT: begin new election round
[001] 2021-06-19 09:53:00.295 [67205] main I> RAFT: bump term to 2, follow
[001] 2021-06-19 09:53:00.295 [67205] main I> RAFT: vote for 2, follow
[001] 2021-06-19 09:53:00.296 [67205] main/128/raft_worker I> RAFT: persisted state {term: 2, vote: 2}
[001] 2021-06-19 09:53:00.296 [67205] main/128/raft_worker I> RAFT: enter candidate state with 1 self vote
[001] 2021-06-19 09:53:00.296 [67205] main/112/applier/unix/:/home/vagrant/tarantool/test/var/001_replication/master.socket-iproto I> RAFT: message {term: 2, vote: 2, state: follower} from 1
[001] 2021-06-19 09:53:00.296 [67205] main/112/applier/unix/:/home/vagrant/tarantool/test/var/001_replication/master.socket-iproto I> RAFT: enter leader state with quorum 2
[001] 2021-06-19 09:53:00.307 [67205] relay/unix/:(socket)/101/main I> recover from `/home/vagrant/tarantool/test/var/001_replication/replica/00000000000000000002.xlog'
[001] Assertion failed: (lsn > prev_lsn), function vclock_follow, file /home/vagrant/tarantool/src/lib/vclock/vclock.c, line 46.
[001] [ fail ]

Main log:

[2021-06-19 12:05:36.277204] ESC[0;34mDEBUG: sending command: test_run:wait_cond(function() return box.info.lsn > lsn end)ESC[0m
[2021-06-19 12:05:40.958230] ESC[0;34mDEBUG: tarantool's response for [test_run:wait_cond(function() return box.info.lsn > lsn end)]
[2021-06-19 12:05:40.958230]  | 
[2021-06-19 12:05:40.958230] ESC[0m
[2021-06-19 12:05:40.959094] ESC[0;34mDEBUG: sending command: test_run:wait_lsn('default', 'replica')ESC[0m
[2021-06-19 12:05:40.959682] ESC[0;34mDEBUG: tarantool's response for [test_run:wait_lsn('default', 'replica')]
[2021-06-19 12:05:40.959682]  | [Lost current connection]
[2021-06-19 12:05:40.959682] ESC[0m
[2021-06-19 12:05:40.960030] ESC[0;34mDEBUG: sending command: test_run:switch('default')ESC[0m
[2021-06-19 12:05:40.960443] ESC[0;34mDEBUG: tarantool's response for [test_run:switch('default')]
[2021-06-19 12:05:40.960443]  | [Lost current connection]
[2021-06-19 12:05:40.960443] ESC[0m
[2021-06-19 12:05:40.960792] ESC[0;34mDEBUG: sending command: test_run:cmd('stop server replica')ESC[0m
[2021-06-19 12:05:40.961197] ESC[0;34mDEBUG: tarantool's response for [test_run:cmd('stop server replica')]
[2021-06-19 12:05:40.961197]  | [Lost current connection]
[2021-06-19 12:05:40.961197] ESC[0m
[2021-06-19 12:05:40.961532] ESC[0;34mDEBUG: sending command: box.space.test:replace{2}ESC[0m
[2021-06-19 12:05:41.038775] ESC[0;31m
[2021-06-19 12:05:41.038775] 
[2021-06-19 12:05:41.038775] [Instance "replica" killed by signal: 6 (SIGABRT)]ESC[0m

Steps to reproduce:
To reproduce errors 2, 3, 4 just run single test in loop on FreeBSD VMware with command:

c=0 ; while ./test-run.py replication/election_qsync.test.lua ; do c=$(($c+1)) ; 
8000
echo "ALX ================================= $c" | tee c.count ; done ; echo "ALX ================================= $c"

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon added qa Issues related to tests or testing subsystem flaky test qsync replication labels Oct 16, 2020
@avtikhon avtikhon changed the title test: flaky replication/election_qsync_stress.test.lua test test: flaky replication/election_qsync.test.lua test Oct 16, 2020
@avtikhon avtikhon reopened this Oct 16, 2020
avtikhon added a commit that referenced this issue Oct 16, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 16, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 17, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 17, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 17, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 18, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 20, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 21, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 21, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
 
8000
 wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 21, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
avtikhon added a commit that referenced this issue Oct 21, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431

t
avtikhon added a commit that referenced this issue Oct 21, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
kyukhin pushed a commit that referenced this issue Oct 22, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431
kyukhin pushed a commit that referenced this issue Oct 22, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431

(cherry picked from commit c596d31)
kyukhin pushed a commit that referenced this issue Oct 22, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431

(cherry picked from commit c596d31)
kyukhin pushed a commit that referenced this issue Oct 22, 2020
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_basic.test.lua			gh-5368
  replication/election_qsync.test.lua			gh-5430
  replication/election_qsync_stress.test.lua		gh-5395
  replication/gh-4402-info-errno.test.lua		gh-5366
  replication/gh-5426-election-on-off.test.lua		gh-5433
  wal_off/snapshot_stress.test.lua			gh-5431

(cherry picked from commit c596d31)
avtikhon added a commit that referenced this issue Oct 22, 2020
Added:

  box/access.test.lua				 	gh-5373 gh-5411
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  box/tx_man.test.lua					gh-5423
  replication/election_qsync.test.lua			gh-5430

Removed:

  replication/gh-5426-election-on-off.test.lua		gh-5433
avtikhon added a commit that referenced this issue Oct 31, 2020
Added flaky tests results files checksums:

  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/ddl.test.lua					gh-5337
  replication/election_qsync.test.lua				gh-5430
  replication/election_qsync_stress.test.lua			gh-5395
  replication/gh-5287-boot-anon.test.lua			gh-5412
  vinyl/deferred_delete.test.lua				gh-5089
  vinyl/iterator.test.lua					gh-5336
  vinyl/gh-4957-too-many-upserts.test.lua			gh-5378
  vinyl/gc.test.lua						gh-5474
avtikhon added a commit that referenced this issue Oct 31, 2020
Added flaky tests results files checksums:

  box/hash_gh-1467.test.lua					gh-5476
  box-tap/session.storage.test.lua				gh-5346
  replication/gh-3160-misc-heartbeats-on-master-changes.test.>	gh-4940
  replication/ddl.test.lua					gh-5337
  replication/election_qsync.test.lua				gh-5430
  replication/election_qsync_stress.test.lua			gh-5395
  replication/gh-5287-boot-anon.test.lua			gh-5412
  vinyl/deferred_delete.test.lua				gh-5089
  vinyl/iterator.test.lua					gh-5336
  vinyl/gh-4957-too-many-upserts.test.lua			gh-5378
  vinyl/gc.test.lua						gh-5474
avtikhon added a commit that referenced this issue Oct 31, 2020
Added flaky tests results files checksums:

  box/hash_gh-1467.test.lua					gh-5476
  box-tap/session.storage.test.lua				gh-5346
  replication/gh-3160-misc-heartbeats-on-master-changes.test.>	gh-4940
  replication/ddl.test.lua					gh-5337
  replication/election_qsync.test.lua				gh-5430
  replication/election_qsync_stress.test.lua			gh-5395
  replication/gh-5287-boot-anon.test.lua			gh-5412
  vinyl/deferred_delete.test.lua				gh-5089
  vinyl/gh-4957-too-many-upserts.test.lua			gh-5378
  vinyl/gc.test.lua						gh-5474
avtikhon added a commit that referenced this issue Nov 1, 2020
Added flaky tests results files checksums:

  box/hash_gh-1467.test.lua					gh-5476
  box-tap/session.storage.test.lua				gh-5346
  replication/gh-3160-misc-heartbeats-on-master-changes.test.>	gh-4940
  replication/ddl.test.lua					gh-5337
  replication/election_qsync.test.lua				gh-5430
  replication/election_qsync_stress.test.lua			gh-5395
  replication/gh-5287-boot-anon.test.lua			gh-5412
  vinyl/deferred_delete.test.lua				gh-5089
  vinyl/gh-4957-too-many-upserts.test.lua			gh-5378
  vinyl/gh-5141-invalid-vylog-file.test.lua			gh-5141
  vinyl/gc.test.lua						gh-5474
  vinyl/iterator.test.lua                       		gh-5141
avtikhon added a commit that referenced this issue Nov 1, 2020
Added:

  box/access.test.lua				 	gh-5373 gh-5411
  box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
  replication/election_qsync.test.lua			gh-5430

Removed:

  replication/gh-5426-election-on-off.test.lua		gh-5433
avtikhon added a commit that referenced this issue Nov 1, 2020
Added flaky tests results files checksums:

  box/access.test.lua				 	gh-5373 gh-5411
  box/hash_gh-1467.test.lua					gh-5476
  box/net.box_incorrect_iterator_gh-841.test.lua		gh-5434
  box-tap/session.storage.test.lua				gh-5346
  replication/gh-3160-misc-heartbeats-on-master-changes.test.>	gh-4940
  replication/ddl.test.lua					gh-5337
  replication/election_qsync.test.lua				gh-5430
  replication/election_qsync_stress.test.lua			gh-5395
  replication/gh-5287-boot-anon.test.lua			gh-5412
  vinyl/deferred_delete.test.lua				gh-5089
  vinyl/gh-4957-too-many-upserts.test.lua			gh-5378
  vinyl/gh-5141-invalid-vylog-file.test.lua			gh-5141
  vinyl/gc.test.lua						gh-5474
  vinyl/iterator.test.lua                       		gh-5141

Removed:

  replication/gh-5426-election-on-off.test.lua			gh-5433
grafin added a commit to grafin/tarantool that referenced this issue Mar 25, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while in wal delay
9. Interfering promote/demote while waiting for limbo to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Didn’t find a way to test failing box_wait_limbo_acked
in box_demote without adding dedicated ERRINJ.
IMO box_wait_limbo_acked should be covered by its own tests.

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue May 31, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 15, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 15, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 17, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 17, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 20, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 20, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
grafin added a commit to grafin/tarantool that referenced this issue Jun 30, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
kyukhin pushed a commit that referenced this issue Jun 30, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/gh-5430-qsync-promote-crash.test.lua)

Closes #6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
kyukhin pushed a commit that referenced this issue Jun 30, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/gh-5430-qsync-promote-crash.test.lua)

Closes #6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff

(cherry picked from commit 5a8dca7)
mkokryashkin pushed a commit to mkokryashkin/tarantool that referenced this issue Sep 9, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/tarantoolgh-5430-qsync-promote-crash.test.lua)

Closes tarantool#6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Jul 24, 2024
The test blocks on the `wait_log` for registering replica2 on the master
node, while the master is blocked by the WAL delay injection after replica1
is starting to get registered. The `wait_log` does not throw an error on
timeout, and its return value is ignored in the test, so the test simply
continued after the timeout and succeeded without the necessary
synchronization due to a mere coincidence.

To fix the test, let's move the `wait_log` synchronization after we unblock
the master from the WAL delay injection, and echo the `wait_log` result.

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Jul 24, 2024
The test blocks on the `wait_log` for registering replica2 on the master
node, while the master is blocked by the WAL delay injection after replica1
is starting to get registered. The `wait_log` does not throw an error on
timeout, and its return value is ignored in the test, so the test simply
continued after the timeout and succeeded without the necessary
synchronization due to a mere coincidence.

To fix the test, let's move the `wait_log` synchronization after we unblock
the master from the WAL delay injection, and echo the `wait_log` result.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Jul 25, 2024
The test blocks on the `wait_log` for registering replica2 on the master
node, while the master is blocked by the WAL delay injection after replica1
is starting to get registered. The `wait_log` does not throw an error on
timeout, and its return value is ignored in the test, so the test simply
continued after the timeout and succeeded without the necessary
synchronization due to a mere coincidence.

To fix the test, let's move the `wait_log` synchronization after we unblock
the master from the WAL delay injection, and echo the `wait_log` result.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Jul 31, 2024
The test blocks on the `wait_log` for registering replica2 on the master
node, while the master is blocked by the WAL delay injection after replica1
is starting to get registered. The `wait_log` does not throw an error on
timeout, and its return value is ignored in the test, so the test simply
continued after the timeout and succeeded without the necessary
synchronization due to a mere coincidence.

To fix the test, let's move the `wait_log` synchronization after we unblock
the master from the WAL delay injection, and echo the `wait_log` result.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 6, 2024
The test blocked on the `wait_log` for registering replica2 on the master
nod. The `wait_log` does not throw an error on timeout, and its return value is
ignored in the test, so the test simply continued after the timeout and
succeeded.

The reason the test blocked was that after 70a6883 we are unable to register an
anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we can
reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get blocked,
and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 6, 2024
The test blocked on the `wait_log` for registering replica2 on the master
nod. The `wait_log` does not throw an error on timeout, and its return value is
ignored in the test, so the test simply continued after the timeout and
succeeded.

The reason the test blocked was that after 70a6883 we are unable to register an
anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it is
blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we can
reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get blocked,
and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 7, 2024
The test blocked on the `wait_log` for registering replica2 on the master
node. The `wait_log` does not throw an error on timeout, and its return
value is ignored in the test, so the test simply continued after the
timeout and succeeded.

The reason the test blocked was that after 70a6883 we are unable to
register an anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it
is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we
can reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get
blocked, and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 19, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 19, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 20, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 21, 2024
The test blocked on the `wait_log` for registering replica2 on the master
node. The `wait_log` does not throw an error on timeout, and its return
value is ignored in the test, so the test simply continued after the
timeout and succeeded.

The reason the test blocked was that after 70a6883 we are unable to
register an anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it
is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we
can reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get
blocked, and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 21, 2024
The test blocked on the `wait_log` for registering replica2 on the master
node. The `wait_log` does not throw an error on timeout, and its return
value is ignored in the test, so the test simply continued after the
timeout and succeeded.

The reason the test blocked was that after 70a6883 we are unable to
register an anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it
is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we
can reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get
blocked, and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Aug 21, 2024
The test blocked on the `wait_log` for registering replica2 on the master
node. The `wait_log` does not throw an error on timeout, and its return
value is ignored in the test, so the test simply continued after the
timeout and succeeded.

The reason the test blocked was that after 70a6883 we are unable to
register an anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it
is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we
can reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get
blocked, and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes tarantool#9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
Gerold103 pushed a commit that referenced this issue Aug 26, 2024
The test blocked on the `wait_log` for registering replica2 on the master
node. The `wait_log` does not throw an error on timeout, and its return
value is ignored in the test, so the test simply continued after the
timeout and succeeded.

The reason the test blocked was that after 70a6883 we are unable to
register an anonymous replica while its relay has not stopped:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/box.cc#L4422-L4427

At the same time, the anonymous replica's relay cannot finish, because it
is blocked by `cbus_unpair` in `wal_clear_watcher`:
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/box/relay.cc#L1068-L1074
https://github.com/tarantool/tarantool/blob/9e616ab6f2f94ba2bcad3044187c474a1c05a8c5/src/lib/core/cbus.h#L440-L442

To fix this let's move `ERRINJ_REPLICA_JOIN_DELAY` before
`box_register_replica`. This error injection is not used right now, so we
can reuse it here instead of `ERRINJ_WAL_DELAY`. This way the WAL won't get
blocked, and we will be able to get 2 concurrent replica registrations.

Let's also echo the `wait_log` result to make sure the fails in the future.

Closes #9617

NO_CHANGELOG=<test fix>
NO_DOC=<test fix>
ligurio added a commit to ligurio/tarantool that referenced this issue Aug 30, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 5, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 5, 2024
Crash the program after printing the first error report (WARNING: USE AT YOUR OWN RISK!). The flag has effect only if code was compiled with -fsanitize-recover=address compile option.

```
 [061] replication/tarantoolgh-5430-cluster-mvcc.test.lua                     [ pass ]
[050]
[050]
[050] [Instance "box" returns with non-zero exit code: 1]
[050]
[050] [test-run server "box"] Last 15 lines of the log file /tmp/t/050_box/box.log:
[050]     tarantool#9 0x55d6c6868851  (<unknown module>)
[050]
[050] Direct leak of 342 byte(s) in 5 object(s) allocated from:
[050]     #0 0x55d69b184cae in malloc (/__w/tarantool/tarantool/src/tarantool+0x1268cae) (BuildId: 4f3fed4334a726219fb69119e67d451f0cb1ccfa)
[050]     tarantool#1 0x55d69d50040c in small_asan_alloc /__w/tarantool/tarantool/src/lib/small/small/util.c:94:24
[050]     tarantool#2 0x55d69d4fcb3c in smalloc /__w/tarantool/tarantool/src/lib/small/small/small_asan.c:57:5
[050]     tarantool#3 0x55d69ce3782f in runtime_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:138:27
[050]     tarantool#4 0x55d69ce33fac in tuple_new /__w/tarantool/tarantool/src/box/tuple.h:801:9
[050]     tarantool#5 0x55d69ce34844 in box_tuple_new /__w/tarantool/tarantool/src/box/tuple.c:845:22
[050]     tarantool#6 0x55d69b523021 in session_settings_index_get /__w/tarantool/tarantool/src/box/session_settings.c:261:12
[050]     tarantool#7 0x55d69b284077 in index_get(index*, char const*, unsigned int, tuple**) /__w/tarantool/tarantool/src/box/index.h:909:9
[050]     tarantool#8 0x55d69b282794 in box_index_get /__w/tarantool/tarantool/src/box/index.cc:390:11
[050]     tarantool#9 0x55d6c685ea09  (<unknown module>)
[050]
[050] SUMMARY: AddressSanitizer: 5627 byte(s) leaked in 83 allocation(s).
[055] box-luatest/gh_8530_alter_space_snapshot_test.>               [ pass ]
```

1. https://github.com/tarantool/tarantool/actions/runs/10454868034/job/28948757147?pr=10431
2. https://github.com/google/sanitizers/wiki/AddressSanitizerFlags

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
0