Open
Description
状況
Frugalos 0.10.2 https://github.com/frugalos/frugalos/releases/tag/0.10.2 を用いて比較的規模の大きなクラスタを運用している状況で、あるfrugalosサーバを再起動(stopした後にstartする)した際に、以下の104行から107行でのassertに失敗し、raft nodeが停止した:
Lines 102 to 110 in c0bd2d8
注意 frugalos_raft
が関連する話であり、調査内容によってはfrugalos側のissueに移動する可能性がある。
計算フロー
raftlog/src/node_state/loader.rs
Lines 48 to 71 in c0bd2d8
Loaderでログ末尾まで読み込み、
common.handle_log_appended(&suffix)
を呼び出してその処理中で問題が起こっている。
CRIT Node down: Other (cause; assertion failed: `self.last_record().head.prev_term < tail.prev_term`)
HISTORY:
[0] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/log/history.rs:104
[1] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/common/mod.rs:78
[2] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/loader.rs:58
[3] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/node_state/mod.rs:113
[4] at /home/vagrant/.cargo/registry/src/github.com-1ecc6299db9ec823/raftlog-0.4.0/src/replicated_log.rs:260 -- node=Node { id: NodeId("b503.ba@*.*.*.*:14278"), role: Follower, ballot: Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") } }
[5] at frugalos_mds/src/node/node.rs:730
[6] at frugalos_segment/src/service.rs:265
[7] at frugalos_segment/src/service.rs:280
, node: b503, module: frugalos_segment::service:282
当該ノードが直前まで行っていた計算の流れ
INFO Add a node: bucket_no=0; segment_no=181; device_no=186; device_id=DeviceId("frugalos15_05"); node=NodeId { local_id: Loc
alNodeId("b503"), instance: 186, addr: V4(*.*.*.*:14278) }; , module: frugalos::service:219
INFO [START] LoadBallot: lump_id=LumpId("0000000000b503000000000000000000"); , node: b503, module: frugalos_raft::storage::ballot:21
INFO [FINISH] LoadBallot: ballot=Some(Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") }); , node:b503, module: frugalos_raft::storage::ballot:42
INFO [START] LoadLogPrefixIndex: lump_id=LumpId("0000000000b503020000000000000000"); , node: b503, module: frugalos_raft::storage::log_prefix::load:87
INFO [START] LoadLogPrefix, node: b503, module: frugalos_raft::storage::log_prefix::load:23
INFO [START] LoadLogSuffix: head=LogPosition { prev_term: Term(0), index: LogIndex(0) }; , node: b503, module: frugalos_raft:
INFO New raft election term: ballot=Ballot { term: Term(315), voted_for: NodeId("b501.120@*.*.*.*:14278") }, node: b503, module: frugalos_mds::node::node:442
INFO [FINISH] LoadLogPrefixIndex: index=Some(2840..2841); , node: b503, module: frugalos_raft::storage::log_prefix::load:113
INFO [START] LoadLogPrefixBytes: prefix_index=2840..2841; , node: b503, module: frugalos_raft::storage::log_prefix::load:135
INFO [PROGRESS] LoadLogPrefixBytes: index=2840; lump_id=LumpId("0000000000b503030000000000000b18"); , node: b503, module: frugalos_raft::storage::log_prefix::load:175
INFO [FINISH] LoadLogPrefixBytes: bytes.len()=1617414; , node: b503, module: frugalos_raft::storage::log_prefix::load:165
INFO [FINISH] LoadLogPrefix: prefix.tail=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; prefix.config=Clust
erConfig { new: {...}, old: {}, state: Sta
ble }; bytes.len()=1617414; , node: b503, module: frugalos_raft::storage::log_prefix::load:58
INFO Event::LogPrefixUpdated: self.log_suffix.head=LogPosition { prev_term: Term(0), index: LogIndex(0) }; new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; , node: b503, module: frugalos_raft::storage:222
INFO [START] LoadLogSuffix: head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }; , node: b503, module: frugalos_raft::storage::log_suffix:25
INFO New snapshot is loaded: new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }, bytes=1617139, node: b503, module: frugalos_mds::node::node:450
INFO Snapshot decoded: 1617139 bytes, node: b503, module: frugalos_mds::node::node:460
INFO Snapshot decoded: new_head=LogPosition { prev_term: Term(308), index: LogIndex(28405496) }, node: b503, module: frugalos_mds::node::node:693
INFO
62D1
[FINISH] LoadLogSuffix: self.current=LogIndex(28405504); , node: b503, module: frugalos_raft::storage::log_suffix:48
疑問
Lines 95 to 101 in c0bd2d8
ここが実行されてしまうと、
self.last_record().head.prev_term == tail.prev_term
が成立しないか?Metadata
Metadata
Assignees
Labels
No labels