8000 [Aeron Cluster] Consensus Module snapshots are different across nodes · Issue #1739 · aeron-io/aeron · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Aeron Cluster] Consensus Module snapshots are different across nodes #1739
Closed
@zyulyaev

Description

@zyulyaev

For the context: as part of the Aeron Cluster monitoring we compare snapshots made at the same log position. This allows us to detect divergence of the state in case of bugs which introduce non-deterministic logic.

From time to time we've been detecting consensus module producing different snapshots on different nodes. The difference is in the nextSessionId field. After some investigation I found that ConsensusModuleAgent#nextSessionId on the leader is updated at the same time as adding the "session open" message to the log, while on the followers it is updated when it reaches the "session open" message.

Consider following scenario:

  1. A snapshot command is issued
  2. Leader node adds the snapshot message to the log
  3. A new client is connected
  4. Leader node increments the nextSessionId and adds the "session open" message to the log
  5. Nodes reach the snapshot message and take a snapshot (at this point leader and followers have different nextSessionId)
  6. Followers reach the "session open" message in the log and increment the nextSessionId (now all nodes have same nextSessionId)

Is it expected that nodes in the cluster may have different consensus module snapshots? Or should the leader write the same nextSessionId value as a follower would?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0