Description
As discussed in #15355, TransferLease
is now sequenced with respect to concurrent replica changes by the command queue. RequestLease
has no such protection because it is evaluated on followers (unlike all other commands). It is instead guarded by the RaftCommand.ProposerLease
field, which ensures that the replica requesting the lease has an up-to-date view of the current lease. This allows for a race in which a replica is removed from the range immediately after it has attempted to take the lease. (This race is difficult to hit in practice because the range must be healthy to execute the ChangeReplicas transaction, but the current lease holder must be (or appear to be) unhealthy in order for a follower to attempt to grab the lease) When this occurs, we will hit the log.Fatal
which prevents ranges from getting stuck with a lease on a non-member store.
The simplest fix I see is to add a counter to roachpb.Lease
which is incremented on every replica change, so that the ProposerLease
will be seen as outdated when a RequestLease
crosses over a rebalance.
Jira issue: CRDB-44008