Fix the recovery issue in kubernetes #126
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The fred.rs sometimes can't recovery when connecting to a redis cluster inside kubernetes. This issue could be reproduced using following command in k8s:
kubectl scale statefuleset redis --replicas 0
kubectl scale statefuleset redis --replicas 8
When redis cluster recovered, fred.rs sometimes will failed to connect. I've implement the DNS interface. Through the logs, it seems fred.rs still connects to old IPs of the redis pods and tries to get the redis cluster nodes but got NULL.
The fix of this issue is to release the backchannel if the result is null, in this case, fred.rs will use the configured redis URL, (in our case it is the FQDN of the k8s service), and do DNS query. The result of the query will be one of the new IPs of the new redis Pod. And then fred.rs could establish a new connection to the new IP to get the up-to-date cluster nodes.
I've implement this change in our testing tools, and I've tested it with traffic for a long time. It always recovers now.