8000 instance will got deleted when one nacos server down · Issue #1873 · alibaba/nacos · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

instance will got deleted when one nacos server down #1873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EZLippi opened this issue Sep 25, 2019 · 3 c 8000 omments
Closed

instance will got deleted when one nacos server down #1873

EZLippi opened this issue Sep 25, 2019 · 3 comments

Comments

@EZLippi
Copy link
Contributor
EZLippi commented Sep 25, 2019

instance will got deleted when one nacos server down

Describe what happened (or what feature you want)

as we know ,nacos servers use distroMapper to sharding the register/heartbeat requests,only one nacos server hold the latest heartbeat timestamp of a service instance,if some nacos server shutdown for some reason for over 30 seconds, within 30 seconds ,the client cannot send heartbeat to nacos because other server treat the dead sever still alive, will redirect heartbeat request to the dead server,after 30 seconds ,other servers treat the dead server as unhealthy,distroMapper will reSharding, so the ClientBeatCheck task will mark the services earlier processed by the dead sever unHealthy, because the heartbeat timestamp is too old.

the delete log is as follow:
019-09-25 13:40:08.763 INFO [ClientBeatCheckTask] [AUTO-DELETE-IP] service: tracklist-listenlist-service:thrift, ip: {"app":"","clusterName":"default-cluster","enabled":true,"ephemeral":true,"healthy":false,"instanceHeartBeatInterval":29217,"instanceHeartBeatTimeOut":87651,"instanceId":"172.28.7.26#9161#default-cluster#tracklist-listenlist-service:thrift","ip":"172.28.7.26","ipDeleteTimeout":146085,"lastBeat":1569388927431,"marked":false,"metadata":{"preserved.heart.beat.timeout":"87651","version":"3.0.17","createTimestamp":"1567762329690","preserved.heart.beat.interval":"29217","preserved.ip.delete.timeout":"146085"},"port":9161,"serviceName":"tracklist-listenlist-service:thrift","tenant":"","weight":1000}

although the client will send heartbeat to the new server and reDeregister,but consumer will find no active provider for about 5 seconds.

Describe what you expected to happen

@nkorange
Copy link
Collaborator
nkorange commented Sep 25, 2019

Now the expire timeout marking Nacos server unhealthy is already an entry in SwitchDomain, but not configurable via API.

Since instance unhealthy timeout and heartbeat interval are configurable, Nacos server unhealthy timeout should also be configurable and the default value should be less than
<instance unhealthy timeout> - <heartbeat interval>.

@EZLippi
Copy link
Contributor Author
EZLippi commented Sep 25, 2019

this will not work for me, i replace the default udp push + timer pull to http long polling,so if the server change a instance to unHealthy, the client will know immediately

@nkorange
Copy link
Collaborator

this will not work for me, i replace the default udp push + timer pull to http long polling,so if the server change a instance to unHealthy, the client will know immediately

The change of the behaviors of subscriber doesn't influence the providers heartbeat and how Nacos server handles the expired instances.

xingguangsixian added a commit to xingguangsixian/nacos that referenced this issue Dec 7, 2019
* fix: closes alibaba#1569

* fix bug

* build main

* alibaba#1529
distro 使用本地时间记录心跳

* Fix alibaba#1591

* Add unit tests for common.GroupKey and utils.MD5

Fully-qualified classname
com.alibaba.nacos.client.config.common.GroupKey
com.alibaba.nacos.client.config.utils.MD5

These tests were written using Diffblue Cover.

* Fix alibaba#1591

* feat: support change password

* upgrade the fastjson version

* Fix namespace vulnerability

* 修复alibaba#1583

* no message

* 编译 main.*

* no message

* fix build

* fix FE

* 后端支持

* npm build

* fix CI

* 没有过滤条件直接返回原始Service列表

* add refresh memory

* remove unness const

* no message

* clear code

* build console

* revert error code

* Remove unnecessary code

* Delete the code that caused the list multi-element

* fix bug

* Unified constant use

* reverse `Service Name` and `Group Name`

* Fix bug

* Update version to 1.1.2-SNAPSHOT

* fix: font privatization

* Subject to the actual startup context path

* if not set the context path with the WebServerInitializedEvent then real '/' is context path

* RunningConfig support get from spring.properties configuration file

* Update version to 1.1.2

* Update version to 1.1.3

*
9D60
 Add unit tests for com.alibaba.nacos.config.server.utils.GroupKey

These tests were written using Diffblue Cover

* Use dynamic server version

* 1. Optimize log printing
2. Improve the robustness and readability of your code

* support datum is null case

* repair httpGetLarge#httpGetLarge will call entity.getContentType().getElements() the contentType is NPE

* Normalize http response entity with ResponseEntity by spring

* feat:

* cluster conf support multi instance inline seperator with ','

* add comma division with some case to use

* add comma division with some case to use

* add comma division with some case to use

* resolve conflict

* fix: fix alibaba#1733

* 页面修复

* use API to create param

* use API to create param

* [ISSUE] alibaba#1671 Unified request header "Client-Version"

* [ISSUE] alibaba#1671 Unified request header "Client-Version"

* 🐛 remove server.contextPath

* Update service description error in Open API Guide

* fix: fix alibaba#1665

* fix alibaba#1764.

* Compatible with older versions

* [Issue] alibaba#1769 Solve the bug of the clone configuration function

* Fix alibaba#1764

* [Issue] alibaba#1769 Solve the bug of the clone configuration function

* [Issue] alibaba#1769 Solve the bug of the clone configuration function

* fix: closes alibaba#1759

* fixed(cluster): fixed raft cluster state

* chore(cluster): delete no used note

* alibaba#1507 close server from current dir

* repair speel error and add debug log

* Fix alibaba#1621

* fix alibaba#1609

* fix alibaba#1609

* Make error information more specifically

* feature(triggerFlag): add triggerFlag for service

* feture(triggerFlag): add frontend show triggerFlag

* style: Modifiers should be declared in the correct order;Map init 设置大小 避免扩容;

* style: Modifiers should be declared in the correct order;Map init 设置大小 避免扩容;

* refactor: 局部变量是线程安全的;
          urlPattern改成static final;
          GroupKey重构

* improve(triggerFlag): add pre check for triggerFlag

* refactor: IO改成try resource。instanceList循环直接改成addAll

* chore(triggerFlag): adjust some details

* improve(instanceHealth): add update logic

* feat:

* Avoiding conflicted for creating directory.

* improve(triggerFlag): adjust triggerFlag calculation chance

* clean controllers code

* chore(reiggerFlag): delete unused function

* merge

* improve(triggerFlag): improve instance health flag

* Add synchronized when add/remove instance

* Update jackson version, see https://nvd.nist.gov/vuln/detail/CVE-2019-16335

* Fix alibaba#1874

* alibaba#1873, set default server expire timeout to 10 seconds and configurable.

* fix bug alibaba#1775

fix bug alibaba#1775

* build fe

* Clean up redundant StringUtils

* bug fix 1841

bug fix alibaba#1841

* fix alibaba#1916

* Bump netty-all from 4.0.42.Final to 4.1.42.Final

Bumps [netty-all](https://github.com/netty/netty) from 4.0.42.Final to 4.1.42.Final.
- [Release notes](https://github.com/netty/netty/releases)
- [Commits](netty/netty@netty-4.0.42.Final...netty-4.1.42.Final)

Signed-off-by: dependabot[bot] <support@github.com>

* alibaba#1409 Introduce MCP server

* alibaba#1409 gRPC server tuned OK.

* alibaba#1409 Update from Nacos

* alibaba#1409 Fix PMD

* Fix alibaba#1906

* fix the getting subscribers error

* Support unique instance index for each registered server

Signed-off-by: dizhe <vettal.wd@alibaba-inc.com>

* 创建字符串不需要带入双引号,否则调用时会报Error:Unable access jarfile

* Support snowflake instance id generator

Signed-off-by: Vettal Wu <vettal.wd@alibaba-inc.com>

* Fix test case error.

* clean code

* fix the CI errors

* remove the useless code that make ci errors

* Fix findbugs

* Add switch to turn on/off MCP server

* Update version to 1.1.4

* issues:调用修改实例接口,未传的参数值会被清空
alibaba#1957

* Change MCP service port to 8848

* Fix add metadata method NPE.

* Fix PMD

* fix alibaba#1947

* alibaba#1947 add test cases.

* issues:调用修改实例接口,未传的参数值会被清空
alibaba#1957

* issues:调用修改实例接口,未传的参数值会被清空
alibaba#1957

* [alibaba#2006] change to throw NacosException to make client handle the right Exception case

* make RequestVote RPC handler thread-safe

* Modify the string splicing method of getgroupedname()

* Remove debug option in startup script

* fix alibaba#2000

* alibaba#2018 Close inpuststream instead of connection.

* fix alibaba#1842

* fix alibaba#1858

* Fix client beat task executing when health check is disabled.

* refactor(client/config): increase the client's read timeout

In order to prevent the server from handling the delay of the client's long task, increase the
client's read timeout to avoid this problem.

* refactor(client/config): update timeout compute

* fix startup for java 11

* remvoe classpath

* Bump jackson-databind from 2.9.10 to 2.9.10.1

Bumps [jackson-databind](https://github.com/FasterXML/jackson) from 2.9.10 to 2.9.10.1.
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

Signed-off-by: dependabot[bot] <support@github.com>

* add toUpperCase

* optimize: the Boolean.parseBoolean(String s) method should be used when converting a String to a Boolean type

* fix 2025

* fix default value of database field

* add client context path config

* 修复 alibaba#2098

* add nacos console cors

* format code

* Update version to 1.2.0-SNAPSHOT

* Fix close connection exception.

* Fix alibaba#2123

* Fix alibaba#2020

* Fix alibaba#2123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0