-
Notifications
You must be signed in to change notification settings - Fork 5.5k
any way to graceful exit tornado application? #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
https://gist.github.com/nicky-zs/6304878 here is one example using the method. some time main_ioloop._timeouts always have new _Timeout instance, we have to wait a long time for the _timeouts queue empty. |
I came here today to ask the exact same question. For after much reading in SO I came up with this example:
It has two problems:
.. which I think means that the Second problem is with AsyncHandler, where request is killed and client receives Remote Disconnected error. It would be very nice to get an |
You definitely shouldn't be looking at Or you can do what I do and just stop the IOLoop 5 seconds after the stop is requested. This will be enough for any regular request to finish, and if it's taking longer than 5 seconds you probably want to let it fail anyway. It's not worth making a more precise measurement to stop in less than 5 seconds. |
This seems like a question that is asked frequently enough that an example solution should be included in the user's guide section of the docs. I would consider adding something to either the Running and deploying section or the Structure section. Ben, which do you think makes the most sense? |
Set a timeout for stop |
Yeah, I think adding something to "Running and deploying" makes sense. There are two tricky things in writing this up: A) The right way to do this depends heavily on your deployment environment (how your load balancers do health checks and/or retries), and B) Convincing people who want to wait for all operations to finish that it's better to just use a timeout. On that latter point: There must be some maximum time that you're willing to wait for an operation to finish, even if it's several minutes. At that point you have to be willing to cut them off or they'll keep consuming resources indefinitely. Once you've decided how long you're willing to wait for the last client operation to finish, you've already committed the resources to keep the old server processes around for that long, so why not leave them running for that duration in every case? Trying to track the number of operations remaining and stopping the server precisely when the count reaches zero is not worth the trouble IMHO. |
You are right that there are plenty of caveats to how to really do a graceful shutdown depending on the deployment strategy. But for the guide, maybe it would suffice to show a simple example case (such as what you mentioned where you stop the IOLoop 5 seconds later to give requests time to finish) while explaining that this is not the only way, but that it at least works for simple setups. I'll take a stab at this in the next few days and maybe we can try to iterate through a good solution via a pull request. |
It would be nice if we could specify what tornado should do on a SIGTERM. This would make it quiet easy to gracefully stop it inside kubernetes by having an health handler which will start to become "faulty" as soon as .inShutdown() returns true. With this I can ensure that I take the service out of the loadbalancing without killing any active connections (if they don't take too long) and not loosing some because kubernetes is not taking it out of LB instantaneously (by design to not have one faulty response kicking the service out). |
We use Ubuntu's upstart to run our services. While terminating the service, I expect the behavior to be consistent across most of the *nix systems. It would be nice it Tornado can support these two signals by default along with a custom set of signals. |
Tornado works correctly under the upstart configuration you described - it dies by default when it gets SIGTERM. What would you like it to do differently? This always seems to be application- and deployment-dependent (especially with respect to how your load balancers do health checks and retries), but whatever behavior you want can be obtained by defining your own signal handlers. |
@bdarnell Thanks for your quick response. I think the reason it didn't work well for us is because we are using an older version of Tornado (2.1.1). Once I implemented my own signal handler for SIGTERM, things started working. |
How to shutdown tornado 5.1 gracefully? |
Here's how I do it (currently with tornado-4.5.3 but I expect it will work the same with tornado-5.1): async def shutdown():
periodic_task.stop()
http_server.stop()
for client in ws_clients.values():
client['handler'].close()
await gen.sleep(1)
ioloop.IOLoop.current().stop()
def exit_handler(sig, frame):
ioloop.IOLoop.instance().add_callback_from_signal(shutdown)
...
if __name__ == '__main__':
signal.signal(signal.SIGTERM, exit_handler)
signal.signal(signal.SIGINT, exit_handler)
... (instead of just |
Yeah, something like @ploxiln's approach is what I'd do. I don't think there have been any noteworthy changes between Tornado 4 and 5 here. It all depends on what "gracefully" means for your application. (One missing piece in the snippet above is that you probably want to signal to your load balancer somehow to stop the incoming traffic). |
@ploxiln Hi mate, would you mind describing what |
sure, ws_clients = [
{
'handler': WebSocketHandler(...),
'tags': ...
},
...
] I had a WebSocketHandler which, for each websocket client which connected, would add "self" to this list. On close, it would find and remove "self" from this list. On shutdown, all still-open websocket connections could be cleanly closed in this way (in addition to other plain http requests having a second to finish up, while no new plain http or websocket requests are possible). |
Linking to this gist in case anyone comes across this issue: https://gist.github.com/wonderbeyond/d38cd85243befe863cdde54b84505784 This works for me. |
This is what I'm doing on Firenado: https://github.com/candango/firenado/blob/develop/firenado/launcher.py#L159-L324 |
We managed to gracefully shutdown tornado by implementing the following steps:
See https://github.com/svaponi/tornado-graceful-shutdown/blob/main/server.py |
there are many gist show how to graceful exit tornado application like this:
it's safy to test io_loop._timeouts or _callbacks ? or _timeouts can be ignored?
The text was updated successfully, but these errors were encountered: