You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, rev.Spec.TimeoutSeconds also specifies the timeout for in-flight request.
I think these two values should be seperated, because in my project, I want to terminate deployment without graceful exit, but I want the timout for in-flight request to be as long as possible.
The text was updated successfully, but these errors were encountered:
I want to terminate deployment without graceful exit, but I want the timout for in-flight request to be as long as possible.
Hi @V2arK, this was added years ago so there is a guarantee about connections not to be dropped during autoscaling. The knative autoscaler continuously makes decisions about the deployment scale and that may interrupt connections during pod shutdown.
Could you elaborate on your use case, you don't care about failing requests?
Hi @skonto, in my uses cases I just want to terminates the pods ASAP (maybe 3~5 seconds) when I triggers the termination, but not to change the timeout for requests (eg, LLM spits out response in minutes),
How do you trigger that? You are removing the knative service? Could you just drop the connection when a SIGTERM is received at the LLM python runtime or at the client side too? If you interrupt the connections, draining will happen pretty fast as QP will not wait for them to finish.
Describe the feature
Right now
TerminationGracePeriodSeconds
is set torev.Spec.TimeoutSeconds
https://github.com/knative/serving/blob/main/pkg/reconciler/revision/resources/deploy.go#L304
However,
rev.Spec.TimeoutSeconds
also specifies the timeout for in-flight request.I think these two values should be seperated, because in my project, I want to terminate deployment without graceful exit, but I want the timout for in-flight request to be as long as possible.
The text was updated successfully, but these errors were encountered: