I’m managing one server on k8s which serves HTTP API consuming quite a long time to respond to it.
Pods are deployed as
StatefulSet
and use
RollingUpdate
as an update strategy.
Also, the type of service is
LoadBalancer
.
For the maintenance, when I update my server,
the pod should wait for all the requests to be responded before exiting
. (I mean graceful shutdown.)
I read following articles:
Pod Lifecycle | Kubernetes
Kubernetes best practices: terminating with grace | Google Cloud Blog
After I read them, my understanding about the process of pod termination is here:
Change the status of pod to
Terminating
and remove it from service endpoints.
: When pod comes to this status,
LoadBalancer
doesn’t send new requests to this pod.
Execute
preStop
phase if available.
Send
SIGTERM
to pod.
Wait for pod terminates before
terminationGracePeriod
.
If
terminationGracePeriod
is expired, send
SIGKILL
to pod.
At step 1, I thought that
LoadBalancer
will not send new requests to this pod, but also it will
NOT
disconnect the connections which are established before this step.
However, in my environment, it
closes
all the client connections and clients get
connection reset by peer
error.
On the server side, the server isn’t aware of it and it tries to write a response to the closed connection and is blocked.
Regardless of the process of termination, I’m experiencing the same thing when I just make the pod fail the readiness probe while processing the requests from the clients.
I’m using the internal k8s platform in my company and I asked the same issue to its managers.
They said closing client connections when the pod is removed from service endpoints is the
official spec of k8s
.
However, I think keeping connections and letting the pod handle them gracefully are more reasonable.
Could you guys please confirm
whether it is truly a spec of k8s or not
?
There are several docs which say pods will not receive new connections in
Terminating
or
Not-Ready
status, but it is hard to find an official doc that says already established connections will be closed or not.
Also could you guys suggest some points or ways that I or our platform managers can try on settings of k8s to slove this issue?
Thanks!
Cluster information:
Kubernetes version: v1.15.10
I’m sorry but, as I’m using the internal k8s platform in my company as I said above, the detailed cluster information is invisible to me.
From the
Pod Lifecycle
you’ve provided:
Pods that shut down slowly cannot continue to serve traffic as
load balancers
(like the service proxy)
remove the Pod from the list of endpoints
as soon as the termination grace period
begins
.
As the pod is removed as a valid endpoint, your client gets a
connection reset by peer
.
I am no developer, but regarding the
12 factor app
Processes
shut down gracefully when they receive a
SIGTERM
signal from the process manager. For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost.
Your description seems to fit in the “long polling” scenario described here, so maybe the application can be updated to retry the un-processed request (on a different pod).
Best regards,
The unfortunate answer is that it was under-defined. Both behaviors exist.
With the rise of EndpointSlice, we have more metadata to work with, and sig-net is discussing what the ideal behavior should be. That said, we can’t just expect everyone to change their implementations over night. There’s got to be some amount of “implementation defined” freedom.
In MY opinion, connections MUST survive while an endpoint is marked as terminating but MAY be killed when an endpoint is removed. To do that cleanly, we have open KEPs to track that intermediate state.
Hi, Xavi and Thockin.
Thank you for answering my question.
It looks controversial to say removing a pod from endpoints implies closing the client connections.
I tested the same server in the other k8s environment, but it didn’t close the client connections when the pod was removed from endpoints.
Therefore, I think saying the answer is
undefined or under-defined
is correct as thockin said.
I’m not sure which part of the environment decides the behavior, but I hope someday there is an option in k8s to choose behavior explicitly or can add hooks like
preStop
before removing pods from endpoints.
For now, I think I have to find other ways to avoid this issue.
Thanks a lot!
I tested the same server in the other k8s environment, but it didn’t close the client connections when the pod was removed from endpoints.
Today we just don’t spec that, and so implementations do what they want. But also, today we do not disitinguish “this endpoint exists but is terminating” from “this endpoint doesn’t exist”. Once we have that, I think implementations can be smarter.
I’m not sure which part of the environment decides the behavior, but I hope someday there is an option in k8s to choose behavior explicitly
It’s a combination of the service proxy (kube-proxy, usually) and the LB implementation. I don’t want to add parameters here, but as I said - I think more metadata will allow better impl choices. Coming soon.
Do you have any links to these?
We’ve built an update recently on the assumption that in-progress connections would not be closed and they would be given some time to complete, so this has caught us out. It’s fair to say we made a mistake in our understanding, but it was also a “reasonable” assumption for connections to stay open, when phrasing like “remove the Pod from the list of endpoints” is used. If there have been any updates that might allow us to configure how this works, that would be extremely helpful. Or any suggested alternative solutions.
We’re using k8s version 1.29.1 (DigitalOcean’s DOKS).
Thank you. Any help will be much appreciated.