I am trying to implement graceful shutdown in our spring boot 2 application to remove 5xx errors when scale down of pods occurs
I have set
management:
endpoints:
jmx:
exposure:
include: "*"
web:
exposure:
include: "*"
endpoint:
shutdown:
enabled: true
server.shutdown: graceful
inside my application.yaml
file and I am using /actuator/health
endpoint as my liveness and readiness probe inside my deployment.yaml
file.
I have set a sleep period of 50 seconds inside preStopHook and my terminationGracePeriod is 120 seconds
My deployment.yaml file
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","sleep 50;","rm -rf /mnt/bixby/log/$HOSTNAME"]
terminationGracePeriod: 120
livenessProbe:
httpGet:
port: 8080
path: /actuator/health
initialDelaySeconds: 45
periodSeconds: 5
readinessProbe:
httpGet:
port: 8080
path: /actuator/health
initialDelaySeconds: 45
periodSeconds: 5
The weird thing is this thing is working in one environment (let’s call it dev1) but not in other (dev2).
When I delete a pod and when I send a curl request to /actuator/health endpoint it returns {"status":"DOWN","groups":["liveness","readiness"]}
in dev1 but it doesn’t returns any response in dev2 env. I get the following output curl: (52) Empty reply from server
after 50 seconds which is also my prestopHook
sleep time. And weirdly our own custom health check endpoint which we used earlier returns correct response during this period
After that if i do a curl request again i get following output
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /actuator/health HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Connection: keep-alive
< Content-Length: 0
Also strangely in dev1 env where graceful shutdown seems to work properly the pod exits after 50 seconds which is weird since terminationGracePeriod is 120 seconds
I have added the following predestroy code in my spring boot application
@PreDestroy
public void tearDown() {
logger.log("Application shutting down");
}
All the configuration, docker image of spring boot application, kubernetes version in both env are same.
I have also added this swagger configuration for actuator to work together with swagger
@Bean // This bean is required when you want swagger and actuator to work together
public static BeanPostProcessor springfoxHandlerProviderBeanPostProcessor() {
return new BeanPostProcessor() {
@Override
public Object postProcessAfterInitialization(Object bean, String beanName)
throws BeansException {
if (bean instanceof WebMvcRequestHandlerProvider
|| bean instanceof RequestHandlerProvider) {
customizeSpringfoxHandlerMappings(getHandlerMappings(bean));
}
return bean;
}
private <T extends RequestMappingInfoHandlerMapping> void customizeSpringfoxHandlerMappings(
List<T> mappings) {
List<T> copy = mappings.stream().filter(mapping -> mapping.getPatternParser() == null)
.collect(Collectors.toList());
mappings.clear();
mappings.addAll(copy);
}
@SuppressWarnings("unchecked")
private List<RequestMappingInfoHandlerMapping> getHandlerMappings(Object bean) {
try {
Field field = ReflectionUtils.findField(bean.getClass(), "handlerMappings");
field.setAccessible(true);
return (List<RequestMappingInfoHandlerMapping>) field.get(bean);
} catch (IllegalArgumentException | IllegalAccessException e) {
throw new IllegalStateException(e);
}
}
};
}
A few things: Springboot provides liveness and readiness probes, so you don’t have to use /actuator/health
In your application.yaml, do:
management:
health:
livenessState:
enabled: true
readinessState:
enabled: true
Then in your deployment yaml:
livenessProbe:
httpGet:
port: 8080
path: /actuator/health/liveness
initialDelaySeconds: 45
periodSeconds: 5
readinessProbe:
httpGet:
port: 8080
path: /actuator/health/readiness
initialDelaySeconds: 45
periodSeconds: 5
Also, the pod exiting after 50 seconds is correct. If graceful shutdown only takes 50 seconds, there is no reason for the pod to wait beyond that. If it is always waiting 120 seconds, then graceful shutdown is not happening properly. You should see “Graceful shutdown completed” at the end of the log so verify it like that.
As for the graceful shutdown working in 1 env but not the other, that is indeed strange. Is the deployment file used the exact same one? How are you starting the application? If you want to really verify that the shutdown is working, you best option is to look at the application log.
8