PBS job terminates prematurely, resulting in partial execution of its underlying scripts when it encounters an orphaned Docker container
I have an application running every 5 minutes which utilizes Docker and PBS for its execution.
The application is a python script which is executed via PBS script in a docker container.
This is the PBS script which uses rm_test docker image for executing _regmod_realtime_forecast.py script
The architecture works fine most of the time but, Once in a blue-moon, the PBS comes across an orphan container. In this situation, The pbs terminates prematurely by partially executing the underlying python script.
PBS logs of an irregular execution
PBS logs of a healthy execution
If we observe closely, the resource used in both the scenarios imply the abnormal behavior when the PBS comes across an orphan docker container.