Problem statement: I want to visualize granular kubernetes pod boot events, ideally in Gantt-like format, in order to help users debug why their pod is slow to transition from the Pending
phase to Running
phase, but have not yet formulated a strategy with my current toolkit.
Greetings! I’ve noticed significant variance pod boot time in one of my applications, by nearly 30+s between some pods. I am also a Prometheus user, and a user of the popular kube-state-metrics
package. kube-state-metrics
provides a very helpful metric called kube_pod_status_phase
, which I can use in PromQL/Grafana to visualize pods during their Pending
phase. Currently, I have a query where each pod gets a single horizontal line on the x-axis (time!) corresponding to Pending
. The wider/longer the line, the slower the pod is to boot. Pending
is a very coarse single dimension metric for trying to understand pod boot. The work that occurs during the pending phase is vast. Image downloads, configmap prep, init container work, and standard container work. All of this is summed up in just a single status.
It would be amazing if I could somehow generate a Gantt chart that shows the milestones of K8S events for a pod, and then facet by say, head(pods, n)
, getting a handful of Gantts for each pod boot.
Current state: “Boot time is observed to be slow as evidenced by long horizontal lines for pods in the Pending
state”
Future state: “Boot time is observed to be slow as evidenced by a long horizontal Gantt-bar in my pod, representing one of my Pod’s specific boot-time tasks.”
I suppose I’m asking for a lot :), but surely some clever folks have formed such a solution!
Share with me what you’ve done with Grafana, promql, …or even just kubectl and charting tools to visualize slow boot times!