I am currently deploying an Apache Flink cluster on Kubernetes and I want to set up monitoring using Prometheus and Grafana to gain insights into the cluster’s performance and resource usage.
I have deployed the deployment with this manifest:https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/custom-resource/overview/
Now I want to set up monitoring of taskmanager-, jobmanager and operator specific metrics with prometheus and grafana on my k8s-setup.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
namespace: default
name: flink-cluster
spec:
image: flink:1.19
flinkVersion: v1_19
flinkConfiguration:
taskmanager.numberOfTaskSlots: "1"
kubernetes.operator.metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
kubernetes.operator.metrics.reporter.prom.port: "9999"
serviceAccount: flink
jobManager:
resource:
memory: "1048m"
cpu: 1
taskManager:
resource:
memory: "1048m"
cpu: 1
I tried to deploy prometheus as described in this tutorial: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/ and also deployed the pod-monitor.yaml as described in the tutorial. But I couldn’t select any Flink specific metrics in the query. I looked at the logs of the pod-monitor pod, it seemed to do something, but I couldn’t verify that metrics are really sent.
I would be very thankful for your suggestions or ideas how to get the grafana/prometheus setup to work. I am new to k8s and I am getting a bit overwhelmed by all the operators, etc.
Thanks for your help.
oak26 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.