I have a pipeline built with the Python SDK running on Dataflow using the Streaming Engine.
I noticed in the beam sources that apache_beam/runners/worker/worker_status.py exposes plenty of diagnostics that I would be very much interested in:
- thread dump
- heap dump
- state cache stats
- active processing bundle state
I’d love to see this information, but I haven’t found anything on how to enable these Worker Status metrics and then access them via an external client.