Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?
A. Check the dashboard application to see if it is not displaying correctly.
B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output. Most Voted (74%)
C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages. Most Voted (17%)
D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.
Option “D” intrigued me. Is there a difference in terms of data loss between pushing data versus pulling data?
Also it seems to me that messages are sent at least once, so the data is either sent to Dataflow or remains in Pub/Sub. In this case, we can only know if all messages have been acknowledged by checking the Stackdriver monitoring on Cloud Pub/Sub, which makes “C” a valid option for me.
“B” is not incorrect neither but I prefer to start with “C”.
Can anyone help clarify why option “C” might not be sufficient, given that most votes favor option “B”?