I’ve a job in Databricks that runs multiple tasks (some in parallel, others sequentially) and currently, I’m sending telemetry information using OpenCensus.
As support for OpenCensus will end on 30 September, I’m doing the transition to Azure Monitor OpenTelemetry Python Distro and with my current changes, not all telemetry events are sent.
I’ve installed the following libraries using pip:
azure-monitor-events-extension==0.1.0
azure-monitor-opentelemetry==1.6.1
In functions notebook, I’ve the following functions:
def get_applicationinsights_connection_string():
return dbutils.secrets.get(get_parameter('subscription_key'), 'application-insights-connection-string')
def create_azure_connection():
connection_string = get_applicationinsights_connection_string()
configure_azure_monitor(connection_string=connection_string)
def send_custom_event(event_name, message_dict):
print(f'Tracking "{event_name}"')
create_azure_connection()
track_event(event_name, message_dict)
Then, in each task of the job, I call the function send_custom_event(event_name, message_dict), so that the telemetry of each task is sent to the table customEvents in Application Insights.
The issue that I’m facing is that not all events are sent. Sometimes I have received the event for the first task but not for some of the parallel tasks. Other times I have not received the event of the first task nor some of the parallel tasks.
Why is this happening? Is there a way to do flush() to force the event to be sent? That option was available when using OpenCensus and the sending of events works perfectly.
The issue you’re encountering with telemetry events being lost when sending them to Application Insights from Databricks using OpenTelemetry (specifically azure-monitor-opentelemetry library) is likely related to how telemetry data is being buffered and sent asynchronously. In OpenCensus, you had the ability to call flush()
to ensure events were sent before the program terminated, but OpenTelemetry manages telemetry data differently and requires explicit handling to ensure that data is flushed properly before the program ends.
Avoid repeated connection creation in every send_custom_event
call. Initialize the connection once at the beginning of your notebook or job and reuse it. This will reduce overhead and potential issues with telemetry being dropped.
azure-monitor-opentelemetry
uses opentelemetry.sdk.trace.export.BatchSpanProcesso
r, which has a force_flush()
method you can invoke to flush all pending telemetry data.
0
Finally, I solved my problem using force_flush()
, as helped here: https://github.com/Azure/azure-sdk-for-python/issues/37228
1