Our custom IoTEdge module was functioning correctly until recently, when it started experiencing loss of connectivity, with our ConnectionStatusChangeHandler reporting status “ConnectionStatus.Disconnected” and reasons “ConnectionStatusChangeReason.Retry_Expired” and
ConnectionStatusChangeReason.Communication_Error”.
The handler triggers re-initialization of the client as per recommended practices, but the loss of connectivity is continuous, there are no intermittent successful connections to the hub that would indicate a transient condition.
We checked One thing that stands out is that the EdgeHub module twin of the device does not list the custom module in its reported properties’ “clients” section, which seems to be related to the outage. Only the IoTEdgeMetricsCollector module is listed, which is connected to IoTHub and functioning normally.
}, "clients": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "deviceId/IoTEdgeMetricsCollector": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "status": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" }, "lastConnectedTimeUtc": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" } } }
It seems that the local EdgeHub module is ignoring our custom module.
Environment
- Host: Debian 11, Arm64
- aziot-edged: 1.4.27
- Edge Agent @ 1.4.35
- Edge Hub @ 1.4.35
- Docker/Moby: 24.0.9-1
Steps to repro
Unfortunately we are unable to reproduce the issue. We are not sure what started it, but we suspect that an accidental change was saved to the edgeHub’s module twin using Azure IoT Explorer.
What we tried so far :
-
Removed and re-added the custom module to the device from the Azure portal, and also through a layered deployment
-
Rebooted the gateway, restarted system modules, ran “iotedge system restart”
-
Deployed the SimulatedTemperatureSensor module on the device; it correctly registers with EdgeHub’s clients and functions correctly, so new modules other than our own seem unaffected
-
The “iotedge check” command returns all green with a couple warnings on package versions, no errors.
-
No errors in the EdgeHub or EdgeAgent logs after restarting
The custom module is successfully deployed, starts up and runs, but fails to send telemetry downstream, since any attempt to connect to the EdgeHub fails with aforementioned ConnectionStatus and ConnectionStatusChangeReason.
- We also gathered traces from inside the custom module: we saw a WebSocket exception when the module attempts to connect to the hub, this is to be expected since EdgeHub which acts as a proxy to the custom module is unaware of its presence:
HasStack=”True” ThreadID=”6,006″ ProcessorNumber=”0″ thisOrContextObject=”ErrorDelegatingHandler#34683734″ memberName=”ExecuteWithErrorHandlingAsync” message=”Exception caught: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server —> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception. —> System.IO.IOException: Received an unexpected EOF or 0 bytes from the transport stream. at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm) at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken) — End of inner exception stack trace — at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request) at System.Threading.Tasks.TaskCompletionSourceWithCancellation
1.WaitWithCancellationAsync(CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options) at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options) at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.CreateClientWebSocketAsync(Uri websocketUri, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.CreateClientWebSocketTransportAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpIotTransport.InitializeAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpIotConnector.OpenConnectionAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.EnsureConnectionAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.OpenSessionAsync(IDeviceIdentity deviceIdentity, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpUnit.EnsureSessionIsOpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.AmqpIot.AmqpUnit.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpTransportHandler.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.ProtocolRoutingDelegatingHandler.OpenAsync(CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.<>c__DisplayClass27_0.<<ExecuteWithErrorHandlingAsync>b__0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func
1 asyncOperation)”
We thought to fully uninstall and reinstall the IoTEdge runtime on the device, but we would like to exhaust all other options before going there.
At this point we are running out of ideas on how to attack this issue, any help would be appreciated.