Im making a diagnostic service that checks devices and saves results in .json files. I have a problem with checking multiple devices of the same type. One of them is a device that I connect to by ssh, execute some simple commands (cat, systemctl etc.) and the other is device that I connect to with websocket and wait for first message that should contain version string.
On my device (win 10) and on debian my service works as expected: for every device I get results in about 1-3 seconds (which is fine for my case). However on other debian device, when i run my service I get locked after 10-15 devices, and I have to wait for results for 30-45 seconds (which is not fine).
Some more info: all devices im checking are configured the same way. Only difference is the device im checking from. They are all connected to the same LAN network
Code for checking ssh devices:
DeviceStatus deviceStatus = await base.CheckDevice(cancellationToken);
if (!deviceStatus.Pingable)
return deviceStatus;
using var client = CreateSshClient();
await client.ConnectAsync(cancellationToken);
string? lcdVersion = await GetLcdAppVersion(client, cancellationToken);
deviceStatus.Components.Add(new("LcdApp", true, lcdVersion));
List<Task> tasks = new();
foreach (ServiceInfo service in _options.Value.LcdServices)
tasks.Add(Task.Run(async () =>
await GetServiceInfo(deviceStatus, service, cancellationToken, client)));
await Task.WhenAll(tasks);
client.Disconnect();
return deviceStatus;
Code for checking websocket devices:
DeviceStatus deviceStatus = await base.CheckDevice(cancellationToken);
if (!deviceStatus.Pingable)
return deviceStatus;
bool connected = false;
string? version = null;
Stopwatch watch = new();
watch.Start();
string address = "ws://" + _device.device_ip + ":" + (_device.device_port ?? 8882) + "/api";
using var ws = new ClientWebSocket();
try
{
var timedSource = new CancellationTokenSource();
timedSource.CancelAfter(TimeSpan.FromSeconds(5));
var source = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, timedSource.Token);
Task dcTask = Task.Run(async () =>
{
try
{
await Task.Delay(TimeSpan.FromSeconds(10), source.Token);
}
finally
{
if (ws.State == WebSocketState.Open)
{
_logger.LogDebug("{ip}: WebSocket timeout exceeded, disconnecting", _device.device_ip);
await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", cancellationToken);
}
if (!source.IsCancellationRequested)
source.Cancel();
}
});
await ws.ConnectAsync(new Uri(address), source.Token);
connected = ws.State == WebSocketState.Open;
if (!connected)
return deviceStatus;
var buffer = new byte[1024];
var result = await ws.ReceiveAsync(new ArraySegment<byte>(buffer), source.Token);
if (result.MessageType == WebSocketMessageType.Text)
{
var message = Encoding.UTF8.GetString(buffer, 0, result.Count);
//_logger.LogDebug("{ip}: Message received - {message}", _device.device_ip, message);
if (message.Contains("appInfo"))
{
JObject jObject = JObject.Parse(message);
AppInfo appInfo = jObject["appInfo"]!.ToObject<AppInfo>()!;
version = appInfo.versionString;
if (version is null && appInfo.version is not null)
{
version = string.Join(".", appInfo.version);
}
}
}
await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", source.Token);
source.Cancel();
}
catch (TaskCanceledException)
{
_logger.LogInformation("{ip} is taking too long to respond", _device.device_ip);
}
catch (OperationCanceledException)
{
_logger.LogInformation("{ip} is taking too long to respond", _device.device_ip);
}
catch (Exception e)
{
if (!e.Message.Contains("close handshake"))
_logger.LogDebug("Error while checking {ip}: {e}", _device.device_ip, e.ToString());
}
deviceStatus.Components.Add(new("WebSocket", connected, version));
return deviceStatus;
NOTE
Disconnect task doesnt seem to work all the time, it seems like its awaiting connection even though CancellationTokenSource is cancelled
I tried adding disconnect task, to cancel checking after 5 seconds. I also optimized for ssh devices, to create only one SshClient for each of them, and run commands on this single client. It worked on first two (win 10 and first debian) however the other debian had the same problem. I also tried setting ulimit -n
to 4096, but it didnt change the outcome.
ajklis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.