We have fresh mongodb PSA cluster version 4.4 and it behaves very strange. It consists of two servers (same configuration, same OS, bare metal) and arbiter hosted on a weaker virtual machine. We use Grafana to check cluster health and telegraph for gathering some metrics. One of them is io_time (value counts the number of milliseconds during which the device has had I/O requests queued
). And this metric (asPercent(perSecond(*.*.diskio.io_time),1000)
) shows huge disk utilization on the secondary node (green) compared to primary (orange). We use secondary node only for replication purposes so no queries here. Primary workload contains a lot of multi updates, deletes, bulk writes. So it is understandable that secondary node should sync all these changes and high disk usage may be expected. But the difference seems abnormal – it is around 4x bigger than on primary node.
Another my observations: CPU load almost the same, disk space used the same, memory usage the same. I ran iotop (around 1m) on the secondary node when grafana shows high usage % and it didn’t look bad at all, IO% from mongo process was around 5-10% with few spikes to 80-90% for a second or two. And there is no non-mongo processes with relatively high IO%.
What can be a reason of such nonsense?