I have an C++ application client/server which communicates through NFS filesystem on two different machines.
The server appends the messages it wants to send in a file (messages.txt) and creates a dummy file (.dummy) to notify the client there are some new messages to read in messages.txt.
Then, regularly (every minute), the client checks the presence of this .dummy file. If it is present, it reads the new messages and deletes the .dummy file. We can consider that the server sends at least one message per minute.
This works globally fine but I had one case where the client never detected the presence of .dummy file (during +24h). When debugging this, a simple ‘ls’ (external to the application and performed on the server machine) in the directory made the client detect the presence of this dummy file and unblocked the communication.
I tried to reproduce this scenario but without success.
The detection of the dummy file is done with:
boost::filesystem::exists(dummyFile);
Which seems to be translated into: open(“/.dummy”, O_RDONLY)
So, I am wondering if it is a NFS cache issue (I am still not very clear about that – a clear explanation would be welcome) and if doing something as below would make the application more robust to detect the presence of the file on the client side:
auto files = boost::make_iterator_range(boost::filesystem::directory_iterator(dir), {});
for (const auto& entry : files) {
if (entry.path() == dummyFile) {
return true;
}
}
Which seems to be translated into:
openat(AT_FDCWD, "<dir>", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
getdents(3, [{d_ino=15851, d_off=10, d_reclen=24, d_name=".", d_type=DT_DIR}, {d_ino=67160780, d_off=12, d_reclen=24, d_name="..", d_type=DT_DIR}, {d_ino=468177, d_off=512, d_reclen=24, d_name=".dummy", d_type=DT_REG}], 32768) = 72
Also, on server side, if the server wants to send new messages and the .dummy file is already there (meaning that the client has already some messages to read), the .dummy file is left unchanged.
I am wondering if changing its attributes or deleting+recreating it would help.
6