I am using MPI to message pass data values, some of them with a lot of decimal places, and I need them all to be accurate to around 6-8 decimal places. I have several implementations of my program (python, c++ with openmp, cuda) that I have run on several different systems and architectures, so I already know what the data values should be and that it is possible to achieve these exact values with different libraries on different architectures.
When using MPI on a single node of my cluster system, I get the exact values; however, when I use 2 nodes I get the errors starting with the 4th decimal place (so it’s only accurate for the first 3 decimal places). All nodes are the same architecture and were bought and installed at the same time, so they are not different except for standard production differences. The specific MPI call is MPI_Reduce() and it’s called three times. I’m assuming the errors are caused by rounding or truncating during the message pass over the network, but my question is how do I prevent this from happening?
Thanks!
What I’m expecting:
- I’m expecting accuracy up to 6-8 decimal places when using MPI to message pass over the network between nodes.
What I tried:
-
Using 1 node, which gave correct accuracy.
-
Casting everything to long doubles and using MPI_LONG_DOUBLE.
theradora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.