I am currently working on a numerical processing system that will be deployed in a performance-critical environment. It takes inputs in the form of numerical arrays (these use the eigen
library, but for the purpose of this question that’s perhaps immaterial), and performs some range of numerical computations (matrix products, concatenations, etc.) to produce outputs.
All arrays are allocated statically and their sizes are known at compile time. However, some of the inputs may be invalid. In these exceptional cases, we still want the code to be computed and we still want outputs not “polluted” by invalid values to be used.
To give an example, let’s take the following trivial example (this is pseudo-code):
Matrix a = {1, 2, NAN, 4}; // this is the "input" matrix
Scalar b = 2;
Matrix output = b * a; // this results in {2, 4, NAN, 8}
The idea here is that 2, 4 and 8 are usable values, but the NAN should signal to the recipient of the data that that entry was involved in an operation that involved an invalid value, and should be discarded (this will be detected via a std::isfinite(value)
check before the value is used).
Is this a sound way of communicating and propagating unusable values, given that performance is critical and heap allocation is not an option (and neither are other resource-consuming constructs such as boost::optional
or pointers)?
Are there better ways of doing this? At this point I’m quite happy with the current setup but I was hoping to get some fresh ideas or productive criticism of the current implementation.
1
This is a completely reasonable way to go. Note also that there are multiple bit masks that are interpreted as NaN. There are two main types of NaN: signalling (which can throw an exception when they are created, depending on your settings) and quiet (which never do). Even within quiet NaN’s however, there are multiple bit masks that correspond to quiet NaN. If you want to get really into it, you can create your own NaN’s that are distinct from the regular NaN’s. For instance, you could use a specific bit pattern that would correspond to NA (which is a different concept from NaN).
As for your point about pollution. This gets much trickier. Generally, any mathematical operation involving NaN results in NaN. In other words, NaN is contagious. In some cases, this is what you want. In others, it’s not. For instance, suppose you were asked for the mean of the vector you gave. Is it NaN, or 7/3? The scalar * vector product you gave, however, will work out exactly the way you want, and you don’t need to do any std::isfinite
check either. Just multiply the numbers and the NaN pops out automatically, so it’s quite performant. If you want to get a mean of 7/3 however for your vector, you need to be more clever, because doing it naively will result in NaN. While I can’t tell you how to do a fast implementation of this, numpy has one, and its open source, so you can look at that.
Sounds fine by me as long as you’ve got your floating point model fixed, and you have said NaN semantics.
IEEE 754 is sufficient in your case.
http://en.m.wikipedia.org/wiki/Single-precision_floating-point_format
https://stackoverflow.com/questions/5777484/how-to-check-if-c-compiler-uses-ieee-754-floating-point-standard
1