I’m reading this guide about network programming, which I’m liking a lot: https://beej.us/guide/bgnet/html/split/slightly-advanced-techniques.html#serialization
I’m confused about something though. In this section about serialization, he talks about serializing ints for byte-ordering reasons, which makes sense to me, but he also includes these two functions pack754 and unpack754 for serializing floats in IEEE-754 format.
uint64_t pack754(long double f, unsigned bits, unsigned expbits)
{
long double fnorm;
int shift;
long long sign, exp, significand;
unsigned significandbits = bits - expbits - 1; // -1 for sign bit
if (f == 0.0) return 0; // get this special case out of the way
// check sign and begin normalization
if (f < 0) { sign = 1; fnorm = -f; }
else { sign = 0; fnorm = f; }
// get the normalized form of f and track the exponent
shift = 0;
while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;
// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);
// get the biased exponent
exp = shift + ((1<<(expbits-1)) - 1); // shift + bias
// return the final answer
return (sign<<(bits-1)) | (exp<<(bits-expbits-1)) | significand;
}
long double unpack754(uint64_t i, unsigned bits, unsigned expbits)
{
long double result;
long long shift;
unsigned bias;
unsigned significandbits = bits - expbits - 1; // -1 for sign bit
if (i == 0) return 0.0;
// pull the significand
result = (i&((1LL<<significandbits)-1)); // mask
result /= (1LL<<significandbits); // convert back to float
result += 1.0f; // add the one back on
// deal with the exponent
bias = (1<<(expbits-1)) - 1;
shift = ((i>>significandbits)&((1LL<<expbits)-1)) - bias;
while(shift > 0) { result *= 2.0; shift--; }
while(shift < 0) { result /= 2.0; shift++; }
// sign it
result *= (i>>(bits-1))&1? -1.0: 1.0;
return result;
}
What I’m confused about is that these functions work by looking at the first bit for the sign, then the next X bits for the exponent, then the next Y bits for the mantissa. So doesn’t that mean the float has to already be in IEEE-754 format on the host machine for this to work?
Is this just here to explain the format, or is this something you would actually do in real life?
8
Is Serializing Floats Necessary for Cross-Platform Network Code?
Yes. FP encoding has many variations across implementations including variations is size, endian, precision ,exponent range, sub-normal support (and possible even base).
So doesn’t that mean the float has to already be in IEEE-754 format on the host machine for this to work?
No, the pack/unpack will “work” (see following problems) even if long double
is not IEEE.
Is this just here to explain the format, or is this something you would actually do in real life?
Looks like learner code. I would not use the provided pack/unpack code, given its weaknesses (below) and especially the 2 very inefficient while
loops. Loops may iterate thousands of times with binary128.
The code is a hole-riddled attempt to pack an arbitrary encoded long double
into an IEEE binary64. It fails for values near 0.0, rounding, handle overflow and infinity/NAN well.
pack754()
has at least these short-comings:
-
if (f == 0.0) return 0;
loses information during serialization as it returns 0 for both +0.0 and -0.0. When testing the FP sign bit, do not useif (f < 0)
, butif (signbit(f))
to well extract the sign bit even iff
is zero or NAN. -
long double
may be more than 64 bits souint64_t pack754(long double f, unsigned bits, unsigned expbits)
loses info in trying to pack into 64-bits. I suppose OP is tolerating this info loss. -
1LL<<significandbits
is UB on overflow (significandbits >= 63
).1ULL<<significandbits
has some advantage, yet overflow (significandbits >= 64
) remains a problem. -
Using
float
math with the laterlong double
math is short sighted.((1LL<<significandbits) + 0.5L)
makes a little more sense. -
Rather than
while(fnorm >= 2.0)
like code, uselong double frexpl(long double value, int *p)
to extract a normalized value and exponent. Uselong double ldexpl(long double x, int p)
to re-combine.while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
risks an infinite loop whenfnorm
is infinity. -
+ 0.5f
for rounding has many corners issues. Better to uselround()
and friends. -
…
For simple cross platform exchange of FP values, I’d consider sprintf(buf, "%La", x)
as a first step to pack and strtold()
to unpack.
Packing a FP into a tight intN_t
and maintaining precision/range faithfulness across many computer implementations are competing goals.
Which is more important: faithful conversions or small packet size?
Most systems I’ve worked with prize faithful conversions over small packet size.
Packing a long double
, for portability, into a 64-bit is simply an unwise design.
2