This is a self-answered question, as encouraged on Stackoverflow for sharing knowledge. As such, it is a follow-on to my previous self-answered question regarding accurate computation of the principal branch of the Lambert W function, W0. For some general background on the Lambert W function, please see there. When I initially looked at the Lambert W function, the negative branch W-1 did not seem particularly useful to me, but I have since come across a number of publications that make practical use of it, for example [1], [2], [3], [4].
The input domain for the negative branch W-1 of the real-valued Lambert-W function is very narrow: [-1/e, 0]. It is a somewhat challenging to compute accurately due to the branch point at -1/e and the asymptotic behavior near zero. How can one accurately computed the negative branch of the real-valued Lambert W function, W-1, using the ISO-C99 standard math library?
Accurate computation shall be defined as maximum error not exceeding 4 ulps, which is the same error bound that applies to the LA profile of the Intel math library, and which I have found to be a reasonable error bound for non-trivial transcendental functions in practical use. Support for IEEE-754 (2008) binary floating-point arithmetic and functionally-correct support for fused multiply-add (FMA) operations, accessible via the fma()
and fmaf()
standard library functions, can be assumed.
[1] A. C. Gormaz-Matamala, et al. “New Hydrodynamic Solutions for Line-driven Winds of Hot Massive Stars Using the Lambert W-function” The Astrophysical Journal, 920:64 (30pp), Oct. 2021
[2] Santiago Pindado, et al., “Simplified lambert w-function math equations when applied to photovoltaic systems modeling.” IEEE Transactions on industry applications, Vol. 57, No. 2, Jan. 2021, pp. 1779-1788
[3] Joaquín F. Pedrayes. et al., “Lambert W function based closed-form expressions of supercapacitor electrical variables in constant power applications.” Energy, Mar. 2021, 218:119364
[4] Armengol Gasull, et. al, “Maxima of Weibull-like distributions and the Lambert W function.” Test 24 (2015): 714-733
I will first present a single-precision implementation lambert_wm1f()
for float
mapped to IEEE-754 binary32
.
As in the computation of the principal branch, functional iteration is not numerically stable near the branch point at -1/e. Therefore, a polynomial minimax approximation p(t) with t=√(x+1/e) is used which was generated with the Remez algorithm.
For functional iteration, a starting approximation with a maximum relative error of 1.488e-4 from the literature is used. Just one iteration with quadratic convergence is required to reach full single-precision accuracy. I determined experimentally that the Newton iteration works best near zero, while use of the Iacono-Boyd iteration is advantageous over the balance of the input domain.
When built against Intel’s implementation of the standard C math library, the maximum error of the ISO C-99 implementation below is less than 2.5 ulp in an exhaustive test; a similar error bound should be achievable with any other high-quality implementation of the standard C math library. For an overview of the accuracy of commonly-used implementations see:
Brian Gladman, Vincenzo Innocente, John Mather, Paul Zimmermann, “Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision.” HAL preprint hal-03141101v6, Feb. 2024
/*
Compute the negative branch of the real-valued Lambert W function, W_(-1). Maximum
error when built against the Intel MKL: 2.41426181 ulps @ -7.32652619e-2
*/
float lambert_wm1f (float x)
{
const float c0 = 1.68090820e-1f; // 0x1.584000p-3
const float c1 = -2.96497345e-3f; // -0x1.84a000p-9
const float c2 = -2.87322998e-2f; // -0x1.d6c000p-6
const float c3 = 7.07275391e-1f; // 0x1.6a2000p-1
const float ln2 = 0.69314718f;
const float l2e = 1.44269504f;
const float em1_fact_0 = 0.625529587f; // exp(-1)_factor_0
const float em1_fact_1 = 0.588108778f; // exp(-1)_factor_1
float redx, r, s, t, w, e, num, den, rden;
if (isnan (x)) return x + x;
if (x == 0.0f) return -INFINITY;
if (x > 0.0f) return INFINITY / INFINITY; // NaN
redx = fmaf (em1_fact_0, em1_fact_1, x); // x + exp(-1)
if (redx <= 0.09765625f) { // expansion at -(exp(-1))
r = sqrtf (redx);
w = -3.30250000e+2f; // -0x1.4a4000p+8
w = fmaf (w, r, 3.53563141e+2f); // 0x1.61902ap+8
w = fmaf (w, r, -1.91617889e+2f); // -0x1.7f3c5cp+7
w = fmaf (w, r, 4.94172478e+1f); // 0x1.8b5686p+5
w = fmaf (w, r, -1.23464909e+1f); // -0x1.8b1674p+3
w = fmaf (w, r, -1.38704872e+0f); // -0x1.6315a0p+0
w = fmaf (w, r, -1.99431837e+0f); // -0x1.fe8ba6p+0
w = fmaf (w, r, -1.81044364e+0f); // -0x1.cf793cp+0
w = fmaf (w, r, -2.33166337e+0f); // -0x1.2a73f2p+1
w = fmaf (w, r, -1.00000000e+0f); // -0x1.000000p+0
} else {
/* Initial approximation based on: D. A. Barry, L. Li, and D.-S. Jeng,
"Comments on 'Numerical Evaluation of the Lambert Function and
Application to Generation of Generalized Gaussian Noise with
Exponent 1/2'", IEEE Transactions on Signal Processing, Vol. 52,
No. 5, May 2004, pp. 1456-1457
*/
s = fmaf (log2f (-x), -ln2, -1.0f);
t = sqrtf (s);
w = -1.0f - s - (1.0f / fmaf (exp2f (c2 * t), c1 * t,
fmaf (1.0f / t, c3, c0)));
if (x > -0x1.0p-116f) {
/* Newton iteration, for arguments near zero */
e = exp2f (fmaf (w, l2e, 32));
num = fmaf (w, e, -0x1.0p32 * x);
den = fmaf (w, e, e);
rden = 1.0f / den;
w = fmaf (-num, rden, w);
} else {
/* Roberto Iacono and John Philip Boyd, "New approximations to the
principal real-valued branch of the Lambert W function",
Advances in Computational Mathematics, Vol. 43, No. 6,
December 2017, pp. 1403-1436
*/
t = w / (1.0f + w);
w = fmaf (logf (x / w), t, t);
}
}
return w;
}
The double-precision implementation lambert_wm1()
for double
mapped to IEEE-754 binary64
is structurally very similar to the single-precision implementation. A minimax polynomial approximation is used near the branch point, and the balance of the input domain is covered via three rounds of Newton iteration. A custom exponentiation function with built-in scaling by powers of two is used, as well as a custom logarithm function, thus largely isolating this code from differences in standard C math library implementations. It is not feasible to test the double-precision implementation exhaustively; in one billion random trials all errors found were < 2.6 ulps.
double exp_scale_pos_normal (double a, int scale);
double log_pos_normal (double a);
/*
Compute the negative branch of the real-valued Lambert W function, W_(-1).
Max. err. found across 1B random trials: 2.56233 ulps @ -0.3678692378632617
*/
double lambert_wm1 (double z)
{
const double c0 = 1.6729676723480225e-1; // 0x1.569fbp-3;
const double c1 = -2.7966443449258804e-3; // -0x1.6e8fdp-9;
const double c2 = -2.1342277526855469e-2; // -0x1.5dac0p-6;
const double c3 = 7.0781660079956055e-1; // 0x1.6a66fp-1;
const double ln2 = 0.6931471805599453094172;
const double exp2_64 = 1.8446744073709552e+19; // 0x1.0p64;
const double em1_fact_0 = 0.57086272525975246; // 0x1.24481e7efdfcep-1 // exp(-1)_factor_0
const double em1_fact_1 = 0.64442715366299452; // 0x1.49f25b1b461b7p-1 // exp(-1)_factor_1
double redz, r, s, t, w, e, num, den, rden;
int i;
if (isnan (z)) return z + z;
if (z == 0.0) return -INFINITY;
if (z > 0.0) return INFINITY / INFINITY; // NaN
redz = fma (em1_fact_0, em1_fact_1, z); // z + exp(-1)
if (redz < 0.04296875) { // expansion at -(exp(-1))
r = sqrt (redz);
w = -3.1102051749530146e+3;
w = fma (w, r, 3.3514583413659661e+3);
w = fma (w, r, -2.0376505203571792e+3);
w = fma (w, r, 6.6470674321336662e+2);
w = fma (w, r, -1.9891092047488328e+2);
w = fma (w, r, 1.1792563777850908e+1);
w = fma (w, r, -1.6044530625408662e+1);
w = fma (w, r, -8.0468700837359766e+0);
w = fma (w, r, -5.8822749101364442e+0);
w = fma (w, r, -4.1741334842726321e+0);
w = fma (w, r, -3.0669009628894104e+0);
w = fma (w, r, -2.3535502058947735e+0);
w = fma (w, r, -1.9366311293653595e+0);
w = fma (w, r, -1.8121878855163436e+0);
w = fma (w, r, -2.3316439815975385e+0);
w = fma (w, r, -1.0000000000000000e+0);
return w;
} else {
/* Initial approximation based on: D. A. Barry, L. Li, D.-S. Jeng,
"Comments on 'Numerical Evaluation of the Lambert Function and
Application to Generation of Generalized Gaussian Noise with
Exponent 1/2'", IEEE Transactions on Signal Processing, Vol. 52,
No. 5, May 2004, pp. 1456-1457
*/
s = - fma (ln2, -64.0, log_pos_normal (-exp2_64 * z)) - 1.0;
t = sqrt (s);
w = -1.0 - s - (1.0 / fma (exp_scale_pos_normal (c2 * t, 0),
c1 * t, fma (1.0 / t, c3, c0)));
/* Newton iterations */
for (i = 0; i < 3; i++) {
e = exp_scale_pos_normal (w, 64);
num = fma (w, e, -exp2_64 * z);
den = fma (w, e, e);
rden = 1.0 / (den);
w = fma (-num, rden, w);
}
}
return w;
}
int double2hiint (double a)
{
unsigned long long int t;
memcpy (&t, &a, sizeof t);
return (int)(t >> 32);
}
int double2loint (double a)
{
unsigned long long int t;
memcpy (&t, &a, sizeof t);
return (int)(unsigned int)t;
}
double hiloint2double (int hi, int lo)
{
double r;
unsigned long long int t;
t = ((unsigned long long int)(unsigned int)hi << 32) | (unsigned int)lo;
memcpy (&r, &t, sizeof r);
return r;
}
/* exp(a) * 2**scale; pos. normal results only! Max. err. found: 0.89028 ulp */
double exp_scale_pos_normal (double a, int scale)
{
const double ln2_hi = 6.9314718055829871e-01; // 0x1.62e42fefa00000p-01
const double ln2_lo = 1.6465949582897082e-12; // 0x1.cf79abc9e3b3a0p-40
const double l2e = 1.4426950408889634; // 0x1.71547652b82fe0p+00 // log2(e)
const double cvt = 6755399441055744.0; // 0x1.80000000000000p+52 // 3*2**51
double f, r;
int i;
/* exp(a) = exp(i + f); i = rint (a / log(2)) */
r = fma (l2e, a, cvt);
i = double2loint (r);
r = r - cvt;
f = fma (r, -ln2_hi, a);
f = fma (r, -ln2_lo, f);
/* approximate r = exp(f) on interval [-log(2)/2,+log(2)/2] */
r = 2.5022018235176802e-8; // 0x1.ade0000000000p-26
r = fma (r, f, 2.7630903497145818e-7); // 0x1.28af3fcbbf09bp-22
r = fma (r, f, 2.7557514543490574e-6); // 0x1.71dee623774fap-19
r = fma (r, f, 2.4801491039409158e-5); // 0x1.a01997c8b50d7p-16
r = fma (r, f, 1.9841269589068419e-4); // 0x1.a01a01475db8cp-13
r = fma (r, f, 1.3888888945916566e-3); // 0x1.6c16c1852b805p-10
r = fma (r, f, 8.3333333334557735e-3); // 0x1.11111111224c7p-7
r = fma (r, f, 4.1666666666519782e-2); // 0x1.55555555502a5p-5
r = fma (r, f, 1.6666666666666477e-1); // 0x1.5555555555511p-3
r = fma (r, f, 5.0000000000000122e-1); // 0x1.000000000000bp-1
r = fma (r, f, 1.0000000000000000e+0); // 0x1.0000000000000p+0
r = fma (r, f, 1.0000000000000000e+0); // 0x1.0000000000000p+0
/* exp(a) = 2**(i+scale) * r */
r = hiloint2double (double2hiint (r) + ((i + scale) << 20),
double2loint (r));
return r;
}
/* compute natural logarithm of positive normals; max. err. found: 0.86902 ulp*/
double log_pos_normal (double a)
{
const double ln2_hi = 6.9314718055994529e-01; // 0x1.62e42fefa39efp-01
const double ln2_lo = 2.3190468138462996e-17; // 0x1.abc9e3b39803fp-56
double m, r, i, s, t, p, q;
int e;
/* log(a) = log(m * 2**i) = i * log(2) + log(m) */
e = (double2hiint (a) - double2hiint (0.70703125)) & 0xfff00000;
m = hiloint2double (double2hiint (a) - e, double2loint (a));
t = hiloint2double (0x41f00000, 0x80000000 ^ e);
i = t - (hiloint2double (0x41f00000, 0x80000000));
/* m now in [181/256, 362/256]. Compute q = (m-1) / (m+1) */
p = m + 1.0;
r = 1.0 / p;
q = fma (m, r, -r);
m = m - 1.0;
/* Compute (2*atanh(q)/q-2*q) as p(q**2), q in [-75/437, 53/309] */
s = q * q;
r = 1.4794533702196025e-1; // 0x1.2efdf700d7135p-3
r = fma (r, s, 1.5314187748152339e-1); // 0x1.39a272db730f7p-3
r = fma (r, s, 1.8183559141306990e-1); // 0x1.746637f2f191bp-3
r = fma (r, s, 2.2222198669309609e-1); // 0x1.c71c522a64577p-3
r = fma (r, s, 2.8571428741489319e-1); // 0x1.24924941c9a2fp-2
r = fma (r, s, 3.9999999999418523e-1); // 0x1.999999998006cp-2
r = fma (r, s, 6.6666666666667340e-1); // 0x1.5555555555592p-1
r = r * s;
/* log(a) = 2*atanh(q) + i*log(2) = ln2_lo*i + p(q**2)*q + 2q + ln2_hi * i.
Use K.C. Ng's trick to improve the accuracy of the computation, like so:
p(q**2)*q + 2q = p(q**2)*q + q*t - t + m, where t = m**2/2.
*/
t = m * m * 0.5;
r = fma (q, t, fma (q, r, ln2_lo * i)) - t + m;
r = fma (ln2_hi, i, r);
return r;
}