According to the Wikipedia article on Spurious Wakeups
“a thread might be awoken from its waiting state even though no thread signaled the condition variable”.
While I’ve know about this ‘feature’ I never knew what actually caused it until, in the same article
“Spurious wakeups may sound strange, but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations.”
Sounds like a bug that just isn’t worth fixing, is that right?
1
TL;DR Assumption (“contract”) of spurious wakeups is a sensible architectural decision made to allow for realistically robust implementations of thread sheduler.
“Performance considerations” are irrelevant here, these are just misunderstanding that became widespread because of having stated in a published authoritative reference. (authoritative references might have errors, y’know – just ask Galileo Galilei) Wikipedia article keeps the reference to the note you quoted just because it perfectly matches their formal guidelines of citing the published reference.
Much more compelling reason for introducing concept of spurious wakeups is provided in this answer at SO that is based on additional details provided in an (older version) of that very article:
The Wikipedia article on spurious wakeups has this tidbit:
The
pthread_cond_wait()
function in Linux is implemented using thefutex
system call. Each blocking system call on Linux returns abruptly withEINTR
when the process receives a signal. …pthread_cond_wait()
can’t restart the waiting because it may miss a real wakeup in the little time it was outside thefutex
system call…
Just think of it… like any code, thread scheduler may experience temporary blackout due to something abnormal happening in underlying hardware / software. Of course, care should be taken for this to happen as rare as possible, but since there’s no such thing as 100% robust software it is reasonable to assume this can happen and take care on the graceful recovery in case if scheduler detects this (eg by observing missing heartbeats).
Now, how could scheduler recover, taking into account that during blackout it could miss some signals intended to notify waiting threads? If scheduler does nothing, mentioned “unlucky” threads will just hang, waiting forever – to avoid this, scheduler would simply send a signal to all the waiting threads.
This makes it necessary to establish a “contract” that waiting thread can be notified without a reason. To be precise, there would be a reason – scheduler blackout – but since thread is designed (for a good reason) to be oblivious to scheduler internal implementation details, this reason is likely better to present as “spurious”.
From thread perspective, this somewhat resembles a Postel’s law (aka robustness principle),
be conservative in what you do, be liberal in what you accept from others
Assumption of spurious wakeups forces thread to be conservative in what it does: set condition when notifying other threads, and liberal in what it accepts: check the condition upon any return from wait and repeat wait if it’s not there yet.
4
It isn’t worth fixing since caller code should use the same treatment (checking the condition) anyway, in order to deal with race condition.
One treatment for two issues, which I summarize by the following:
Spurious wakeup: waiting thread is scheduled before condition has been established. Forced oversleep: waiting thread is scheduled after condition has been falsified again.
Since the later might happen, some went as far as introducing spurious wakeup in the contract:
- to enforce good practices by requiring predicate loops.
- to give some liberty for scheduler implementation (including an emergency recovery option, as pointed by @gnat).
SO reference
2