I have heard it said that managing concurrent TCP connections using user-space coroutines uses less resources per open connection than using an OS thread per connection.
Actually, OS threads and coroutines are essentially different ways of representing the same thing: a combination of control and state.
An OS thread stores:
- A pc counter (representing control)
- A stack pointer + other registers (representing state)
A coroutine requires:
- A function pointer (representing control)
- An environment pointer (representing state)
Why should we believe user-space coroutines to be a more resource-efficient approach?
1
It’s about scheduling overhead, and about how some solutions fit specific problems better than others.
Scheduling is the activity of deciding who is executing right now, and switching between processes/threads. Cooperative scheduling is simple to implement, and requires that each participating thread must yield to the scheduler when a sensible pause state is reached. Imagine a few threads in a discussion:
A: So, how was your day? YIELD
B: Today I visited a zoo. YIELD
A: YIELD
B: There were kangaroos and llamas. YIELD
A: Can you pass me the butter? YIELD
B: Here you are 🙂 YIELD
Of course, there’s now a problem when process B
never yields:
A: So, how was your day? YIELD
B: Today I visited a zoo and there were like kangaroos and llamas and all kinds of animals and stuff and…
(A tries to make themselves noticed and raises a hand, but B never stops)
B: … their fluffy fur and I took tons of pictures and do you want to see them here they are this is me next to a …
(A would really like that butter, but B still won’t stop)
B: … and here I made a selfie with an orang-utan and look how it …
(at this point, A starts plotting to kill B)
Cooperative scheduling only works when we can rely on everyone to yield regularly to give others a chance to execute as well. That works well for a single program that was designed by a single organization which has control over all parts and is aware of what specific soft real-time requirements have to be met. But for most operating systems, programs can be written by third parties, and these might forget to yield regularly. Therefore, we build a preemptive scheduler that yields for them. Ideally, the conversation works like this:
A: So, how was your day? dum dee dum
Scheduler: NEXT
B: Today I visited a zoo. There were
Scheduler: NEXT
A: dum dee dum twiddly thumb
Scheduler: NEXT
B: kangaroos and llamas. dum dee dum
Scheduler: NEXT
A: Can you pass me the butter? dum dee dum
Scheduler: NEXT
B: Here you are 🙂 dum dee dum
However, the scheduler might interrupt all processes when they are in the middle of a thought. Afterwards they have to remember where they left off.
A: So, how was your
Scheduler: NEXT!
A: I was asking: How was your day?
Scheduler: NEXT!
B: Today I visited a
Scheduler: NEXT!
B: A zoo. I visited a zoo. And there were
Scheduler: NEXT!
B: Will you let me talk! There were kangaroos and
Scheduler: NEXT!
B: Where was I? There were kangaroos and llamas.
Scheduler: NEXT!
etc.
The scheduler slows things down: Not only does scheduling take time that could otherwise have been used productively, it also requires the state of the thread to be restored. That involves setting the CPU registers to their previous values from values in memory, requires the instruction pointer to be reset and the CPU instruction pipeline to be flushed. And when execution continues, the CPU chache will contain useless data from other threads, so reading from memory will be slow.
These context switches are what makes preemptive scheduling so terribly expensive. So while preemptive context switches are bad, they are better than the alternative: a computer that freezes up because some badly-written process forgets to yield. This can be adjusted by changing the frequency of interrupts, but a lower frequency also means the computer takes longer to react to input etc.
2
All thing being equal, user space will have fewer layer of abstraction to work through. OS thread will require relatively slow calls through the kernel. User space, done correctly will avoid these abstractions. This results in the opportunity for higher performance.
1