How is async logic implemented natively without threads? What would be the high level structure of the system?
Is it just a separate OS thread that gets and pushes requests and results in 2 queues?
I keep reading about a state machine and event loop when implementing async but I’m not quite sure about the general structure.
EDIT: Rethinking my question after reading the answer below, I realize what I might be asking is how something like io_uring is implemented in linux.
Thank you in advance,
3
Well, “async” is an umbrella term and different pieces of async, over different runtimes are implemented in different ways.
For network i/o typically such runtime would take advantage of some async API exposed by the OS. So for example on linux systems you would have epoll syscalls, which are widely used by pretty much any relevant async runtime out there. Given a list of file descriptors epoll waits until one or more of the descriptors is ready to read/write. The OS does not spawn threads that separately wait for that. Instead it is the network device/card that sends notifications to the kernel whenever it receives data, and the kernel interprets the data and wakes up appropriate file descriptor. This kind of interrupts has nothing to do with threads and is a normal thing that CPU does all the time (e.g. whenever you move mouse or type in keyboard), regardless of how many threads it supports.
But there are other pieces of async that do run over threads. For example libuv library, which serves as the fundement behind Google’s V8 engine, i.e. Google’s JavaScript, runs all file i/o as sync calls over threads. There are reasons for that, you may want to read this: https://stackoverflow.com/questions/68092226/is-the-libuv-thread-pool-used-for-async-file-i-o-in-node-js-why
Other async calls, like for example all sorts of delays/timeouts can be implemented in various ways. Maybe using epoll (which does support timeouts), maybe in a different way. So as you can see the answer is not that simple. But the general idea is: you need some kind of support from the OS to enable true async. And it has nothing to do with threads.
5
In the context of io_uring, it’s important to recognize that block device I/O operations are inherently asynchronous: requests are sent over PCIe or whatever bus, and some time later a response is received. It’s natural to keep queues of requests waiting to be sent, and of requests already sent and waiting a response.
Exposing this to userspace has historically been hard because we expect kernel code to carefully track state, but want userspace code to use nice procedural abstractions.
This raw state (queued requests and pending completions) is what io_uring exposes directly. The fact that it’s so manual is the reason this style isn’t more popular outside the kernel.
How is async logic implemented natively without threads? What would be the high level structure of the system?
There are two approaches that could be described as “async” that don’t require threads: co-operative and pre-emptive.
-
Co-operative multitasking, or co-routines.
Doesn’t require much language, runtime etc. support, but does require a huge amount of discipline from every developer without that support.
Every task, at least every one that might either be slow or might need to perform some blocking system call, must be coded so that it can explicitly yield control back to the scheduler so that another task can make progress.
One way to do this is to express every task as a finite-state machine, because you can just advance every FSM a single step in turn.
Another – much easier – is with explicit language support such as Python coroutines.
-
Pre-emptive multitasking
In this case you just let code run without explicitly yielding or awaiting, but use an interrupt to park it and switch control to another task.
The same blocking calls that would interact with a co-operative scheduler (eg. await-able ones) will now have hooks which give the pre-emptive scheduler the chance to run a different task while this one would be blocked anyway.
One advantage over co-operative multi-tasking is that you’re less vulnerable to being blocked forever by a badly-behaved task (it will get interrupted eventually). Another is that the tasks themselves don’t have to do anything special to save their execution context.
The disadvantage is that that execution context still needs to be managed, and now you have to be very conservative (preserving the stack, all registers, everything). This is relatively heavy-weight.
This is effectively how threads are implemented anyway, either by the kernel, in userspace when you don’t have kernel support, or historically when kernel threads were considered too expensive or a limited resource. These days most systems would just use kernel threads and be done with it.
For most purposes you’d consider #2 to be a threading model where you write regular procedural code with blocking calls, even if they’re userspace threads, and #1 to be async.
It’s also possible to use explicitly non-blocking I/O calls like in the UNIX synchronous multiplexing style. Technically you’re not using async, but it’s practically very similar to explicit co-operative multi-tasking.