CPython implementation detail: In CPython, due to the Global
Interpreter Lock, only one thread can execute Python code at once
(even though certain performance-oriented libraries might overcome
this limitation). If you want your application to make better use of
the computational resources of multi-core machines, you are advised to
use multiprocessing. However, threading is still an appropriate model
if you want to run multiple I/O-bound tasks simultaneously.
-> threadingmultiprocessing is a package that supports spawning processes using an
API similar to the threading module.
-> multiprocessing
All modern PC processors are multicore. What are the downsides to spawning new processes, instead of threads? If they are not significant enough, why does the threading module even exist?
tl;dr
- threads are fast, cheap and lightweight in principle, but achieving all 3 without sacrificing safety is hard work
- processes are slower, costlier and heavyweight in principle, but in practice they’re so much easier to use it often doesn’t matter
Threads exist inside a process, sharing a common address space.
Thread pros:
- it’s generally cheaper (in time and memory) to create a thread than to fork a new process, because whatever book-keeping overhead your OS keeps per process (eg. page tables, file descriptor tables, process table entry) already exists and doesn’t have to be copied
- inter-thread communication is free in the sense that you don’t have to send stuff through a socket or other kernel interface, so it is fast
Thread cons:
-
inter-thread communication is free in the sense that it’s unregulated, so it’s very easy to cause data races, deadlocks and livelocks. This means you need to carefully craft some synchronization regime for your application, which can easily eat all the notional benefits of sharing your address space: the GIL is a case in point.
-
because your OS probably manages resources at the process level, a fatal operation in one thread will kill the whole process
Processes are isolated from each other, with independent address spaces and OS resources.
Process cons:
- it’s generally more expensive to fork a new process than to create a new thread
- inter-process communication (IPC) is relatively expensive, since you generally need a kernel mechanism to copy data from one process to another
Process pros:
- IPC via the kernel means you have no race conditions, and there is no synchronization to design, implement, test and enforce. The simplicity is often worth more than the notional speed (and complexity) of copy-free access with threads.
- because processes are isolated, one can’t accidentally kill another (except by explicitly sending a signal) or damage another process’ memory
2