The global interpreter lock (GIL) seems to be often cited as a major reason why threading and the like is a touch tricky in Python – which raises the question “Why was that done in the first place?”
Being Not A Programmer, I’ve got no clue why that might be – what was the logic behind putting in the GIL?
9
There are several implementations of Python, for example, CPython, IronPython, RPython, etc.
Some of them have a GIL, some don’t. For example, CPython has the GIL:
From http://en.wikipedia.org/wiki/Global_Interpreter_Lock
Applications written in programming languages with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL.
Benefits of the GIL
- Increased speed of single-threaded programs.
- Easy integration of C libraries that usually are not thread-safe.
Why Python (CPython and others) uses the GIL
- From http://wiki.python.org/moin/GlobalInterpreterLock
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe.
The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
- From http://www.grouplens.org/node/244
Python has a GIL as opposed to fine-grained locking for several reasons:
-
It is faster in the single-threaded case.
-
It is faster in the multi-threaded case for i/o bound programs.
-
It is faster in the multi-threaded case for cpu-bound programs that do their compute-intensive work in C libraries.
-
It makes C extensions easier to write: there will be no switch of Python threads except where you allow it to happen (i.e. between the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros).
-
It makes wrapping C libraries easier. You don’t have to worry about thread-safety. If the library is not thread-safe, you simply keep the GIL locked while you call it.
The GIL can be released by C extensions. Python’s standard library releases the GIL around each blocking i/o call. Thus the GIL has no consequence for performance of i/o bound servers. You can thus create networking servers in Python using processes (fork), threads or asynchronous i/o, and the GIL will not get in your way.
Numerical libraries in C or Fortran can similarly be called with the GIL released. While your C extension is waiting for an FFT to complete, the interpreter will be executing other Python threads. A GIL is thus easier and faster than fine-grained locking in this case as well. This constitutes the bulk of numerical work. The NumPy extension releases the GIL whenever possible.
Threads are usually a bad way to write most server programs. If the load is low, forking is easier. If the load is high, asynchronous i/o and event-driven programming (e.g. using Python’s Twisted framework) is better. The only excuse for using threads is the lack of os.fork on Windows.
The GIL is a problem if, and only if, you are doing CPU-intensive work in pure Python. Here you can get cleaner design using processes and message-passing (e.g. mpi4py). There is also a ‘processing’ module in Python cheese shop, that gives processes the same interface as threads (i.e. replace threading.Thread with processing.Process).
Threads can be used to maintain responsiveness of a GUI regardless of the GIL. If the GIL impairs your performance (cf. the discussion above), you can let your thread spawn a process and wait for it to finish.
25
First off: Python doesn’t have a GIL. Python is a programming language. A programming language is a set of abstract mathematical rules and restrictions. There is nothing in the Python Language Specification which says that there must be a GIL.
There are many different implementations of Python. Some have a GIL, some don’t.
One simple explanation for having a GIL is that writing concurrent code is hard. By placing a giant lock around your code, you force it to always run serially. Problem solved!
In CPython, in particular, one important goal is to make it easy to extend the interpreter with plugins written in C. Again, writing concurrent code is hard, so by guaranteeing that there will be no concurrency, it makes it easier to write extensions for the interpreter. Plus, many of those extensions are just thin wrappers around existing libraries which may not have been written with concurrency in mind.
7
What is the purpose of a GIL?
The CAPI documentation has this to say on the subject:
The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.
In other words, the GIL is prevents corruption of state. Python programs should never produce a segmentation fault, because only memory safe operations are permitted. The GIL extends this assurance to multi-threaded programs.
What are the alternatives?
If the purpose of the GIL is to protect state from corruption, then one obvious alternative is lock at a much finer grain; perhaps at a per object level. The problem with this is that although it has been demonstrated to increase the performance of multi-threaded programs, it has more overhead and single-threaded programs suffer as a result.
3