In this talk, Guido van Rossum is talking (27:30) about attempts to write a compiler for Python code, commenting on it saying:
turns out it’s not so easy to write a compiler that maintains all the
nice dynamic typing properties and also maintains semantic correctness
of your program, so that it actually does the same thing no matter
what kind of weirdness you do somewhere under the covers and actually
runs any faster
What are the (possible) challenges related to typing in writing a compiler for a dynamically typed language like Python?
11
You oversimplified Guido’s statement in phrasing your question. The problem isn’t writing a compiler for a dynamically-typed language. The problem is writing one that is (criteria 1) always correct, (criteria 2) keeps dynamic typing, and (criteria 3) is noticeably faster for a significant amount of code.
It’s easy to implement 90% (failing criteria 1) of Python and be consistently fast at it. Similarly, it’s easy to create a faster Python variant with static typing (failing criteria 2). Implementing 100% is also easy (insofar implementing a language that complex is easy), but so far every easy way to implement it turns out to be relatively slow (failing criteria 3).
Implementing an interpreter plus JIT that’s correct, implements the entire language, and is faster for some code turns out to be feasible, though significantly harder (cf. PyPy) and only so if you automate the creation of the JIT compiler (Psyco did without it, but was very limited in what code it could speed up). But note that this is explicitly out of scope, as we’re talking about static (aka ahead-of-time) compilers. I only mention this to explain why its approach does not work for static compilers (or at least there’s no existing counterexample): It first has to interpret and observe the program, then generate code for a specific iteration of a loop (or another linear code path), then optimize the hell out of that based on assumptions only true for that specific iteration (or at least, not for all possible iterations). The expectation is that many later executions of that code will also match the expectation and thus benefit from the optimizations. Some (relatively cheap) checks are added to assure correctness. To do all this, you need an idea of what to specialize for, and a slow but general implementation to fall back to. AOT compilers have neither. They can’t specialize at all based on code they can’t see (e.g. dynamically loaded code), and specializing carelessly means generating more code, which has a number of problems (icache utilization, binary size, compile time, additional branches).
Implementing an AOT compiler that correctly implements the entire language is also relatively easy: Generate code that calls into the runtime to do what the interpreter would do when fed with this code. Nuitka (mostly) does this. However, this doesn’t yield much performance benefit (failing criteria 3), as you still have to do just as much unnecessary work as an interpreter, save for dispatching the bytecode to the block of C code which does what you compiled in. But that’s only a rather small cost — significant enough to be worth optimizing in an existing interpreter, but not significant enough to justify a whole new implementation with its own problems.
What would be needed to fulfill all three criteria? We have no idea. There are some static analysis schemes which can extract some information about concrete types, control flow, etc. from Python programs. The ones that yield accurate data beyond the scope of a single basic block are extremely slow and need to see the whole program, or at least most of it. Still, you can’t do much with that information, other than perhaps optimize a few operations on builtin types.
Why’s that? To put it bluntly, a compiler either removes the ability to execute Python code loaded at runtime (failing criteria 1), or it does not make any assumptions that can be invalidated by any Python code at all. Unfortunately, that includes pretty much everything useful for optimizing programs: Globals including functions can be rebound, classes can be mutated or replaced entirely, modules can be modified arbitrarily too, importing can be hijacked in several ways, etc. A single string passed to eval
, exec
, __import__
, or numerous other functions, may do any of that. In effect, that means almost no big optimizations can be applied, yielding little performance benefit (failing criteria 3). Back to the above paragraph.
The hardest problem is to figure out what type everything has at any given time.
In a static language like C or Java, once you have seen the type declaration, you know what that object is and what it can do. If a variable is declared int
, it is an integer. It is not, for example, a callable function reference.
In Python, it can be. This is horrible Python, but legal:
i = 2
x = 3 + i
def prn(s):
print(s)
i = prn
i(x)
Now, this example is pretty stupid, but it illustrates the general idea.
More realistically, you might replace a built-in function with a user-defined function that does something slightly different (like a version that logs its arguments when you call it).
PyPy uses Just-In-Time compilation after watching what the code actually does, and this lets PyPy speed things up a lot. PyPy can watch a loop, and verify that every time the loop runs, variable foo
is always an integer; then PyPy can optimize away the code that looks up the type of foo
on every pass through the loop, and can often even get rid of the Python object that represents an integer, and foo
can just become a number sitting in a register on the CPU. This is how PyPy can be faster than CPython; CPython does the type lookup as fast as possible, but not even doing the lookup is even faster.
I don’t know the details, but I remember there was a project called Unladen Swallow that was trying to apply static compiler technology to speed up Python (using LLVM). You might want to Google search for Unladen Swallow and see if you can find a discussion of why it didn’t work out as they had hoped.
2
As the other answer says, the key problem is figuring out type information. To the extent you can do that statically, you can directly generate good code.
But even if you can’t do it statically, you can still generate reasonable code, just at runtime, when you get actual type information. This information turns out often to be stable, or have at most a few different values for any particular entity at a particular code point. The SELF programming language pioneered many of the ideas of aggressive runtime type collection and runtime code generation. Its ideas are widely used in modern JIT-based compilers like Java and C#.