What are the challenges related to typing in writing a compiler for a dynamically typed language?

In this talk, Guido van Rossum is talking (27:30) about attempts to write a compiler for Python code, commenting on it saying:

turns out it’s not so easy to write a compiler that maintains all the
nice dynamic typing properties and also maintains semantic correctness
of your program, so that it actually does the same thing no matter
what kind of weirdness you do somewhere under the covers and actually
runs any faster

What are the (possible) challenges related to typing in writing a compiler for a dynamically typed language like Python?

You oversimplified Guido’s statement in phrasing your question. The problem isn’t writing a compiler for a dynamically-typed language. The problem is writing one that is (criteria 1) always correct, (criteria 2) keeps dynamic typing, and (criteria 3) is noticeably faster for a significant amount of code.

It’s easy to implement 90% (failing criteria 1) of Python and be consistently fast at it. Similarly, it’s easy to create a faster Python variant with static typing (failing criteria 2). Implementing 100% is also easy (insofar implementing a language that complex is easy), but so far every easy way to implement it turns out to be relatively slow (failing criteria 3).

Implementing an interpreter plus JIT that’s correct, implements the entire language, and is faster for some code turns out to be feasible, though significantly harder (cf. PyPy) and only so if you automate the creation of the JIT compiler (Psyco did without it, but was very limited in what code it could speed up). But note that this is explicitly out of scope, as we’re talking about static (aka ahead-of-time) compilers. I only mention this to explain why its approach does not work for static compilers (or at least there’s no existing counterexample): It first has to interpret and observe the program, then generate code for a specific iteration of a loop (or another linear code path), then optimize the hell out of that based on assumptions only true for that specific iteration (or at least, not for all possible iterations). The expectation is that many later executions of that code will also match the expectation and thus benefit from the optimizations. Some (relatively cheap) checks are added to assure correctness. To do all this, you need an idea of what to specialize for, and a slow but general implementation to fall back to. AOT compilers have neither. They can’t specialize at all based on code they can’t see (e.g. dynamically loaded code), and specializing carelessly means generating more code, which has a number of problems (icache utilization, binary size, compile time, additional branches).

Implementing an AOT compiler that correctly implements the entire language is also relatively easy: Generate code that calls into the runtime to do what the interpreter would do when fed with this code. Nuitka (mostly) does this. However, this doesn’t yield much performance benefit (failing criteria 3), as you still have to do just as much unnecessary work as an interpreter, save for dispatching the bytecode to the block of C code which does what you compiled in. But that’s only a rather small cost — significant enough to be worth optimizing in an existing interpreter, but not significant enough to justify a whole new implementation with its own problems.

What would be needed to fulfill all three criteria? We have no idea. There are some static analysis schemes which can extract some information about concrete types, control flow, etc. from Python programs. The ones that yield accurate data beyond the scope of a single basic block are extremely slow and need to see the whole program, or at least most of it. Still, you can’t do much with that information, other than perhaps optimize a few operations on builtin types.

Why’s that? To put it bluntly, a compiler either removes the ability to execute Python code loaded at runtime (failing criteria 1), or it does not make any assumptions that can be invalidated by any Python code at all. Unfortunately, that includes pretty much everything useful for optimizing programs: Globals including functions can be rebound, classes can be mutated or replaced entirely, modules can be modified arbitrarily too, importing can be hijacked in several ways, etc. A single string passed to eval, exec, __import__, or numerous other functions, may do any of that. In effect, that means almost no big optimizations can be applied, yielding little performance benefit (failing criteria 3). Back to the above paragraph.

The hardest problem is to figure out what type everything has at any given time.

In a static language like C or Java, once you have seen the type declaration, you know what that object is and what it can do. If a variable is declared int, it is an integer. It is not, for example, a callable function reference.

In Python, it can be. This is horrible Python, but legal:

i = 2
x = 3 + i

def prn(s):
    print(s)

i = prn
i(x)

Now, this example is pretty stupid, but it illustrates the general idea.

More realistically, you might replace a built-in function with a user-defined function that does something slightly different (like a version that logs its arguments when you call it).

PyPy uses Just-In-Time compilation after watching what the code actually does, and this lets PyPy speed things up a lot. PyPy can watch a loop, and verify that every time the loop runs, variable foo is always an integer; then PyPy can optimize away the code that looks up the type of foo on every pass through the loop, and can often even get rid of the Python object that represents an integer, and foo can just become a number sitting in a register on the CPU. This is how PyPy can be faster than CPython; CPython does the type lookup as fast as possible, but not even doing the lookup is even faster.

I don’t know the details, but I remember there was a project called Unladen Swallow that was trying to apply static compiler technology to speed up Python (using LLVM). You might want to Google search for Unladen Swallow and see if you can find a discussion of why it didn’t work out as they had hoped.

As the other answer says, the key problem is figuring out type information. To the extent you can do that statically, you can directly generate good code.

But even if you can’t do it statically, you can still generate reasonable code, just at runtime, when you get actual type information. This information turns out often to be stable, or have at most a few different values for any particular entity at a particular code point. The SELF programming language pioneered many of the ideas of aggressive runtime type collection and runtime code generation. Its ideas are widely used in modern JIT-based compilers like Java and C#.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 00:44

Thẻ: compiler, python

Thiết kế website giá rẻ

Danh mục

What are the challenges related to typing in writing a compiler for a dynamically typed language?