Is Ken Thompson’s compiler hack still a threat?

Ken Thompson Hack (1984)

Ken Thompson outlined a method for corrupting a compiler binary (and other compiled software, like a login script on a *nix system) in 1984. I was curious to know if modern compilation has addressed this security flaw or not.

Short description:

Re-write compiler code to contain 2 flaws:

When compiling its own binary, the compiler must compile these flaws
When compiling some other preselected code (login function) it must compile some arbitrary backdoor

Thus, the compiler works normally – when it compiles a login script or similar, it can create a security backdoor, and when it compiles newer versions of itself in the future, it retains the previous flaws – and the flaws will only exist in the compiler binary so are extremely difficult to detect.

Questions:

I could not find any answers to these on the web:

How does this relate to just-in-time compilation?
Are functions like the program handling logins on a *nix system compiled when they are
run?
Is this still a valid threat, or have there been developments in
the security of compilation since 1984 that prevent this from being a
significant issue?
Does this affect all languages?

Why do I want to know?

I came across this while doing some homework, and it seemed interesting but I lack the background to understand in a concrete way whether this is a current issue, or a solved issue.

Reference Material

Overview
Some code

This hack has to be understood in context. It was published at a time and in a culture where Unix running on all kinds of different hardware was the dominant system.

What made the attack so scary was that the C compiler was the central piece of software for these systems. Almost everything in the system went through the compiler when it was first installed (binary distributions were rare due to the heterogenous hardware). Everyone compiled stuff all the time. People regularly inspected source code (they often had to make adjustments to get it to compile at all), so having the compiler inject backdoors seemed to be a kind of “perfect crime” scenario where you could not be caught.

Nowadays, hardware is much more compatible and compilers therefore have a much smaller role in the day-to-day operation of a system. A compromised compiler is not the most scary scenario anymore – rootkits and a compromised BIOS are even harder to detect and get rid of.

The purpose of that speech wasn’t to highlight a vulnerability that needs to be addressed, or even to propose a theoretical vulnerability that we need to be aware of.

The purpose was that, when it comes to security, we’d like to not have to trust anyone, but unfortunately that’s impossible. You always have to trust someone (hence the title: “Reflections On Trusting Trust”)

Even if you’re the paranoid type who encrypts his desktop hard-drive and refuses to run any software you didn’t compile yourself, you still need to trust your operating system. And even if you compile the operating system yourself, you still need to trust the compiler you used. And even if you compile your own compiler, you still need to trust that compiler! And that’s not even mentioning the hardware manufacturers!

You simply can’t get away with trusting no one. That’s the point he was trying to get across.

No

The attack, as originally described, was never a threat. While a compiler could theoretically do this, actually pulling off the attack would require programming the compiler to

Recognize when the source code being compiled is of a compiler, and
Figure out how to modify arbitrary source code to insert the hack into it.

This entails figuring out how the compiler works from its source code, in order that it can modify it without breakage.

For instance, imagine that the linking format stores the data lengths or offset of the compiled machine code somewhere in the executable. The compiler would have to figure out for itself which of these need to be updated, and where, when inserting the exploit payload. Subsequent versions of the compiler (innocuous version) can arbitrarily change this format, so the exploit code would effectively need to understand these concepts.

This is high-level self-directed programming, a hard AI problem (last I checked, the state of the art was generating code that is practically determined by its types). Look: few humans can even do this; you would have to learn the programming language and understand the code-base first.

Even if the AI problem is solved, people would notice if compiling their tiny compiler results in a binary with a huge AI library linked into it.

Analogous attack: bootstrapping trust

However, a generalization of the attack is relevant. The basic issue is that your chain of trust has to start somewhere, and in many domains its origin could subvert the entire chain in a hard-to-detect way.

An example that could easily be pulled off in real life

Your operating system, say Ubuntu Linux, ensures security (integrity) of updates by checking downloaded update packages against the repository’s signing key (using public-key cryptography). But this only guarantees authenticity of the updates if you can prove that the signing key is owned by a legitimate source.

Where did you get the signing key? When you first downloaded the operating system distribution.

You have to trust that the source of your chain of trust, this signing key, isn’t evil.

Anyone that can MITM the Internet connection between you and the Ubuntu download server—this could be your ISP, a government that controls Internet access (e.g. China), or Ubuntu’s hosting provider—could have hijacked this process:

Detect that you’re downloading the Ubuntu CD image. This is simple: see that the request is going to any of the (publicly-listed) Ubuntu mirrors and asks for the filename of the ISO image.
Serve the request from their own server, giving you a CD image containing the attacker’s public key and repository location instead of Ubuntu’s.

Thenceforth, you will get your updates securely from the attacker’s server. Updates run as root, so the attacker has full control.

You can prevent the attack by making sure the original is authentic. But this requires that you validate the downloaded CD image using a hash (few people actually do this)—and the hash must itself be downloaded securely, e.g. over HTTPS. And if your attacker can add a certificate on your computer (common in a corporate environment) or controls a certificate authority (e.g. China), even HTTPS provides no protection.

First, my favorite writeup of this hack is called Strange Loops.

This particular hack could certainly (*) be done today in any of the major open source OS projects, particularly Linux, *BSD, and the like. I would expect it would work almost identically. For example, you download a copy of FreeBSD that has an exploited compiler to modify openssh. From then on, every time you upgrade openssh or the compiler by source, you will continue the problem. Assuming the attacker has exploited the system used to package FreeBSD in the first place (likely, since the image itself is corrupted, or the attacker is in fact the packager), then every time that system rebuilds FreeBSD binaries, it will reinject the problem. There are lots of ways for this attack to fail, but they’re not fundamentally different than how Ken’s attack could have failed (**). The world really hasn’t changed that much.

Of course, similar attacks could just as easily (or more easily) be injected by their owners into systems like Java, the iOS SDK, Windows, or any other system. Certain kinds of security flaws can even be engineered into the hardware (particularly weakening random number generation).

(*) But by “certainly” I mean “in pricinciple.” Should you expect that this kind of hole exists in any particular system? No. I would consider it quite unlikely for various practical reasons. Over time, as the code changes and changes, the likelihood that this kind of hack would cause strange bugs increases. And that raises the likelihood that it would be discovered. Less ingenious backdoors would require conspiracies to maintain. Of course we know for a fact that “lawful intercept” backdoors have been installed in various telecommunications and networking systems, so in many cases this kind of elaborate hack is unnecessary. The hack is installed overtly.

So always, defense in depth.

(**) Assuming Ken’s attack ever actually existed. He just discussed how it could be done. He didn’t say he actually did it as far as I know.

Does this affect all languages?

This attack primarily affects languages that are self-hosting. That is languages where the compiler is written in the language itself. C, Squeak Smalltalk, and the PyPy Python interpreter would be affected by this. Perl, JavaScript, and the CPython Python interpreter would not.

How does this relate to just-in-time compilation?

Not very much. It is the self-hosting nature of the compiler that allows the hack to be hidden. I don’t know of any self-hosting JIT compilers. (Maybe LLVM?)

Are functions like the program handling logins on a *nix system compiled when they are run?

Not usually. But the question isn’t when it is compiled, but by which compiler. If the login program is compiled by a tainted compiler, it will be tainted. If it is compiled by a clean compiler, it will be clean.

Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?

This is still a theoretical threat, but is not very likely.

One thing you could do to mitigate it is to use multiple compilers. For example, an LLVM Compiler which is, itself compiled by GCC will not pass along a back door. Similarly, a GCC compiled by LLVM will not pass along a back door. So, if you are worried about this sort of attack, then you could compile your compiler with another breed of compiler. That means that the evil hacker (at your OS vendor?) Will have to taint both compilers to recognize each other; A much more difficult problem.

There’s a theoretical chance for this to happen. There is, however, a way of checking if a specific compiler (with available source code) has been compromised, through David A. Wheeler’s Diverse double-compiling.

Basically, use both the suspected compiler and another independently developed compiler to compile the source of the suspect compiler. This gives you SC_sc and SC_T. Now, compile the suspect source using both of these binaries. If the resulting binaries are identical (with exception of a variety of things that may well legitimately vary, like assorted timestamps), the suspect compiler was not actually abusing trust.

As a specific attack, it’s as much of a threat as it ever was, which is pretty much no threat at all.

How does this relate to just-in-time compilation?

Not sure what you mean by that. Is a JITter immune to this? No. Is it more vulnerable? Not really. As a developer YOUR app is more vulnerable simply because you can’t validate that it’s not been done. Note that your as yet undeveloped app is basically immune to this and all practical variations, you only have to worry about a compiler that is newer than your code.

Are functions like the program handling logins on a *nix system compiled when they are run?

That’s not really relevant.

Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?

There is no real security of compilation, and can’t be. That was really the point of his talk, that at some point you have to trust someone.

Does this affect all languages?

Yes. Fundamentally, at some time or another, your instructions have to be turned into something the computer execeutes, and that translation can be done incorrectly.

For all we know.. this is happening at this very moment:

It’s hard to know for sure…

David Wheeler has a good article: http://www.dwheeler.com/trusting-trust/

Me, I’m more worried about hardware attacks. I think we need a totally VLSI design toolchain with FLOSS source code, that we can modify and compile ourselves, that lets us build a microprocessor that has no backdoors inserted by the tools. The tools should also let us understand the purpose of any transistor on the chip. Then we could pop open a sample of the finished chips and inspect them with a microscope, making sure they had the same circuitry that the tools said they were supposed to have.

Systems in which the end users have access to the source code are the ones for which you would have to hide this type of attack. Those would be open source systems in today’s world. The problem is that although there is a dependence on a single compiler for all Linux systems, the attack would have to get onto the build servers for all of the major Linux distributions. Since those don’t download the compiler binaries directly for each compiler release, the source for the attack would have had to be on their build servers in at least one previous release of the compiler. Either that or the very first version of the compiler that they downloaded as a binary would have to have been compromised.

If one has source code for a compiler/build system whose output should not depend on anything other than the content of the supplied source files, and if one has several other compilers and knows that they do not all contain the same compiler hack, one can make sure one gets an executable that depends upon nothing other than the source code.

Suppose one has source code for a compiler/linker package (say the Groucho Suite) written in such a way that its output will not depend upon any unspecified behaviors, nor on anything other than the content of the input source files, and one compiles/links that code on a variety of independently-produced compilers/linker packages (say the Harpo Suite, the Chico suite, and the Zeppo Suite), yielding a different set of exeuctables for each (call them G-Harpo, G-Chico, and G-Zeppo). It would not be unexpected for these executables to contain different sequences of instructions, but they should be functionally identical. Proving that they are functionally identical in all cases, however, would likely be an intractable problem.

Fortunately, such proof won’t be necessary if one only uses the resulting executables for one single purpose: compiling the Groucho suite again. If one compilers the Groucho suite using using G-Harpo (yielding G-G-Harpo), G-Chico (G-G-Chico), and G-Zeppo (G-G-Zeppo), then all three resulting files, G-G-Harpo, G-G-Chico, and G-G-Zeppo, should all byte-for-byte identical. If the files match, that would imply that any “compiler virus” that exists in any of them must exist identically in all of them (since the all three files are byte-for-byte identical, there’s no way their behaviors could differ in any way).

Depending upon the age and lineage of the other compilers, it may be possible to ensure that such a virus could not plausibly exist in them. For example, if one uses an antique Macintosh to feed a compiler that was written from scratch in 2007 through a version of MPW that was written in the 1980’s, the 1980’s compilers wouldn’t know where to insert a virus in the 2007 compiler. It may be possible for a compiler today to do fancy enough code analysis to figure it out, but the level of computation required for such analysis would far exceed the level of computation required to simply compile the code, and could not very well have gone unnoticed in a marketplace where compilation speed was a major selling point.

I would posit that if one is working with compilation tools where the bytes in an executable file to be produced should not depend in any way upon anything other than the content of the submitted source files, it is possible to achieve reasonably good immunity from a Thompson-style virus. Unfortunately, for some reason, non-determinism in compilation seems to be regarded as normal in some environments. I recognize that on a multi-CPU system it may be possible for a compiler to run faster if it is allowed to have certain aspects of code generation vary depending upon which of two threads finishes a piece of work first.

On the other hand, I’m not sure I see any reason that compilers/linkers shouldn’t provide a “canonical output” mode where the output depends only upon the source files and a “compilation date” which may be overridden by the user. Even if compiling code in such a mode took twice as long as normal compilation, I would suggest that there would be considerable value in being able to recreate any “release build”, byte for byte, entirely from source materials, even if it meant that release builds would take longer than “normal builds”.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 22:33

Thẻ: compiler, hacking, linux, unix

Thiết kế website giá rẻ

Danh mục

Is Ken Thompson’s compiler hack still a threat?

Ken Thompson Hack (1984)

Short description:

Questions:

Why do I want to know?

Reference Material

No

Analogous attack: bootstrapping trust

An example that could easily be pulled off in real life

Does this affect all languages?

How does this relate to just-in-time compilation?

Are functions like the program handling logins on a *nix system compiled when they are run?

Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?