Why are there so few C compilers?

C is one of the most widely-used languages in the world. It accounts for a huge proportion of existing code and continues to be used for a vast amount of new code. It’s beloved by its users, it’s so widely ported that being able to run C is to many the informal definition of a platform, and is praised by its fans for being a “small” language with a relatively clean set of features.

So where are all the compilers?

On the desktop, there are (realistically) two: GCC and Clang. Thinking about it for a few seconds you’ll probably remember Intel exists as well. There are a handful of others, far too obscure for the average person to name and almost universally not bothering to support a recent language version (or often even a well-defined language subset, just “a subset”). Half of the members of this list are historical footnotes; most of the rest are very specialized and still don’t actually implement the full language. Very few actually seem to be open-source.

Scheme and Forth – other small languages that are beloved by their fans for it – probably have more compilers than actual users. Even something like SML has more “serious” implementations to choose between than C. Whereas the announcement of a new (unfinished) C compiler aiming at verification actually sees some pretty negative responses, and veteran implementations struggle to get enough contributors to even catch up to C99.

Why? Is implementing C so hard? It isn’t C++. Do users simply have a very skewed idea about what complexity group it falls in (i.e. that it actually is closer to C++ than Scheme)?

Today, you need a real C compiler to be an optimizing compiler, notably because C is no longer a language close to the hardware, because current processors are incredibly complex (out-of-order, pipelined, superscalar, with complex caches & TLB, hence needing instruction scheduling, etc…). Today’s x86 processors are not like i386 processors of the previous century, even if both are able to run the same machine code. See the C is not a low level language (Your computer is not a fast PDP-11) paper by David Chisnall.

Few people are using naive non-optimizing C compilers like tinycc or nwcc, since they produce code which is several times slower than what optimizing compilers can give.

Coding an optimizing compiler is difficult. Notice that both GCC and Clang are optimizing some “source language-neutral” code representation (Gimple for GCC, LLVM for Clang). The complexity of a good C compiler is not in the parsing phase!

In particular, making a C++ compiler is not much harder than making a C compiler: parsing C++ and transforming it into some internal code representation is complex (because the C++ specification is complex), but is well understood, but the optimization parts are even more complex (inside GCC: the middle-end optimizations, source-language and target-processor neutral, form the majority of the compiler, with the rest being balanced between front-ends for several languages and back-ends for several processors). Hence most optimizing C compilers are also able to compile some other languages, like C++, Fortran, D, … The C++ specific parts of GCC are about 20% of the compiler…

Also, C (or C++) is so widely used that people expect their code to be compilable even when it does not exactly follow the official standards, which do not define precisely enough the semantics of the language (so each compiler may have its own interpretation of it). Look also into the CompCert proved C compiler, and the Frama-C static analyzer, which care about more formal semantics of C.

And optimizations are a long-tail phenomenon: implementing a few simple optimizations is easy, but they won’t make a compiler competitive! You need to implement a lot of different optimizations, and to organize and combine them cleverly, to get a real-world compiler that is competitive. In other words, a real-world optimizing compiler has to be a complex piece of software. BTW, both GCC and Clang/LLVM have several internal specialized C/C++ code generators. And both are huge beasts (several millions of source lines of code, with a growth rate of several percent each year) with a large developer community (a few hundred persons, working mostly full-time, or at least half-time).

Notice that there is no (to the best of my knowledge) multi-threaded C compiler, even if some parts of a compiler could be run in parallel (e.g. intra-procedural optimization, register allocation, instruction scheduling… ). And parallel build with make -j is not always enough (especially with LTO).

Also, it is difficult to get funded on coding a C compiler from scratch, and such an effort needs to last several years. Finally, most C or C++ compilers are free software today (there is no longer a market for new proprietary compilers sold by startups) or at least are monopolistic commodities (like Microsoft Visual C++), and being a free software is nearly required for compilers (because they need contributions from many different organizations).

I’d be delighted to get funding to work on a C compiler from scratch as free software, but I am not naive enough to believe that is possible today!

I would like to contest your underlying assumption that there are only a small number of C implementations.

I don’t even know C, I don’t use C, I am not a member of the C community, and yet, even I know far more than the few compilers you mentioned.

First and foremost, there is the compiler which probably completely dwarfs both GCC and Clang on the desktop: Microsoft Visual C. Despite the inroads that both OSX and Linux have been making on the desktop, and the marketshare that iOS and Android have “stolen” away from former traditional desktop users, Windows is still the dominant desktop OS, and the majority of Windows desktop C programs are probably compiled using Microsoft tools.

Traditionally, every OS vendor and every chip vendor had their own compilers. Microsoft, as an OS vendor, has Microsoft Visual C. IBM, as both an OS vendor and a chip vendor, has XLC (which is the default system compiler for AIX, and the compiler with which both AIX and i/OS are compiled). Intel has their own compiler. Sun/Oracle have their own compiler in Sun Studio.

Then, there are the high-performance compiler vendors like PathScale and The Portland Group, whose compilers (and OpenMP libraries) are used for numbercrunching.

Digital Mars is also still in business. I believe Walter Bright has the unique distinction of being the only person on the planet who managed to create a production-quality C++ compiler (mostly) by himself.

Last but not least we have all the proprietary compilers for embedded microcontrollers. IIRC, there are more microcontrollers sold every year than desktop, mobile, server, workstation, and mainframe CPUs have been sold in the entire history of computing combined. So, those are definitely not niche products.

An honorary mention goes out to TruffleC, a C interpreter(!) running on the JVM(!) written using the Truffle AST interpreter framework that is only 7% slower than GCC and Clang (whichever is fastest on any given particular benchmark) across the Computer Languages Benchmark Game, and faster than both on microbenchmarks. Using TruffleC, the Truffle team was able to get their version of JRuby+Truffle to execute Ruby C extensions faster than the actual C Ruby implementation!

So, these are 6 implementations in addition to the ones you listed which I can name off the top of my head, without even knowing anything about C.

How many compilers do you need?

If they have different feature sets, you create a portability problem. If they’re commoditised you choose either the “default” (GCC, Clang or VS). If you care about the last 5% performance you have a benchmark-off.

If you’re doing programming language work recreationally or for research purposes, it’s likely to be in a more modern language. Hence the proliferation of toy compilers for Scheme and ML. Although OCaml seems to be getting some traction for non-toy non-academic uses.

Note this varies a lot by language. Java has essentially the Sun/Oracle toolchain and the GNU one. Python has various compilers none of which are really respected compared to the standard interpreter. Rust and Go have exactly one implementation each. C# has Microsoft and Mono.

C/C++ is unique amongst compiled languages in that it has 3 major implementations of a common specification.

Going by the rule of dismissing anything that’s not used much, every other compiled language has 0 to 1.

And I think javascript is the only reason you need to specify ‘compiled’.

So what is your target language?

SML compilers are often targeting C or something like LLVM (or as seen in your link, the JVM or JavaScript).

If you’re compiling C, it’s not because you’re going to the JVM. You’re going to something worse than C. Far worse. And then you get to duplicate that minor hell a bunch of times for all your target platforms.

And sure, C isn’t C++, but I’d say that it’s closer to C++ than Scheme. It does have its own subset of undefined behavior evilness (I’m looking at you size of built in types). And if you screw up that minutiae (or do it “correctly” but unexpectedly) then you have decades of existing code on vital systems that will tell you how terrible you are. If you screw up an SML compiler, it just won’t work – and someone might notice. Someday.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 18:34

Thẻ: c++, compiler, implementations

Thiết kế website giá rẻ

Danh mục

Why are there so few C compilers?