I often hear the term that language A is written in language B. For example, PHP has been written C, C# is written in C++.
Can someone please explain what does that mean and if it is even correct? Does that have anything to do with the compiler of interpreter used by the language?
In addition what are the factors on which the choice of the implementing language is built upon?
6
Most programming languages fall in two categories: interpreted, and compiled languages.
A compiled language is translated by a compiler into machine code, the language the CPU directly executes step by step. An interpreted language, on the other hand, uses an intermediary, an interpreter, to run the language code. The interpreter is itself another program, usually itself compiled to machine code.
PHP is an interpreted language. You need a separate program to run PHP code, the computer does not run the program directly. That separate program, the PHP interpreter, is itself written in C.
C# is a compiled language, but it is not compiled to machine code. Instead, it is compiled to a specialist language, byte code, to be run on a virtual machine. Java is another example of such a setup. You could see it as a hybrid between compilation and interpretation, where the virtual machine is an interpreter. The virtual machine for C# (the CLI, or Common Language Infrastructure) is written in C++.
Other examples are:
- Python: The Python interpreter compiles Python code to Python bytecode, then interprets the bytecode. The interpreter itself is written in C. New implementations have since been added, including one that compiles python to run on the same CLI used for C#, called IronPython, and one that runs on the Java virtual machine, Jython. To complete the circle, there is a Python version written in (a subset of) Python, PyPy.
- Ruby: Ruby started out as a pure interpreted language, but the most recent version switched to using bytecode. For Ruby, too, there is a project that compiles to the CLI, named IronRuby, and one for the Java VM, JRuby.
5
You are basically right. If it is said that Ruby is written in C, this means that the language interpreter and parts of the core library are written in C.
So the Ruby interpreter is a C program that takes a text file as input, processes it and then calls functions that are either in another text file (if written in Ruby) or that are compiled C code, as much of the basic functionality that needs to directly access system resources like memory, the file system and more. And some functions that require very high performance.
So you have different parts of a language that can or have to be written in other languages. Nothing would keep you from writing the interpreter in C and the libraries in C++ (though maybe making a few things more difficult). You could even have multiple steps and use a language that is very good at text processing to generate some intermediate data which then is processed by some C code.
Factors for the decision may be just the same as for other complex applications. Performance is one. The ability to write code that can access system resources directly another. So in most cases it has to be a compiled language (though in theory you could write a Ruby interpreter in Python). Availability on different systems is important if you want your language to run on Linux, Win, OS X and others.
6
It simply means that most of the core of language A is written in language B. What “core of language A” might differ from language to language, but in general terms you guess right, it means it’s compiler or interpreter. The deciding factor on picking a language to write another language in is, as with almost every project, what languages the developers are more familiar with.
That said, “language A is written in language B” is an oversimplification for most modern languages. If we take Python as an example, while the reference implementation, CPython, was indeed written in C there are implementations written in other languages, like Jython (written in Java), IronPython (written in C#), PyPy (writen in Python), CLPython (written in Common Lisp), Stackless Python (written in C and Python) and Unladen Swallow (written in C++).
A programming language is a definition, and as the Python example shows, there aren’t really any restrictions on what languages its compiler, interpreter and libraries can be written in. And of course it’s also possible for a language to be written in itself, through a process called bootstrapping.
8
From the perspective of using a programming language, a programming language is just a program. It might be a compiler, or it might be an interpreter, or it might be some sort of virtual machine. All of those things are just computer programs, and thus can be written in any language.
So, if you wanted to create your own version of PHP, you might start out with whatever language you are most fluent in. You would then write a program that can read PHP-formatted code and do whatever the PHP spec says your program should do. You are thus creating the PHP language in language X.
7
A very similar phrasing with completely different meaning is “writing language A in language B”, e.g. “writing C in Java”.
This describes code that is syntactically correct in one language, but uses structures, idioms and conventions from another language. In the “writing C in Java” example, signs of this would be declaring all local variables on top of each method, using integer constants instead of enums, using identifiers_with_underscores, etc.
Typically this happens when someone has worked with one language for a long time (especially when they have worked only with that language) and is very new to the current language (or not interested in writing clean code).
2
Technology is an inherently iterative process. We start with simple tools and then use those tools to make better ones. The first assembly languages were pretty much 1:1 translations of the standardized instruction bytecodes for the chip; the 8086 architecture and its assembler became dominant over other architectures like Z80, RISC, etc, and so we began to develop languages that could be digested into 8086 assembly, like FORTRAN, COBOL, Pascal and C. The program that interprets the source code of these languages has to be written in something more primitive, otherwise you end up in a chicken-and-egg argument; if the source code for the first C compiler was written in C, then what compiled that C source code, and wouldn’t that, by definition, be the first C compiler?
Basically, “C# is written in C++” should be taken to mean that the first and/or most popular compiler and runtime/core libraries that obey the specification of the C# language (those being Microsoft’s .NET Framework, and the command-line compiler program CSC.exe) are written in C++.
“Language A is written in language B” means that the only implementation of language A (or the only one that is widely used) is the one that is actually a project developed in language B, and the only complete, up-to-date specification of A is the B source code which implements it such that if the documentation and the B program disagree, the B program is usually deemed correct.
2