I am reading “C in a nutshell” and there are alot of sentences similar to this one:
A statement specifies one or more actions to be performed such as
assigning a value to a variable, passing control to a function, or
jumping to another statement.
My question is what is the thing that “performs” these actions?
I have read here and there that C was defined to run on an abstract machine,
so my guess is that the abstract machine is supposed to perform these actions, and the job of actual compilers like gcc is to ensure that
if you evaluate a program mentally based on the way the abstract machine works then you would get the same result as when you actually
run the object file generated by the compiler (ofcourse evaluating a program mentally is not possible in most cases, but I am speaking
theoretically here).
So is the abstract machine supposed to interpret C code (after preprocessing) directly? Is C supposed to be translated to some
intermediate code that the abstract machine interprets? What exactly is the relationship between the abstract machine and C?
What is the
state of the abstract machine visible to programs? Only the main memory? If the abstract machine really interprets C code directly, how are
declarations evaluated, how do they change the state of the abstract machine? This last series of questions only serves the purpose of giving
you an idea of what I mean by precise relationship between C and it’s abstract machine.
2
The abstract machine does not exist – it is, after all, literally abstract (“existing in thought or as an idea but not having a physical or concrete existence”). The abstract machine is an imaginary machine that precisely follows the rules of the standard.
The C program is compiled by a compiler to a concrete machine which might (and usually does) have semantics distinct from that of the abstract machine. The actual machine might have things like speculative execution, out-of-order execution and parallelism.
A compliant compiler must produce an executable that when run, will have the observable behaviour as if the program was executed in the said abstract machine following the rules of the standard.
The abstract machine is a formal C term for the model of program execution.
It is related to the abstract model called Turing machine and refers to the very core of the language. The abstract machine is defined by the whole chapter C17 5.1.2.3 Program execution, where the first line says:
The semantic descriptions in this International Standard describe the behavior of an
abstract machine in which issues of optimization are irrelevant.
In other words, the abstract machine is a model for the specified outcome of a program, regardless of optimizations. It specifies the term sequencing of expressions (order of execution), the rules for determining if an optimization is allowed or not and the observable behavior of a program.
Very simply put, the abstract machine is what specifies that source code lines are to be read as if executed from the top to bottom of the source file.
Take this example:
int a = 1;
int b = 1;
int c = a + b + 1;
printf("%d", c);
The abstract machine is what specifies that the initializations of a
and b
are performed first, then the line int c = a + b + 1;
and finally the printf. The result must be 3. This means that the compiler is not allowed to re-order these lines if it affects the outcome of the program. There are sequence points at the ;
of each line, where all previous calculations must be finished.
The compiler is however free to execute the sub-expression a + b
first, or b + 1
first, as they are not sequenced in relation to each other. The order of evaluation is not specified. Similarly, it could initialize b
before a
since the order wouldn’t matter.
The compiler is also free to replace the code with c = 1 + 1 + 1;
or with c = 3;
or just replace it all with printf("3");
. Neither would affect the observable behavior of the program, so it would be valid optimizations to make.
4