Why does the source include a header and not also the other way around? I googled it but only found questions regarding the use of header files, how to include them but nowhere to say why it is like it is.
If the header is merely the declaration, how does the compiler know the definition only from it?
For example: take foo.cpp
, bar.h
, bar.cpp
. This is what everybody does:
in foo.cpp
:
#include "bar.h"
but the bar.cpp
is not included neither in the bar.h
or foo.cpp
. That’s why I deem logical that bar.cpp
be included in bar.h
and so, indirectly in foo.cpp
.
2
Compilation of C and C++ code is done in two distinct steps.
In the first step, the source code in a single .c or .cpp file is compiled to object code. While compiling the file foo.cpp
, the compiler needs to know that bar
contains a method doSomething
and what parameters that method expects and what return type it returns, but the compiler doesn’t need to know exactly what the function does internally.
The compiler checks that the call in the code from foo.cpp
matches how the function is declared (which is obtained by the #include “bar.h”), and then makes an annotation in the object code that at position X the function bar::doSomething
is being called.
In the second step, all the source files that make up the application are linked together to create the final executable. At this point, the linker tries to replace the annotations that the compiler made in the object code with the actual address of the corresponding function. The linker is explicitly told which object files it should look at.
It is only at this point that the definitions of all functions need to be available.
Which files are needed to create an application is typically defined in a project file or a makefile.
13
Your question indicates that you don’t understand the compilation process (most of this will apply to c as well as c++.)
The fundamental unit of compilation is a source file; for C++ the convention is to use the .cpp file extension. The extension could be anything, but most compilers use the extension to determine how to process the file, ex. .cpp means c++, .o means object, .c means straight c, etc. Source files are complied into object (nominally .obj or .o) files. Object files are linked together to create executable files.
The compiler doesn’t know anything about header files. There is a pre-processor that replaces the #include directives with the content of the header file, and processes other pre-processor directives, the resulting output is what the compiler sees as input; this is the reason why headers are included in source files and not vice versa.
Header files aren’t strictly necessary to compile source to object; they are a tool to manage complexity, organize and modularize your source files. As the other answers have explained, when foo.cpp needs to reference something in bar.cpp, say baz(), foo.cpp needs to have some descriptive information about baz() so it can generate a reference that the linker can resolve.
One could simply put this information in foo.cpp, but this would duplicate the information in bar.cpp. The solution to this duplication is to put all the descriptive information in another file, which we call a header file (nominally .h or .hpp.) With bar.h the information that needs to be included in foo.cpp and bar.cpp is only in one file, which is inserted into the file when the pre-processor finds the #include directive. This becomes invaluable as the complexity of the project increases.
Much of the compilation process is hidden behind an IDE or a makefile. Compiling a small project by hand will be very instructive in learning how the compilation process actually works. The first time I had to write a .makefile from scratch in school all mystery about the compilation process vanished; I had to learn all the details to accomplish the task.
Compiling and linking your example would be 3 operations, in two stages:
- Compile the source files:
- Compile foo.cpp into foo.o (ex. gcc -c -o foo.o foo.cpp)
- Compile bar.cpp into bar.o (ex. gcc -c -o bar.o bar.cpp)
- Link the object files:
- link foo.o and bar.o into program (ex. gcc -o program foo.o bar.o)
During the compilation stage each file is processed individually. The resulting object files contain machine code and external references (things not in the source file, but located some other source file.)
The final step is to link the objects, this is when the references are resolved, if you forget to include the object file in the link command, it will fail with unresolved external references; this is the step where all the information is needed. Modern compilers can handle all those steps with a single command, but that hides the process.
10
If bar.cpp
included bar.h
and vice versa, then there would be no point in having those two as separate files, since they would be effectively identical.
But there is a point: you don’t need the definition for most tasks. Compiling a module that uses the services of bar
only needs to know that bar
contains a method doThings()
which takes an int
and returns an int
. It does not need to know how bar
achieves this (indeed, that is the entire point of modules). Therefore, when compiling another module, the compiler only bothers to read the header file, but when compiling bar
itself it reads both files. In a large system composed of many modules, this saves a huge amount of effort because every line of definition code is compiled only once and not many times.
4
If you’re a compiler, then Life is easy so long as you know what everything “looks like”. For example, ‘X’ is a 64-bit integer, ‘Y’ is a pointer to something of that Type, ‘Z’ is an array of this Type and size, and so on.
Take this to the extreme and you get an early Pascal compiler, where the main method has to be the very last thing in the source file! Everything else has to be defined first. Writing code in this way is painful, because the order is natural to the compiler but not to the likes of you and me.
Enter the Header (.h) file.
It describes all the methods that that appear in your .cpp file so the compiler can “make a note” of what they look like (known as “forward declarations”) and then just recognise them when they appear, again, further down in the code. Otherwise, it just has to assume some “default” definition for any method and then complain like mad when it finds a different implmentation signature (some Solaris compilers assume “int methodName( int )” as a default, IIRC.
But, even with all this, you still don’t need a Header file.
You can do all that within the .cpp file itself.
Now add “reusable code” into the mix. Other .cpp files want to use the methods in your .cpp file but the compiler still needs to know what those methods “look like”.
By splitting the method declarations out into a Header file, you’re effectively “publishing” the API provided by your .cpp file. (ADA has “package” and “package body”, most of the Microsoft world just smooshes all the declaration and implementation details into a single file, the dll).
So the Header file gives you a single declaration of your methods, usable by both you and any number of others who might need to use those methods. Note, however, that the people using your Header file don’t need your .cpp file at all; [object] code compiled in this way always has “holes” in it (unresolved function pointers); it’s the linker’s job to join all the dots together.
1