Why do we need to include both the .h
and .cpp
files while we can make it work solely by including the .cpp
file?
For example: creating a file.h
containing declarations, then creating a file.cpp
containing definitions and including both in main.cpp
.
Alternatively: creating a file.cpp
containing declaration/definitions ( no prototypes ) including it in main.cpp
.
Both work for me. I can’t see the difference. Maybe some insight into the compiling and linking process may help.
6
While you can include .cpp
files as you mentioned, this is a bad idea.
As you mentioned, declarations belong in header files. These cause no problems when included in multiple compilation units because they do not include implementations. Including a the definition of a function or class member multiple times will normally cause a problem (but not always) because the linker will get confused and throw an error.
What should happen is each .cpp
file includes definitions for a subset of the program, such as a class, logically organized group of functions, global static variables (use sparingly if at all), etc.
Each compilation unit (.cpp
file) then includes whatever declarations it needs to compile the definitions it contains. It keeps track of the functions and classes it references but does not contain, so the linker can resolve them later when it combines the object code into an executable or library.
Example
Foo.h
-> contains declaration (interface) for class Foo.Foo.cpp
-> contains definition (implementation) for class Foo.Main.cpp
-> contains main method, program entry point. This code instantiates a Foo and uses it.
Both Foo.cpp
and Main.cpp
need to include Foo.h
. Foo.cpp
needs it because it is defining the code that backs the class interface, so it needs to know what that interface is. Main.cpp
needs it because it is creating a Foo and invoking its behavior, so it has to know what that behavior is, the size of a Foo in memory and how to find its functions, etc. but it does not need the actual implementation just yet.
The compiler will generate Foo.o
from Foo.cpp
which contains all of the Foo class code in compiled form. It also generates Main.o
which includes the main method and unresolved references to class Foo.
Now comes the linker, which combines the two object files Foo.o
and Main.o
into an executable file. It sees the unresolved Foo references in Main.o
but sees that Foo.o
contains the necessary symbols, so it “connects the dots” so to speak. A function call in Main.o
is now connected to the actual location of the compiled code so at runtime, the program can jump to the correct location.
If you had included the Foo.cpp
file in Main.cpp
, there would be two definitions of class Foo. The linker would see this and say “I don’t know which one to pick, so this is an error.” The compiling step would succeed, but linking would not. (Unless you just do not compile Foo.cpp
but then why is it in a separate .cpp
file?)
Finally, the idea of different file types is irrelevant to a C/C++ compiler. It compiles “text files” which hopefully contain valid code for the desired language. Sometimes it may be able to tell the language based on the file extension. For example, compile a .c
file with no compiler options and it will assume C, while a .cc
or .cpp
extension would tell it to assume C++. However, I can easily tell a compiler to compile a .h
or even .docx
file as C++, and it will emit an object (.o
) file if it contains valid C++ code in plain text format. These extensions are more for the benefit of the programmer. If I see Foo.h
and Foo.cpp
, I immediately assume that the first contains the declaration of the class and the second contains the definition.
6
Read more on the role of the C and C++ preprocessor, which is conceptually the first “phase” of the C or C++ compiler (historically it was a separate program /lib/cpp
; now, for performance reasons it is integrated inside the compiler proper cc1
or cc1plus
). Read in particular the documentation of the GNU cpp
preprocessor. So in practice the compiler conceptually first preprocesses your compilation unit (or translation unit) and then work on the preprocessed form.
You’ll probably need to always include the header file file.h
if it contains (as dictated by conventions and habits):
- macro definitions
- types definitions (e.g.
typedef
,struct
,class
etc, …) - definitions of
static inline
functions - declarations of external functions.
Notice that it is a matter of conventions (and convenience) to put these in a header file.
Of course, your implementation file.cpp
need all the above, so wants to #include "file.h"
at first.
This is a convention (but a very common one). You could avoid header files and copy and paste their content into implementation files (i.e. translation units). But you don’t want that (except perhaps if your C or C++ code is automatically generated; then you could make the generator program doing that copy & paste, mimicking the role of the preprocessor).
The point is that the preprocessor is doing textual only operations. You could (in principle) avoid it entirely by copy & paste, or replace it by another “preprocessor” or C code generator (like gpp or m4).
An additional issue is that the recent C (or C++) standards define several standard headers. Most implementations really implement these standard headers as (implementation specific) files, but I believe that it would be possible for a conforming implementation to implement standard includes (like #include <stdio.h>
for C, or #include <vector>
for C++) with some magic tricks (e.g. using some database or some information inside the compiler).
If using GCC compilers (e.g. gcc
or g++
) you can use the -H
flag to get informed about every inclusion, and the -C -E
flags to obtain the preprocessed form. Of course there are many other compiler flags affecting preprocessing (e.g. -I /some/dir/
to add /some/dir/
for searching included files, and -D
to predefine some preprocessor macro, etc, etc….).
NB. Future versions of C++ (perhaps C++20, perhaps even later) might have C++ modules.
2
Due to C++’s multiple-unit build model, you need a way to have code that appears in your program only once (definitions), and you need a way to have code that appears in each translation unit of your program (declarations).
From this is born the C++ header idiom. It’s convention for a reason.
You can dump your entire program into a single translation unit, but this introduces problems with code re-use, unit testing and inter-module dependency handling. It’s also just a big mess.
The selected answer from Why do we need to write a header file? is a reasonable explanation, but I wanted to add additional detail.
It seems that the rational for header files tends to get lost in teaching and discussing C/C++.
Headers file provide a solution to two application development problems:
- Separation of interface and implementation
- Improved compile/link times for large programs
C/C++ can scale from small programs to very large multi-million line, multi-thousand file programs. Application development can scale from teams of one developer to hundreds of developers.
You can wear several hats as a developer. In particular, you can be the user of an interface to functions and classes, or you can be the writer of an interface of functions and classes.
When you are using a function you need to know the function interface, what parameters to use, what the functions returns and you need to know what the function does. This is easily documented in the header file without ever looking at the implementation. When have you read the implementation of printf
? Buy we use it every day.
When you are the developer of an interface, the hat changes the other direction. The header file provides the declaration of the public interface. The header file defines what another implementation needs in order to use this interface. Information internal and private to this new interface do not (and should not) be declared in the header file. The public header file should be all anyone needs to use the module.
For large scale development, compiling and linking can take a long time. From many minutes to many hours (even to many days!). Dividing the software into interfaces (headers) and implementations (sources) provides a method for only compiling files that need to be compiled rather than re-building everything.
Additionally, header files allows a developer to provide a library (already compiled) as well as a header file. Other users of the library may not ever see the implementation proper, but can still use the library with the header file. You do this every day with the C/C++ standard library.
Even if you are developing a small application, using large scale software development techniques are a good habit. But we also need to remember why we use these habits.
You may as well be asking why not just put all your code into one file.
The simplest answer is code maintenance.
There are times when it is reasonable to create a class:
- all inlined within a header
- all within a compilation unit and no header at all.
The time when it is reasonable to totally inline in a header is when the class is really a data struct with a few basic getters and setters and perhaps a constructor that takes values to initialise its members.
(Templates, that need to be all inlined, are a slightly different issue).
The other time to create a class all within a header is when you might use the class in multiple projects and specifically want to have to avoid linking in libraries.
A time when you might include a whole class within a compilation unit and not expose its header at all is:
-
An “impl” class that is used only by the class that it implements. It is an implementation detail of that class, and externally is not used.
-
An implementation of an abstract base class that is created by some kind of factory method that returns a pointer/reference/smart pointer) to the base class. The factory method would be exposed by the class itself would not be. (In addition if the class has an instance that registers itself on a table through a static instance, it doesn’t even need to be exposed via a factory).
-
A “functor” type class.
In other words, where you do not want anyone to include the header.
I know what you might be thinking… If it’s just for maintainability, by including the cpp file (or a totally inlined header) you are able to easily edit a file to “find” the code and just rebuild.
However “maintainability” isn’t just having the code looking tidy. It is a matter of impacts of change. It is generally known that if you keep a header unchanged and just change an implementation (.cpp)
file, it will not be required to rebuild the other source because there should be no side effects.
This makes it “safer” to make such changes without worrying about the knock-on effect and that is what really is meant by “maintainability”.
Snowman’s example need a little extension to show you exactly why .h files are needed.
Add another class Bar into the play which is also dependent on class Foo.
Foo.h -> contains declaration for class Foo
Foo.cpp -> contains definition (impementation) of class Foo
Main.cpp -> uses variable of the type Foo.
Bar.cpp -> uses variable of type Foo too.
Now all cpp files needs to include Foo.h
It would be an error to include Foo.cpp in more than one other cpp file.
Linker would fail because class Foo would be defined more than once.
5
If you write the whole code in the same file, then it will make your code ugly.
Secondly, you will not be able to share your written class with others. According to software engineering you should write client code separately. The client should not know how your program is working. They just need output If you write the whole program in the same file it will leak your program’s security.