I’m new to C++, coming from Java.
In Java, all variables (except for primitives) are essentially pointers. They hold the address of whatever they’re ‘holding’.
So any Java data structure stores it’s data by reference. You can also store by value, i.e. save and return a copy of any item you store, but that would take extra work and isn’t native to the language.
For example, the collections ArrayList
, HashSet
, and a simple array all store the addresses of the items they ‘store’, and not the actual items.
However in C++, you have a choice: when implementing a container class, you can either store and return to the user items by value or by reference.
For example, here’s a simple Stack
class I wrote (omitted irrelevant stuff):
template <typename T> class Stack {
public:
Stack(...) : ... { }
void push(const T& item) {
if(size == capacity - 1)
enlargeArray();
data[indexToInsert++] = &item;
size++;
}
const T& pop() {
const T& item = *data[indexToInsert - 1];
data[indexToInsert - 1] = 0;
indexToInsert--;
size--;
return item;
}
int getSize() const {
return size;
}
private:
const T** data;
int indexToInsert;
int size;
int capacity;
void enlargeArray() {
// omitted
}
};
This data structure takes and returns data by reference. push
takes a const reference, and pop
returns a const reference. The backing array is an array of pointers, not objects.
However push
could also look like so:
void push(T item) {
if(size == capacity - 1)
enlargeArray();
data[indexToInsert++] = item;
size++;
}
And pop
could return a T
, not a const T&
, etc.
My question is: what is the preferred approach in C++? Is there a preferred approach? Which approach should I normally take when implementing ‘container’ classes?
3
Firstly, you probably shouldn’t implement a container class. 95% of the time you should one included in the standard library. If you just want to learn, or are in the 5%, carry on.
If you are defining a template, leave the decision up to your users. You users can use:
Stack<Foo>
if they want by value.
Stack<Foo*>
if they want by pointer.
Stack<std::unique_ptr<Foo>>
if they want pointers that clean up after themselves.
When choosing which to use, you should default to by value, unless you’ve got a good reason to do something different. Inside your stack class, just store everything by value. If the use of the template needs indirection, they can use T=pointer type.
Looking at your code:
void push(const T& item) {
if(size == capacity - 1)
enlargeArray();
data[indexToInsert++] = &item;
size++;
}
You can’t do that. &item
records the pointer to whatever was passed in. But you have no idea how long the pointer will be valid for. It could become invalid right after push finished. In that case, you’ve stored a pointer to an invalid place. In general, you can’t assume that a pointer remains valid. You should instead be copying the item.
5
My question is: what is the preferred approach in C++? Is there a preferred approach? Which approach should I normally take when implementing ‘container’ classes?
In C++ you can keep objects by:
- value
- reference
- pointer
- smart pointer (std::unique_ptr, std::shared_ptr, YourPointerClass).
(you didn’t mention the last two).
Each of these is valid for different situations and imposes different constraints, concerning objects ownership, lifetime management and polymorphic behavior ( that is, there isn’t a prefered approach – there are many of them 🙂 ):
-
use storing by value when:
- you are not storing objects with polymorphic behavior (i.e. objects are concrete classes with no base class and no virtual functions, or all objects are of the same runtime type – you are not storing specializations of a base class)
- your container owns the objects (i.e. the contained objects should have a lifetime equal to that of the container, and will be destroyed when the container is destroyed)
-
use storing by reference when:
- the stored objects are not owned by the container
- your contained objects have a larger scope and lifetime than the container.
- you are interested in polymorphic behavior (in this case you should store references to a base class)
Normally you should not do this, as there is a chance of slicing on assignment (unless your class hierarchy supports polymorphic assignment – which is another discussion altogether)
-
use storing by raw pointer when:
- your stored objects expose polymorphic behavior (base class and/or virtual functions)
- the container doesn’t own the stored objects
-
use storing by smart pointer when:
- your stored objects expose polymorphic behavior
- your objects are owned by the container (std::unique_ptr), ownership is shared (std::shared_ptr) or osage of smart pointer is imposed by client code constraints.
- the stored objects are expensive to copy/instantiate (edit cf. @NirFriedman)
To support all this, you will probably want to template your class by the stored type and fill in as needed, in client code.
2
Depends on if the container “owns” the object and they are a base class.
-
If the container doesn’t own the object then you should use pointers and be careful of the danglers (make sure no container holds a pointer to the object when it is destroyed).
-
If the container owns the object and is not a base class then you should store by value.
-
Otherwise (container owns and there are subclasses) you should store using a
std::unique_ptr
to ensure no slicing happens on store and proper destruction.