OK, I’m facing this all the time in many functions I write, which should I use?
void sth(int* a)
void sth(int& a)
Which one is faster, regarding two separate occasions: a is a small variable or a is a large data struct.
I would like a deep answers with pertinence to the actual hardware and stack process.
7
Most compilers will implement references as pointers. So the deep answer to your question is that there will be absolutely no difference in terms of performance between the two. (Doesn’t change aliasing analysis either as far as I know.)
If you want to be 100% sure of that statement, inspect your compiler’s output.
struct Small {
int s;
};
void foo(Small* s)
{
s->s = 1;
}
void bar(Small& s)
{
s.s = 1;
}
Compiled with clang++ -O2
, saving the assembly:
_Z3fooP5Small: # @_Z3fooP5Small
.cfi_startproc
# BB#0:
movl $1, (%rdi)
ret
_Z3barR5Small: # @_Z3barR5Small
.cfi_startproc
# BB#0:
movl $1, (%rdi)
ret
You can try that with a large struct or an enormously complex struct – doesn’t matter, all you’re passing in to the function is a pointer.
That being said, there are semantic differences between the two. The most important one being that, as long as your program is free of undefined behavior, the overload that takes a reference is guaranteed to get a reference to a valid, live object. The pointer overload isn’t.
Also assigning to s
in these two examples has completely different meanings. It would replace the pointer in the first function (i.e. whatever it pointed to remains unchanged, but becomes unreachable from within that function; caller unaffected by the assignment).
In the second, it would call the appropriate assignment operator en the object passed in (effect visible from the caller).
So your choice shouldn’t be made on a potential performance difference (there will generally be none), but on semantics. What you need the function to be able to do, and how you should be able to call it, will dictate what overload(s) you need to provide.
4
The main semantic differerence between int*
and int&
is that the former allows passing of NULL
or uninitialized values, and the latter does not. So the implementation of a function using pointers should look like this:
void sth(int* a)
{
if(a==NULL)
{
// handle NULL case
}
else
{
// do something with *a
}
}
When using references, you can omit that special NULL handling within the function.
So if the function you are going to write does not explicitly has a special need to allow NULL values as input, use int&
. See also this Wikipedia entry.
Note that you should not make your decision based on which of the 2 alternatives is faster. Your first priority should be correct code, not any micro-optimizations, which I would expect in this case to be neglectable.
8
The most important difference between references and pointers is that you cannot free an object through a reference while it is possible to do it through a pointer.
Thus, selecting the reference type instead of the pointer type for an argument a
in a method of an object b
advertises that ownership of a
is not transferred to b
.
(The common belief, that it is not possible to pass a dereferenced NULL as a reference to a method without cheating is wrong. Most methods creating an object —e.g. clone
or factories—will return a pointer, NULL or not. If the method you want to call with the freshly created object uses references, you have to dereference the pointer.)
Reference and Pointers are two implementation of a same concept: indirection (that is “talk about something through a pronoun“)
At machine level they are the same thing (index of a memory cell), so there in no performance distinction.
At language level the main difference is mostly in being “explicit” and “mutable”:
- pointer dereferencing is explicit: given
pa = &a;
a.x
is the same ofpa->x
- reference dereferencing is implicit: given
ra = a;
a.x
is the same asra.x
The identical syntax inside expressions makes reference more suitable in generic functions, since the way they will be expressed won’t change whether the access to the variable is direct or indirect.
- pointer are mutable:
pa = &a1; ...; pa = &a2;
or++pa
orpa[x]
are all possible - reference are unmutable:
ra = a1; ... ; ra= a2;
in fact assign the a2 value to a1 (thus playing a different game)
The mutable nature of pointers make them more suitable implementing generic iterators.
It is like talking about fixed versus adaptive wrenches. Their different shape makes their usability to change respect to certain context. But for the screw standpoint, they are just wrenches.
This may be implementation dependent, but I believe references do not physically exist in memory, in this sense they are somewhat like an alias.
For example
int a;
int* ptr1= &a;
int* ptr2= &a;
Creates two pointers in memory.
While
int a;
int* ptr1= &a;
int& ref= a;
Only creates one pointer in memory whereas the reference does not occupy any physical space in memory.
Since pointers take up very little space, however, the performance differences are going to be negligible in most cases.
The main benefits of references are:
- They are guaranteed to be non-null
- What they “point” to cannot be changed
Also, from a programming perspective:
- They use the same syntax as regular variables, so you don’t need to de-reference with
*
and you use.
instead of->
I think this is why most people recommend you use references over pointers whenever possible, they are “safer” and have slightly less overhead (although the last point may be implementation dependent).
References are converted to pointers by the compiler so at run time there is no difference in speed. Speed is not the reason you choose references instead of pointers but in fact references might be a bit faster in your overall code simply because you don’t need to constantly check for invalid references (NULL pointers).
If a is a small variable (no larger than the machine address size), then there is no difference between a and &a. If a is larger than the machine’s address size (e.g. a class object) then &a is faster and &a must be done (rather than a).
Of course you need to consider whether you want to be able to edit a in the called function. If a is large and you want to forbid changing of a, pass a constant reference to a.
If using C++, you should always use references when possible. A good C++ book will list what the difference between references and pointers are. You should consider the following:
References are implemented underneath as pointers. So why use a reference? Because it allows the function writer to determine how a function works without affecting how it is called. With references, the caller of a function doesn’t need to know if the function takes a pointer or the object itself. For example, you call the following 2 functions the same way and notice that we can change add1 to use references without affecting all the callers:
int add1 (int a, int b);
int add2 (int &a, int &b) {
// this actually gets converted by the compiler to
// *a + *b
return a + b;
}
If you were using pointers, you’d have to know that the function is actually taking the address, not the object itself. In other words, references add to information hiding.
So to sum up, references are just syntactic sugar. The compiler will convert all references to pointers and operation to references to valid operations with/on pointers.
Second, unlike pointers – a reference is the object. With pointers, you can change the object pointed to or you can change the pointer itself (in which case it will point to something else). With a reference there’s only one thing you can change – the referred object.
Third, references cannot be re-assigned to refer to another object like a pointer can. Often, in linked lists you have to move pointers forward/backwards. You can’t do this with a reference. Think of a reference as an alias to an object.
Lastly, about the NULL…in C code you will see a lot of NULL pointer checking. References are meant to help with that by allowing the compiler to catch whenever a reference doesn’t refer to a valid object. The C++ standard also says that a reference cannot point to an invalid object (e.g. a NULL). However, I think it’s up to the compiler on how it implements that so you need to double check with your compiler. I think some compilers will not even warn you.
4