After more than two decades of C++ programming, I have finally reached a point where I do not understand which types of pointer reinterpret-casts yield well-defined behaviour and which ones result in undefined behaviour due to strict aliasing rules…
I will be referring to the pointer-inverconvertibility clause from the C++ standard, in my case the C++17 standard, because this is the latest standard version that I have access to, but feel free to refer to any newer version. So, in section 6.9.2 Compound Types of the C++17 standard, paragraph 4 states:
Two objects a and b are pointer-interconvertible if:
- (4.1) – they are the same object, or
- (4.2) – one is a standard-layout union object and the other is a non-static data member of that object (12.3), or
- (4.3) – one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object (12.2), or
- (4.4) – there exists an object c such that a and c are pointer-interconvertible, and c and b are pointerinterconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast (8.2.10) . [Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
Now let’s consider the following piece of example code:
#include <iostream>
struct Foo
{
int x;
int y;
};
struct Bar
{
int a[2];
};
struct Qux
{
union
{
int a[2];
int dummy;
};
};
int main()
{
int* primes = new int[4];
primes[0] = 2;
primes[1] = 3;
primes[2] = 5;
primes[3] = 7;
Foo* foo = reinterpret_cast<Foo*>(primes);
std::cout << "Foo[1].y = " << foo[1].y << std::endl;
Bar* bar = reinterpret_cast<Bar*>(primes);
std::cout << "Bar[1].a[1] = " << bar[1].a[1] << std::endl;
Qux* qux = reinterpret_cast<Qux*>(primes);
std::cout << "qux[1].a[1] = " << qux[1].a[1] << std::endl;
}
I am now trying to figure out, which one (if any) of the three reinterpret_cast
calls and their subsequent console outputs yield well-defined behaviour.
According to my understanding of the above quoted section of the standard, the cast Foo* foo = reinterpret_cast<Foo*>(primes)
should be well-defined, because the first non-static data member of Foo
is an object of type int
and therefore by (4.3) Foo
and int
should be pointer-interconvertible.
On the other hand, the cast Bar* bar = reinterpret_cast<Bar*>(primes)
should result in undefined behaviour, because the first non-static data member of Bar
is an array of int
, which is not pointer-interconvertible to int
according to the note.
Now let’s take a look at the third cast Qux* qux = reinterpret_cast<Qux*>(primes)
: the first non-static data member of Qux
is an anonymous union, and so by (4.3) Qux
and that anonymous union are pointer-interconvertible. Furthermore, the non-static data member dummy
of the anonymous union is and int
object, and therefore by (4.2) the anonymous union is pointer-interconvertible with int
. Finally, by applying (4.4), Qux
is therefore pointer-interconvertible with int
and therefore the cast should be well-defined.
Now, since both foo
and qux
should be well-defined, accessing the fourth array element of primes, which stores the value 7, should be safe via both foo[1].y
as well as qux[1].a[1]
, right?
Am I right, that the above code does not break strict aliasing rules, since in all of the above examples, the objects accessed via the class members are all of type int
and therefore no type punning is happening?
Peter Zajac is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
6