In the Wikipedia article for “Undefined behaviour”, there is a section showing that this is undefined behaviour in C/C++:
int arr[4] = { 0, 1, 2, 3 };
int* p = arr + 5; // undefined behavior for indexing out of bounds
p = NULL;
int a = *p; // undefined behavior for dereferencing a null pointer
When you do int* p = arr + 5;
you’re incrementing the value of the pointer by 5 * sizeof(int)
, but there’s no dereferencing done, and no indexing of the array, so why is this undefined behaviour according to this?
13
It is undefined behavior, as that term is used in the C standard, because the C standard does not define pointer arithmetic for this case. Below, I quote the entire paragraph in the C standard about this, but here is a simpler explanation. For adding an integer N
to a pointer P
, the standard says that if P
points to A[i]
, where A[i]
is either an element of the array or is what would one beyond the last element, then P+N
is defined to point to A[i+N]
, as long as A[i+N]
is also either an element of the array or is what would be one beyond the last element.
That means if A[i+N]
would be anything else, the behavior is not defined, because there is no definition of what the result should be. But the standard goes further than just leaving a definition unstated; it explicitly states that if A[i+N]
would be anything else, the behavior is not defined.
So, if you have int A[4];
, then &A[2] + 2
is defined to point to just beyond the last element, but &A[2] + 3
is undefined.
You might think of address arithmetic in hardware terms. If you have the address of A[2]
in hardware and add the number of bytes of 3 elements, you get the address where A[5]
would be if A
were bigger. But a C implementation is not obliged to work with hardware addresses when doing its pointer calculations. Conforming to the C standard only requires it to support the calculations defined by the standard.
Text of the C standard
C 2018 6.5.6 8 says:
When an expression that has integer type is added to or subtracted from a pointer, the result has the
type of the pointer operand. If the pointer operand points to an element of an array object, and the
array is large enough, the result points to an element offset from the original element such that the
difference of the subscripts of the resulting and original array elements equals the integer expression.
In other words, if the expressionP
points to the i-th element of an array object, the expressions
(P)+N
(equivalently,N+(P)
) and(P)-N
(whereN
has the valuen
) point to, respectively, the i + n-th
and i − n-th elements of the array object, provided they exist. Moreover, if the expressionP
points to
the last element of an array object, the expression(P)+1
points one past the last element of the array
object, and if the expressionQ
points one past the last element of an array object, the expression
(Q)-1
points to the last element of the array object. If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past
the last element of the array object, it shall not be used as the operand of a unary*
operator that is
evaluated.
5