I’m just now learning about C.
I find it odd that the creators chose the asterisk (*
) as the symbol for pointers rather than a symbol that actually looks like a pointer (->
).
Considering how confusing dereferencing and function pointers can be, is there a historical, or even practical, reason for using the asterisk?
4
Why does C use the asterisk for pointers?
Simply – because B did.
Because memory is a linear array, it is possible to interpret the value in a cell as an index in this array, and BCPL supplies an operator for this purpose. In the original language it was spelled
rv
, and later!
, while B uses the unary*
. Thus, ifp
is a cell containing the index of (or address of), or pointer to) another cell,*p
refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.
From The Development of the C Language
Thats it. At this point, the question is as uninteresting as “why does python 3 use .
to call a method? Why not ->
?” Well… because Python 2 uses .
to call a method.
Rarely does a language exist from nothing. It has influences and is based on something that came before.
So, why didn’t B use !
for derefrencing a pointer like its predecessor BCPL did?
Well, BCPL was a bit wordy. Instead of &&
or ||
BCPL used logand
and logor
. This was because most keyboards din’t have ∧
or ∨
keys and not equal was actually the word NEQV
(see The BCPL Reference Manual).
B appears to have been partially inspired to tighten up the syntax rather than have long words for all these logical operators that programmers did fairly frequently. And thus !
for dereference became *
so that !
could be used for logical negation. Note there’s a difference between the unary *
operator and the binary *
operator (multiplication).
Well, what about other options, like
->
?
The ->
was taken for syntactic sugar around field derefrences struct_pointer->field
which is (*struct_pointer).field
Other options like <-
could create ambiguous parsings. For example:
foo <- bar
Is that to be read as:
(foo) <- (bar)
or
(foo) < (-bar)
Making a unary operator that is composed of a binary operator and another unary operator is quite likely to have problems as the second unary operator may be a prefix for another expression.
Furthermore, it is again important to try to keep the things being typed frequently to a minimum. I would hate to have to write:
int main(int argc, char->-> argv, char->-> envp)
This also becomes difficult to read.
Other characters might have been possible (the @
wasn’t used until Objective C appropriated it). Though again, this goes to the core of ‘C uses *
because B did’. Why didn’t B use @
? Well, B didn’t use all the characters. There was no bpp
program (compare cpp) and other characters were available in B (such as #
which was later used by cpp).
If I may hazard a guess as to why – its because of where the keys are. From a manual on B:
To facilitate manipulation of addresses when it seems advisable, B provides two unary address operators,
*
and&
.&
is the address operator so&x
is the address ofx
, assuming it has one.*
is the indirection operator;*x
means “use the content of x as an address.”
Note that &
is shift-7 and *
is shift-8. Their proximity to each other may have been a hint to the programmer as to what they do… but that’s only a guess. One would have to ask Ken Thompson about why that choice was made.
So, there you have it. C is that way because B was. B is that way because it wanted to change from how BCPL was.
9
I was asked by a student if &
and *
were chosen because they were next to each other on the keyboard (something I had never noticed before). Much googling led me to B and BCPL documentation, and this thread. However, I couldn’t find much at all. It seemed like there were lots of reasons for *
in B, but I couldn’t find anything for &
.
So following @MichaelT’s suggestion, I asked Ken Thompson:
From: Ken Thompson < [email protected] >
near on the keyboard: no.
c copied from b so & and * are same there.
b got * from earlier languages – some assembly,
bcpl and i think pl/1.
i think that i used & because the name (ampersand)
sounds like “address.” b was designed to be run with
a teletype model 33 teletype. (5 bit baud-o code)
so the use of symbols was restricted.
1