I found an apparent contradiction in the c++ text having to do with the result of the c_str()
function operating on std:strings
(in my copy, the definition and contradiction are on p1040).
First it defines the c_str()
function as something that produces a ‘C-style’ (zero-terminated) string, but later it talks about how a C++ c_str value can have embed a ‘C’-style, end-of-string terminators (i.e. NUL’s) embedded in the string (that is defined by being NUL terminated).
Um… does anyone else feel that this is a ‘stretching’ of the definition of a C-string beyond it’s definition? I.e. I think what it means, is that if you were to look at the length()
function as applied to the string
, it will show a different end of string than using the C-definition of a z-string — one that can contain any character except NUL, and is terminated by NUL.
I likely don’t have to worry about it in my of my programs, but it seems like a subtle distinction that makes a C++ c_str
, not really a ‘C’-string. Am I misunderstanding this issue?
Thanks!
4
The c_str
function returns a pointer to the string’s contents, ensuring that the data is followed by a NUL (zero) character.
┌─────────────data─────────────┐ end
48 65 6C 6C 6F 20 77 6F 72 6C 64 00
H e l l o ␠ w o r l d ␀
However, the string’s data itself may contain a NUL character. (This is possible because std::string
has to store its length explicitly instead of depending on null termination.) When this happens, strlen(str.c_str())
will return a smaller value than str.length()
.
┌─────────────data─────────────┐ end
48 65 6C 6C 6F 00 77 6F 72 6C 64 00
H e l l o ␀ w o r l d ␀
└────────────┘
string seen by strlen() etc.
The above is the equivalent of doing strlen("Helloworld")
in C. The string as seen by the C function is a left-substring of the original string.
Sometimes this causes data loss or even a security risk, but what else could c_str()
do in this situation?
1
In C, NULL
is the same as 0x00. Or null is the same as zero.
And just to be clear, NUL
is an equivalent name of NULL
.
So Stroustrup’s text is correct about how C++ c_str
are terminated.
To provide a bit more context. It’s fairly common practice in C to memset a character array after creating it in order to make sure it will be null terminated.
char foo[20];
memset(foo, 0x00, 20);
strlen(foo); /* Should return 0 */
Please forgive me if my syntax is off a bit. And for the use of magic numbers.
2