Search

6.8b — C-style string symbolic constants

By Alex on August 15th, 2015 | last modified by Alex on March 3rd, 2018

C-style string symbolic constants

In the lesson 6.6 -- C-style strings, we discussed how you could create and initialize a C-style string, like this:

1

2

3

4

5

6

7

8

9

#include <iostream>

intmain()

{

charmyName[]="Alex";

std::cout<<myName;

return0;

}

C++ also supports a way to create C-style string symbolic constants using pointers:

1

2

3

4

5

6

7

8

9

#include <iostream>

intmain()

{

constchar*myName="Alex";

std::cout<<myName;

return0;

}

While these above two programs operate and produce the same results, C++ deals with the memory allocation for these slightly differently.

In the fixed array case, the program allocates memory for a fixed array of length 5, and initializes that memory with the string “Alex\0”. Because memory has been specifically allocated for the array, you’re free to alter the contents of the array. The array itself is treated as a normal local variable, so when the array goes out of scope, the memory used by the array is freed up for other uses.

In the symbolic constant case, how the compiler handles this is implementation defined. What usually happens is that the compiler places the string “Alex\0” into read-only memory somewhere, and then sets the pointer to point to it. Because this memory may be read-only, best practice is to make sure the string is const.

For optimization purposes, multiple string literals may be consolidated into a single value. For example:

1

2

constchar*name1="Alex";

constchar*name2="Alex";

These are two different string literals with the same value. The compiler may opt to combine these into a single shared string literal, with both name1 and name2 pointed at the same address. Thus, if name1 was not const, making a change to name1 could also impact name2 (which might not be expected).

Also, because strings declared this way are persisted throughout the life of the program (they have static duration rather than automatic duration like most other locally defined literals), we don’t have to worry about scoping issues. Thus, the following is okay:

1

2

3

4

constchar*getName()

{

return"Alex";

}

In the above code, getName() will return a pointer to C-style string “Alex”. This is okay since “Alex” will not go out of scope when getName() terminates, so the caller can still successfully access it.

Rule: Feel free to use C-style string symbolic constants if you need read-only strings in your program, but always make them const!

std::cout and char pointers

At this point, you may have noticed something interesting about the way std::cout handles pointers of different types.

Consider the following example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

#include <iostream>

intmain()

{

intnArray[5]={9,7,5,3,1};

charcArray[]="Hello!";

constchar*name="Alex";

std::cout<<nArray<<'\n';// nArray will decay to type int*

std::cout<<cArray<<'\n';// cArray will decay to type char*

std::cout<<name<<'\n';// name is already type char*

return0;

}

On the author’s machine, this printed:

003AF738
Hello!
Alex

Why did the int array print an address, but the character arrays printed strings?

The answer is that std::cout makes some assumptions about your intent. If you pass it a non-char pointer, it will simply print the contents of that pointer (the address that the pointer is holding). However, if you pass it an object of type char* or const char*, it will assume you’re intending to print a string. Consequently, instead of printing the pointer’s value, it will print the string being pointed to instead!

While this is great 99% of the time, it can lead to unexpected results. Consider the following case:

1

2

3

4

5

6

7

8

9

#include <iostream>

intmain()

{

charc='Q';

std::cout<<&c;

return0;

}

In this case, the programmer is intending to print the address of variable c. However, &c has type char*, so std::cout tries to print this as a string! On the author’s machine, this printed:

Q╠╠╠╠╜╡4;¿■A

Why did it do this? Well, it assumed &c (which has type char*) was a string. So it printed the ‘Q’, and then kept going. Next in memory was a bunch of garbage. Eventually, it ran into some memory holding a 0 value, which it interpreted as a null terminator, so it stopped. What you see may be different depending on what’s in memory after variable c.

This case is somewhat unlikely to occur in real-life (as you’re not likely to actually want to print memory addresses), but it is illustrative of how things work under the hood, and how programs can inadvertently go off the rails.

Just to clarify in regards to printing the address of a char with a single character. Because std::cout assumes it's dealing with a string, does this mean that a char with a single character doesn't have a null terminator and that's why it keeps printing garbage until it hits a 0?

In the article, you state the following:
"...Multiple string literals with the same content may point to the same location."

I was surprised to see two different memory addresses. I believe I'm not quite understanding what's happening with the getName() function and how memory is being allocated for the string literal.

Also, as a follow-up question, you state this in one of your comments:
"Personally, I only use const char* if I’m hardcoding a string that will only be displayed (e.g. the application name). Otherwise, std::string for everything (or my own string class)."

I'm wondering - why did you implement getName() as a function? If I'd like to use some sort of constant string in my code, is there any advantage of using a function over something like:

You're getting different addresses because you're printing the addresses of variables name1 and name2, not printing the address that they're holding (which is the address of "Drew"). You need to remove the ampersands from name1 and name2 in your cout statements.

In the case where you want to reuse a C-style string from different areas of the code, you really have two options: use a function like getName(), or define the string as a global constant. I used a function here just to show it could be done, but in reality I'd favor a global constant.

Now, if I got it right, both of them are possible, but they are different assignments: in the second case I am assining a char pointer to a char pointer, in the first case, I am a char value to the dereferentiation of a char pointer, right?

So, this is creating me a little confusion. How can the compiler tell the two situations apart?

Hello
Alex,
1)As you mentioned there is no scoping issues and it has static duration.(in getName())
I have shown printed results on comment.
-Does below sub function’s "int *subFun1()"returned value have automatic duration?
The second function’s (getName()) returned value has static duration, as you stated above;
Static storage duration objects have entire program life duration.
-How we can check out that either of sub function’s (subFun1(),getName()) returned value have entire program life duration?
Or
-I didn’t get why this below dereferenced pointer prints 0 in main,?May be it is automatic duration. std::cout<<*ptrFun1<<'\n';//prints 0
So a bit confusing here;
//=============================================================================
#include<iostream>

return 0;
}
//=====================================================================================
2) It is something new for me. I have been reading from the 1’s chapter till this one. “4.3a — Scope, duration, and linkage summary” and “4.3 — Static duration variables” chapters just mentioned few things about them.
-Is it important part of c++ class? If it is so, is there any reference for us ?

In subFun1(), ptr has automatic duration and will be destroyed at the end of the function. Because ptr is destroyed, this function returns an invalid address, and dereferencing it will lead to undefined behavior (hence the inconsistent results)

In getName(), "Alex" has static duration because string literals are treated specially. For this reason, you can return their addresses back to the caller and use them.

I don't understand what you are asking in question #2. Can you clarify?

1a)
>Because ptr is destroyed, this function returns an invalid address, and dereferencing it will lead to undefined behavior (hence the inconsistent results)
-Is this a misstep if function is returning addresses of a stack-allocated local variable?
For example, see below program pointer to int (ptr) returned to main. In main the address of x (&x) is destroyed and ptrFun1 now points to the address of x which is not existed.(invalid address)
It works well, prints the result 6, as you see.
-Even though, prints the correct result. Most probably it is random as well. (prints 6).
Am I right?

1b) If subFun1() returns just address of x directly to the main(return &x) instead of returning it through pointer (return ptr), my compiler (DevC++) gives Warnning:
([Warning] address of local variable 'a' returned [-Wreturn-local-addr])

It seems, both of them are identical whether you return addresses of x directly (&x) or through pointer (ptr)

- What is the difference? Why does it give warning only when the address of x (return &x) is returned directly not through pointer. (return ptr)
//=========================================================
2)
>I don’t understand what you are asking in question #2. Can you clarify?

I meant, will we have a separate lesson for " weather the string literals have automatic duration or static duration when they returned to the main?. I think, it is not necessary now (because you almost explained),if you could clarify below this.

The C++ standard mandates that C-style string literals have static duration. As you said string literals are treated specially.
-What about C++ std::string? Does C++ standard mandate which is the type of std::string literals have static duration?
-Does getName() function's returned value have static duration, see below code?

1a) Yes, you should never return a local variable by reference or address. Once that variable goes out of scope, your reference or address will point to memory that may be reused for other purposes, leading to undefined behavior.
1b) The programs work identically, but in some cases your compiler can be smarter about warning you when you're making a mistake. If you directly return the address of a local variable, your compiler can easily detect this, and should complain. If you're returning a pointer, the compiler has a harder time determining whether it's pointing to something valid or not, so it may not complain.
2) Only C-style string literals have special handling. std::string is treated normally, like any other object. For the getName() case, no, the returned std::string has expression scope (it will die at the end of the expression it's returned into). So you either need to assign the return value to another variable, or use it to initialize a const reference, which will extend the lifetime of the return value to match the lifetime of the reference.

With respect to the second point of your last reply in which you wrote:
"For the getName() case, no, the returned std::string has expression scope (it will die at the end of the expression it’s returned into). So you either need to assign the return value to another variable, or use it to initialize a const reference, which will extend the lifetime of the return value to match the lifetime of the reference."

Let's say I have the following:

1

2

std::stringstr=getName();//method 1

conststd::string&str1=getName();//method 2

Left to me, I would have adopted the KISS mantra in programming and used method 1 but I want to believe there are scenarios that may warrant using method 2. What are the special cases in which one method would be preferred over the other? In other words, what are the peculiarities and advantages of choosing one method over the other?

Method #2 avoids making a copy of the returned value, whereas method #1 may make a copy (depending on whether a move constructor exists -- in this case, it does, so the copy would be avoided, but that's not always the case. So method #2 is possibly more performant.

Yes. getName() returns a char pointer to the C-style string. If you dereference this string (via operator*), it will give you the value that char pointer is actually pointing to, which is the first element of the array.

Hello, i have a question about the sentence: "string declared this way are persisted throughout the life of the program, we don't have to worry about scoping issues." Why is that true?
Thanks! and needless to say, incredible tutorial.

The C++ standard mandates that C-style string literals have static duration (instead of automatic duration like most other locally defined values). This was probably done for efficiency reasons, as copying strings is expensive, and it would be easy to inadvertently to end up with a dangling pointer otherwise.

If you remove the ampersands from in front of the array elements, it'll print each array element (of type char) individually. However, with the ampersand, you're passing std::cout a pointer to a char, which it will interpret as a C-style string, and print all of the array elements onwards until it encounters a null terminator.

This only happens for char pointers due to the way std::cout interprets those. std::cout won't exhibit that behavior for pointers to other types of types.

Hi Alex, many thanks for the efforts put into these tutorials. They're so much clearer on the concepts than the books I've came across!

I understand that std::cout implicitly inteprets a char* pointer as the array's string value. In this case, how do I find out what the memory address of the read only memory variable created as the & operator only gives me the address of the pointer? Thanks.

In one of the comments you mentioned:
"String literal “Alex” is treated as a C-style array of const chars. As you’ve learned, arrays can decay into pointers to the first element of the array. Thus, const char* myName = “Alex” assigns myName the address of the first character in “Alex”."

However because std::cout prints char pointers as strings I casted the pointer to a const general pointer type using a static_cast, and found that this was indeed the case!

We can see proof of this in the following code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

intmain()

{

constchar*name="Alex";

std::cout<<name<<"\n\n";// prints 'Alex'

// prints the contents of name, which decays into a pointer

// holding the address of the first element of the array (using static_cast)

std::cout<<static_cast<constvoid*>(name)<<"\n\n";

// prints the address of each char using static_cast

std::cout<<static_cast<constvoid*>(&name[0])<<'\n';// prints '&A'

std::cout<<static_cast<constvoid*>(&name[1])<<'\n';// prints '&l'

std::cout<<static_cast<constvoid*>(&name[2])<<'\n';// prints '&e'

std::cout<<static_cast<constvoid*>(&name[3])<<'\n';// prints '&x'

return0;

}

On my machine the above printed:
Alex

010A9B30

010A9B30
010A9B31
010A9B32
010A9B33

As you can see the address of "Alex", which decays into a pointer to the first element of the array (in this case ‘A’), is exactly the same as the address of ‘A’.

Perhaps you could consider adding this example to the lesson (as well as your comment I've quoted above), to facilitate the comprehension of C-style symbolic constants.

You're welcome! The only reason I suggested adding the code to the lesson is because when I read through your lessons I don't always read through the comments. Maybe you could make a reference to the code I've provided. However, I'll leave that up to you 😉

> “Multiple string literals with the same content may point to the same location. Because there’s no guarantee that this memory will be writable, best practice is to make sure the string is const.”

Let’s say I have this:
char *name1 = “Alex”;
char *name2 = “Alex”;

You’d expect if I did this: name1[1]=’r’, that name1 would now be “Arex” and name2 would now be “Alex”. But some compilers will only keep one copy of string literal “Alex”, and set both name1 and name2 to point at that literal.

I tried to execute this scenario which result in segmentation fault.
#include<iostream>
#include<cstring>
#include<stdio.h>
using namespace std;

If you want a pointer to be available in the scope of multiple functions, you have a few options:
1) Best option: pass it as a parameters to the functions that need it
2) Worst option: declare it as a global variable

I'm a little confused about the usage of const. It states that it places the string into a read-only memory. But I think that I'm still able to change whats on that memory address. For example if I do the following. I'm pretty sure that I'm misunderstanding something here.

1

2

3

constchar*myName="John";

myName="Doe";

Can you give an example of something that you can do without const that isn't possible when you use the const keyword.

This means we're allocating a normal (non-const) pointer named myName to a C-style string of type const char. This means the pointer treats the value being pointed to as const. However, because the pointer itself is a non-const pointer, it can be changed to point at a different string (which is what you're doing when you assign it to string "Doe".

With the above string, you couldn't do this:

1

2

constchar*myName="John";

myName[0]='B';// error: can't change the J to a B because myName is pointing to a const value

"Because there’s no guarantee that this memory will be writable, best practice is to make sure the string is const."

Do you mean unwritable? If you meant that you would WANT it to be writable, wouldn't const defeat that purpose?

EDIT: I read a post of yours above and now get it. I'd like to suggest (just an innocent suggestion, not trying to be rude) that perhaps you mention that you make it const so it doesn't become affected by what happens to other entries.

Thanks for the reply. Does it mean that the string literal "Alex", was put in a char array "variable" somewhere in the memory address prior to assigning it to the pointer? I am somehow confused how it skips the process of assigning this string literal "Alex" in a char array "variable" initialization. But the rest of the process I now understand.

In the previous lessons, you taught that pointers only hold memory address thus, we can't initialized it with a value other than the memory address of that value. But how come the pointer "myName" is initialized with a string. Although, arrays are somehow identical with pointers as the previous lessons denotes but different is size etc., and C-strings are of array type, is this has something to do with it?

String literal "Alex" is treated as a C-style array of const chars. As you've learned, arrays can decay into pointers to the first element of the array. Thus, const char* myName = "Alex" assigns myName the address of the first character in "Alex". We can then pass this to std::cout, which knows that char pointers should be printed as strings.

In the assignment
myString="Andy";
only the pointer myString changes and points to a new string. But the constant string "Name" does not change in the memory. However, this brings up another question.

After the above assignment I loose the address of the constant string "Name" which is sitting somewhere in the memory. Since this string was defined to be constant, its corresponding memory should not be assigned to another application as long as it does not go out of scope. Does this imply memory leak?

> After the above assignment I loose the address of the constant string "Name" which is sitting somewhere in the memory.

Yes, unless you've copied the address into another pointer, once you've assigned myString to another string literal, the address of the original string literal is lost.

> Since this string was defined to be constant, its corresponding memory should not be assigned to another application as long as it does not go out of scope. Does this imply memory leak?

Not really. A "memory leak" means we've lost track of some bit of dynamically allocated memory that now can't be returned to the OS for reassignment to another program while your program is running.

String literals aren't dynamically allocated, and there's no way to return them to the OS. They get set up when your program starts, and destroyed when your program ends, so your program owns that memory for the entire time it's running, whether you're using it or not.

The thing to note here is that myString points to a const string (of type const char*). It is not a const pointer itself! This means the pointer can be changed to point at another string, which you do in your example by assigning it to a different string literal.

> “Multiple string literals with the same content may point to the same location. Because there’s no guarantee that this memory will be writable, best practice is to make sure the string is const.”

Let's say I have this:
char *name1 = "Alex";
char *name2 = "Alex";

You'd expect if I did this: name1[1]='r', that name1 would now be "Arex" and name2 would now be "Alex". But some compilers will only keep one copy of string literal "Alex", and set both name1 and name2 to point at that literal. So when you change the value of name1, you end up inadvertently changing the value of any other string pointing to that location (in this case, name2).

Because of this, it's really only safe to use C-style strings for "read-only" purposes.

hey thanks alex..
when code gets longer these small error or small imporatnt issue becomes nightmare
i modified my program according to your suggestion and it saves a lot of runtime and runtime error and preincrement helps too it saves time and one thing more i m missing here was to define limit of no of shifts.. it gives some direction to while loop.
if(shift>25||shift<0)
break;
and yes you are absolutely correct i saved my code without commenting and day after i opene it again it takes me years to realize what is going on with my code? what is this?. I will never forgot to commenting my code now i am finding how all these things are life saver.

Hey!! alex please help me to figure out run time error in this program i just don't know program is running smoothly but still there is some run time error.
#include<iostream>
#include<string>
using namespace std;
int main(){

very much thanks for reply.
the problem is to shift each letter 2 places further through the alphabet (e.g. 'A' shifts to 'C', 'R' shifts to 'T', etc.). At the end of the alphabet we wrap around, that is 'Y' shifts to 'A'. We can, of course, try shifting by any number.
The input contains several test cases. The first line of input contains an integer variable that indicates the number of test cases. Each test case is composed by two lines. The first line contais a string that is a codified sentence. This string will contain between 1 and 50 characters, inclusive. Each character is an uppercase letter ('A'-'Z'), that is the codified sentence to this modified string. The second line contains the number of right shift, this value is between 0 and 25, inclusive.

Awesome...we'll wait.
May be a typo here:
"Why did the int array print an address, but the strings printed strings?
should be: "Why did the int array prints an address, but the char array printed string?"

On the first line below the "std::cout and char pointers" headline - I believe you meant to write std::cout instead of std::string. As in "At this point, you may have noticed something interesting about the way std::COUT handles pointers of different types."