Sunday, August 14, 2016

Transition from C to C++: Rule of Three (C++98)

In the previous post, we have looked at how we can declare constructor and destructor to take care of memory management behind the scene, so that the programmer does not need to worry about memory allocation when using local objects. However, the code has some significant mistakes that need to be fixed. Let's see what will happen in the following code.

As the error message states, the problem is that upon exiting main function, destructor for Word hello and Word hi is called. However, because the two objects are identical, due to calling the implicit copy constructor, their member word is pointing to the same location. In the destructor, free(word) is called, this is OK for the first time, but when it is called the second time, it crashes and provides the error message.

It can be more clear by looking at the disassembly code from gdb.

Dump of assembler code for function main(int, char**):

0x00000000004006b6 <+0>:push rbp

0x00000000004006b7 <+1>:mov rbp,rsp

0x00000000004006ba <+4>:push rbx

0x00000000004006bb <+5>:sub rsp,0x38

0x00000000004006bf <+9>:mov DWORD PTR [rbp-0x34],edi

0x00000000004006c2 <+12>:mov QWORD PTR [rbp-0x40],rsi

0x00000000004006c6 <+16>:mov rax,QWORD PTR fs:0x28

0x00000000004006cf <+25>:mov QWORD PTR [rbp-0x18],rax

0x00000000004006d3 <+29>:xor eax,eax

0x00000000004006d5 <+31>:lea rax,[rbp-0x30]

0x00000000004006d9 <+35>:mov rdi,rax

0x00000000004006dc <+38>:call 0x400734 <Word::Word()>

0x00000000004006e1 <+43>:lea rax,[rbp-0x30]

0x00000000004006e5 <+47>:mov esi,0x400884

0x00000000004006ea <+52>:mov rdi,rax

0x00000000004006ed <+55>:call 0x400774 <Word::setWord(char*)>

0x00000000004006f2 <+60>:mov rax,QWORD PTR [rbp-0x30]

0x00000000004006f6 <+64>:mov QWORD PTR [rbp-0x20],rax

0x00000000004006fa <+68>:mov ebx,0x0

0x00000000004006ff <+73>:lea rax,[rbp-0x20]

0x0000000000400703 <+77>:mov rdi,rax

0x0000000000400706 <+80>:call 0x40074a <Word::~Word()>

0x000000000040070b <+85>:lea rax,[rbp-0x30]

0x000000000040070f <+89>:mov rdi,rax

0x0000000000400712 <+92>:call 0x40074a <Word::~Word()>

0x0000000000400717 <+97>:mov eax,ebx

0x0000000000400719 <+99>:mov rdx,QWORD PTR [rbp-0x18]

0x000000000040071d <+103>:xor rdx,QWORD PTR fs:0x28

0x0000000000400726 <+112>:je 0x40072d <main(int, char**)+119>

0x0000000000400728 <+114>:call 0x400570 <__stack_chk_fail@plt>

0x000000000040072d <+119>:add rsp,0x38

0x0000000000400731 <+123>:pop rbx

0x0000000000400732 <+124>:pop rbp

0x0000000000400733 <+125>:ret

In both <+80> and <+92>, destructor Word::~Word is called. However, they are exact copies, so we are trying to free char* word twice. Note that Word hello is constructed in <+38>, whose stack address is $rbp-0x30, and its entire content is copied over to $rbp-0x20 in <+60> and <+64>, which is Word hi.

So, how should we fix this? Well, we simply need to explicitly declare copy constructor for class Word, in which we create a copy of the string pointed by word and have the copy object's word point to it. In fact, we also need to explicitly declare copy operator in a similar manner, because implicit copy operator will, again, make the exact copy, having two different objects point to the same string.

Below is the code that correct these.

#include <cstdio>

#include <cstring>

#include <vector>

#include <cstdlib>

class Word {

char* word;

public:

Word() {

word = NULL;

}

Word(const Word &that) {

if (that.word == NULL) {

this->word = NULL;

} else {

this->word = (char*)malloc(strlen(that.word));

strcpy(this->word, that.word);

}

}

~Word() {

if (word != NULL)

free(word);

}

Word& operator=(const Word& that) {

if (that.word == NULL) {

this->word = NULL;

} else {

this->word = (char*)realloc(this->word, strlen(that.word));

strcpy(this->word, that.word);

}

return *this;

}

void setWord(const char *word) {

if (this->word != NULL) {

this->word = (char*)realloc(this->word, strlen(word));

}

else {

this->word = (char*)malloc(strlen(word));

}

strcpy(this->word, word);

}

const char* getWord() {

return word;

}

};

int main (int argc, char **argv) {

Word hello; // default constructor

hello.setWord("hello");

Word hi(hello); // implicit copy constructor

printf("hello: %s\n", hello.getWord());

printf("hi: %s\n", hi.getWord());

hi.setWord("hi");

printf("hello: %s\n", hello.getWord());

printf("hi: %s\n", hi.getWord());

Word bye;

bye = hello;

printf("bye: %s\n", bye.getWord());

bye.setWord("bye");

printf("hello: %s\n", hello.getWord());

printf("bye: %s\n", bye.getWord());

return 0;

}

/* output

hello: hello

hi: hello

hello: hello

hi: hi

bye: hello

hello: hello

bye: bye

*/

With the proper modification, we now do not see any error, and the program runs as expected. In C++, there is so-called the rule of three:

If one has to explicitly declare any of class destructor, copy constructor, or copy assignment operator, then perhaps one needs to explicitly declare all of them.

Initially, we only declared explicit destructor, which caused a serious problem. Now that we have declared all three of them, we are in good shape.

Some other minor fixes to the above code is declaring the argument as const char* in setWord, and returning const char* for getWord method.