Saturday, April 20, 2013

Copy Constructor and Assignment Operator Overloading in C++

Copy Constructor


A copy constructor is a special class constructor. It is used to make a copy of an existing instance of a class. Consider the code segment below:

//Calls Circle Constructor
Circle c1(5.0);
//Calls Circle Copy Constructor
Circle c2 = c1;

By looking at the "=" operator assignment in the second statement, you might expect an assignment operator call. However, copy constructor is the one which is actually called. It is because copy constructor is the one which copies to newly created objects.

There are 3 general cases where the copy constructor is called:

1. When instantiating one object and initializing it with values from another object (as shown above)
2. When passing an object by value
3. When an object is returned from a function by value 

We'll see the examples of both cases, but I want to stress on one important point before we go on: When do we really need to define a copy constructor? Well, we should first note that the copy constructor is implicitly defined by the compiler, unless we define one. However, the copy constructor defined by the compiler does a member-wise copy of the source object. In other words, it does shallow copying. If the class has pointer variables and has some dynamic memory allocations, then we need to define a copy constructor which does deep copying. Otherwise we would have two pointers pointing to the same memory location. In this case, when one of the pointers is deleted, the other one will be left as a dangling pointer. This behavior is one reason for Segmentation Faults and Heap Corruption. You can have a look at my previous post to learn more about segmentation fault and dangling pointers.

Let's go on with a simple example:

class Circle {
private:
       double* ptr;
public:
       //Constructor
       Circle(double radius) {
             cout<<"Calling Constructor"<<endl;
             ptr = new double;
             *ptr = radius;
       }
       //Default Constructor
       Circle() {
             cout<<"Calling Default Constructor"<<endl;
             ptr = new double;
       }
       //Copy Constructor
       Circle(const Circle& obj) {
             cout<<"Calling Copy Constructor"<<endl;
             ptr = new double;
             //Deep Copy
             *ptr = *obj.ptr;
             ////Shallow Copy
             //ptr = obj.ptr;
       }
       //Destructor
       ~Circle() {
             cout<<"Deallocating dynamic memory!"<<endl;
             delete ptr;
       }
       double GetRadius() {
             return *ptr;
       }
       Circle GetCircle() {
             return *this;
       }
};

void PrintRadius(Circle obj) {
       cout<<"Radius = "<<obj.GetRadius()<<endl;
}

int main()
{
    //Normal Constructor will be called
    Circle c1(5.0);

    //Call by value: The argument object will be copied, Copy Constructor will be called
    PrintRadius(c1);

    //Object return by value: The return object will be copied, Copy Constructor will be called
    Circle c2 = c1.GetCircle();

    return 0;
}    

The output is as follows:

Calling Constructor
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Calling Copy Constructor
Deallocating dynamic memory!
Deallocating dynamic memory!

Please note that the class destructor is called for any object that goes out of scope, no matter the object is created with a normal or a copy constructor.

Let's go on with another example of the same class "Circle":

int main()
{
     //Normal Constructor will be called
     Circle c1(5.0);

     //An new object is initialized with the values of another object: Copy Constructor will be called
     Circle c2 = c1;

     //Object return by value: The return object will be copied, Copy Constructor will be called
     PrintRadius(c1);

    //Object return by value: The return object will be copied, Copy Constructor will be called
    PrintRadius(c2);
      
    return 0;
}

The output is as follows:

Calling Constructor
Calling Copy Constructor
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Deallocating dynamic memory!
Deallocating dynamic memory!


Assignment Operator 


The Assignment Operator is used to copy the values of one object to another "already existing" object. Consider the following example:

//Calls Circle Constructor
Circle c1(5.0);
//Calls Circle Default Constructor
Circle c2;

//Calls Assignment Operator for Circle
c2 = c1; 

An object for "c2" is already created. Therefore, the third statement does not call the copy constructor but it calls the assignment operator. The purpose of the copy constructor and the assignment operator are almost the same: both copy one object to another. However, the assignment operator copies to existing objects while the copy constructor copies to newly created objects.

As in the case of copy constructor, the compiler implicity defines an assignment operator if you don't define one, or in other words, if you don't overload it. However, the one defined by the compiler would do member-wise copying. As you remember, a class with pointers and dynamic memory allocations would need an assignment operator overloading which can achieve deep copy.

Now, it's time to have a more complicated example which comprises both the copy constructor and the assignment operator calls. In order to achieve this, we should better enhance our "Circle" class with assignment operator overloading

class Circle {
private:
       double* ptr;
public:
       //Constructor
       Circle(double radius) {
             cout<<"Calling Constructor"<<endl;
             ptr = new double;
             *ptr = radius;
       }
       //Default Constructor
       Circle() {
             cout<<"Calling Default Constructor"<<endl;
             ptr = new double;
       }
       //Copy Constructor
       Circle(const Circle& obj) {
             cout<<"Calling Copy Constructor"<<endl;
             ptr = new double;
             //Deep Copy
             *ptr = *obj.ptr;
             ////Shallow Copy
             //ptr = obj.ptr;
       }
       //Assignment Operator Overloading
       Circle& operator=(const Circle& obj) {
             cout<<"Calling Assignment Operator"<<endl;

             if(this == &obj) {
                    return *this;
             }
             delete ptr;
             ptr = new double;
             //Deep Copy
             *ptr = *obj.ptr;

             return *this;
       }
       //Destructor
       ~Circle() {
             cout<<"Deallocating dynamic memory!"<<endl;
             delete ptr;
       }
       double GetRadius() {
             return *ptr;
       }
       Circle GetCircle() {
             return *this;
       }
};

void PrintRadius(Circle obj) {
       cout<<"Radius = "<<obj.GetRadius()<<endl;
}


int main()
{
    //Constructor
    Circle c1(5.0);
    //Copy Constructor
    Circle c2 = c1;
    //Default Constructor
    Circle c3;
    //Assignment Operator
    c3 = c2;

    //Copy Constructor
    PrintRadius(c1);
    //Copy Constructor
    PrintRadius(c2);
    //Copy Constructor
    PrintRadius(c3);

    return 0;
}

And the output is as follows:

Calling Constructor

Calling Copy Constructor
Calling Default Constructor
Calling Assignment Operator
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Calling Copy Constructor
Radius = 5
Deallocating dynamic memory!
Deallocating dynamic memory!
Deallocating dynamic memory!
Deallocating dynamic memory!

Sunday, April 14, 2013

Smart Pointers, Reference Counting and Implementing Basic Garbage Collection in C++

Introduction

Many programming languages allow the usage of pointers (especially lower level programming languages such as C and C++). A pointer, simply, references a memory location  and it is used to obtain the value stored at that location. Pointers provide prominent flexibility to the code. They allow different sections of code to share information easily. Besides, they enable complex "linked" data structures like linked lists and binary trees. However, they can cause serious problems unless they are properly used. So, what are the most serious problems pointers may cause?

1. Segmentation Faults
2. Memory Leakage

Segmentation Fault is a specific kind of error which happens when you access a memory location that "does not belong to you". It's actually a helper mechanism that keeps you away from corrupting the memory. Getting a segmentation fault means that you are doing something wrong with the memory, e.g. you may be accessing a pointer that is not initialized or whose pointee in the heap is already deallocated (dangling pointers) or you may be writing to a read-only portion of memory. 

Memory leak is gradually losing the available dynamic memory when a program repeatedly fails to deallocate memory it has obtained for temporary use. As a result, the available memory for that application might become exhausted so the program might stop working.

Smart Pointers

A smart pointer is a wrapper class that encapsulates a regular C++ pointer in order to manage the life-cycle of the object being pointed to. To look and feel like a pointer, a smart pointer needs to have the same interface that a pointer does: it supports pointer operations like "dereferencing" (operator *) and "indirection" (operator ->). To be smarter than a regular pointer, a smart pointer needs to do some smart stuff. As I mentioned above, memory management issues like memory leaks, dangling pointers and allocation failures are handled by smart pointers.

Now, let's take a look at the code segment below:

public class SampleClass{
private:
       int num;
       char * text;
public:
       SampleClass() : num(0), text(0) { }

       ~SampleClass() { }

       void DoSomething() {
             num ++;
             text="Do something";
       }
};

int main()
{
       //Initialize (Allocate dynamic memory from heap)
       int* a = new int[3];
       double* b = new double[4];
       char* c = new char[5];
       SampleClass* s  = new SampleClass();

            s->DoSomething();
       //Now, deallocate memory
       if(a != 0) { delete a; }
       if(b != 0) { delete b; }
       if(c != 0) { delete c; }
       if(s != 0) { delete s; }
      
       cout<<"enter a key to end the program"<<endl;
       getchar();
       return 0;
}

As you see, you need to explicitly delete the memory after you use it. Otherwise, you will cause memory leaks. But, what if the DoSomething() function call above throws an exception? In this case, the program stops running right before you have the delete calls in order to deallocate the dynamic memory. Since the dynamic memory is not given back, your program causes memory leak.

That would be great if there were a mechanism like a Garbage Collector (in .NET or Java) which took care of the dynamic memory we had allocated, gave it back when we didn't need it anymore so that we didn't have to explicitly deallocate that memory portion. Such a mechanism would prevent memory leaks and provide exception-safe memory management. This is what smart pointers do for us actually.

Here is another example:

class Student {
private:
       char* pName;
       int score;
public:
       Student(char * name, int grade): pName(name), score(grade) { }

       ~Student() { }

       void Study(){
             if(score ==  99) {
                    score++;
             }
             else if(score <= 98){
                    score += 2;
             }
             cout<<pName<<" is studying. His/Her new score is "<<score<<endl;
       }

       void Sleep(){
             if(score >= 1){
                    score--;
             }
             cout<<pName<<" is sleeping. His/Her new score is "<<score<<endl;
       }

       void Party() {
             if(score >= 1){
                    score--;
             }
             cout<<pName<<" is partying. His/Her new score is "<<score<<endl;
       }
};

int main()
{
       Student* s = new Student("Mark Zuckerberg", 80);

       s->Study();
       s->Party();
       s->Sleep();

       delete s;

       getchar();
}

Let us define a basic smart pointer class for the Student* and use it:

public class SmartPointer{
private:
       Student* s;
public:
       SmartPointer(Student* ptr):s(ptr) { }

       ~SmartPointer() {
             delete s;
       }

       Student& operator* () {
             return *s;
       }

       Student* operator-> () {
             return s;
       }
};

int main()
{
       SmartPointer p(new Student("Mark Zuckerberg", 80));

       p->Study();
       p->Party();
       p->Sleep();

       getchar();
}

SmartPointer class wraps the pointer Student* and provides pointer operations "dereferencing" (operator *) and "indirection" (operator ->). As you see, the pointer is explicitly deleted inside the SmartPointer's destructor. This way, since the destructor is automatically called when the object goes out of the scope, the dynamic memory  deallocation is automatically handled by the SmartPointer.

We might go one step further and define a generic smart pointer class which does not only wrap Student* pointer but, might wrap any pointers:


template <typename T> class SmartPointer {
private:
       T* pData;
public:
       SmartPointer(T* p) : pData(p) { }

       ~SmartPointer() {
             delete pData;
       }

       //Operator Overloadings
       //Dereferencing
       T& operator* () {
             return *pData;
       }
       //Indirection
       T* operator-> () {
             return pData;
       }
};
And here is how we use it:


int main()
{
       SmartPointer<Student> p(new Student("Mark Zuckerberg", 80));
       p->Study();
       p->Party();
       p->Sleep();
}

Hoping that everything is clear up to here, I want to show you something else. Look at the code segment below:


       SP<Student> p(new Student("Mark Zuckerberg", 80));
       p->Study();
       {
             SP<Student> q = p;
             q->Party();
             //Lifecycle of q ends here. q's destructor is called. Remember that both p and q  
             //point to the same address.
             //If two pointers point to the same address and one of the pointers is deleted, 
             //meaning that the object pointed by that pointer is released, then the other       
             //pointer will be left with a dangling pointer. 
       }
       //This call will fail
       p->Sleep();

It does not matter whether it is a smart or a regular pointer, if two pointers point to the same memory location and one of them is deleted, it means that the object pointed by both pointes will be released, so, the other pointer will be left with a dangling pointer and any call by that pointer will fail.


Reference Counting

In order to solve the problem which is explained above, a method called Reference Counting is used. Reference Counting basically means counting references to an object. Here is a basic implementation of a Reference Counting Class:

class ReferenceCounter {
private:
       int count;
public:
       void Increment() {
             ++count;
       }

       int Decrement() {
             return --count;
       }
};

Smart Pointer with Reference Counting

By explicitly deleting its pointer in its destructor, smart pointer already handles the memory deallocation. A smart pointer class can also use reference counting in order to prevent dangling pointer effect. Such a smart pointer implementation is a basic example of garbage collection. 


template <typename T> class SmartPointer {
private:
       T* pData;
       ReferenceCounter* pReference;
public:
       //Default Constructor
       SmartPointer() : pData(0), pReference(0) {
             //Create an instance of Reference Counter
             pReference = new ReferenceCounter();
             //Increment the Reference Count
             pReference->Increment();
       }

       //Constructor
       SmartPointer(T* pValue) : pData(pValue), pReference(0) {
             //Create an instance of Reference Counter
             pReference = new ReferenceCounter();
             //Increment the Reference Count
             pReference->Increment();
       }

       //Copy constructor
       SmartPointer(const SmartPointer<T> & p) : pData(p.pData), pReference(p.pReference) {
             //Increment the Reference Count
             pReference->Increment();
       }

       //Destructor
       ~SmartPointer() {
             if(pReference->Decrement() == 0){
                    delete pData;
                    delete pReference;
             }
       }

       // * Operator Overloading
       T& operator* () {
             return *pData;
       }

       // -> Operator Overloading
       T* operator-> () {
             return pData;
       }

       // = Operator Overloading
       SmartPointer<T>& operator= (const SmartPointer<T>& p) {
             //Check if they are the same
             if(this != &p) {
                    //Decrement the Old Reference Count
                    //If it is zero, delete old data
                    if(pReference->Decrement() == 0) {
                           delete pData;
                           delete pReference;
                    }
                   
                    pData = p.pData;
                    pReference = p.pReference;
                    pReference->Increment();
             }
       }
};