Sunday, April 14, 2013

Smart Pointers, Reference Counting and Implementing Basic Garbage Collection in C++

Introduction

Many programming languages allow the usage of pointers (especially lower level programming languages such as C and C++). A pointer, simply, references a memory location  and it is used to obtain the value stored at that location. Pointers provide prominent flexibility to the code. They allow different sections of code to share information easily. Besides, they enable complex "linked" data structures like linked lists and binary trees. However, they can cause serious problems unless they are properly used. So, what are the most serious problems pointers may cause?

1. Segmentation Faults
2. Memory Leakage

Segmentation Fault is a specific kind of error which happens when you access a memory location that "does not belong to you". It's actually a helper mechanism that keeps you away from corrupting the memory. Getting a segmentation fault means that you are doing something wrong with the memory, e.g. you may be accessing a pointer that is not initialized or whose pointee in the heap is already deallocated (dangling pointers) or you may be writing to a read-only portion of memory. 

Memory leak is gradually losing the available dynamic memory when a program repeatedly fails to deallocate memory it has obtained for temporary use. As a result, the available memory for that application might become exhausted so the program might stop working.

Smart Pointers

A smart pointer is a wrapper class that encapsulates a regular C++ pointer in order to manage the life-cycle of the object being pointed to. To look and feel like a pointer, a smart pointer needs to have the same interface that a pointer does: it supports pointer operations like "dereferencing" (operator *) and "indirection" (operator ->). To be smarter than a regular pointer, a smart pointer needs to do some smart stuff. As I mentioned above, memory management issues like memory leaks, dangling pointers and allocation failures are handled by smart pointers.

Now, let's take a look at the code segment below:

public class SampleClass{
private:
       int num;
       char * text;
public:
       SampleClass() : num(0), text(0) { }

       ~SampleClass() { }

       void DoSomething() {
             num ++;
             text="Do something";
       }
};

int main()
{
       //Initialize (Allocate dynamic memory from heap)
       int* a = new int[3];
       double* b = new double[4];
       char* c = new char[5];
       SampleClass* s  = new SampleClass();

            s->DoSomething();
       //Now, deallocate memory
       if(a != 0) { delete a; }
       if(b != 0) { delete b; }
       if(c != 0) { delete c; }
       if(s != 0) { delete s; }
      
       cout<<"enter a key to end the program"<<endl;
       getchar();
       return 0;
}

As you see, you need to explicitly delete the memory after you use it. Otherwise, you will cause memory leaks. But, what if the DoSomething() function call above throws an exception? In this case, the program stops running right before you have the delete calls in order to deallocate the dynamic memory. Since the dynamic memory is not given back, your program causes memory leak.

That would be great if there were a mechanism like a Garbage Collector (in .NET or Java) which took care of the dynamic memory we had allocated, gave it back when we didn't need it anymore so that we didn't have to explicitly deallocate that memory portion. Such a mechanism would prevent memory leaks and provide exception-safe memory management. This is what smart pointers do for us actually.

Here is another example:

class Student {
private:
       char* pName;
       int score;
public:
       Student(char * name, int grade): pName(name), score(grade) { }

       ~Student() { }

       void Study(){
             if(score ==  99) {
                    score++;
             }
             else if(score <= 98){
                    score += 2;
             }
             cout<<pName<<" is studying. His/Her new score is "<<score<<endl;
       }

       void Sleep(){
             if(score >= 1){
                    score--;
             }
             cout<<pName<<" is sleeping. His/Her new score is "<<score<<endl;
       }

       void Party() {
             if(score >= 1){
                    score--;
             }
             cout<<pName<<" is partying. His/Her new score is "<<score<<endl;
       }
};

int main()
{
       Student* s = new Student("Mark Zuckerberg", 80);

       s->Study();
       s->Party();
       s->Sleep();

       delete s;

       getchar();
}

Let us define a basic smart pointer class for the Student* and use it:

public class SmartPointer{
private:
       Student* s;
public:
       SmartPointer(Student* ptr):s(ptr) { }

       ~SmartPointer() {
             delete s;
       }

       Student& operator* () {
             return *s;
       }

       Student* operator-> () {
             return s;
       }
};

int main()
{
       SmartPointer p(new Student("Mark Zuckerberg", 80));

       p->Study();
       p->Party();
       p->Sleep();

       getchar();
}

SmartPointer class wraps the pointer Student* and provides pointer operations "dereferencing" (operator *) and "indirection" (operator ->). As you see, the pointer is explicitly deleted inside the SmartPointer's destructor. This way, since the destructor is automatically called when the object goes out of the scope, the dynamic memory  deallocation is automatically handled by the SmartPointer.

We might go one step further and define a generic smart pointer class which does not only wrap Student* pointer but, might wrap any pointers:


template <typename T> class SmartPointer {
private:
       T* pData;
public:
       SmartPointer(T* p) : pData(p) { }

       ~SmartPointer() {
             delete pData;
       }

       //Operator Overloadings
       //Dereferencing
       T& operator* () {
             return *pData;
       }
       //Indirection
       T* operator-> () {
             return pData;
       }
};
And here is how we use it:


int main()
{
       SmartPointer<Student> p(new Student("Mark Zuckerberg", 80));
       p->Study();
       p->Party();
       p->Sleep();
}

Hoping that everything is clear up to here, I want to show you something else. Look at the code segment below:


       SP<Student> p(new Student("Mark Zuckerberg", 80));
       p->Study();
       {
             SP<Student> q = p;
             q->Party();
             //Lifecycle of q ends here. q's destructor is called. Remember that both p and q  
             //point to the same address.
             //If two pointers point to the same address and one of the pointers is deleted, 
             //meaning that the object pointed by that pointer is released, then the other       
             //pointer will be left with a dangling pointer. 
       }
       //This call will fail
       p->Sleep();

It does not matter whether it is a smart or a regular pointer, if two pointers point to the same memory location and one of them is deleted, it means that the object pointed by both pointes will be released, so, the other pointer will be left with a dangling pointer and any call by that pointer will fail.


Reference Counting

In order to solve the problem which is explained above, a method called Reference Counting is used. Reference Counting basically means counting references to an object. Here is a basic implementation of a Reference Counting Class:

class ReferenceCounter {
private:
       int count;
public:
       void Increment() {
             ++count;
       }

       int Decrement() {
             return --count;
       }
};

Smart Pointer with Reference Counting

By explicitly deleting its pointer in its destructor, smart pointer already handles the memory deallocation. A smart pointer class can also use reference counting in order to prevent dangling pointer effect. Such a smart pointer implementation is a basic example of garbage collection. 


template <typename T> class SmartPointer {
private:
       T* pData;
       ReferenceCounter* pReference;
public:
       //Default Constructor
       SmartPointer() : pData(0), pReference(0) {
             //Create an instance of Reference Counter
             pReference = new ReferenceCounter();
             //Increment the Reference Count
             pReference->Increment();
       }

       //Constructor
       SmartPointer(T* pValue) : pData(pValue), pReference(0) {
             //Create an instance of Reference Counter
             pReference = new ReferenceCounter();
             //Increment the Reference Count
             pReference->Increment();
       }

       //Copy constructor
       SmartPointer(const SmartPointer<T> & p) : pData(p.pData), pReference(p.pReference) {
             //Increment the Reference Count
             pReference->Increment();
       }

       //Destructor
       ~SmartPointer() {
             if(pReference->Decrement() == 0){
                    delete pData;
                    delete pReference;
             }
       }

       // * Operator Overloading
       T& operator* () {
             return *pData;
       }

       // -> Operator Overloading
       T* operator-> () {
             return pData;
       }

       // = Operator Overloading
       SmartPointer<T>& operator= (const SmartPointer<T>& p) {
             //Check if they are the same
             if(this != &p) {
                    //Decrement the Old Reference Count
                    //If it is zero, delete old data
                    if(pReference->Decrement() == 0) {
                           delete pData;
                           delete pReference;
                    }
                   
                    pData = p.pData;
                    pReference = p.pReference;
                    pReference->Increment();
             }
       }
};


         

1 comment: