33 C++ Object Model Layout

Bhushan Trivedi

Introduction

 

We have already seen what are the different types of coding possible using a C++ program. We have also seen that the OO coding is more flexible but will demand some runtime overhead. In this module, we will look at the conventional layout used by the C++ compilers to map the C++ code and consequences of the same.

 

The C++ object model layout

 

The object model layout describes how and where the members are stored so the design

 

class Employee

 

{

 

public:

 

Employee() {…}                                                 // a default constructor

Employee (..) {…}                             // other parametrized constructor

virtual ~Employee()                         // a destructor

string getName()                                               // a function to get a name

void virtual DisplayDetails ()// a virtual function to

//display details of an employee

static NumberOfEmployees()      // return number of

//employee objects currently

 

private:

int EmpID;

string Name;

public:

static int No;                                       // number of employee objects

}

Figure 35.1 Example of a class which we want to object model

 

remain consistent with the performance requirements of the object model and provide all the features of the language promises to the user. Let us throw some light on how C++ object model lays out the content. Consider the case described in the Figure 35.1. There are a few ways this content can be modeled in the memory. The first method is to have as many pointers as the number of members the class has. This layout is named as the pointer to all member layout, it is designed to have as many pointers as the number of members the class has. Every object of the class contains only pointers to all members, without discriminating if those members are data or function. The size of such object is a number of members multiplied by the size of pointers in that compiler. Every object of the employee class contains 9 pointers, the first one pointing to the default constructor, named as Employee:: Employee(), second to the other constructor Employee:: Employee(..) and so on till the pointer to a static integer member No. A compiler code becomes simple and easy. For any member access, it just traverses that link and works on the item. The objects of the class have the same length (for a 64-bit OS, every pointer contains 64 bits or 8 bytes in size, that means our employee object contains 72 bytes as it contains nine pointers.). This design makes it very simple for accessing any element. If we know from which address the object starts, and the number of the element, we can calculate the offset from the beginning of the object and access that object. One can assume this as an array of 8 bytes each and address any item using the index (or a pointer, it is a programmer’s choice). For example, if a programmer wants to access the EmpID of an object called Sachin of the type Employee, one can type Sachin[6] as it is the seventh item of the array. Another way to write the same code is Sachin + 6 where Sachin is a pointer to the object and thus the first element of the array. figure 35.2 showcases this simple structure for the object Sachin. The object contains nine slots, each of them being a pointer pointing to a specific member of the object Sachin

This simple method, however, cannot work. First, data members do not really need an indirection. It is better if they are directly accessed. That will be a faster solution. Second, keeping all function for all objects inside them does not make much of a sense. As there may be thousands of employees, one only needs to keep a single copy of all functions for all of them. This point needs elaboration.

 

The second solution depicts another way to store members. We call it separated data and functions model. in this case, we keep all data members together in a single table and having a single pointer pointing to that data member table. This move, to an extent, solves our first problem. All functions are stored separately, another table containing pointers to all these functions is also generated. Thus, an object of the class will only have two pointers now, one pointing to a table containing data members and another to the pointer the table containing pointers to functions belonging to the class Employee. We can call these two  tables as data member table and function member table. The function member table containing all pointers and not functions also makes sense as we do not want to store functions for each object separately. The non-static data members describe the identity of the object, for example, EmpID 5 for EmpName Sachin is the identity of that object but DisplayDetail() function is not. That function is the same for all objects. We must find an alternate solution where the identity of the object is preserved but unnecessary items are not stored within.

 

This is a better model than the previous one but still with a problem. Why do we need to call functions based on run-time pointer based information? Non-virtual functions like getName() do not need to be resolved at run time. We do not need to have a pointer for such functions. Such pointers make sense only when the resolution is to be done at runtime. That means for functions which we cannot decide at compile time, the virtual functions. That resulted in the actual C++ object model, depicted in the figure 35.4.

 

You can see that the actual C++ Object Model is implemented in little different fashion. It stores all data members inside the object which saves the indirection that was offered by the second version and saves that time. It stores all non-virtual functions together as a single copy for all objects of the class. It also stores all static data members as a single copy. Additionally, it contains a table containing all virtual functions and has a pointer as a member of the object to point to that table.

 

It is a very smart move; it helps virtual functions to be decided by the typical table which contains set of pointers for all virtual functions of a typical class. The Virtual Table is a table containing pointers to all virtual functions a class contains. When a typical object of a typical class is referred to by a pointer pointing to it, the pointer will point to that specific table, containing virtual functions of that very class, at runtime. A short name for virtual table pointer is vptr. Every class object which contains at least one virtual function is provided with a pointer which is made to point to a virtual table of the specific type of object it is referring to at run time. This model also stores the non-virtual functions and static members separately. So we do not need to have indirection to access them at runtime. Then, a question might arise in your mind, how those function calls are resolved? For example, if we have the following call in our program;

 

Sachin.getName();

 

The C++ compiler does a smart move yet again. It will replace the call by the following code

 

getName( Employee & Sachin);

That means the member function is redefined with the argument which is a reference or pointer to the calling object as the first argument. We do not have any other argument in the getName() function, but if they are, they are placed after this argument. This argument is popularly known as this pointer. The non-virtual function call, thus, is resolved at compile time and that part is just executed. The meaning, if the function is inline, the code is pasted at the place where the call is made, or if the function is non-inline, control moves to the stack where the function is loaded. The call is replaced by that statement which moves the control. At runtime, the function is loaded on the stack and the control is transferred.

 

Interestingly, the static int member No, is not really a member of an object in this sense and thus we cannot actually access it as a member. That is why Employee:: No is a better representation than Sachin.No. Though expressions like Sachin.No is allowed, they are inherently converted to into Employee:: No. Not only that, semantically also, Employee:: No is better. The No, which represents a number of employees, is an attribute of a class Employee and not an attribute of an object Sachin or Mahesh or any other. In a call to Sachin.No, the object name is a dummy name only. Once we have seen the layout, let us try to see how Object Model is applied by the compiler for some of the common cases. Let us begin with the constructors.

 

Constructors

 

The object model, possibly, needs to work the most on the constructors. We will start with the discussion on the default constructor. The programmer may or may not provide a default constructor for a class. The default constructors are crafted by the compiler only when needed.

For example, if we define a class as described in 35.3.

 

class SomeClass {

private:

int ID;

 

public

//some functions but no default constructor

}

 

Figure 35.5 Neither user nor Compiler need default constructor

A compiler, while compiling a program described in 35.5, needs to conjure a default constructor,

 

SomeClass::                        SomeClass();

 

However, does physical construction is taken place? Not really. Only memory allocation for

the required size is to be made for all members (here is only one, named as ID). The compiler only needs to allocate sufficient memory for the object to have and no other initialization or processing. For example, if we encounter statement as follows.

 

SomeClass SC;

 

The compiler does need to do anything with the data, there is no need to synthesize a default constructor here. Let us take another case. We define a class called node, not unknown to students who have ever coded for a singly linked list in figure 35.6

class Node {

private:

 

int Info;

next *ptr;

public:

  • // some functions but no constructor

}

Figure 35.6 class where user wants default constructor but compiler does not

 

What will the compiler do? It should ideally define a default constructor as shown in 35.7. Initializing Info to zero and ptr to null (zero is cast to null). So later on if the user ever uses these two values, there is no issue with junk values stored in those variables

Node::Node:

Info(0), ptr(0)

 

{};

  However, a compiler does not construct such  default constructor. The compiler only constructs the part it needs and not what the user wants. However, if the implementation needs, the compiler will provide one default constructor itself. There are four instances where the compiler actually constructs a physical default constructor when the programmer does not provide one from his side. In the above case, if the constructor provides only the memory needed for storing a Node object, it is enough. The compiler is not compelled to generate anything else and it does not need to synthesize constructor itself. So in which case, a compiler needs to have a default constructor? Let us try to see and understand their need.

 

Here is a list of four such cases

 

1.      Base class has a default constructor

2.      A class embeds objects of other class

3.      A class with a virtual function is inherited

4.      A virtual base class is inherited

 

The first such case involves a base class having a default constructor while the derived class doesn’t.

 

Base class with default constructor

class Student{…..

Student() { ….} // default constructor

class MCAStudent {…

other functions but no default constructor MCAStudent M;

Consider following case depicted in figure 35.8. The figure 35.9 depicts the memory layout for the objects of both the classes.  Note that  thederived   object contains the base class sub-object embedded within. Why it is done so is explained later when we discuss the inheritance process.

 

 

The construction of object M demands a subobject Student within and thus needs to invoke the constructor of student class; That means, the compiler needs to generate that constructor itself. So, code depicted in 35.10 is inserted in the MCAStudent class, which defines a default constructor for MCAStudent. It does just one thing, initialize the Student sub object within the MCAStudent class with an explicit call to the constructor. It cannot just

 

inline MCAStudent::MACStudent

 

{ // compiler generated default MCAStudent.Student::Student(); // constructor for MCAStudent class }

 

allocate memory and forget. The Student class default constructor is defined by programmer indicates that the programmer is not happy with the compiler’s default behavior and provided something additional. It is important that programmer’s idea is carried everywhere the Student object is used, including the inherited class MCAStudent. The programmer has not defined a default constructor for the MCAStudent class is an indicator that he is happy about the rest of the items defined in the MCAStudent class and default behavior of the compiler only to allocate memory for them is OK. However, the compiler must also remain consistent with the embedded Student sub object and it has to provide a default constructor for an MCAStudent class which does just that. 35.6 depicts what the compiler might construct.

 

What if the programmer has already provided a default constructor? For example, 35.7 displays one such default constructor. It just initializes the name with a typical value “No Name” to indicate that the name is not yet assigned. That means, the programmer is not happy with the default behavior of the compiler and provided MCAStudent class with initialization of the member called Name. This is the user-defined default constructor, which satisfies what user needs.

 

MCAStudent::MCAStudent()

 

{ // user generated default constructor Name = “No Name”;

 

}

What shall a compiler do now? The default constructor that the user defines suffices the need for the user but not for the compiler1. The compiler, unlike the previous case, cannot define a default constructor itself. The only option left to the compiler is to intercede and augment the user-defined default constructor as follows. Now it has two lines, one, which is already provided by the user and another, which the compiler adds from its side.

  • 1 A compiler may also flag an error and expect a user to define a default constructor with student default constructor. The compiler generally applies its own logic and decides what user expects to do and help the user out with its own additions like above without flagging an error. Such a user-friendly behavior turns out to be exactly opposite if a user does something which compiler misunderstands.

MCAStudent::MCAStudent() // compiler augmented default constructor {//following line is added by the compiler MCAStudent.Student::Student();

 

Name = “No Name”;                                             // this is the user provided line

}

The idea is to provide whatever compiler needs for its processing, if a user has not provided the default constructor than providing one itself, if a user has provided one, augmenting it with the required statements.

 

Embedded objects

 

The second case describes embedded objects in a composite object. This is another case where the compiler has to synthesize a default constructor. Consider a case picked up from reference-3 depicted in the figure 35.9.

 

class Face {

Circle LeftEye ;

Circle RightEye;

Triangle Nose ;

Square Mouth ;

Point Position;

 

public:

//Some funcitons

Consider there is no default constructor defined for class Face, however, all other classes, Point, Circle, Triangle, and Square, have their default constructors defined. Whenever an object of Face is defined, it has to have objects like LeftEye, RightEye, Nose,and Mouth are to be constructed using their respective constructors. That is why the compiler crafts default constructor for the class Face as depicted in the figure 35.10. It adds five constructors for five objects embedded within the class Face

Face::Face {

LeftEye.Circle::Circle();

RightEye.Circle::Circle();

Nose.Triangle::Triangle();

Mouth.Square::Square();

Position.Point::Point();

}

Look at these calls. Each embedded objects,FigureFace with multiple objects embedded within  LeftEye, RIghtEye, Nose, Mouth, and Position are to be constructed for constructing the face. So their default constructors are called. If the  programmer ever defines a default constructor for Face, the compiler augments it like the earlier case.

 

A class with virtual function is inherited

 

The third case is where a class is derived from another class with a virtual function. Look at the code depicted in the figure

class Student {

 

virtual getMarks();

}

 

class MCAStudent : public Student {…

 

}

 

MCAStduent Ganesh;

 

Before we proceed further. Let us try to understand what exactly a compiler needs to do when a virtual function is defined for a class. We have already seen the layout which indicates that both, a virtual table, as well as a virtual pointer, need to be initialized in such a case. Following is to be done for MCAStudent class as well as Ganesh,the object of that class

  1. For MCAStudent class, the compiler must generate a virtual table. If there is no virtual table, a new virtual table with one entry for getMarks() is needed. If the MCAStudent class has a few virtual functions itself, the virtual table will have an additional entry.
  2. For every object like Ganesh, a vptr value as an additional pointer member is added to the list of data members. This vptr is initialized with the value of the address of the vtble constructed for that class. The compiler must intervene and initialize the value of vptr of every object generated for this class. That means following is to be done when Ganesh is defined. However, with pointers which may point anywhere during runtime, such an initialization is not possible. That is done at runtime.
  • * (Ganesh.__vptr__) = &vtble_MCAStudent and a static call to that virtual function using the following statement fetches the getMarks(

*Ganesh.__vptr[1]2.

 

The Student’s getMarks() call, for example, changes to something like what is shown in below

Student * TempStudent;

  • // TempStudent is assigned to some type of student here TempStudent -> getMarks()

…..

  • // getMarks() call is converted to following

(*TempStudent->__vptr__[1]) (&TempStudent)

The index 1 is decided considering getMarks() has the first entry in the vtble. This entry is consistently maintained  as  a fixed value in all classes in the hierarchy. This is not difficult as all subsequent classes will have this getMarks() function defined for virtual function invocation and it will always be

  • 2 Expressions like Pointer->__vptr__ or object.__vptr__ provides the vptr, adding * in the beginning provides the object they are pointing to, the vtble. That means *(Ganesh.__vptr__) is __vtble__MCAStudent. Now we take the array subscript 1 of that vtble which results into the pointer to getMarks() of that class.
  • stored at the same location throughout. The only argument to the converted function &TempStudent is this pointer passed to every member function. The statement will pick up the right function only if the vptr points to respective vtble. You can understand that a compiler must not only define an additional member __vptr but also initialize that with the address of respective vtble.
  • The fourth case is where a class has a virtual base class in the inheritance chain. Consider code depicted in figure 35.13. Closely observe the statement pD1-> D1Int = 100;
  • Do we know exactly where the D1Int be found? In a way, we need to find out the base class subobject in the derived and further derived class. In a normal circumstance, this operation is straight forward. It is converted to following first.
  • pD1 -> D2.B::BInt When PD1 points D2 and pD1 -> DD::BInt When pD1 points DD.
  • That means it will pick up a base class subobject within the object of the class which is being used as an argument. D2 in the first case and DD in the second case.

class B {

public:

int BInt;

};

class D1 : virtual public B {

public:

int D1Int;

};

class D2 : virtual public B {

public:

int D2Int;

};

class DD: public D1, public D2 { public:

int DDInt2;

};

void Initialize (const D1* pD1) { pD1-> BInt = 100; }

void main()

{

 

Initialize (new D1);

Initialize (new DD);

 

}

    However, the virtual base class complicates the matter. Closely look at how normal and virtual inheritance is laid out in figures 35.18 and 35.19. In the case of D1, it is straight forward but either D2 or DD, the base class sub object is adjusted. In the case of D1, it is in the beginning where it was in the case of 35.18. So the subobject is found at the same address, so no issues. In the case of D2 or DD, it is not the same address, the subobject is moved out of D2 and another base class subobject of the other inherited class (D1) is to be looked at for this value. Except for the first virtual base class, all other virtual base classes have their data members stored at other than default locations now. We need to move our pointers to point to right sub object. The compiler must intervene and set type pointers accordingly.

 

The pointer movement depends on the size of the previous class object as well as how many such classes are used in inheritance. That means, it is not possible to find out the displacement when the Initialize () is called because the size depends on the value of pD1 which points to some object of either class D1 or some object of part of inheritance change originates from D1 (DD is one of them). Thus, it is not possible to be decided at the compile time. A solution similar to the case of a virtual function is to be devised. One such solution is to have a table containing virtual base class pointers for each such class. Let us call one such pointer to virtual base class for the D1, class B, is __vbctB (pointer to the virtual base class table for B).

The Initialize function is object modeled as something like following.

void Initialize (const D1* pD1) {

pD1 -> __vbctB -> BInt = 100;    }

 

As a concluding remark, we have busted two popular myths here.

  1. The default constructor, when a program does not provide one, is always generated by the compiler. We have narrated four cases where they are generated by the compiler. In no other cases, a compiler actually generates them.
  2. The compiler generated default constructor provides initialization of all data members. This is again wrong as compiler only handles the case which affects its operation. For example, when a pointer is defined by a user, he may need to initialize to null for any subsequent problems himself. A compiler does not do that for him.

 

Summary

 

In this module, we have extended our discussion on the C++ object model and learned how the object model lays out a C++ program, how functions, nonstatic data members, and static members are laid out. We have looked at four different cases where the default constructor needs to the synthesized by the compiler.

 

 

References

 

Reference 1: Inside the C++ Object Model Stanley Lippmann, Addition Wesley

Reference2: www.stroustrup.com, homepage of Bjarne Stroustrup, the creator of C++

Reference 3: Introduction to ANSI C++, Bhushan Trivedi, Oxford University Press