32 Introduction to the C++ Object Model

Bhushan Trivedi

Introduction

The C++ language is designed by Bjarne Stroustrup with an aim to provide extension (and that is why ++ is added in front of C), to the C language. Let me repeat, C++ is not a replacement but an extension to C. C has already proven itself in the context of performance optimized coding arena. C is used for writing OS, compilers, databases and so on and considered just next to assembly coding in terms of speed of execution. The problem with C is that once the code gets larger, the conventional method of functional coding does not remain as visible as for shorter programs and one needs the extension for better abreaction and visibility, usually provided by classes and objects. C++ has done just that for C. That means, C++ is C with added functionality for better visibility and coding.

While C++ has provided great extensions to C and hailed by many system level programmers who earlier prefer to code in C, there are a few questions invariably comes to a diligent reader’s mind. Does the C++ work in the same manner as the C, does it require more overheads for doing its job? What are the things C++ does to manage the additional services like inheritance, multiple inheritance, constructors, static functions and static members, templates, virtual functions and virtual base classes etc.? How much more a programmer or a user is paying for these services, in terms of both, memory and the time to execute? These questions and few other related queries can only be answered if we know how exactly C++ maps these coding structures into a final structure and then compare that with the way C used to do. Once we compare them we may be able to say something. These three modules are an attempt to throw some light on these issues and provide answers to some of the questions that we raised here. For detailed description and answers to many similar queries, you may refer to the book mentioned in reference-1 which is exclusively designed to describe the C++ object model, or reference-2, which is a home page of the C++ creator, Bjarne Stroustrup. In fact, at many places in my own book, which is mentioned in the reference-3, I have mentioned the C++ object model and its characteristics. You can read these references if you want to gain further insight into this part of C++ course.

Maintaining performance by code conversion

The major design goal of C was performance and C++ was to maintain that performance with additional features to provide better visibility. That means C++ designers have taken quite a few decisions in favor of performance rather than any other parameter. The C++ program, after compilation, is converted to another form which later is executed when user deems fit. C++ object model changes the user supplied code for efficiency optimization and consistency reasons. We call it code conversion.

There are many different ways to convert and thus this choice invites a lot of deliberation on how this modeling should be done and a lot of trade-offs to be made. The idea is to make sure the code runs fast and utilize as less memory as possible. The language provides many features which are not possible for the compiler to provide without compromising performance. The conversion also involves converting the statements using such features into a simpler, more efficient, form. Let us try to understand it with a simple example. One of the great features of C++ is function overloading. Consider following case depicted in figure 34.1. We have defined three different functions with the same name and with a different set of arguments. if the C++ object code contains three functions with the same name, it would be hard for the runtime system to decide which function to call at the run

time. It is not impossible to do it but doing so will demand a decision-making process at run time. Such a

solution is not a performance optimized solution. C++ solves this problem by doing something called name mangling. The C++ mangles the names of the functions, in some form which names them different internally. For example, it might change names of the functions as depicted in figure 34.2.

int addInt(int, int)

float addFloat(float, float)

Complex addComplex(Complex, Complex)

Not only that, the compiler plays smart and decide the type of function called from the arguments. For example, it looks at following

Figure 34.2 Name Mangled overloaded functions

code depicted in figure 34.3 and It will replace that by code depicted in figure 34.4.

Can you see the difference? Now, there is no decision to be made at run time. The appropriate functions are decided at

compile time and that code is inserted instead of user’s code. It is just a normal function call and not an overloaded call with the same name as some other function. The function overloading allows multiple functions with different names but the converted code names all functions differently to eliminate runtime overhead. Changing the

names of overloaded functions and give each overloaded function a unique name is called name mangling or decorating. This process also demands to change the calls to such functions, now by a new name given by the compiler. That means the overloaded function is now managed in the same way C functions are managed. The C++ overloaded function will execute as fast as the equivalent C function (and faster if inline!).

There are many similar conversions made to provide both, the flexibility to a C++ programmer like us, and keep the efficiency level of C. This smart conversion process enables the C++ program to perform as fast as equivalent C code in most cases.

The C++ Object Model

The C++ object model describes this conversion process. The objects the program is dealing with, for example, variables, static and non-static data members, static, non-static and virtual functions, template functions and classes, inherited and multiple-inherited objects etc. are to be stored and processed in a fashion that they do not hinder the speed of the program. Only when the program tries to use features which are not C-like and demand additional flexibility may demand more overhead. The meaning of this statement will be clear in due course. C++ standard does not require this mapping to be uniform and thus every C++ compiler designer decides this mapping process as it deems fit to him and suitable for the target OS. In short, there is no standard which mandates a designer to do the conversion process in a typical form. This argument seems to be counterintuitive as that makes it little sense for us to discuss it in a generic course meant to be taken by undergraduates. However, it seems that even non-standard domain like this has some definite ways of solving things which are followed by more or less all vendors, as well as preferred by different operating systems. The study of the C++ object model enables the reader to get the idea of how C++ is doing things under its hood. It can help one understand seemingly strange behavior1 of a program. It can also help one to choose a given alternative. For example, consider a case where one has an option of choosing a normal function or a virtual function. This study will help him understand what will be the additional overhead of using a virtual function. The user may decide to go for virtual function or decide against, but will be able to do that will more assurance. The code that a programmer writes after having a clear understanding of the C++ object model will contain less number of errors as well as more efficient. When the programmer is aware of how his code is modeled by the language, he is more confident about the output as well. So, let us plunge into the deeper water and understand how C++ does things that we all have seen it doing so far in this course.

1 For example if the library contains a base class and if that library is changed, even when there is no change in the program, which is using a derived class, one must recompile the program. We will soon see why we need this, for short, it is because every derived class does not have a pointer to a base class but a copy embedded within. So when a base class changes, the derived class copy is invalidated and needs to be recompiled with a fresh copy of the base class.

The Object Model Transformations

The building blocks of the code conversion process are object model transformations. The code conversion process demands some transformations from one typical type of code into another typical type of code. Such transformations are known as the object model transformations. As per Lippmann (Reference 1), when a source code (designed based on the semantics of the language) is converted to the object code, the object model usually performs three different types of transformations. Let us try to see what are those.

First is an implementation dependent transformation. This transformation depends on the type of compiler as well as type of operating system for which the transformation is being made. Typical operating systems demand a specific type of code conversion for their own convenience. For example, some compilers allowed void main () while some compilers did not. They always want the main () to be preceded by int. When no return value is specified the compiler decides that value based on its own preference. That is an example of implementation dependent transformation. Such demands usually stem from the operating system requirements. When main () is invoked, the return statement returns the int value which is reflected in the statement as a return value of main. When any function is invoked and completes its job, it is returned back to the calling function. The question that you may ask is, who is calling main ()? It is the first function of the program, isn’t it? When a C++ program is compiled and executed using any method, the process (the running instance of a C++ program) is usually spawned by some system process and thus is a parent process to our C++ program (in execution). At that point of time, it is possible for our C++ program to return an integer value to that system process using this return statement. When an OS demands every process to return an int value, we must write int main () and we cannot write void main (). When we write just main (), that value is either converted to the OS favored form or thrown back to us as an error. Depending on the specific operating system the returning int value is stored. For example, in Linux, it is reflected in a global variable while in Windows, it is a specific variable used in the batch file containing a typical value. Sometimes the transformation depends on the default value chosen by the typical compiler, for example, if we have a statement like following, the compiler usually decides it to be a function call, but it could be an overloaded () operator for a class as well.

My_function ();

Above call may be a call to function My_function() but it could also be a class My_function, and it has an overloaded () operator. As this part is operating system and compiler dependent, we will not discuss that any longer in this text which describes the C++ object model in general sense.

The second type of transformation demands the implementation of general features of the language. This is called feature based transformation. The user demands the compiler to have the implementation of general features of the language for two reasons, one is the consistency of the code and second is optimization. Consistency is needed when a programmer codes in an inconsistent way, for example, defining a default constructor in base class but not in the derived class. How constructors and destructors are automatically provided when not defined, how that additional code is synthesized and augments the program that we have written, how member-wise initialization list is converted to the code, how overloaded functions are name mangled, how a copy of an object is made (member by member or bitwise copy) etc. This is the most critical part as there are always a few tradeoffs to be made.

The third type of transformation is known as OO based transformations. It is about the special features of the language which demands more flexibility which results into costlier representations, in terms of memory and execution time. For example, virtual functions and virtual base classes, inheritance and multiple-inheritance, static data members and static function members etc.

There are two things one would like to understand for writing efficient programs. First is about the object-related features provided by C++ and second is about how they are preferred to be implemented by the compiler writers. We will primarily focus on the second aspects in the subsequent parts of this module as well as other two modules. The earlier part is elaborately discussed in the previous modules. However, if you are not clear about the terms mentioned in above discussion, it is strongly recommended that you may revisit those topics before embarking on learning these three modules, as it is a compulsory prerequisite.

Four paradigms to code

There are four different ways one can program using a C++ compiler. The paradigms to code the programming type by which the program is coded. One can code using C style, ADT style, Object based or Object oriented type of coding in C++.

First is a C-type procedural coding. It is about coding like a program, without using classes and objects and uses plain vanilla C code to write a C++ program. It is, in fact, a valid way to write a C++ code. Most of the valid C programs will be able to run as valid C++ programs without any change. C++ is designed to support such portable codes from legacy C libraries. Most of the valid C programs will be able to run as valid C++ programs without any change. C++ is designed to support such portable codes from legacy C libraries. The C++ designers consider this portability part so sacred that they devised many complex solutions for keeping this portability possible for every conceivable2. However, keeping this in mind, a C++ compiler must be able to encounter a C legacy code (and constructs like struct student) but still be able to work.

2 One such requirement is the struct keyword and public by default behavior of the members of the struct. Ideally, one does not need struct in writing a C++ program and keeping something by default public is a violation of object-based coding but we will have to do it as it is allowed in C.

Second is using an Abstract Data Type or ADT model. Abstract Data Type or ADT model is used when C-style coding is augmented with objects, function and data members of objects and using them as a user defined data type. In this method, the class is available for use but the implementation is not seen by the user. This is a common method to generate conventional libraries. For example, we can have class Student which can support all functions a conventional university system might need. Now one can write code as depicted in figure 34.5. All functions that one needs are available in the definition of the class and the user just needs to call them when there is a need.

Student Ganesh;

Ganesh.enrol();

Ganesh.registerForExam(); //In a way, the class Student is defined as a user defined data type with its own functions. This is a method for legacy systems usually designed using

Ganesh.getMarks()

Ganesh.printMarksheet();

…

conventional languages and databases etc. One can define classes like Student, Teacher,

Examiner etc. and make sure the system is solving the problem elegantly. When a user uses Ganesh.printMarksheet(); it is reading from the database, finds a printer and prints the mark sheet but it is hidden from the user. That is why this process is called abstraction. It simplifies the user’s view. Such classes, that is why are known as abstract data types. The string class that we all use so often, is another example of ADT model of programming. Usually, ADT demands large class with all possible functions embedded in it. Another method is to use a small class with minimum members possible. However, we will not discuss that difference here. The ADT model is a simplified method of providing user-defined objects. The library designers design these ADTs based on the need of the system and the other users use these objects from the libraries for their own purpose. This is similar to the C model of having function libraries. The ADT model is a better design than C’s functional design as it confines functions to specific objects and that makes it easier for the users to choose and use those functions.

The third method is called Object Based coding. It is designed to use classes and objects and templates for allowing generic classes and generic algorithms together named as a generic programming model. This method is quite general and provide lots of benefits of reusability but without any runtime overhead. For example, we can define a template function like SortedList and use that in our student management system at multiple places, for example, if we have code shown in figure 34.6, we are calling the generic function with an integer argument. This indicates that we want the sorting to be performed based on an int value. The second argument of the function indicates the field number. The third argument indicates whether we want ascending order or descending order for sorting. for listing students roll number wise, roll number being the 1st field and hence the second argument being 1. The third argument is indicating we want the student objects to be sorted in ascending order, (of field 1, the roll number).

Student UniversityStudents[];

SortedList <int> (UniversityStudents, 1, a);

Figure 34.6 using a template with type int

For providing another list for generating the attendance sheet of students where we need students list to be sorted name wise, we can have another call as shown in figure 34.7. The name is the second field and we want students to be listed in ascending order of their names.

SortedList <string> (UniversityStudents,2, a);

If we want to have student’s merit list, we would like to sort the same list using their final marks, may be stored in the field 20, in descending order. So the function call is made using the statement depicted in figure

SortedList <int> (UniversityStudents, 20, d);

Such a way to code is quite efficient as all template calls are resolved at compile time. There is no runtime overhead. This process is called generic as we can use the same SortedList function to sort some other class objects, for example, employee class. We can also have generic classes which can be used for any data types that are needed by a programmer.

The fourth method is to use object oriented coding. In this method, we have a hierarchy of classes inherited from their parent classes. They have virtual functions and also virtual base classes sometimes. The idea is to provide a generic interface for all these classes using virtual functions defined in the base class.

For example, if we define a hierarchy as shown in figure 34.9. In that case, the MCA student is defined as an inherited class from another class called Student and GLSMCAStudent is an inherited class from MCAStudent class. A virtual function getMarks() is defined in each class and have a typical body as per the requirement. An MCAStudent class includes MCA related subjects and practical list which is common to all MCA students. A GLSMCAStudent class provides additional subject GLS university offers and some other subjects that it might allow for choice based credit system3. Also consider two more colleges, LD and AES, both having MCA students and we store information about those students in respective classes like AESMCAStudent and LDMCAStudent.

class student

{String Name;

….

virtual getMarks();

student(string TempName, …)

{… }

class MCAStudent: public student

{

MCAStudent(string TempName,…):

student(TempName,..)

virtual getMarks()

{…}

};

class GLSMCAStudent:public MCAStudent

{

GLSMCAStudent(string TempName,..)

:MCAStudent(TempName,..)

{

virtual getMarks()

…}

}

Now, consider a case of a company where three types students are working on a final semester project. One of the groups is from GLS, another from LD and third from AES. We can have three classes inherited here, GLSMCAStudent, LDMCAStudent, and AESMCAStudent. We want to write a program to look at their marks, we can write following code. Assume all three classes have their getMarks() functions defined and thus a student belongs to a typical class can always get the specific getMarks() function to get his or her marks in the fashion which is provided by their respective college. Now consider a version of DisplayMarks () function depicted in figure 34.6. An interesting point is that the parent class pointer (a pointer to a Student class) can always be made to point to a derived class object which we take advantage of in this code. As we will have to deal with a mix of students from all three of the colleges, a pointer to Student is quite handy. It can point to a student of any one of the colleges as they are inherited from it. That pointer is passed as an argument to the getMarks() function and a required process is presented.

Marks DisplayMarks(Student *pTempStudent )

{

return pTempStudent -> getMarks();

}

3)Some universities offer subjects like magic, guitar playing, drama, singing etc. additionally in such a system

The code depicted in the figure 34.6 works for any one the three students if we have defined an appropriate process for those classes. If a student pointer pTempStudent points to an AESMCAStudent, we will have the subjects’ marks entry for that student, if the pointer points to GLSSMCAStudent, we have the marks entry for that typical set of subjects and their marks offered at GLS and so on. The code is not only simplified; it becomes more dynamic as well. It is possible to make such a code independent of the hierarchy of the class and thus the addition of new classes (for new MCA colleges) or removal (for colleges which are closed down or moved out of university) does not affect the code. The requirement is, that every new type of class defined must have a virtual function getMarks() defined. It will be automatically called when the student of that type is being pointed to by pTempStudent. This type of programming is called object oriented programming.

It is also possible to use a function pointer or a reference for providing this type of coding (we have shown a pointer to a Student object here). A similar type of coding is also possible using a function pointer to a member of a class4. The biggest advantage of such coding is, we only need to write one function and it will work for any student type. Even when new types are added such functions can still work.

A C++ object model is not only good because it provides these solutions, it is great because these choices are offered to the programmer. The programmer chooses the coding style that deems fit for the system that he is developing. Many other languages, most notably Java, in its original form, imposes every function to be virtual and the programmer has no choice5.

The object oriented programming version sounds great and attracts attention, but it is a heavy duty solution and should be used with concern. If the efficiency is of prime importance, OO programming should be used with extra care as it demands runtime decision making. Interestingly, the object-based solutions can offer the same advantage as that of the OO solutions in many cases (but not in all cases) without any performance overhead at runtime. Experts press for using object-based model rather than OO model for implementing C++ coding for this simple reason. STL or standard template library is an excellent example of how generality can be achieved without paying a high price for it.

An interesting problem occurs when multiple approaches are mixed in a program. The flexibility given to a C++ programmer can result in an indecipherable code or programs which spring surprises when we run them. One such common problem is to use a base class

4 Programs 4.15 and 4.16 of Reference 3 describe these two cases and showcases the advantage of using a function pointer.
5 With the advent of Java 8, the object model has introduced some procedural features but still, the model is highly object oriented.

pointer for accessing derived class object contents when the base class pointer is NOT pointing to a derived class object, for example. One need to use dynamic_cast() properly for handling such cases. However, when it is not mandated, a programmer may fall for it. Bottom line is, C++ has given programmers a freedom for both, to excel and to err.

Another point, even when the hierarchy is defined and virtual functions are in place, the OO paradigm is applied only when pointers or references are used to access those objects. When an Object.Function() notation is used, it is always the ADT type or Object Based access which is resolved at compile time and if there is any error in that statement, picked up by the compiler. When Object.Function() notation is used, the C++ compiler knows which function is to be called at runtime and does not need to defer the decision for the runtime system. On the other hand, a pointer may point to any object at run time and if a pointer is used in a way that the object being pointed to is not possible to be determined at compile time, that decision differs until the runtime execution of the code is performed and the pointer value is available.

Another example of OO programming is taken as an excerpt from reference-3 in figure 34.7.

class News

{};

News::~News()

{ } // destructor is defined for making News polymorphic class TextNews : public News

{…

void Print_News() {};

};

// other class hierarchy is defined here

void OnRightClick(News & NewsItem )

{…

case PRINT: try

{

TextNews & ReTextNews = dynamic_cast<TextNews&> (NewsItem); RefTextNews.PrintNews();

}

catch (bad_cast)

{ //appropriate action

}

This example shows another reason why OO programming should be chosen for flexibility. There are a news reporting applications which continuously receives various types of news. The user is allowed to right click on the news item coming in and do various operations on the specific item. There are some operations which are allowed for all types but for some of the items, some operations are not allowed. For example, the printing is enabled only if the object belongs to a specific hierarchy which is basically text-based and not otherwise, for example, the case where the news item is an image or a video.

The casting of reference ReTextNews is successful only if the item being cast (NewsItem) is a type of text news or any subtype of text news. When the user right click on any news item, based on the type it belongs to, the menu appears allows or disallows printing. In a news channel, there is much news coming in, of various types, one after another. When such news is pouring in, it is possible for such a function to provide specific options depending on the type of news. Some categories, which contains text or other types which are basically text (like HTML) can be printed but categories like images are not. This function decides what to display when a user presses a right click on the item. The dynamic_cast enables us to check the news belong to an entire hierarchy in just one go. That is the power of object oriented programming. Another great part of such coding is that we can go ahead and add a few other news types under the text and non-text category, the program segment is not going to change a bit. The dynamic_cast is going to work the same.

However, the dynamic_cast cannot be resolved at compile time because of the news item, a NewsItem reference is referring to, is not available until the program is running. So when the program is running and encounters this statement, it checks at that time and determines the type of news NewsItem is referring to and so decides the course of action then. This slows down the execution. However, the beauty of the C++ design is that a programmer is not forced into this unwillingly. Only when he wants such flexibility and the overhead is acceptable to him, he will go for such a design.

The user decides the model which he adopts for programming based on many parameters. If the user needs to extend the legacy system developed in C, for example, it will demand to code in the similar fashion. Different programmers deploy different styles of coding based on their own preferences and expertize and that sometimes decide what type of programming will be done. Another point is to see the code extension expected in the future. OB and OO type of programming provides better reusability and extensibility than other designs. Similar issue with flexibility in the system. The system is more flexible with OB and OO models than other models.

In the subsequent two modules, we will see how the C++ object model maps the code that we write into a structure which is extremely efficient, with suitable examples. We will also see how much overhead the additional facilities of OO programming provides.

Summary

In this module, we have introduced the C++ Object Model. We have seen how name mangling is done as an example of how C++ Object Model intelligently converts a C++ code into an efficient code which does not demand any runtime overhead. We have seen different types of coding possible with C++ design, stress on the point that the flexibility of choosing the type of coding is available with the programmer.

References

Reference 1: Inside the C++ Object Model Stanley Lippmann, Addition Wesley

Reference2: www.stroustrup.com, homepage of Bjarne Stroustrup, the creator of C++

Reference 3: Introduction to ANSI C++, Bhushan Trivedi, Oxford University Press