31 The Standard Template Library
Bhushan Trivedi
Introduction
We have looked at templates in module 16th and 17th. We learned that it is possible for a designer to construct a library of templates for any purpose that the user deems fit for his own job. C++ designers have provided one such library for all of us to use which is popularly known as the Standard Template Library or STL for short. It is standard as provided by the standardization committee and thus available under all compilers. It is also a collection of templates and thus called a template library. STL contains both, class templates as well as function templates. The terminology used by STL, however, is different. The class templates are called containers (as these classes are designed to contain other objects) and the function templates are named as generic algorithms. We will look at both, the containers as well as the generic algorithms one after another. However, we will begin with why an STL like solution was needed and how the C++ designers provided that.
Standard Template Library
STL is a collection of generic software entities, containers, and generic algorithms. There is a third member to STL, known as iterators. An iterator is used by generic algorithms to move along the list of members of the containers. It allows iterating through these members and hence the name.
We have mentioned how libraries are generated in C++. Designers design classes and hierarchy of classes based on their need and allow the users to use those classes and program over those classes. Many times users extend those classes by inheriting them into a version that they need to work on, with the addition of data and function members that they feel necessary for solving their problems. Normally, these classes represent some real world entities (for example employees) or some other entities which are needed for designing the system (for example furniture, there is no real furniture in the real world, we only have chairs, tables and sofa etc.). The member functions are operations performed on objects of those classes. STL is not of that type.
STL defines generic algorithms as nonmember functions. These nonmember functions can be instantiated into specific types needed by specific types of containers to work on. Containers themselves are generic classes. For example, there is a generic algorithm called find (). One can use that with any of the containers to find a specific element from that container. Being nonmember, these generic algorithms can be applied to many classes and not confined to a specific class. These generic algorithms are designed by experts and thus designed with the prime goal of efficiency and optimization. The C++ designers expect users to forgo writing their own routines for such cases and use the STL routines. Such an approach offers two distinct advantages. First, programmers get a tested and efficient solution off the shelf and second, programmers code, which uses STL components, become standard and can be debugged or used with more confidence by users. The second point needs little more explanation. Let us take an example to understand. If you are a user and you have two options, a code which a programmer’s novice algorithm and a code with STL. Which option do you prefer? People normally trust STL based code more than the programmer’s code. The programmer’s code, however efficient or good he claims, you cannot trust unless it runs for a longer period without any problem. On the contrary, thousands of users are already using STL code and when my programmer gives you a code based on STL, you have reasons to be more confident about that code.
Another point, when the designers are taking the functions out of the class and make it available for multiple classes, they are to be written once and used for multiple classes. When a single routine is written for multiple classes, it is usually written with utmost care for optimality which is exactly done for STL. These routines are designed to keep the efficiency level as much as possible and designed by experts who can do so. It is unlike that normal user could code with such an optimized manner.
Like other components of C++, STL can also be extended with additional containers and algorithms if one needs.
Generic algorithms and containers
Generic algorithms are function templates which can operate on multiple classes as being nonmember. Containers can contain objects of other classes. Vector, for example, is one such container works exactly like a dynamic array which can extend itself when the number of elements increases beyond the capacity. Another container, the list, is basically an efficient implementation of a doubly linked list. Insertion and deletion at random places are the most efficient operation with lists. Deque is another container which is an implementation of a double ended queue. Applying various constraints, the deque can also act as a queue or a stack. Stack and queue are generated from deque and are known as adapted containers. As these containers are keeping the objects in a sequence of their insertion, they are known as sequence containers.
Unlike the sequence containers, there are some other containers which can provide insertion in a sorted form of a key element. They provide sorted elements at any given point of time as the insertion is always done in the right place in the sorted order.
Set, map and their multi-key versions multiset and multimap are known as associated sorted containers. They keep the contained objects in a typical sorted form. There is a field in the contained object known as a key. The sorted associative containers make sure the elements are inserted in the sorted order of the key. The elements of the sorted associative containers are inserted in the data structure, known as the height balanced tree, which is a very efficient structure for such a case. The difference between the normal associative container and their multi-key version is that the multi key versions allow them to have more than one element with the same key while normal containers do not allow them.
An important component which glues containers and generic algorithms together is called an iterator. Iterators are pointer like objects and used to refer to the contained objects. The generic algorithms are written in the form that they act on iterators and no other data structures. The containers are also designed in form of iterators of some type. Iterators are categorized into different hierarchical types. Algorithms can work on iterators of the type they support or any higher type. For example, an input iterator can only read. A generic algorithm find() acts using an input iterator to find an element from a sequence container. The find() can also work with any container which offers any higher order iterator, for example, the vector which offers random access iterator which is of a higher order than input iterator.
Generic algorithms are designed to be extremely efficient, for example, sort() is one such algorithm which implements a version of quick sort internally. Interestingly, when one uses such algorithms, they aren’t equally efficient for all containers. Sort() demands random access to data, for example, and a container of type list is not stored in a form where random access is possible. As the search for an element happens in sequential fashion, sort algorithm would have worked painfully slow for such a container. (There is a member function sort provided for list which can do a reasonably good job in sorting it though). Here the sort () is defined to have a type of iterators known as random access iterators. The list () works on bi-directional iterators and thus sort is not allowed to work on the list as bidirectional iterators are of a lower order than the random access iterators. One would get a compiler error when such an operation is carried out. Intelligently categorizing iterators allows and prohibits generic algorithms to work only in an efficient manner and not otherwise.
Long back in 1979 Alexander Stepanov, a programmer working with GE R&D center, started working on a concept of generic programming. The idea had nothing to do with C++ in the beginning. The ideas were implemented in a language known as Ada. Later on, when Stepanov joined AT&T, his idea of generic programming merged with C++. The eventual result was STL. Andrew Koenig, one of the standardization committee members, helped the committee to invite Stepanov to present on his idea and the idea of Generic Programming
become part of ANSI C++ standard. Many experts believe that the STL is the biggest addition to the C++ standard.
Generic Programming Principles
Now let us discuss the generic programming principles on which the STL is designed on. As we have mentioned before, it is not a normal library. STL is based on generic programming principles which are independent of C++ itself.
Generic programming is a mechanism to design generic software components. Generic algorithms work on these generic components without keeping any specific components in mind. Rather one of the critical ideas is to keep the algorithm as less dependent on the component design as possible. The algorithms are designed with minimum requirements. The requirements are specified often in terms of the typical type of iterators. Any container needs the type of iterator offered by the algorithm or any lesser version can work with these algorithms. Thus most algorithms work with most containers without any change.
These algorithms and the containers are based on templatized design and thus is different than object oriented model based design. The generic programming model tries to provide a similar job in more efficient fashion. It cleverly uses templates, iterators, and good design to do its job. We have already seen that unlike the OO model, this model does not offer any overhead at runtime. However, it is also possible to combine both models and inherit the containers and use base class pointers to access the items and have the best of both worlds.
It has been stressed a few times before but it is important to reiterate that C++ offers many choices to the programmers and programmers design their system based on any combination of multiple options that they deem fit. Choosing the right mix and judicious choices from the options available is critical to code well in C++. STL is another very good option to a programmer.
The vector, list deque etc. are designed in a way that any programmer can use them for his system. One can, for example, use the queue to model customer queue in a banking application or a process queue in an OS implementation. When hundreds of programmers across the world are using these components, they are well debugged as well.
There are total 12 such containers defined in STL at the point of this writing1. These small number of containers are designed in such a generic fashion that it is possible for programmers to use them in solving a myriad of problems. Such a small number can be easily mastered. They are also designed keeping efficiency in mind. Each component specifies a typical time requirement and tested to provide that time requirement. A
- 1 As per the standard. There are three adapted containers, hash based unordered associative containers, and two more containers valarray and bitset.
program using STL is more portable as it is available on all known platforms. The queues and vectors and lists are available on every platform in the same form and work the same on every platform. The generic algorithms are also available on all platforms so a program using them is portable across all platforms. Templatized design makes it independent of types and more reusable.
The algorithms operate on these software components does not depend much on how those components are designed. They only dependency is the iterators the components work on and the algorithm demands. The algorithms also work consistently and the programming with them is easier for programmers. For example, testing the end of the sequence is same in all containers. So learning about how to do it in one container, one does not need to learn to do it for another.
The containers contain multiple contained objects and there is a need to have an efficient method to traverse through the collection of these objects. Pointer like objects called Iterators is designed precisely for the same.
Let us elaborate this point a little further. When a programming system is dealing with multiple objects, there are two primary methods to work on them. Consider a case of an array and a file. We can define a pointer and access elements of the array using a simple pointer increment-decrement. For example, consider following code.
int dummy[]={0,1,2,3,4,5,6,7,8,9};
int * p = dummy;
int iterate = 0;
while (iterate++ < 10)
cout << *p++ << “,”;
The code is simple. There is a pointer p, pointing to the members of the array dummy one after another and print their content. We have used two things, dereferencing (adding * in front of p to make it *p) to get the value and incrementing p (using p++). This method of accessing each element of the array is quite straight forward and is done in a minimum number of steps.
Another way to access elements is exemplified when one access elements of files. We can use get() and put() methods to access elements of a file. We cannot address any element directly. This method is not direct and usually modeled using a much more complex process. For example, the file get() demands finding out where the next element is, in which sector of the disk, get that complete sector and fetch the said element from it. A functional approach hides that complexity from the user and thus quite appropriate in these circumstances.
The pointer to elements of the array and iterating through them directly represents the model called addressing mechanism. Providing pointers have a distinct advantage, it is a very fast way to access an element. Unfortunately, pointers can create serious consequences if used in an inappropriate way. Designers of some other languages, most notably Java, felt that this approach of exposing the raw addresses to the programmer is too risky to be allowed and they forgo the pointer mechanism, thereby eliminating the address mechanism to access elements. The functional approach is simpler but slower. The addressing mechanism gives the programmer the ability to control the process in a complete way unlike functional approach. Those who want such detailed control over that process prefer the address mechanism. The programmer can use this method to tailor their own solution in the most efficient way. The functional approach forgoes this finer control to provide better readability and easier interface.
The programmers can define and use iterators like pointers, they can dereference to get the object they are pointing to, can assign them, compare them, passing to a function or returning from a function, exactly like pointers. Thus it is consistent with the philosophy of the C++ design. It provides complete freedom to programmer and access to raw addresses, assuming the programmer knows what he is doing.
The above discussion might raise one more query in your mind. If iterators are so much like pointers, why not the designers use pointers instead? Why an another indirection using iterators?
There are two reasons for using iterators instead. First, unlike pointers the iterators are categorized and their categories help the system determine which algorithm can be applied to which type of container, automatically2. Second, through pointers are de facto implementation method for the iterators, one can use a more fitting method for a given case. A programmer can actually design his own iterator for his collection of object and not confined to pointers. Iterators are chosen for two characteristics C++ programmers love to have.
- An open design where a programmer can manipulate an iterator directly
- Addressing mechanism which does the job in the least possible steps
The containers also possess some other characteristics which make them special.
- They can grow as per user’s need. A user may not specify the size while defining a container and start using it. As long as the number of elements is less than the
- 2 Lowest level of iterators is Input iterators, next is output, next is forward iterators, bi-directional iterators come next, and finally, we have random access iterators. Their order is also decided in that fashion.
default size, the elements are added. As soon as it becomes equal to the size, C++ runtime system increases the size and reallocate all elements if need be and start inserting additional elements thereon. This is unlike an array where once the maximum size is reached, it is not possible for the user to add any element into it.
- An STL container behaves as a normal object as long as possible. It can be defined as a normal object (using templates, i.e. with a specific type declaration), can be assigned to other objects and copied to other objects etc.
- New algorithms and containers can be added, if need be. The default behavior of a standard container can be altered if need be. The design of STL allows options for extending the original design by providing different values to default parameters or specifying other methods for working (for example, the way memory is allocated).
Sequence Containers: vector
Let us begin with the first sequence container, the vector. The vector is much like an array and stores the element in a fashion random access is possible. Any element of the vector can be accessed with a constant time. Let us see how vector works using a simple example depicted in 32.1.
We define two vectors, one of type int and another is of type cricketer.
vector <Cricketer> Vc;
vector <int> vi;
The function push_back inserts at the end of the vector and that function are called to insert 6 cricketers into a vector of type Cricketers while 6 integers are inserted into the int vector.
Vc.push_back(Mithali);
…
Vc.push_back(Veda);
and
vi.push_back(1);
…
vi.push_back(4);
Once the elements are inserted, it is time to access them one after another. We need to define iterators for the same. Following statements do that for us.
vector<int>::iterator IntIndex;
vector<Cricketer>::iterator CricketerIndex;
Thus two iterators, one for traversing integer elements and another for traversing Cricketers are defined. After defining, using them is equally straight forward. There are two member functions begin () and end () which are available to all containers. Function begin () returns the first element address while end () returns one address past the last element, kind of invisible final boundary after the last element. These two functions are used to iterate through the elements of the vector one after another. We start with and thus start with the CricketerIndex pointing to the first element of Vc. We iterate through the elements using simple increment using CricketerIndex++. We can test if the last element is traversed by CricketerIndex.end() being true or not. We can refer to the cricket object pointed to by the CricketerIndex iterator just by dereferencing it as *CricketerIndex. The functions begin() and end() returns an iterator pointing to first and one past last element. When a for loop starts with first = Vc.begin() and go till < last = Vc.end(), it includes the range [first, last) including the first value but not the last value.
Program 32.1
#include <vector>
#include <iostream>
#include <string>
using namespace std;
class Cricketer
{
int JerseyNo;
string Nm;
public:
Cricketer(int t_rNo, string t_Nm): JerseyNo(t_rNo), Nm(t_Nm) { };
bool operator < (Cricketer t_Cric)
{
return (JerseyNo < t_Cric.JerseyNo);
}
friend ostream & operator << (ostream & t_Out, Cricketer t_Cric)
{
t_Out<< “Jersey Number: ” << t_Cric.JerseyNo;
t_Out<< ” Name : ” << t_Cric.Nm << endl;
return t_Out;
}
};
int main()
{
vector <Cricketer> Vc;
vector <int> vi;
vi.push_back(1);
vi.push_back(2);
vi.push_back(6);
vi.push_back(9);
vi.push_back(4);
Cricketer Mithali (1, “Mithali Raj”);
Cricketer Sushma (2, “Shushma Verma”);
Cricketer Harman (3, “Harmanpreet Kaur”);
Cricketer Dipti (4, “Dipti Sharma”);
Cricketer Zulan (5, “Zulan Goswami”);
Cricketer Veda (6, “Veda Krishnamurthi”);
Vc.push_back(Mithali);
Vc.push_back(Sushma);
Vc.push_back(Harman);
Vc.push_back(Dipti);
Vc.push_back(Zulan);
Vc.push_back(Veda);
vector<int>::iterator IntIndex;
vector<Cricketer>::iterator CricketerIndex;
for (IntIndex = vi.begin(); IntIndex < vi.end(); IntIndex++)
{
cout << *IntIndex << endl;
}
for (CricketerIndex = Vc.begin(); CricketerIndex < Vc.end(); CricketerIndex++)
{
cout << *CricketerIndex;
}
}
Output:
1
2
6
9
4
Jersey Number: 1 Name: Mithali Raj
Jersey Number: 2 Name: Sushma Verma
Jersey Number: 3 Name: Harmanpreet Kaur
Jersey Number: 4 Name: Dipti Sharma
Jersey Number: 5 Name: Zulan Goswami
Jersey Number: 6 Name: Veda Krishnamurthi
Program ended with exit code: 0
Extensibility of a vector container
The vector works quite the same as the array but there is more to it. What if the number of elements inserted goes beyond the size? Remember our discussion on vector being extendable, it is time to see how it does so. Here is the code that showcases that. A member function capacity() is called to describe the current size of the vector. The output clearly indicates that at the time of definition, both the vectors have size zero and the size is doubled every time a new element is inserted when the vector is full. We inserted 200 elements in both types of vectors and the output clearly shows how the vector increases its size to 256, starting from 1 then 2 and then 4 and so on and finally from 128 to 256. In fact, to keep the random access working, the vectors not only increases in size, they also relocate their elements in contiguous positions after having a much larger memory space. For example, when the vector grows from 128 to 256, it occupies a new memory area where it can store all 256 elements together in contiguous fashion and thus enabling random access to any element. The author has worked with a few other compilers and found that the increment of vector size may not be in the same fashion always. It is also important to note that when vectors increase their size, they also have to acquire more memory for a bigger size, which they may get at some other place, so need to relocate current elements. It is because vector provides random access and for that the elements need to be stored at contiguous location. It is not possible to have random access if some elements of the vector are stored at one place and others at other places. This relocation introduces significant amount of overhead but it also provides amount of flexibility no other container provides. It is like having an array which can extend when need be.
#include <vector>
#include <iostream>
#include <string>
using namespace std;
class Cricketer
{
int JerseyNo;
string Nm;
public:
Cricketer(int t_rNo, string t_Nm): JerseyNo(t_rNo), Nm(t_Nm) { };
bool operator < (Cricketer t_Cric)
{
return (JerseyNo < t_Cric.JerseyNo);
}
friend ostream & operator << (ostream & t_Out, Cricketer t_Cric)
{
t_Out<< “Jersey Number: ” << t_Cric.JerseyNo;
t_Out<< ” Name : ” << t_Cric.Nm << endl;
return t_Out;
}
};
int main()
{
vector <Cricketer> Vc;
vector <int> vi;
cout << “The capacity of Cricketer vector is “<< Vc.capacity()<< endl; cout << “The capacity of integer vector is “<< vi.capacity()<< endl; unsigned long t_Size = vi.capacity();
unsigned long t_S_Size = Vc.capacity();
for (int i=0; i< 200;i++)
{
vi.push_back(i);
if (vi.capacity() != t_Size)
{
cout << ” the capacity of the vector changes to ” << vi.capacity() << endl;
t_Size = vi.capacity();
}
}
Cricketer Mithali(1,”Mithali Raj”);
for (int i=0; i< 200;i++)
{
Vc.push_back(Mithali);
if (Vc.capacity() != t_S_Size)
{
cout << ” the capacity of the vector changes to ” << Vc.capacity() << endl; t_S_Size = Vc.capacity();
}
}
}
The capacity of Cricketer vector is 0
The capacity of integer vector is 0
the capacity of the vector changes to 1
the capacity of the vector changes to 2
the capacity of the vector changes to 4
the capacity of the vector changes to 8
the capacity of the vector changes to 16
the capacity of the vector changes to 32
the capacity of the vector changes to 64
the capacity of the vector changes to 128
the capacity of the vector changes to 256
the capacity of the vector changes to 1
the capacity of the vector changes to 2
the capacity of the vector changes to 4
the capacity of the vector changes to 8
the capacity of the vector changes to 16
the capacity of the vector changes to 32
the capacity of the vector changes to 64
the capacity of the vector changes to 128
the capacity of the vector changes to 256
Program ended with exit code: 0
You might wonder why we have overloaded the operator <. It is to decide the order in case one needs to sort the data. If we call a sort function, we need to pass only the range of the vector to sort. The sorting process will decide the order based on how we have overloaded the operator < in the contained object. For example, if we ever call the following statement, the cricketers will be sorted based on the jersey number.
sort(Vc.begin(),Vc.end();
Why? Because the overloaded operator < indicates so.
Summary
In this module, we have introduced the Standard Template Library, which is a significant addition to the ANSI C++ standard. STL is a collection of containers or template classes which can contain other objects and algorithms, template functions which are designed in a fashion that they can be used with multiple containers. Iterators are pointer-like objects which can be used to connect the containers and algorithms. We have also seen a sequence container named vector in this module.
References
Reference 1: Introduction to ANSI C++, Bhushan Trivedi, Oxford University Press
Reference2: www.stroustrup.com, homepage of Bjarne Stroustrup, the creator of C++
Reference 3: The C++ Primer, Stanley Lippmann, Addition Wesley