Data Abstraction:The basic object, Abstract data typing, Header file etiquette

<< The C in C++:Creating functions, Controlling execution, Introduction to operators

Hiding the Implementation:C++ access control, Handle classes >>

4: Data Abstraction

C++ is a productivity enhancement tool. Why else

would you make the effort (and it is an effort,

regardless of how easy we attempt to make the

transition)

233

to switch from some language that you already know and are

productive with to a new language in which you're going to be less

productive for a while, until you get the hang of it? It's because

you've become convinced that you're going to get big gains by

using this new tool.

Productivity, in computer programming terms, means that fewer

people can make much more complex and impressive programs in

less time. There are certainly other issues when it comes to

choosing a language, such as efficiency (does the nature of the

language cause slowdown and code bloat?), safety (does the

language help you ensure that your program will always do what

you plan, and handle errors gracefully?), and maintenance (does

the language help you create code that is easy to understand,

modify, and extend?). These are certainly important factors that

will be examined in this book.

But raw productivity means a program that formerly took three of

you a week to write now takes one of you a day or two. This

touches several levels of economics. You're happy because you get

the rush of power that comes from building something, your client

(or boss) is happy because products are produced faster and with

fewer people, and the customers are happy because they get

products more cheaply. The only way to get massive increases in

productivity is to leverage off other people's code. That is, to use

libraries.

A library is simply a bunch of code that someone else has written

and packaged together. Often, the most minimal package is a file

with an extension like lib and one or more header files to tell your

compiler what's in the library. The linker knows how to search

through the library file and extract the appropriate compiled code.

But that's only one way to deliver a library. On platforms that span

many architectures, such as Linux/Unix, often the only sensible

way to deliver a library is with source code, so it can be

reconfigured and recompiled on the new target.

234

Thinking in C++

Thus, libraries are probably the most important way to improve

productivity, and one of the primary design goals of C++ is to

make library use easier. This implies that there's something hard

about using libraries in C. Understanding this factor will give you a

first insight into the design of C++, and thus insight into how to use

it.

A tiny C-like library

A library usually starts out as a collection of functions, but if you

have used third-party C libraries you know there's usually more to

it than that because there's more to life than behavior, actions, and

functions. There are also characteristics (blue, pounds, texture,

luminance), which are represented by data. And when you start to

deal with a set of characteristics in C, it is very convenient to clump

them together into a struct, especially if you want to represent

more than one similar thing in your problem space. Then you can

make a variable of this struct for each thing.

Thus, most C libraries have a set of structs and a set of functions

that act on those structs. As an example of what such a system

looks like, consider a programming tool that acts like an array, but

whose size can be established at runtime, when it is created. I'll call

it a CStash. Although it's written in C++, it has the style of what

you'd write in C:

//: C04:CLib.h

// Header file for a C-like library

// An array-like entity created at runtime

typedef struct CStashTag {

int size;

// Size of each space

int quantity; // Number of storage spaces

int next;

// Next empty space

// Dynamically allocated array of bytes:

unsigned char* storage;

} CStash;

4: Data Abstraction

235

void initialize(CStash* s, int size);

void cleanup(CStash* s);

int add(CStash* s, const void* element);

void* fetch(CStash* s, int index);

int count(CStash* s);

void inflate(CStash* s, int increase);

///:~

A tag name like CStashTagis generally used for a struct in case

you need to reference the struct inside itself. For example, when

creating a linked list (each element in your list contains a pointer to

the next element), you need a pointer to the next struct variable, so

you need a way to identify the type of that pointer within the struct

body. Also, you'll almost universally see the typedef as shown

above for every struct in a C library. This is done so you can treat

the struct as if it were a new type and define variables of that struct

like this:

CStash A, B, C;

The storage pointer is an unsigned char* An unsigned charis the

smallest piece of storage a C compiler supports, although on some

machines it can be the same size as the largest. It's implementation

dependent, but is often one byte long. You might think that because

the CStash is designed to hold any type of variable, a void* would

be more appropriate here. However, the purpose is not to treat this

storage as a block of some unknown type, but rather as a block of

contiguous bytes.

The source code for the implementation file (which you may not

get if you buy a library commercially you might get only a

compiled obj or lib or dll, etc.) looks like this:

//: C04:CLib.cpp {O}

// Implementation of example C-like library

// Declare structure and functions:

#include "CLib.h"

#include <iostream>

#include <cassert>

using namespace std;

236

Thinking in C++

// Quantity of elements to add

// when increasing storage:

const int increment = 100;

void initialize(CStash* s, int sz) {

s->size = sz;

s->quantity = 0;

s->storage = 0;

s->next = 0;

}

int add(CStash* s, const void* element) {

if(s->next >= s->quantity) //Enough space left?

inflate(s, increment);

// Copy element into storage,

// starting at next empty space:

int startBytes = s->next * s->size;

unsigned char* e = (unsigned char*)element;

for(int i = 0; i < s->size; i++)

s->storage[startBytes + i] = e[i];

s->next++;

return(s->next - 1); // Index number

}

void* fetch(CStash* s, int index) {

// Check index boundaries:

assert(0 <= index);

if(index >= s->next)

return 0; // To indicate the end

// Produce pointer to desired element:

return &(s->storage[index * s->size]);

}

int count(CStash* s) {

return s->next; // Elements in CStash

}

void inflate(CStash* s, int increase) {

assert(increase > 0);

int newQuantity = s->quantity + increase;

int newBytes = newQuantity * s->size;

int oldBytes = s->quantity * s->size;

unsigned char* b = new unsigned char[newBytes];

for(int i = 0; i < oldBytes; i++)

b[i] = s->storage[i]; // Copy old to new

4: Data Abstraction

237

delete [](s->storage); // Old storage

s->storage = b; // Point to new memory

s->quantity = newQuantity;

}

void cleanup(CStash* s) {

if(s->storage != 0) {

cout << "freeing storage" << endl;

delete []s->storage;

}

} ///:~

initialize( )performs the necessary setup for struct CStashby

setting the internal variables to appropriate values. Initially, the

storage pointer is set to zero no initial storage is allocated.

The add( ) function inserts an element into the CStash at the next

available location. First, it checks to see if there is any available

space left. If not, it expands the storage using the inflate( )function,

described later.

Because the compiler doesn't know the specific type of the variable

being stored (all the function gets is a void*), you can't just do an

assignment, which would certainly be the convenient thing.

Instead, you must copy the variable byte-by-byte. The most

straightforward way to perform the copying is with array indexing.

Typically, there are already data bytes in storage, and this is

indicated by the value of next. To start with the right byte offset,

next is multiplied by the size of each element (in bytes) to produce

startBytes Then the argument element is cast to an unsigned char

so that it can be addressed byte-by-byte and copied into the

available storage space. next is incremented so that it indicates the

next available piece of storage, and the "index number" where the

value was stored so that value can be retrieved using this index

number with fetch( )

fetch( )checks to see that the index isn't out of bounds and then

returns the address of the desired variable, calculated using the

index argument. Since index indicates the number of elements to

238

Thinking in C++

offset into the CStash, it must be multiplied by the number of bytes

occupied by each piece to produce the numerical offset in bytes.

When this offset is used to index into storage using array indexing,

you don't get the address, but instead the byte at the address. To

produce the address, you must use the address-of operator &.

count( )may look a bit strange at first to a seasoned C programmer.

It seems like a lot of trouble to go through to do something that

would probably be a lot easier to do by hand. If you have a struct

CStash called intStash for example, it would seem much more

straightforward to find out how many elements it has by saying

intStash.nextinstead of making a function call (which has

overhead), such as count(&intStash)However, if you wanted to

change the internal representation of CStash and thus the way the

count was calculated, the function call interface allows the

necessary flexibility. But alas, most programmers won't bother to

find out about your "better" design for the library. They'll look at

the struct and grab the next value directly, and possibly even

change next without your permission. If only there were some way

for the library designer to have better control over things like this!

(Yes, that's foreshadowing.)

Dynamic storage allocation

You never know the maximum amount of storage you might need

for a CStash, so the memory pointed to by storage is allocated from

the heap. The heap is a big block of memory used for allocating

smaller pieces at runtime. You use the heap when you don't know

the size of the memory you'll need while you're writing a program.

That is, only at runtime will you find out that you need space to

hold 200 Airplanevariables instead of 20. In Standard C, dynamic-

memory allocation functions include malloc( ) calloc( ) realloc( )

and free( ). Instead of library calls, however, C++ has a more

sophisticated (albeit simpler to use) approach to dynamic memory

that is integrated into the language via the keywords new and

delete.

4: Data Abstraction

239

The inflate( )function uses new to get a bigger chunk of space for

the CStash. In this situation, we will only expand memory and not

shrink it, and the assert( )will guarantee that a negative number is

not passed to inflate( )as the increasevalue. The new number of

elements that can be held (after inflate( )completes) is calculated as

newQuantity and this is multiplied by the number of bytes per

element to produce newBytes which will be the number of bytes in

the allocation. So that we know how many bytes to copy over from

the old location, oldBytesis calculated using the old quantity

The actual storage allocation occurs in the new-expression, which is

the expression involving the new keyword:

new unsigned char[newBytes];

The general form of the new-expression is:

new Type;

in which Type describes the type of variable you want allocated on

the heap. In this case, we want an array of unsigned charthat is

newByteslong, so that is what appears as the Type. You can also

allocate something as simple as an int by saying:

new int;

and although this is rarely done, you can see that the form is

consistent.

A new-expression returns a pointer to an object of the exact type

that you asked for. So if you say new Type you get back a pointer

to a Type. If you say new int, you get back a pointer to an int. If

you want a new unsigned chararray, you get back a pointer to the

first element of that array. The compiler will ensure that you assign

the return value of the new-expression to a pointer of the correct

type.

240

Thinking in C++

Of course, any time you request memory it's possible for the

request to fail, if there is no more memory. As you will learn, C++

has mechanisms that come into play if the memory-allocation

operation is unsuccessful.

Once the new storage is allocated, the data in the old storage must

be copied to the new storage; this is again accomplished with array

indexing, copying one byte at a time in a loop. After the data is

copied, the old storage must be released so that it can be used by

other parts of the program if they need new storage. The delete

keyword is the complement of new, and must be applied to release

any storage that is allocated with new (if you forget to use delete,

that storage remains unavailable, and if this so-called memory leak

happens enough, you'll run out of memory). In addition, there's a

special syntax when you're deleting an array. It's as if you must

remind the compiler that this pointer is not just pointing to one

object, but to an array of objects: you put a set of empty square

brackets in front of the pointer to be deleted:

delete []myArray;

Once the old storage has been deleted, the pointer to the new

storage can be assigned to the storage pointer, the quantity is

adjusted, and inflate( )has completed its job.

Note that the heap manager is fairly primitive. It gives you chunks

of memory and takes them back when you delete them. There's no

inherent facility for heap compaction, which compresses the heap to

provide bigger free chunks. If a program allocates and frees heap

storage for a while, you can end up with a fragmented heap that has

lots of memory free, but without any pieces that are big enough to

allocate the size you're looking for at the moment. A heap

compactor complicates a program because it moves memory

chunks around, so your pointers won't retain their proper values.

Some operating environments have heap compaction built in, but

they require you to use special memory handles (which can be

temporarily converted to pointers, after locking the memory so the

4: Data Abstraction

241

heap compactor can't move it) instead of pointers. You can also

build your own heap-compaction scheme, but this is not a task to

be undertaken lightly.

When you create a variable on the stack at compile-time, the

storage for that variable is automatically created and freed by the

compiler. The compiler knows exactly how much storage is needed,

and it knows the lifetime of the variables because of scoping. With

dynamic memory allocation, however, the compiler doesn't know

how much storage you're going to need, and it doesn't know the

lifetime of that storage. That is, the storage doesn't get cleaned up

automatically. Therefore, you're responsible for releasing the

storage using delete, which tells the heap manager that storage can

be used by the next call to new. The logical place for this to happen

in the library is in the cleanup( )function because that is where all

the closing-up housekeeping is done.

To test the library, two CStashes are created. The first holds ints

and the second holds arrays of 80 chars:

//: C04:CLibTest.cpp

//{L} CLib

// Test the C-like library

#include "CLib.h"

#include <fstream>

#include <iostream>

#include <string>

#include <cassert>

using namespace std;

int main() {

// Define variables at the beginning

// of the block, as in C:

CStash intStash, stringStash;

int i;

char* cp;

ifstream in;

string line;

const int bufsize = 80;

// Now remember to initialize the variables:

242

Thinking in C++

initialize(&intStash, sizeof(int));

for(i = 0; i < 100; i++)

add(&intStash, &i);

for(i = 0; i < count(&intStash); i++)

cout << "fetch(&intStash, " << i << ") = "

<< *(int*)fetch(&intStash, i)

<< endl;

// Holds 80-character strings:

initialize(&stringStash, sizeof(char)*bufsize);

in.open("CLibTest.cpp");

assert(in);

while(getline(in, line))

add(&stringStash, line.c_str());

i = 0;

while((cp = (char*)fetch(&stringStash,i++))!=0)

cout << "fetch(&stringStash, " << i << ") = "

<< cp << endl;

cleanup(&intStash);

cleanup(&stringStash);

} ///:~

Following the form required by C, all the variables are created at

the beginning of the scope of main( ). Of course, you must

remember to initialize the CStash variables later in the block by

calling initialize( . One of the problems with C libraries is that you

)

must carefully convey to the user the importance of the

initialization and cleanup functions. If these functions aren't called,

there will be a lot of trouble. Unfortunately, the user doesn't always

wonder if initialization and cleanup are mandatory. They know

what they want to accomplish, and they're not as concerned about

you jumping up and down saying, "Hey, wait, you have to do this

first!" Some users have even been known to initialize the elements

of a structure themselves. There's certainly no mechanism in C to

prevent it (more foreshadowing).

The intStashis filled up with integers, and the stringStashis filled

with character arrays. These character arrays are produced by

opening the source code file, CLibTest.cpp and reading the lines

from it into a string called line, and then producing a pointer to the

character representation of line using the member function c_str( )

4: Data Abstraction

243

After each Stash is loaded, it is displayed. The intStashis printed

using a for loop, which uses count( )to establish its limit. The

stringStashis printed with a while, which breaks out when fetch( )

returns zero to indicate it is out of bounds.

You'll also notice an additional cast in

cp = (char*)fetch(&stringStash,i++)

This is due to the stricter type checking in C++, which does not

allow you to simply assign a void* to any other type (C allows

this).

Bad guesses

There is one more important issue you should understand before

we look at the general problems in creating a C library. Note that

the CLib.h header file must be included in any file that refers to

CStash because the compiler can't even guess at what that

structure looks like. However, it can guess at what a function looks

like; this sounds like a feature but it turns out to be a major C

pitfall.

Although you should always declare functions by including a

header file, function declarations aren't essential in C. It's possible

in C (but not in C++) to call a function that you haven't declared. A

good compiler will warn you that you probably ought to declare a

function first, but it isn't enforced by the C language standard. This

is a dangerous practice, because the C compiler can assume that a

function that you call with an int argument has an argument list

containing int, even if it may actually contain a float. This can

produce bugs that are very difficult to find, as you will see.

Each separate C implementation file (with an extension of .c) is a

translation unit. That is, the compiler is run separately on each

translation unit, and when it is running it is aware of only that unit.

Thus, any information you provide by including header files is

quite important because it determines the compiler's

244

Thinking in C++

understanding of the rest of your program. Declarations in header

files are particularly important, because everywhere the header is

included, the compiler will know exactly what to do. If, for

example, you have a declaration in a header file that says void

func(float) the compiler knows that if you call that function with

an integer argument, it should convert the int to a float as it passes

the argument (this is called promotion). Without the declaration, the

C compiler would simply assume that a function func(int)existed,

it wouldn't do the promotion, and the wrong data would quietly be

passed into func( ).

For each translation unit, the compiler creates an object file, with an

extension of .o or .obj or something similar. These object files, along

with the necessary start-up code, must be collected by the linker

into the executable program. During linking, all the external

references must be resolved. For example, in CLibTest.cpp

functions such as initialize( )and fetch( )are declared (that is, the

compiler is told what they look like) and used, but not defined.

They are defined elsewhere, in CLib.cpp Thus, the calls in

CLib.cppare external references. The linker must, when it puts all

the object files together, take the unresolved external references and

find the addresses they actually refer to. Those addresses are put

into the executable program to replace the external references.

It's important to realize that in C, the external references that the

linker searches for are simply function names, generally with an

underscore in front of them. So all the linker has to do is match up

the function name where it is called and the function body in the

object file, and it's done. If you accidentally made a call that the

compiler interpreted as func(int)and there's a function body for

func(float)in some other object file, the linker will see _func in one

place and _func in another, and it will think everything's OK. The

func( ) at the calling location will push an int onto the stack, and

the func( ) function body will expect a float to be on the stack. If the

function only reads the value and doesn't write to it, it won't blow

up the stack. In fact, the float value it reads off the stack might even

4: Data Abstraction

245

make some kind of sense. That's worse because it's harder to find

the bug.

What's wrong?

We are remarkably adaptable, even in situations in which perhaps

we shouldn't adapt. The style of the CStash library has been a staple

for C programmers, but if you look at it for a while, you might

notice that it's rather . . . awkward. When you use it, you have to

pass the address of the structure to every single function in the

library. When reading the code, the mechanism of the library gets

mixed with the meaning of the function calls, which is confusing

when you're trying to understand what's going on.

One of the biggest obstacles, however, to using libraries in C is the

problem of name clashes. C has a single name space for functions;

that is, when the linker looks for a function name, it looks in a

single master list. In addition, when the compiler is working on a

translation unit, it can work only with a single function with a

given name.

Now suppose you decide to buy two libraries from two different

vendors, and each library has a structure that must be initialized

and cleaned up. Both vendors decided that initialize( )and

cleanup( )are good names. If you include both their header files in

a single translation unit, what does the C compiler do? Fortunately,

C gives you an error, telling you there's a type mismatch in the two

different argument lists of the declared functions. But even if you

don't include them in the same translation unit, the linker will still

have problems. A good linker will detect that there's a name clash,

but some linkers take the first function name they find, by

searching through the list of object files in the order you give them

in the link list. (This can even be thought of as a feature because it

allows you to replace a library function with your own version.)

246

Thinking in C++

In either event, you can't use two C libraries that contain a function

with the identical name. To solve this problem, C library vendors

will often prepend a sequence of unique characters to the beginning

of all their function names. So initialize( )and cleanup( )might

become CStash_initialize( )

and CStash_cleanup( .)This is a

logical thing to do because it "decorates" the name of the struct the

function works on with the name of the function.

Now it's time to take the first step toward creating classes in C++.

Variable names inside a struct do not clash with global variable

names. So why not take advantage of this for function names, when

those functions operate on a particular struct? That is, why not

make functions members of structs?

The basic object

Step one is exactly that. C++ functions can be placed inside structs

as "member functions." Here's what it looks like after converting

the C version of CStash to the C++ Stash:

//: C04:CppLib.h

// C-like library converted to C++

struct Stash {

int size;

// Size of each space

int quantity; // Number of storage spaces

int next;

// Next empty space

// Dynamically allocated array of bytes:

unsigned char* storage;

// Functions!

void initialize(int size);

void cleanup();

int add(const void* element);

void* fetch(int index);

int count();

void inflate(int increase);

}; ///:~

First, notice there is no typedef. Instead of requiring you to create a

typedef, the C++ compiler turns the name of the structure into a

4: Data Abstraction

247

new type name for the program (just as int, char, float and double

are type names).

All the data members are exactly the same as before, but now the

functions are inside the body of the struct. In addition, notice that

the first argument from the C version of the library has been

removed. In C++, instead of forcing you to pass the address of the

structure as the first argument to all the functions that operate on

that structure, the compiler secretly does this for you. Now the only

arguments for the functions are concerned with what the function

does, not the mechanism of the function's operation.

It's important to realize that the function code is effectively the

same as it was with the C version of the library. The number of

arguments is the same (even though you don't see the structure

address being passed in, it's still there), and there's only one

function body for each function. That is, just because you say

Stash A, B, C;

doesn't mean you get a different add( ) function for each variable.

So the code that's generated is almost identical to what you would

have written for the C version of the library. Interestingly enough,

this includes the "name decoration" you probably would have

done to produce Stash_initialize( ,)Stash_cleanup( ,)and so on.

When the function name is inside the struct, the compiler

effectively does the same thing. Therefore, initialize( )inside the

structure Stash will not collide with a function named initialize( )

inside any other structure, or even a global function named

initialize( . Most of the time you don't have to worry about the

)

function name decoration you use the undecorated name. But

sometimes you do need to be able to specify that this initialize( )

belongs to the struct Stash, and not to any other struct. In

particular, when you're defining the function you need to fully

specify which one it is. To accomplish this full specification, C++

has an operator (::) called the scope resolution operator (named so

248

Thinking in C++

because names can now be in different scopes: at global scope or

within the scope of a struct). For example, if you want to specify

initialize( , which belongs to Stash, you say Stash::initialize(int

)

size). You can see how the scope resolution operator is used in the

function definitions:

//: C04:CppLib.cpp {O}

// C library converted to C++

// Declare structure and functions:

#include "CppLib.h"

#include <iostream>

#include <cassert>

using namespace std;

// Quantity of elements to add

// when increasing storage:

const int increment = 100;

void Stash::initialize(int sz) {

size = sz;

quantity = 0;

storage = 0;

next = 0;

}

int Stash::add(const void* element) {

if(next >= quantity) // Enough space left?

inflate(increment);

// Copy element into storage,

// starting at next empty space:

int startBytes = next * size;

unsigned char* e = (unsigned char*)element;

for(int i = 0; i < size; i++)

storage[startBytes + i] = e[i];

next++;

return(next - 1); // Index number

}

void* Stash::fetch(int index) {

// Check index boundaries:

assert(0 <= index);

if(index >= next)

return 0; // To indicate the end

// Produce pointer to desired element:

4: Data Abstraction

249

return &(storage[index * size]);

}

int Stash::count() {

return next; // Number of elements in CStash

}

void Stash::inflate(int increase) {

assert(increase > 0);

int newQuantity = quantity + increase;

int newBytes = newQuantity * size;

int oldBytes = quantity * size;

unsigned char* b = new unsigned char[newBytes];

for(int i = 0; i < oldBytes; i++)

b[i] = storage[i]; // Copy old to new

delete []storage; // Old storage

storage = b; // Point to new memory

quantity = newQuantity;

}

void Stash::cleanup() {

if(storage != 0) {

cout << "freeing storage" << endl;

delete []storage;

}

} ///:~

There are several other things that are different between C and

C++. First, the declarations in the header files are required by the

compiler. In C++ you cannot call a function without declaring it

first. The compiler will issue an error message otherwise. This is an

important way to ensure that function calls are consistent between

the point where they are called and the point where they are

defined. By forcing you to declare the function before you call it,

the C++ compiler virtually ensures that you will perform this

declaration by including the header file. If you also include the

same header file in the place where the functions are defined, then

the compiler checks to make sure that the declaration in the header

and the function definition match up. This means that the header

file becomes a validated repository for function declarations and

250

Thinking in C++

ensures that functions are used consistently throughout all

translation units in the project.

Of course, global functions can still be declared by hand every

place where they are defined and used. (This is so tedious that it

becomes very unlikely.) However, structures must always be

declared before they are defined or used, and the most convenient

place to put a structure definition is in a header file, except for

those you intentionally hide in a file.

You can see that all the member functions look almost the same as

when they were C functions, except for the scope resolution and

the fact that the first argument from the C version of the library is

no longer explicit. It's still there, of course, because the function has

to be able to work on a particular struct variable. But notice, inside

the member function, that the member selection is also gone! Thus,

instead of saying s>size = sz;you say size = sz;and eliminate the

tedious s>, which didn't really add anything to the meaning of

what you were doing anyway. The C++ compiler is apparently

doing this for you. Indeed, it is taking the "secret" first argument

(the address of the structure that we were previously passing in by

hand) and applying the member selector whenever you refer to one

of the data members of a struct. This means that whenever you are

inside the member function of another struct, you can refer to any

member (including another member function) by simply giving its

name. The compiler will search through the local structure's names

before looking for a global version of that name. You'll find that

this feature means that not only is your code easier to write, it's a

lot easier to read.

But what if, for some reason, you want to be able to get your hands

on the address of the structure? In the C version of the library it

was easy because each function's first argument was a CStash*

called s. In C++, things are even more consistent. There's a special

keyword, called this, which produces the address of the struct. It's

4: Data Abstraction

251

the equivalent of the `s' in the C version of the library. So we can

revert to the C style of things by saying

this->size = Size;

The code generated by the compiler is exactly the same, so you

don't need to use this in such a fashion; occasionally, you'll see

code where people explicitly use this-> everywhere but it doesn't

add anything to the meaning of the code and often indicates an

inexperienced programmer. Usually, you don't use this often, but

when you need it, it's there (some of the examples later in the book

will use this).

There's one last item to mention. In C, you could assign a void* to

any other pointer like this:

int i = 10;

void* vp = &i; // OK in both C and C++

int* ip = vp; // Only acceptable in C

and there was no complaint from the compiler. But in C++, this

statement is not allowed. Why? Because C is not so particular about

type information, so it allows you to assign a pointer with an

unspecified type to a pointer with a specified type. Not so with

C++. Type is critical in C++, and the compiler stamps its foot when

there are any violations of type information. This has always been

important, but it is especially important in C++ because you have

member functions in structs. If you could pass pointers to structs

around with impunity in C++, then you could end up calling a

member function for a struct that doesn't even logically exist for

that struct! A real recipe for disaster. Therefore, while C++ allows

the assignment of any type of pointer to a void* (this was the

original intent of void*, which is required to be large enough to

hold a pointer to any type), it will not allow you to assign a void

pointer to any other type of pointer. A cast is always required to tell

the reader and the compiler that you really do want to treat it as the

destination type.

252

Thinking in C++

This brings up an interesting issue. One of the important goals for

C++ is to compile as much existing C code as possible to allow for

an easy transition to the new language. However, this doesn't mean

any code that C allows will automatically be allowed in C++. There

are a number of things the C compiler lets you get away with that

are dangerous and error-prone. (We'll look at them as the book

progresses.) The C++ compiler generates warnings and errors for

these situations. This is often much more of an advantage than a

hindrance. In fact, there are many situations in which you are

trying to run down an error in C and just can't find it, but as soon

as you recompile the program in C++, the compiler points out the

problem! In C, you'll often find that you can get the program to

compile, but then you have to get it to work. In C++, when the

program compiles correctly, it often works, too! This is because the

language is a lot stricter about type.

You can see a number of new things in the way the C++ version of

Stash is used in the following test program:

//: C04:CppLibTest.cpp

//{L} CppLib

// Test of C++ library

#include "CppLib.h"

#include "../require.h"

#include <fstream>

#include <iostream>

#include <string>

using namespace std;

int main() {

Stash intStash;

intStash.initialize(sizeof(int));

for(int i = 0; i < 100; i++)

intStash.add(&i);

for(int j = 0; j < intStash.count(); j++)

cout << "intStash.fetch(" << j << ") = "

<< *(int*)intStash.fetch(j)

<< endl;

// Holds 80-character strings:

Stash stringStash;

4: Data Abstraction

253

const int bufsize = 80;

stringStash.initialize(sizeof(char) * bufsize);

ifstream in("CppLibTest.cpp");

assure(in, "CppLibTest.cpp");

string line;

while(getline(in, line))

stringStash.add(line.c_str());

int k = 0;

char* cp;

while((cp =(char*)stringStash.fetch(k++)) != 0)

cout << "stringStash.fetch(" << k << ") = "

<< cp << endl;

intStash.cleanup();

stringStash.cleanup();

} ///:~

One thing you'll notice is that the variables are all defined "on the

fly" (as introduced in the previous chapter). That is, they are

defined at any point in the scope, rather than being restricted as

in C to the beginning of the scope.

The code is quite similar to CLibTest.cpp but when a member

function is called, the call occurs using the member selection

operator `.' preceded by the name of the variable. This is a

convenient syntax because it mimics the selection of a data member

of the structure. The difference is that this is a function member, so

it has an argument list.

Of course, the call that the compiler actually generates looks much

more like the original C library function. Thus, considering name

decoration and the passing of this, the C++ function call

intStash.initialize(sizeof(int), 100)

becomes something like

Stash_initialize(&intStash, sizeof(int), 100) you ever wonder

. If

what's going on underneath the covers, remember that the original

C++ compiler cfront from AT&T produced C code as its output,

which was then compiled by the underlying C compiler. This

approach meant that cfront could be quickly ported to any machine

that had a C compiler, and it helped to rapidly disseminate C++

compiler technology. But because the C++ compiler had to generate

254

Thinking in C++

C, you know that there must be some way to represent C++ syntax

in C (some compilers still allow you to produce C code).

There's one other change from ClibTest.cpp which is the

introduction of the require.hheader file. This is a header file that I

created for this book to perform more sophisticated error checking

than that provided by assert( ) It contains several functions,

including the one used here called assure( ),which is used for files.

This function checks to see if the file has successfully been opened,

and if not it reports to standard error that the file could not be

opened (thus it needs the name of the file as the second argument)

and exits the program. The require.hfunctions will be used

throughout the book, in particular to ensure that there are the right

number of command-line arguments and that files are opened

properly. The require.hfunctions replace repetitive and distracting

error-checking code, and yet they provide essentially useful error

messages. These functions will be fully explained later in the book.

What's an object?

Now that you've seen an initial example, it's time to step back and

take a look at some terminology. The act of bringing functions

inside structures is the root of what C++ adds to C, and it

introduces a new way of thinking about structures: as concepts. In

C, a struct is an agglomeration of data, a way to package data so

you can treat it in a clump. But it's hard to think about it as

anything but a programming convenience. The functions that

operate on those structures are elsewhere. However, with functions

in the package, the structure becomes a new creature, capable of

describing both characteristics (like a C struct does) and behaviors.

The concept of an object, a free-standing, bounded entity that can

remember and act, suggests itself.

In C++, an object is just a variable, and the purest definition is "a

region of storage" (this is a more specific way of saying, "an object

must have a unique identifier," which in the case of C++ is a

4: Data Abstraction

255

unique memory address). It's a place where you can store data, and

it's implied that there are also operations that can be performed on

this data.

Unfortunately, there's not complete consistency across languages

when it comes to these terms, although they are fairly well-

accepted. You will also sometimes encounter disagreement about

what an object-oriented language is, although that seems to be

reasonably well sorted out by now. There are languages that are

object-based, which means that they have objects like the C++

structures-with-functions that you've seen so far. This, however, is

only part of the picture when it comes to an object-oriented

language, and languages that stop at packaging functions inside

data structures are object-based, not object-oriented.

Abstract data typing

The ability to package data with functions allows you to create a

new data type. This is often called encapsulation1. An existing data

type may have several pieces of data packaged together. For

example, a float has an exponent, a mantissa, and a sign bit. You

can tell it to do things: add to another float or to an int, and so on.

It has characteristics and behavior.

The definition of Stash creates a new data type. You can add( ),

fetch( ) and inflate( ) You create one by saying Stash s, just as you

create a float by saying float f. A Stash also has characteristics and

behavior. Even though it acts like a real, built-in data type, we refer

to it as an abstract data type, perhaps because it allows us to abstract

a concept from the problem space into the solution space. In

addition, the C++ compiler treats it like a new data type, and if you

say a function expects a Stash, the compiler makes sure you pass a

1 This term can cause debate. Some people use it as defined here; others use it to

describe access control, discussed in the following chapter.

256

Thinking in C++

Stash to that function. So the same level of type checking happens

with abstract data types (sometimes called user-defined types) as

with built-in types.

You can immediately see a difference, however, in the way you

perform operations on objects. You say

object.memberFunction(arglist)

. This is "calling a member

function for an object." But in object-oriented parlance, this is also

referred to as "sending a message to an object." So for a Stash s, the

statement s.add(&i)"sends a message to s" saying, "add( ) this to

yourself." In fact, object-oriented programming can be summed up

in a single phrase: sending messages to objects. Really, that's all you

do create a bunch of objects and send messages to them. The trick,

of course, is figuring out what your objects and messages are, but

once you accomplish this the implementation in C++ is surprisingly

straightforward.

Object details

A question that often comes up in seminars is, "How big is an

object, and what does it look like?" The answer is "about what you

expect from a C struct." In fact, the code the C compiler produces

for a C struct (with no C++ adornments) will usually look exactly

the same as the code produced by a C++ compiler. This is

reassuring to those C programmers who depend on the details of

size and layout in their code, and for some reason directly access

structure bytes instead of using identifiers (relying on a particular

size and layout for a structure is a nonportable activity).

The size of a struct is the combined size of all of its members.

Sometimes when the compiler lays out a struct, it adds extra bytes

to make the boundaries come out neatly this may increase

execution efficiency. In Chapter 15, you'll see how in some cases

"secret" pointers are added to the structure, but you don't need to

worry about that right now.

4: Data Abstraction

257

You can determine the size of a struct using the sizeof operator.

Here's a small example:

//: C04:Sizeof.cpp

// Sizes of structs

#include "CLib.h"

#include "CppLib.h"

#include <iostream>

using namespace std;

struct A {

int i[100];

};

struct B {

void f();

};

void B::f() {}

int main() {

cout << "sizeof struct A = " << sizeof(A)

<< " bytes" << endl;

cout << "sizeof struct B = " << sizeof(B)

<< " bytes" << endl;

cout << "sizeof CStash in C = "

<< sizeof(CStash) << " bytes" << endl;

cout << "sizeof Stash in C++ = "

<< sizeof(Stash) << " bytes" << endl;

} ///:~

On my machine (your results may vary) the first print statement

produces 200 because each int occupies two bytes. struct Bis

something of an anomaly because it is a struct with no data

members. In C, this is illegal, but in C++ we need the option of

creating a struct whose sole task is to scope function names, so it is

allowed. Still, the result produced by the second print statement is

a somewhat surprising nonzero value. In early versions of the

language, the size was zero, but an awkward situation arises when

you create such objects: They have the same address as the object

created directly after them, and so are not distinct. One of the

fundamental rules of objects is that each object must have a unique

258

Thinking in C++

address, so structures with no data members will always have

some minimum nonzero size.

The last two sizeof statements show you that the size of the

structure in C++ is the same as the size of the equivalent version in

C. C++ tries not to add any unnecessary overhead.

Header file etiquette

When you create a struct containing member functions, you are

creating a new data type. In general, you want this type to be easily

accessible to yourself and others. In addition, you want to separate

the interface (the declaration) from the implementation (the

definition of the member functions) so the implementation can be

changed without forcing a re-compile of the entire system. You

achieve this end by putting the declaration for your new type in a

header file.

When I first learned to program in C, the header file was a mystery

to me. Many C books don't seem to emphasize it, and the compiler

didn't enforce function declarations, so it seemed optional most of

the time, except when structures were declared. In C++ the use of

header files becomes crystal clear. They are virtually mandatory for

easy program development, and you put very specific information

in them: declarations. The header file tells the compiler what is

available in your library. You can use the library even if you only

possess the header file along with the object file or library file; you

don't need the source code for the cpp file. The header file is where

the interface specification is stored.

Although it is not enforced by the compiler, the best approach to

building large projects in C is to use libraries; collect associated

functions into the same object module or library, and use a header

file to hold all the declarations for the functions. It is de rigueur in

C++; you could throw any function into a C library, but the C++

abstract data type determines the functions that are associated by

4: Data Abstraction

259

dint of their common access to the data in a struct. Any member

function must be declared in the struct declaration; you cannot put

it elsewhere. The use of function libraries was encouraged in C and

institutionalized in C++.

Importance of header files

When using a function from a library, C allows you the option of

ignoring the header file and simply declaring the function by hand.

In the past, people would sometimes do this to speed up the

compiler just a bit by avoiding the task of opening and including

the file (this is usually not an issue with modern compilers). For

example, here's an extremely lazy declaration of the C function

printf( )(from <stdio.h>

printf(...);

The ellipses specify a variable argument list2, which says: printf( )

has some arguments, each of which has a type, but ignore that. Just

take whatever arguments you see and accept them. By using this

kind of declaration, you suspend all error checking on the

arguments.

This practice can cause subtle problems. If you declare functions by

hand, in one file you may make a mistake. Since the compiler sees

only your hand-declaration in that file, it may be able to adapt to

your mistake. The program will then link correctly, but the use of

the function in that one file will be faulty. This is a tough error to

find, and is easily avoided by using a header file.

If you place all your function declarations in a header file, and

include that header everywhere you use the function and where

you define the function, you ensure a consistent declaration across

2 To write a function definition for a function that takes a true variable argument list,

you must use varargs, although these should be avoided in C++. You can find details

about the use of varargs in your C manual.

260

Thinking in C++

the whole system. You also ensure that the declaration and the

definition match by including the header in the definition file.

If a struct is declared in a header file in C++, you must include the

header file everywhere a struct is used and where struct member

functions are defined. The C++ compiler will give an error message

if you try to call a regular function, or to call or define a member

function, without declaring it first. By enforcing the proper use of

header files, the language ensures consistency in libraries, and

reduces bugs by forcing the same interface to be used everywhere.

The header is a contract between you and the user of your library.

The contract describes your data structures, and states the

arguments and return values for the function calls. It says, "Here's

what my library does." The user needs some of this information to

develop the application and the compiler needs all of it to generate

proper code. The user of the struct simply includes the header file,

creates objects (instances) of that struct, and links in the object

module or library (i.e.: the compiled code).

The compiler enforces the contract by requiring you to declare all

structures and functions before they are used and, in the case of

member functions, before they are defined. Thus, you're forced to

put the declarations in the header and to include the header in the

file where the member functions are defined and the file(s) where

they are used. Because a single header file describing your library is

included throughout the system, the compiler can ensure

consistency and prevent errors.

There are certain issues that you must be aware of in order to

organize your code properly and write effective header files. The

first issue concerns what you can put into header files. The basic

rule is "only declarations," that is, only information to the compiler

but nothing that allocates storage by generating code or creating

variables. This is because the header file will typically be included

in several translation units in a project, and if storage for one

identifier is allocated in more than one place, the linker will come

4: Data Abstraction

261

up with a multiple definition error (this is C++'s one definition rule:

You can declare things as many times as you want, but there can be

only one actual definition for each thing).

This rule isn't completely hard and fast. If you define a variable

that is "file static" (has visibility only within a file) inside a header

file, there will be multiple instances of that data across the project,

but the linker won't have a collision3. Basically, you don't want to

do anything in the header file that will cause an ambiguity at link

time.

The multiple-declaration problem

The second header-file issue is this: when you put a struct

declaration in a header file, it is possible for the file to be included

more than once in a complicated program. Iostreams are a good

example. Any time a struct does I/O it may include one of the

iostream headers. If the cpp file you are working on uses more than

one kind of struct (typically including a header file for each one),

you run the risk of including the <iostream>header more than

once and re-declaring iostreams.

The compiler considers the redeclaration of a structure (this

includes both structs and classes) to be an error, since it would

otherwise allow you to use the same name for different types. To

prevent this error when multiple header files are included, you

need to build some intelligence into your header files using the

preprocessor (Standard C++ header files like <iostream>already

have this "intelligence").

Both C and C++ allow you to redeclare a function, as long as the

two declarations match, but neither will allow the redeclaration of a

structure. In C++ this rule is especially important because if the

3 However, in Standard C++ file static is a deprecated feature.

262

Thinking in C++

compiler allowed you to redeclare a structure and the two

declarations differed, which one would it use?

The problem of redeclaration comes up quite a bit in C++ because

each data type (structure with functions) generally has its own

header file, and you have to include one header in another if you

want to create another data type that uses the first one. In any cpp

file in your project, it's likely that you'll include several files that

include the same header file. During a single compilation, the

compiler can see the same header file several times. Unless you do

something about it, the compiler will see the redeclaration of your

structure and report a compile-time error. To solve the problem,

you need to know a bit more about the preprocessor.

The preprocessor directives

#define, #ifdef, and #endif

The preprocessor directive #define can be used to create compile-

time flags. You have two choices: you can simply tell the

preprocessor that the flag is defined, without specifying a value:

#define FLAG

or you can give it a value (which is the typical C way to define a

constant):

#define PI 3.14159

In either case, the label can now be tested by the preprocessor to see

if it has been defined:

#ifdef FLAG

This will yield a true result, and the code following the #ifdef will

be included in the package sent to the compiler. This inclusion

stops when the preprocessor encounters the statement

#endif

4: Data Abstraction

263

#endif // FLAG

Any non-comment after the #endif on the same line is illegal, even

though some compilers may accept it. The #ifdef/#endif pairs

may be nested within each other.

The complement of #define is #undef (short for "un-define"),

which will make an #ifdef statement using the same variable yield

a false result. #undef will also cause the preprocessor to stop using

a macro. The complement of #ifdef is #ifndef, which will yield a

true if the label has not been defined (this is the one we will use in

header files).

There are other useful features in the C preprocessor. You should

check your local documentation for the full set.

A standard for header files

In each header file that contains a structure, you should first check

to see if this header has already been included in this particular cpp

file. You do this by testing a preprocessor flag. If the flag isn't set,

the file wasn't included and you should set the flag (so the

structure can't get re-declared) and declare the structure. If the flag

was set then that type has already been declared so you should just

ignore the code that declares it. Here's how the header file should

look:

#ifndef HEADER_FLAG

#define HEADER_FLAG

// Type declaration here...

#endif // HEADER_FLAG

As you can see, the first time the header file is included, the

contents of the header file (including your type declaration) will be

included by the preprocessor. All the subsequent times it is

included in a single compilation unit the type declaration will

be ignored. The name HEADER_FLAG can be any unique name,

264

Thinking in C++

but a reliable standard to follow is to capitalize the name of the

header file and replace periods with underscores (leading

underscores, however, are reserved for system names). Here's an

example:

//: C04:Simple.h

// Simple header that prevents re-definition

#ifndef SIMPLE_H

#define SIMPLE_H

struct Simple {

int i,j,k;

initialize() { i = j = k = 0; }

};

#endif // SIMPLE_H ///:~

Although the SIMPLE_Hafter the #endif is commented out and

thus ignored by the preprocessor, it is useful for documentation.

These preprocessor statements that prevent multiple inclusion are

often referred to as include guards.

Namespaces in headers

You'll notice that using directives are present in nearly all the cpp

files in this book, usually in the form:

using namespace std;

Since std is the namespace that surrounds the entire Standard C++

library, this particular using directive allows the names in the

Standard C++ library to be used without qualification. However,

you'll virtually never see a using directive in a header file (at least,

not outside of a scope). The reason is that the using directive

eliminates the protection of that particular namespace, and the

effect lasts until the end of the current compilation unit. If you put

a using directive (outside of a scope) in a header file, it means that

this loss of "namespace protection" will occur with any file that

includes this header, which often means other header files. Thus, if

you start putting using directives in header files, it's very easy to

4: Data Abstraction

265

end up "turning off" namespaces practically everywhere, and

thereby neutralizing the beneficial effects of namespaces.

In short: don't put using directives in header files.

Using headers in projects

When building a project in C++, you'll usually create it by bringing

together a lot of different types (data structures with associated

functions). You'll usually put the declaration for each type or group

of associated types in a separate header file, then define the

functions for that type in a translation unit. When you use that

type, you must include the header file to perform the declarations

properly.

Sometimes that pattern will be followed in this book, but more

often the examples will be very small, so everything the structure

declarations, function definitions, and the main( ) function may

appear in a single file. However, keep in mind that you'll want to

use separate files and header files in practice.

Nested structures

The convenience of taking data and function names out of the

global name space extends to structures. You can nest a structure

within another structure, and therefore keep associated elements

together. The declaration syntax is what you would expect, as you

can see in the following structure, which implements a push-down

stack as a simple linked list so it "never" runs out of memory:

//: C04:Stack.h

// Nested struct in linked list

#ifndef STACK_H

#define STACK_H

struct Stack {

struct Link {

void* data;

266

Thinking in C++

Link* next;

void initialize(void* dat, Link* nxt);

}* head;

void initialize();

void push(void* dat);

void* peek();

void* pop();

void cleanup();

};

#endif // STACK_H ///:~

The nested struct is called Link, and it contains a pointer to the

next Link in the list and a pointer to the data stored in the Link. If

the next pointer is zero, it means you're at the end of the list.

Notice that the head pointer is defined right after the declaration

for struct Link instead of a separate definition Link* head This is

a syntax that came from C, but it emphasizes the importance of the

semicolon after the structure declaration; the semicolon indicates

the end of the comma-separated list of definitions of that structure

type. (Usually the list is empty.)

The nested structure has its own initialize( )function, like all the

structures presented so far, to ensure proper initialization. Stack

has both an initialize( )and cleanup( )function, as well as push( ),

which takes a pointer to the data you wish to store (it assumes this

has been allocated on the heap), and pop( ), which returns the data

pointer from the top of the Stack and removes the top element.

(When you pop( ) an element, you are responsible for destroying

the object pointed to by the data.) The peek( ) function also returns

the data pointer from the top element, but it leaves the top element

on the Stack.

Here are the definitions for the member functions:

//: C04:Stack.cpp {O}

// Linked list with nesting

#include "Stack.h"

#include "../require.h"

using namespace std;

4: Data Abstraction

267

void

Stack::Link::initialize(void* dat, Link* nxt) {

data = dat;

next = nxt;

}

void Stack::initialize() { head = 0; }

void Stack::push(void* dat) {

Link* newLink = new Link;

newLink->initialize(dat, head);

head = newLink;

}

void* Stack::peek() {

require(head != 0, "Stack empty");

return head->data;

}

void* Stack::pop() {

if(head == 0) return 0;

void* result = head->data;

Link* oldHead = head;

head = head->next;

delete oldHead;

return result;

}

void Stack::cleanup() {

require(head == 0, "Stack not empty");

} ///:~

The first definition is particularly interesting because it shows you

how to define a member of a nested structure. You simply use an

additional level of scope resolution to specify the name of the

enclosing struct. Stack::Link::initialize( takes the arguments and

)

assigns them to its members.

Stack::initialize( )

sets head to zero, so the object knows it has an

empty list.

268

Thinking in C++

Stack::push( )takes the argument, which is a pointer to the variable

you want to keep track of, and pushes it on the Stack. First, it uses

new to allocate storage for the Link it will insert at the top. Then it

calls Link's initialize( )function to assign the appropriate values to

the members of the Link. Notice that the next pointer is assigned to

the current head; then head is assigned to the new Link pointer.

This effectively pushes the Link in at the top of the list.

Stack::pop( )captures the data pointer at the current top of the

Stack; then it moves the head pointer down and deletes the old top

of the Stack, finally returning the captured pointer. When pop( )

removes the last element, then head again becomes zero, meaning

the Stack is empty.

Stack::cleanup( )

doesn't actually do any cleanup. Instead, it

establishes a firm policy that "you (the client programmer using

this Stack object) are responsible for popping all the elements off

this Stack and deleting them." The require( )is used to indicate

that a programming error has occurred if the Stack is not empty.

Why couldn't the Stack destructor be responsible for all the objects

that the client programmer didn't pop( )? The problem is that the

Stack is holding void pointers, and you'll learn in Chapter 13 that

calling delete for a void* doesn't clean things up properly. The

subject of "who's responsible for the memory" is not even that

simple, as we'll see in later chapters.

Here's an example to test the Stack:

//: C04:StackTest.cpp

//{L} Stack

//{T} StackTest.cpp

// Test of nested linked list

#include "Stack.h"

#include "../require.h"

#include <fstream>

#include <iostream>

#include <string>

using namespace std;

4: Data Abstraction

269

int main(int argc, char* argv[]) {

requireArgs(argc, 1); // File name is argument

ifstream in(argv[1]);

assure(in, argv[1]);

Stack textlines;

textlines.initialize();

string line;

// Read file and store lines in the Stack:

while(getline(in, line))

textlines.push(new string(line));

// Pop the lines from the Stack and print them:

string* s;

while((s = (string*)textlines.pop()) != 0) {

cout << *s << endl;

delete s;

}

textlines.cleanup();

} ///:~

This is similar to the earlier example, but it pushes lines from a file

(as string pointers) on the Stack and then pops them off, which

results in the file being printed out in reverse order. Note that the

pop( ) member function returns a void* and this must be cast back

to a string* before it can be used. To print the string, the pointer is

dereferenced.

As textlinesis being filled, the contents of line is "cloned" for each

push( ) by making a new string(line)The value returned from the

new-expression is a pointer to the new string that was created and

that copied the information from line. If you had simply passed the

address of line to push( ), you would end up with a Stack filled

with identical addresses, all pointing to line. You'll learn more

about this "cloning" process later in the book.

The file name is taken from the command line. To guarantee that

there are enough arguments on the command line, you see a

second function used from the require.hheader file:

requireArgs( ,) which compares argc to the desired number of

270

Thinking in C++

arguments and prints an appropriate error message and exits the

program if there aren't enough arguments.

Global scope resolution

The scope resolution operator gets you out of situations in which

the name the compiler chooses by default (the "nearest" name) isn't

what you want. For example, suppose you have a structure with a

local identifier a, and you want to select a global identifier a from

inside a member function. The compiler would default to choosing

the local one, so you must tell it to do otherwise. When you want to

specify a global name using scope resolution, you use the operator

with nothing in front of it. Here's an example that shows global

scope resolution for both a variable and a function:

//: C04:Scoperes.cpp

// Global scope resolution

int a;

void f() {}

struct S {

int a;

void f();

};

void S::f() {

::f(); // Would be recursive otherwise!

::a++; // Select the global a

a--;

// The a at struct scope

}

int main() { S s; f(); } ///:~

Without scope resolution in S::f( ), the compiler would default to

selecting the member versions of f( ) and a.

Summary

In this chapter, you've learned the fundamental "twist" of C++:

that you can place functions inside of structures. This new type of

structure is called an abstract data type, and variables you create

4: Data Abstraction

271

using this structure are called objects, or instances, of that type.

Calling a member function for an object is called sending a message

to that object. The primary action in object-oriented programming

is sending messages to objects.

Although packaging data and functions together is a significant

benefit for code organization and makes library use easier because

it prevents name clashes by hiding the names, there's a lot more

you can do to make programming safer in C++. In the next chapter,

you'll learn how to protect some members of a struct so that only

you can manipulate them. This establishes a clear boundary

between what the user of the structure can change and what only

the programmer may change.

Exercises

Solutions to selected exercises can be found in the electronic document The Thinking in C++ Annotated

Solution Guide, available for a small fee from http://.

In the Standard C library, the function puts( ) prints a

char array to the console (so you can say puts("hello")

Write a C program that uses puts( ) but does not include

<stdio.h>or otherwise declare the function. Compile this

program with your C compiler. (Some C++ compilers are

not distinct from their C compilers; in this case you may

need to discover a command-line flag that forces a C

compilation.) Now compile it with the C++ compiler and

note the difference.

Create a struct declaration with a single member

function, then create a definition for that member

function. Create an object of your new data type, and call

the member function.

Change your solution to Exercise 2 so the struct is

declared in a properly "guarded" header file, with the

definition in one cpp file and your main( ) in another.

272

Thinking in C++

Create a struct with a single int data member, and two

global functions, each of which takes a pointer to that

struct. The first function has a second int argument and

sets the struct's int to the argument value, the second

displays the int from the struct. Test the functions.

Repeat Exercise 4 but move the functions so they are

member functions of the struct, and test again.

Create a class that (redundantly) performs data member

selection and a member function call using the this

keyword (which refers to the address of the current

object).

Make a Stash that holds doubles. Fill it with 25 double

values, then print them out to the console.

Repeat Exercise 7 with Stack.

Create a file containing a function f( ) that takes an int

argument and prints it to the console using the printf( )

function in <stdio.h>by saying: printf("%d\n", i)

which i is the int you wish to print. Create a separate file

containing main( ), and in this file declare f( ) to take a

float argument. Call f( ) from inside main( ). Try to

compile and link your program with the C++ compiler

and see what happens. Now compile and link the

program using the C compiler, and see what happens

when it runs. Explain the behavior.

10.

Find out how to produce assembly language from your C

and C++ compilers. Write a function in C and a struct

with a single member function in C++. Produce assembly

language from each and find the function names that are

produced by your C function and your C++ member

function, so you can see what sort of name decoration

occurs inside the compiler.

11.

Write a program with conditionally-compiled code in

main( ), so that when a preprocessor value is defined one

message is printed, but when it is not defined another

message is printed. Compile this code experimenting

with a #define within the program, then discover the

4: Data Abstraction

273

way your compiler takes preprocessor definitions on the

command line and experiment with that.

12.

Write a program that uses assert( )with an argument that

is always false (zero) to see what happens when you run

it. Now compile it with #define NDEBUGand run it

again to see the difference.

13.

Create an abstract data type that represents a videotape

in a video rental store. Try to consider all the data and

operations that may be necessary for the Video type to

work well within the video rental management system.

Include a print( )member function that displays

information about the Video.

14.

Create a Stack object to hold the Video objects from

Exercise 13. Create several Video objects, store them in

the Stack, then display them using Video::print( .)

15.

Write a program that prints out all the sizes for the

fundamental data types on your computer using sizeof.

16.

Modify Stash to use a vector<char>as its underlying

data structure.

17.

Dynamically create pieces of storage of the following

types, using new: int, long, an array of 100 chars, an

array of 100 floats. Print the addresses of these and then

free the storage using delete.

18.

Write a function that takes a char* argument. Using new,

dynamically allocate an array of char that is the size of

the char array that's passed to the function. Using array

indexing, copy the characters from the argument to the

dynamically allocated array (don't forget the null

terminator) and return the pointer to the copy. In your

main( ), test the function by passing a static quoted

character array, then take the result of that and pass it

back into the function. Print both strings and both

pointers so you can see they are different storage. Using

delete, clean up all the dynamic storage.

19.

Show an example of a structure declared within another

structure (a nested structure). Declare data members in

274

Thinking in C++

both structs, and declare and define member functions in

both structs. Write a main( ) that tests your new types.

20.

How big is a structure? Write a piece of code that prints

the size of various structures. Create structures that have

data members only and ones that have data members

and function members. Then create a structure that has

no members at all. Print out the sizes of all these. Explain

the reason for the result of the structure with no data

members at all.

21.

C++ automatically creates the equivalent of a typedef for

structs, as you've seen in this chapter. It also does this for

enumerations and unions. Write a small program that

demonstrates this.

22.

Create a Stack that holds Stashes. Each Stash will hold

five lines from an input file. Create the Stashes using

new. Read a file into your Stack, then reprint it in its

original form by extracting it from the Stack.

23.

Modify Exercise 22 so that you create a struct that

encapsulates the Stack of Stashes. The user should only

add and get lines via member functions, but under the

covers the struct happens to use a Stack of Stashes.

24.

Create a struct that holds an int and a pointer to another

instance of the same struct. Write a function that takes

the address of one of these structs and an int indicating

the length of the list you want created. This function will

make a whole chain of these structs (a linked list), starting

from the argument (the head of the list), with each one

pointing to the next. Make the new structs using new,

and put the count (which object number this is) in the int.

In the last struct in the list, put a zero value in the pointer

to indicate that it's the end. Write a second function that

takes the head of your list and moves through to the end,

printing out both the pointer value and the int value for

each one.

25.

Repeat Exercise 24, but put the functions inside a struct

instead of using "raw" structs and functions.

4: Data Abstraction

275

Table of Contents: