ZeePedia

Data Abstraction:The basic object, Abstract data typing, Header file etiquette

<< The C in C++:Creating functions, Controlling execution, Introduction to operators
Hiding the Implementation:C++ access control, Handle classes >>
img
4: Data Abstraction
C++ is a productivity enhancement tool. Why else
would you make the effort (and it is an effort,
regardless of how easy we attempt to make the
transition)
233
img
to switch from some language that you already know and are
productive with to a new language in which you're going to be less
productive for a while, until you get the hang of it? It's because
you've become convinced that you're going to get big gains by
using this new tool.
Productivity, in computer programming terms, means that fewer
people can make much more complex and impressive programs in
less time. There are certainly other issues when it comes to
choosing a language, such as efficiency (does the nature of the
language cause slowdown and code bloat?), safety (does the
language help you ensure that your program will always do what
you plan, and handle errors gracefully?), and maintenance (does
the language help you create code that is easy to understand,
modify, and extend?). These are certainly important factors that
will be examined in this book.
But raw productivity means a program that formerly took three of
you a week to write now takes one of you a day or two. This
touches several levels of economics. You're happy because you get
the rush of power that comes from building something, your client
(or boss) is happy because products are produced faster and with
fewer people, and the customers are happy because they get
products more cheaply. The only way to get massive increases in
productivity is to leverage off other people's code. That is, to use
libraries.
A library is simply a bunch of code that someone else has written
and packaged together. Often, the most minimal package is a file
with an extension like lib and one or more header files to tell your
compiler what's in the library. The linker knows how to search
through the library file and extract the appropriate compiled code.
But that's only one way to deliver a library. On platforms that span
many architectures, such as Linux/Unix, often the only sensible
way to deliver a library is with source code, so it can be
reconfigured and recompiled on the new target.
234
Thinking in C++
img
Thus, libraries are probably the most important way to improve
productivity, and one of the primary design goals of C++ is to
make library use easier. This implies that there's something hard
about using libraries in C. Understanding this factor will give you a
first insight into the design of C++, and thus insight into how to use
it.
A tiny C-like library
A library usually starts out as a collection of functions, but if you
have used third-party C libraries you know there's usually more to
it than that because there's more to life than behavior, actions, and
functions. There are also characteristics (blue, pounds, texture,
luminance), which are represented by data. And when you start to
deal with a set of characteristics in C, it is very convenient to clump
them together into a struct, especially if you want to represent
more than one similar thing in your problem space. Then you can
make a variable of this struct for each thing.
Thus, most C libraries have a set of structs and a set of functions
that act on those structs. As an example of what such a system
looks like, consider a programming tool that acts like an array, but
whose size can be established at runtime, when it is created. I'll call
it a CStash. Although it's written in C++, it has the style of what
you'd write in C:
//: C04:CLib.h
// Header file for a C-like library
// An array-like entity created at runtime
typedef struct CStashTag {
int size;
// Size of each space
int quantity;  // Number of storage spaces
int next;
// Next empty space
// Dynamically allocated array of bytes:
unsigned char* storage;
} CStash;
4: Data Abstraction
235
img
void initialize(CStash* s, int size);
void cleanup(CStash* s);
int add(CStash* s, const void* element);
void* fetch(CStash* s, int index);
int count(CStash* s);
void inflate(CStash* s, int increase);
///:~
A tag name like CStashTagis generally used for a struct in case
you need to reference the struct inside itself. For example, when
creating a linked list (each element in your list contains a pointer to
the next element), you need a pointer to the next struct variable, so
you need a way to identify the type of that pointer within the struct
body. Also, you'll almost universally see the typedef as shown
above for every struct in a C library. This is done so you can treat
the struct as if it were a new type and define variables of that struct
like this:
CStash A, B, C;
The storage pointer is an unsigned char* An unsigned charis the
.
smallest piece of storage a C compiler supports, although on some
machines it can be the same size as the largest. It's implementation
dependent, but is often one byte long. You might think that because
the CStash is designed to hold any type of variable, a void* would
be more appropriate here. However, the purpose is not to treat this
storage as a block of some unknown type, but rather as a block of
contiguous bytes.
The source code for the implementation file (which you may not
get if you buy a library commercially ­ you might get only a
compiled obj or lib or dll, etc.) looks like this:
//: C04:CLib.cpp {O}
// Implementation of example C-like library
// Declare structure and functions:
#include "CLib.h"
#include <iostream>
#include <cassert>
using namespace std;
236
Thinking in C++
img
// Quantity of elements to add
// when increasing storage:
const int increment = 100;
void initialize(CStash* s, int sz) {
s->size = sz;
s->quantity = 0;
s->storage = 0;
s->next = 0;
}
int add(CStash* s, const void* element) {
if(s->next >= s->quantity) //Enough space left?
inflate(s, increment);
// Copy element into storage,
// starting at next empty space:
int startBytes = s->next * s->size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < s->size; i++)
s->storage[startBytes + i] = e[i];
s->next++;
return(s->next - 1); // Index number
}
void* fetch(CStash* s, int index) {
// Check index boundaries:
assert(0 <= index);
if(index >= s->next)
return 0; // To indicate the end
// Produce pointer to desired element:
return &(s->storage[index * s->size]);
}
int count(CStash* s) {
return s->next;  // Elements in CStash
}
void inflate(CStash* s, int increase) {
assert(increase > 0);
int newQuantity = s->quantity + increase;
int newBytes = newQuantity * s->size;
int oldBytes = s->quantity * s->size;
unsigned char* b = new unsigned char[newBytes];
for(int i = 0; i < oldBytes; i++)
b[i] = s->storage[i]; // Copy old to new
4: Data Abstraction
237
img
delete [](s->storage); // Old storage
s->storage = b; // Point to new memory
s->quantity = newQuantity;
}
void cleanup(CStash* s) {
if(s->storage != 0) {
cout << "freeing storage" << endl;
delete []s->storage;
}
} ///:~
initialize( )performs the necessary setup for struct CStashby
setting the internal variables to appropriate values. Initially, the
storage pointer is set to zero ­ no initial storage is allocated.
The add( ) function inserts an element into the CStash at the next
available location. First, it checks to see if there is any available
space left. If not, it expands the storage using the inflate( )function,
described later.
Because the compiler doesn't know the specific type of the variable
being stored (all the function gets is a void*), you can't just do an
assignment, which would certainly be the convenient thing.
Instead, you must copy the variable byte-by-byte. The most
straightforward way to perform the copying is with array indexing.
Typically, there are already data bytes in storage, and this is
indicated by the value of next. To start with the right byte offset,
next is multiplied by the size of each element (in bytes) to produce
startBytes Then the argument element is cast to an unsigned char
.
*
so that it can be addressed byte-by-byte and copied into the
available storage space. next is incremented so that it indicates the
next available piece of storage, and the "index number" where the
value was stored so that value can be retrieved using this index
number with fetch( )
.
fetch( )checks to see that the index isn't out of bounds and then
returns the address of the desired variable, calculated using the
index argument. Since index indicates the number of elements to
238
Thinking in C++
img
offset into the CStash, it must be multiplied by the number of bytes
occupied by each piece to produce the numerical offset in bytes.
When this offset is used to index into storage using array indexing,
you don't get the address, but instead the byte at the address. To
produce the address, you must use the address-of operator &.
count( )may look a bit strange at first to a seasoned C programmer.
It seems like a lot of trouble to go through to do something that
would probably be a lot easier to do by hand. If you have a struct
CStash called intStash for example, it would seem much more
,
straightforward to find out how many elements it has by saying
intStash.nextinstead of making a function call (which has
overhead), such as count(&intStash)However, if you wanted to
.
change the internal representation of CStash and thus the way the
count was calculated, the function call interface allows the
necessary flexibility. But alas, most programmers won't bother to
find out about your "better" design for the library. They'll look at
the struct and grab the next value directly, and possibly even
change next without your permission. If only there were some way
for the library designer to have better control over things like this!
(Yes, that's foreshadowing.)
Dynamic storage allocation
You never know the maximum amount of storage you might need
for a CStash, so the memory pointed to by storage is allocated from
the heap. The heap is a big block of memory used for allocating
smaller pieces at runtime. You use the heap when you don't know
the size of the memory you'll need while you're writing a program.
That is, only at runtime will you find out that you need space to
hold 200 Airplanevariables instead of 20. In Standard C, dynamic-
memory allocation functions include malloc( ) calloc( ) realloc( )
,
,
,
and free( ). Instead of library calls, however, C++ has a more
sophisticated (albeit simpler to use) approach to dynamic memory
that is integrated into the language via the keywords new and
delete.
4: Data Abstraction
239
img
The inflate( )function uses new to get a bigger chunk of space for
the CStash. In this situation, we will only expand memory and not
shrink it, and the assert( )will guarantee that a negative number is
not passed to inflate( )as the increasevalue. The new number of
elements that can be held (after inflate( )completes) is calculated as
newQuantity and this is multiplied by the number of bytes per
,
element to produce newBytes which will be the number of bytes in
,
the allocation. So that we know how many bytes to copy over from
the old location, oldBytesis calculated using the old quantity
.
The actual storage allocation occurs in the new-expression, which is
the expression involving the new keyword:
new unsigned char[newBytes];
The general form of the new-expression is:
new Type;
in which Type describes the type of variable you want allocated on
the heap. In this case, we want an array of unsigned charthat is
newByteslong, so that is what appears as the Type. You can also
allocate something as simple as an int by saying:
new int;
and although this is rarely done, you can see that the form is
consistent.
A new-expression returns a pointer to an object of the exact type
that you asked for. So if you say new Type you get back a pointer
,
to a Type. If you say new int, you get back a pointer to an int. If
you want a new unsigned chararray, you get back a pointer to the
first element of that array. The compiler will ensure that you assign
the return value of the new-expression to a pointer of the correct
type.
240
Thinking in C++
img
Of course, any time you request memory it's possible for the
request to fail, if there is no more memory. As you will learn, C++
has mechanisms that come into play if the memory-allocation
operation is unsuccessful.
Once the new storage is allocated, the data in the old storage must
be copied to the new storage; this is again accomplished with array
indexing, copying one byte at a time in a loop. After the data is
copied, the old storage must be released so that it can be used by
other parts of the program if they need new storage. The delete
keyword is the complement of new, and must be applied to release
any storage that is allocated with new (if you forget to use delete,
that storage remains unavailable, and if this so-called memory leak
happens enough, you'll run out of memory). In addition, there's a
special syntax when you're deleting an array. It's as if you must
remind the compiler that this pointer is not just pointing to one
object, but to an array of objects: you put a set of empty square
brackets in front of the pointer to be deleted:
delete []myArray;
Once the old storage has been deleted, the pointer to the new
storage can be assigned to the storage pointer, the quantity is
adjusted, and inflate( )has completed its job.
Note that the heap manager is fairly primitive. It gives you chunks
of memory and takes them back when you delete them. There's no
inherent facility for heap compaction, which compresses the heap to
provide bigger free chunks. If a program allocates and frees heap
storage for a while, you can end up with a fragmented heap that has
lots of memory free, but without any pieces that are big enough to
allocate the size you're looking for at the moment. A heap
compactor complicates a program because it moves memory
chunks around, so your pointers won't retain their proper values.
Some operating environments have heap compaction built in, but
they require you to use special memory handles (which can be
temporarily converted to pointers, after locking the memory so the
4: Data Abstraction
241
img
heap compactor can't move it) instead of pointers. You can also
build your own heap-compaction scheme, but this is not a task to
be undertaken lightly.
When you create a variable on the stack at compile-time, the
storage for that variable is automatically created and freed by the
compiler. The compiler knows exactly how much storage is needed,
and it knows the lifetime of the variables because of scoping. With
dynamic memory allocation, however, the compiler doesn't know
how much storage you're going to need, and it doesn't know the
lifetime of that storage. That is, the storage doesn't get cleaned up
automatically. Therefore, you're responsible for releasing the
storage using delete, which tells the heap manager that storage can
be used by the next call to new. The logical place for this to happen
in the library is in the cleanup( )function because that is where all
the closing-up housekeeping is done.
To test the library, two CStashes are created. The first holds ints
and the second holds arrays of 80 chars:
//: C04:CLibTest.cpp
//{L} CLib
// Test the C-like library
#include "CLib.h"
#include <fstream>
#include <iostream>
#include <string>
#include <cassert>
using namespace std;
int main() {
// Define variables at the beginning
// of the block, as in C:
CStash intStash, stringStash;
int i;
char* cp;
ifstream in;
string line;
const int bufsize = 80;
// Now remember to initialize the variables:
242
Thinking in C++
img
initialize(&intStash, sizeof(int));
for(i = 0; i < 100; i++)
add(&intStash, &i);
for(i = 0; i < count(&intStash); i++)
cout << "fetch(&intStash, " << i << ") = "
<< *(int*)fetch(&intStash, i)
<< endl;
// Holds 80-character strings:
initialize(&stringStash, sizeof(char)*bufsize);
in.open("CLibTest.cpp");
assert(in);
while(getline(in, line))
add(&stringStash, line.c_str());
i = 0;
while((cp = (char*)fetch(&stringStash,i++))!=0)
cout << "fetch(&stringStash, " << i << ") = "
<< cp << endl;
cleanup(&intStash);
cleanup(&stringStash);
} ///:~
Following the form required by C, all the variables are created at
the beginning of the scope of main( ). Of course, you must
remember to initialize the CStash variables later in the block by
calling initialize( . One of the problems with C libraries is that you
)
must carefully convey to the user the importance of the
initialization and cleanup functions. If these functions aren't called,
there will be a lot of trouble. Unfortunately, the user doesn't always
wonder if initialization and cleanup are mandatory. They know
what they want to accomplish, and they're not as concerned about
you jumping up and down saying, "Hey, wait, you have to do this
first!" Some users have even been known to initialize the elements
of a structure themselves. There's certainly no mechanism in C to
prevent it (more foreshadowing).
The intStashis filled up with integers, and the stringStashis filled
with character arrays. These character arrays are produced by
opening the source code file, CLibTest.cpp and reading the lines
,
from it into a string called line, and then producing a pointer to the
character representation of line using the member function c_str( )
.
4: Data Abstraction
243
img
After each Stash is loaded, it is displayed. The intStashis printed
using a for loop, which uses count( )to establish its limit. The
stringStashis printed with a while, which breaks out when fetch( )
returns zero to indicate it is out of bounds.
You'll also notice an additional cast in
cp = (char*)fetch(&stringStash,i++)
This is due to the stricter type checking in C++, which does not
allow you to simply assign a void* to any other type (C allows
this).
Bad guesses
There is one more important issue you should understand before
we look at the general problems in creating a C library. Note that
the CLib.h header file must be included in any file that refers to
CStash because the compiler can't even guess at what that
structure looks like. However, it can guess at what a function looks
like; this sounds like a feature but it turns out to be a major C
pitfall.
Although you should always declare functions by including a
header file, function declarations aren't essential in C. It's possible
in C (but not in C++) to call a function that you haven't declared. A
good compiler will warn you that you probably ought to declare a
function first, but it isn't enforced by the C language standard. This
is a dangerous practice, because the C compiler can assume that a
function that you call with an int argument has an argument list
containing int, even if it may actually contain a float. This can
produce bugs that are very difficult to find, as you will see.
Each separate C implementation file (with an extension of .c) is a
translation unit. That is, the compiler is run separately on each
translation unit, and when it is running it is aware of only that unit.
Thus, any information you provide by including header files is
quite important because it determines the compiler's
244
Thinking in C++
img
understanding of the rest of your program. Declarations in header
files are particularly important, because everywhere the header is
included, the compiler will know exactly what to do. If, for
example, you have a declaration in a header file that says void
func(float) the compiler knows that if you call that function with
,
an integer argument, it should convert the int to a float as it passes
the argument (this is called promotion). Without the declaration, the
C compiler would simply assume that a function func(int)existed,
it wouldn't do the promotion, and the wrong data would quietly be
passed into func( ).
For each translation unit, the compiler creates an object file, with an
extension of .o or .obj or something similar. These object files, along
with the necessary start-up code, must be collected by the linker
into the executable program. During linking, all the external
references must be resolved. For example, in CLibTest.cpp
,
functions such as initialize( )and fetch( )are declared (that is, the
compiler is told what they look like) and used, but not defined.
They are defined elsewhere, in CLib.cpp Thus, the calls in
.
CLib.cppare external references. The linker must, when it puts all
the object files together, take the unresolved external references and
find the addresses they actually refer to. Those addresses are put
into the executable program to replace the external references.
It's important to realize that in C, the external references that the
linker searches for are simply function names, generally with an
underscore in front of them. So all the linker has to do is match up
the function name where it is called and the function body in the
object file, and it's done. If you accidentally made a call that the
compiler interpreted as func(int)and there's a function body for
func(float)in some other object file, the linker will see _func in one
place and _func in another, and it will think everything's OK. The
func( ) at the calling location will push an int onto the stack, and
the func( ) function body will expect a float to be on the stack. If the
function only reads the value and doesn't write to it, it won't blow
up the stack. In fact, the float value it reads off the stack might even
4: Data Abstraction
245
img
make some kind of sense. That's worse because it's harder to find
the bug.
What's wrong?
We are remarkably adaptable, even in situations in which perhaps
we shouldn't adapt. The style of the CStash library has been a staple
for C programmers, but if you look at it for a while, you might
notice that it's rather . . . awkward. When you use it, you have to
pass the address of the structure to every single function in the
library. When reading the code, the mechanism of the library gets
mixed with the meaning of the function calls, which is confusing
when you're trying to understand what's going on.
One of the biggest obstacles, however, to using libraries in C is the
problem of name clashes. C has a single name space for functions;
that is, when the linker looks for a function name, it looks in a
single master list. In addition, when the compiler is working on a
translation unit, it can work only with a single function with a
given name.
Now suppose you decide to buy two libraries from two different
vendors, and each library has a structure that must be initialized
and cleaned up. Both vendors decided that initialize( )and
cleanup( )are good names. If you include both their header files in
a single translation unit, what does the C compiler do? Fortunately,
C gives you an error, telling you there's a type mismatch in the two
different argument lists of the declared functions. But even if you
don't include them in the same translation unit, the linker will still
have problems. A good linker will detect that there's a name clash,
but some linkers take the first function name they find, by
searching through the list of object files in the order you give them
in the link list. (This can even be thought of as a feature because it
allows you to replace a library function with your own version.)
246
Thinking in C++
img
In either event, you can't use two C libraries that contain a function
with the identical name. To solve this problem, C library vendors
will often prepend a sequence of unique characters to the beginning
of all their function names. So initialize( )and cleanup( )might
become CStash_initialize( )
and CStash_cleanup( .)This is a
logical thing to do because it "decorates" the name of the struct the
function works on with the name of the function.
Now it's time to take the first step toward creating classes in C++.
Variable names inside a struct do not clash with global variable
names. So why not take advantage of this for function names, when
those functions operate on a particular struct? That is, why not
make functions members of structs?
The basic object
Step one is exactly that. C++ functions can be placed inside structs
as "member functions." Here's what it looks like after converting
the C version of CStash to the C++ Stash:
//: C04:CppLib.h
// C-like library converted to C++
struct Stash {
int size;
// Size of each space
int quantity;  // Number of storage spaces
int next;
// Next empty space
// Dynamically allocated array of bytes:
unsigned char* storage;
// Functions!
void initialize(int size);
void cleanup();
int add(const void* element);
void* fetch(int index);
int count();
void inflate(int increase);
}; ///:~
First, notice there is no typedef. Instead of requiring you to create a
typedef, the C++ compiler turns the name of the structure into a
4: Data Abstraction
247
img
new type name for the program (just as int, char, float and double
are type names).
All the data members are exactly the same as before, but now the
functions are inside the body of the struct. In addition, notice that
the first argument from the C version of the library has been
removed. In C++, instead of forcing you to pass the address of the
structure as the first argument to all the functions that operate on
that structure, the compiler secretly does this for you. Now the only
arguments for the functions are concerned with what the function
does, not the mechanism of the function's operation.
It's important to realize that the function code is effectively the
same as it was with the C version of the library. The number of
arguments is the same (even though you don't see the structure
address being passed in, it's still there), and there's only one
function body for each function. That is, just because you say
Stash A, B, C;
doesn't mean you get a different add( ) function for each variable.
So the code that's generated is almost identical to what you would
have written for the C version of the library. Interestingly enough,
this includes the "name decoration" you probably would have
done to produce Stash_initialize( ,)Stash_cleanup( ,)and so on.
When the function name is inside the struct, the compiler
effectively does the same thing. Therefore, initialize( )inside the
structure Stash will not collide with a function named initialize( )
inside any other structure, or even a global function named
initialize( . Most of the time you don't have to worry about the
)
function name decoration ­ you use the undecorated name. But
sometimes you do need to be able to specify that this initialize( )
belongs to the struct Stash, and not to any other struct. In
particular, when you're defining the function you need to fully
specify which one it is. To accomplish this full specification, C++
has an operator (::) called the scope resolution operator (named so
248
Thinking in C++
img
because names can now be in different scopes: at global scope or
within the scope of a struct). For example, if you want to specify
initialize( , which belongs to Stash, you say Stash::initialize(int
)
size). You can see how the scope resolution operator is used in the
function definitions:
//: C04:CppLib.cpp {O}
// C library converted to C++
// Declare structure and functions:
#include "CppLib.h"
#include <iostream>
#include <cassert>
using namespace std;
// Quantity of elements to add
// when increasing storage:
const int increment = 100;
void Stash::initialize(int sz) {
size = sz;
quantity = 0;
storage = 0;
next = 0;
}
int Stash::add(const void* element) {
if(next >= quantity) // Enough space left?
inflate(increment);
// Copy element into storage,
// starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}
void* Stash::fetch(int index) {
// Check index boundaries:
assert(0 <= index);
if(index >= next)
return 0; // To indicate the end
// Produce pointer to desired element:
4: Data Abstraction
249
img
return &(storage[index * size]);
}
int Stash::count() {
return next; // Number of elements in CStash
}
void Stash::inflate(int increase) {
assert(increase > 0);
int newQuantity = quantity + increase;
int newBytes = newQuantity * size;
int oldBytes = quantity * size;
unsigned char* b = new unsigned char[newBytes];
for(int i = 0; i < oldBytes; i++)
b[i] = storage[i]; // Copy old to new
delete []storage; // Old storage
storage = b; // Point to new memory
quantity = newQuantity;
}
void Stash::cleanup() {
if(storage != 0) {
cout << "freeing storage" << endl;
delete []storage;
}
} ///:~
There are several other things that are different between C and
C++. First, the declarations in the header files are required by the
compiler. In C++ you cannot call a function without declaring it
first. The compiler will issue an error message otherwise. This is an
important way to ensure that function calls are consistent between
the point where they are called and the point where they are
defined. By forcing you to declare the function before you call it,
the C++ compiler virtually ensures that you will perform this
declaration by including the header file. If you also include the
same header file in the place where the functions are defined, then
the compiler checks to make sure that the declaration in the header
and the function definition match up. This means that the header
file becomes a validated repository for function declarations and
250
Thinking in C++
img
ensures that functions are used consistently throughout all
translation units in the project.
Of course, global functions can still be declared by hand every
place where they are defined and used. (This is so tedious that it
becomes very unlikely.) However, structures must always be
declared before they are defined or used, and the most convenient
place to put a structure definition is in a header file, except for
those you intentionally hide in a file.
You can see that all the member functions look almost the same as
when they were C functions, except for the scope resolution and
the fact that the first argument from the C version of the library is
no longer explicit. It's still there, of course, because the function has
to be able to work on a particular struct variable. But notice, inside
the member function, that the member selection is also gone! Thus,
instead of saying s­>size = sz;you say size = sz;and eliminate the
tedious s­>, which didn't really add anything to the meaning of
what you were doing anyway. The C++ compiler is apparently
doing this for you. Indeed, it is taking the "secret" first argument
(the address of the structure that we were previously passing in by
hand) and applying the member selector whenever you refer to one
of the data members of a struct. This means that whenever you are
inside the member function of another struct, you can refer to any
member (including another member function) by simply giving its
name. The compiler will search through the local structure's names
before looking for a global version of that name. You'll find that
this feature means that not only is your code easier to write, it's a
lot easier to read.
But what if, for some reason, you want to be able to get your hands
on the address of the structure? In the C version of the library it
was easy because each function's first argument was a CStash*
called s. In C++, things are even more consistent. There's a special
keyword, called this, which produces the address of the struct. It's
4: Data Abstraction
251
img
the equivalent of the `s' in the C version of the library. So we can
revert to the C style of things by saying
this->size = Size;
The code generated by the compiler is exactly the same, so you
don't need to use this in such a fashion; occasionally, you'll see
code where people explicitly use this-> everywhere but it doesn't
add anything to the meaning of the code and often indicates an
inexperienced programmer. Usually, you don't use this often, but
when you need it, it's there (some of the examples later in the book
will use this).
There's one last item to mention. In C, you could assign a void* to
any other pointer like this:
int i = 10;
void* vp = &i; // OK in both C and C++
int* ip = vp; // Only acceptable in C
and there was no complaint from the compiler. But in C++, this
statement is not allowed. Why? Because C is not so particular about
type information, so it allows you to assign a pointer with an
unspecified type to a pointer with a specified type. Not so with
C++. Type is critical in C++, and the compiler stamps its foot when
there are any violations of type information. This has always been
important, but it is especially important in C++ because you have
member functions in structs. If you could pass pointers to structs
around with impunity in C++, then you could end up calling a
member function for a struct that doesn't even logically exist for
that struct! A real recipe for disaster. Therefore, while C++ allows
the assignment of any type of pointer to a void* (this was the
original intent of void*, which is required to be large enough to
hold a pointer to any type), it will not allow you to assign a void
pointer to any other type of pointer. A cast is always required to tell
the reader and the compiler that you really do want to treat it as the
destination type.
252
Thinking in C++
img
This brings up an interesting issue. One of the important goals for
C++ is to compile as much existing C code as possible to allow for
an easy transition to the new language. However, this doesn't mean
any code that C allows will automatically be allowed in C++. There
are a number of things the C compiler lets you get away with that
are dangerous and error-prone. (We'll look at them as the book
progresses.) The C++ compiler generates warnings and errors for
these situations. This is often much more of an advantage than a
hindrance. In fact, there are many situations in which you are
trying to run down an error in C and just can't find it, but as soon
as you recompile the program in C++, the compiler points out the
problem! In C, you'll often find that you can get the program to
compile, but then you have to get it to work. In C++, when the
program compiles correctly, it often works, too! This is because the
language is a lot stricter about type.
You can see a number of new things in the way the C++ version of
Stash is used in the following test program:
//: C04:CppLibTest.cpp
//{L} CppLib
// Test of C++ library
#include "CppLib.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main() {
Stash intStash;
intStash.initialize(sizeof(int));
for(int i = 0; i < 100; i++)
intStash.add(&i);
for(int j = 0; j < intStash.count(); j++)
cout << "intStash.fetch(" << j << ") = "
<< *(int*)intStash.fetch(j)
<< endl;
// Holds 80-character strings:
Stash stringStash;
4: Data Abstraction
253
img
const int bufsize = 80;
stringStash.initialize(sizeof(char) * bufsize);
ifstream in("CppLibTest.cpp");
assure(in, "CppLibTest.cpp");
string line;
while(getline(in, line))
stringStash.add(line.c_str());
int k = 0;
char* cp;
while((cp =(char*)stringStash.fetch(k++)) != 0)
cout << "stringStash.fetch(" << k << ") = "
<< cp << endl;
intStash.cleanup();
stringStash.cleanup();
} ///:~
One thing you'll notice is that the variables are all defined "on the
fly" (as introduced in the previous chapter). That is, they are
defined at any point in the scope, rather than being restricted ­ as
in C ­ to the beginning of the scope.
The code is quite similar to CLibTest.cpp but when a member
,
function is called, the call occurs using the member selection
operator `.' preceded by the name of the variable. This is a
convenient syntax because it mimics the selection of a data member
of the structure. The difference is that this is a function member, so
it has an argument list.
Of course, the call that the compiler actually generates looks much
more like the original C library function. Thus, considering name
decoration and the passing of this, the C++ function call
intStash.initialize(sizeof(int), 100)
becomes something like
Stash_initialize(&intStash, sizeof(int), 100) you ever wonder
. If
what's going on underneath the covers, remember that the original
C++ compiler cfront from AT&T produced C code as its output,
which was then compiled by the underlying C compiler. This
approach meant that cfront could be quickly ported to any machine
that had a C compiler, and it helped to rapidly disseminate C++
compiler technology. But because the C++ compiler had to generate
254
Thinking in C++
img
C, you know that there must be some way to represent C++ syntax
in C (some compilers still allow you to produce C code).
There's one other change from ClibTest.cpp which is the
,
introduction of the require.hheader file. This is a header file that I
created for this book to perform more sophisticated error checking
than that provided by assert( ) It contains several functions,
.
including the one used here called assure( ),which is used for files.
This function checks to see if the file has successfully been opened,
and if not it reports to standard error that the file could not be
opened (thus it needs the name of the file as the second argument)
and exits the program. The require.hfunctions will be used
throughout the book, in particular to ensure that there are the right
number of command-line arguments and that files are opened
properly. The require.hfunctions replace repetitive and distracting
error-checking code, and yet they provide essentially useful error
messages. These functions will be fully explained later in the book.
What's an object?
Now that you've seen an initial example, it's time to step back and
take a look at some terminology. The act of bringing functions
inside structures is the root of what C++ adds to C, and it
introduces a new way of thinking about structures: as concepts. In
C, a struct is an agglomeration of data, a way to package data so
you can treat it in a clump. But it's hard to think about it as
anything but a programming convenience. The functions that
operate on those structures are elsewhere. However, with functions
in the package, the structure becomes a new creature, capable of
describing both characteristics (like a C struct does) and behaviors.
The concept of an object, a free-standing, bounded entity that can
remember and act, suggests itself.
In C++, an object is just a variable, and the purest definition is "a
region of storage" (this is a more specific way of saying, "an object
must have a unique identifier," which in the case of C++ is a
4: Data Abstraction
255
img
.
unique memory address). It's a place where you can store data, and
it's implied that there are also operations that can be performed on
this data.
Unfortunately, there's not complete consistency across languages
when it comes to these terms, although they are fairly well-
accepted. You will also sometimes encounter disagreement about
what an object-oriented language is, although that seems to be
reasonably well sorted out by now. There are languages that are
object-based, which means that they have objects like the C++
structures-with-functions that you've seen so far. This, however, is
only part of the picture when it comes to an object-oriented
language, and languages that stop at packaging functions inside
data structures are object-based, not object-oriented.
Abstract data typing
The ability to package data with functions allows you to create a
new data type. This is often called encapsulation1. An existing data
type may have several pieces of data packaged together. For
example, a float has an exponent, a mantissa, and a sign bit. You
can tell it to do things: add to another float or to an int, and so on.
It has characteristics and behavior.
The definition of Stash creates a new data type. You can add( ),
fetch( ) and inflate( ) You create one by saying Stash s, just as you
,
.
create a float by saying float f. A Stash also has characteristics and
behavior. Even though it acts like a real, built-in data type, we refer
to it as an abstract data type, perhaps because it allows us to abstract
a concept from the problem space into the solution space. In
addition, the C++ compiler treats it like a new data type, and if you
say a function expects a Stash, the compiler makes sure you pass a
1 This term can cause debate. Some people use it as defined here; others use it to
describe access control, discussed in the following chapter.
256
Thinking in C++
img
Stash to that function. So the same level of type checking happens
with abstract data types (sometimes called user-defined types) as
with built-in types.
You can immediately see a difference, however, in the way you
perform operations on objects. You say
object.memberFunction(arglist)
. This is "calling a member
function for an object." But in object-oriented parlance, this is also
referred to as "sending a message to an object." So for a Stash s, the
statement s.add(&i)"sends a message to s" saying, "add( ) this to
yourself." In fact, object-oriented programming can be summed up
in a single phrase: sending messages to objects. Really, that's all you
do ­ create a bunch of objects and send messages to them. The trick,
of course, is figuring out what your objects and messages are, but
once you accomplish this the implementation in C++ is surprisingly
straightforward.
Object details
A question that often comes up in seminars is, "How big is an
object, and what does it look like?" The answer is "about what you
expect from a C struct." In fact, the code the C compiler produces
for a C struct (with no C++ adornments) will usually look exactly
the same as the code produced by a C++ compiler. This is
reassuring to those C programmers who depend on the details of
size and layout in their code, and for some reason directly access
structure bytes instead of using identifiers (relying on a particular
size and layout for a structure is a nonportable activity).
The size of a struct is the combined size of all of its members.
Sometimes when the compiler lays out a struct, it adds extra bytes
to make the boundaries come out neatly ­ this may increase
execution efficiency. In Chapter 15, you'll see how in some cases
"secret" pointers are added to the structure, but you don't need to
worry about that right now.
4: Data Abstraction
257
img
You can determine the size of a struct using the sizeof operator.
Here's a small example:
//: C04:Sizeof.cpp
// Sizes of structs
#include "CLib.h"
#include "CppLib.h"
#include <iostream>
using namespace std;
struct A {
int i[100];
};
struct B {
void f();
};
void B::f() {}
int main() {
cout << "sizeof struct A = " << sizeof(A)
<< " bytes" << endl;
cout << "sizeof struct B = " << sizeof(B)
<< " bytes" << endl;
cout << "sizeof CStash in C = "
<< sizeof(CStash) << " bytes" << endl;
cout << "sizeof Stash in C++ = "
<< sizeof(Stash) << " bytes" << endl;
} ///:~
On my machine (your results may vary) the first print statement
produces 200 because each int occupies two bytes. struct Bis
something of an anomaly because it is a struct with no data
members. In C, this is illegal, but in C++ we need the option of
creating a struct whose sole task is to scope function names, so it is
allowed. Still, the result produced by the second print statement is
a somewhat surprising nonzero value. In early versions of the
language, the size was zero, but an awkward situation arises when
you create such objects: They have the same address as the object
created directly after them, and so are not distinct. One of the
fundamental rules of objects is that each object must have a unique
258
Thinking in C++
img
address, so structures with no data members will always have
some minimum nonzero size.
The last two sizeof statements show you that the size of the
structure in C++ is the same as the size of the equivalent version in
C. C++ tries not to add any unnecessary overhead.
Header file etiquette
When you create a struct containing member functions, you are
creating a new data type. In general, you want this type to be easily
accessible to yourself and others. In addition, you want to separate
the interface (the declaration) from the implementation (the
definition of the member functions) so the implementation can be
changed without forcing a re-compile of the entire system. You
achieve this end by putting the declaration for your new type in a
header file.
When I first learned to program in C, the header file was a mystery
to me. Many C books don't seem to emphasize it, and the compiler
didn't enforce function declarations, so it seemed optional most of
the time, except when structures were declared. In C++ the use of
header files becomes crystal clear. They are virtually mandatory for
easy program development, and you put very specific information
in them: declarations. The header file tells the compiler what is
available in your library. You can use the library even if you only
possess the header file along with the object file or library file; you
don't need the source code for the cpp file. The header file is where
the interface specification is stored.
Although it is not enforced by the compiler, the best approach to
building large projects in C is to use libraries; collect associated
functions into the same object module or library, and use a header
file to hold all the declarations for the functions. It is de rigueur in
C++; you could throw any function into a C library, but the C++
abstract data type determines the functions that are associated by
4: Data Abstraction
259
img
.
dint of their common access to the data in a struct. Any member
function must be declared in the struct declaration; you cannot put
it elsewhere. The use of function libraries was encouraged in C and
institutionalized in C++.
Importance of header files
When using a function from a library, C allows you the option of
ignoring the header file and simply declaring the function by hand.
In the past, people would sometimes do this to speed up the
compiler just a bit by avoiding the task of opening and including
the file (this is usually not an issue with modern compilers). For
example, here's an extremely lazy declaration of the C function
printf( )(from <stdio.h>
):
printf(...);
The ellipses specify a variable argument list2, which says: printf( )
has some arguments, each of which has a type, but ignore that. Just
take whatever arguments you see and accept them. By using this
kind of declaration, you suspend all error checking on the
arguments.
This practice can cause subtle problems. If you declare functions by
hand, in one file you may make a mistake. Since the compiler sees
only your hand-declaration in that file, it may be able to adapt to
your mistake. The program will then link correctly, but the use of
the function in that one file will be faulty. This is a tough error to
find, and is easily avoided by using a header file.
If you place all your function declarations in a header file, and
include that header everywhere you use the function and where
you define the function, you ensure a consistent declaration across
2 To write a function definition for a function that takes a true variable argument list,
you must use varargs, although these should be avoided in C++. You can find details
about the use of varargs in your C manual.
260
Thinking in C++
img
the whole system. You also ensure that the declaration and the
definition match by including the header in the definition file.
If a struct is declared in a header file in C++, you must include the
header file everywhere a struct is used and where struct member
functions are defined. The C++ compiler will give an error message
if you try to call a regular function, or to call or define a member
function, without declaring it first. By enforcing the proper use of
header files, the language ensures consistency in libraries, and
reduces bugs by forcing the same interface to be used everywhere.
The header is a contract between you and the user of your library.
The contract describes your data structures, and states the
arguments and return values for the function calls. It says, "Here's
what my library does." The user needs some of this information to
develop the application and the compiler needs all of it to generate
proper code. The user of the struct simply includes the header file,
creates objects (instances) of that struct, and links in the object
module or library (i.e.: the compiled code).
The compiler enforces the contract by requiring you to declare all
structures and functions before they are used and, in the case of
member functions, before they are defined. Thus, you're forced to
put the declarations in the header and to include the header in the
file where the member functions are defined and the file(s) where
they are used. Because a single header file describing your library is
included throughout the system, the compiler can ensure
consistency and prevent errors.
There are certain issues that you must be aware of in order to
organize your code properly and write effective header files. The
first issue concerns what you can put into header files. The basic
rule is "only declarations," that is, only information to the compiler
but nothing that allocates storage by generating code or creating
variables. This is because the header file will typically be included
in several translation units in a project, and if storage for one
identifier is allocated in more than one place, the linker will come
4: Data Abstraction
261
img
.
up with a multiple definition error (this is C++'s one definition rule:
You can declare things as many times as you want, but there can be
only one actual definition for each thing).
This rule isn't completely hard and fast. If you define a variable
that is "file static" (has visibility only within a file) inside a header
file, there will be multiple instances of that data across the project,
but the linker won't have a collision3. Basically, you don't want to
do anything in the header file that will cause an ambiguity at link
time.
The multiple-declaration problem
The second header-file issue is this: when you put a struct
declaration in a header file, it is possible for the file to be included
more than once in a complicated program. Iostreams are a good
example. Any time a struct does I/O it may include one of the
iostream headers. If the cpp file you are working on uses more than
one kind of struct (typically including a header file for each one),
you run the risk of including the <iostream>header more than
once and re-declaring iostreams.
The compiler considers the redeclaration of a structure (this
includes both structs and classes) to be an error, since it would
otherwise allow you to use the same name for different types. To
prevent this error when multiple header files are included, you
need to build some intelligence into your header files using the
preprocessor (Standard C++ header files like <iostream>already
have this "intelligence").
Both C and C++ allow you to redeclare a function, as long as the
two declarations match, but neither will allow the redeclaration of a
structure. In C++ this rule is especially important because if the
3 However, in Standard C++ file static is a deprecated feature.
262
Thinking in C++
img
compiler allowed you to redeclare a structure and the two
declarations differed, which one would it use?
The problem of redeclaration comes up quite a bit in C++ because
each data type (structure with functions) generally has its own
header file, and you have to include one header in another if you
want to create another data type that uses the first one. In any cpp
file in your project, it's likely that you'll include several files that
include the same header file. During a single compilation, the
compiler can see the same header file several times. Unless you do
something about it, the compiler will see the redeclaration of your
structure and report a compile-time error. To solve the problem,
you need to know a bit more about the preprocessor.
The preprocessor directives
#define, #ifdef, and #endif
The preprocessor directive #define can be used to create compile-
time flags. You have two choices: you can simply tell the
preprocessor that the flag is defined, without specifying a value:
#define FLAG
or you can give it a value (which is the typical C way to define a
constant):
#define PI 3.14159
In either case, the label can now be tested by the preprocessor to see
if it has been defined:
#ifdef FLAG
This will yield a true result, and the code following the #ifdef will
be included in the package sent to the compiler. This inclusion
stops when the preprocessor encounters the statement
#endif
4: Data Abstraction
263
img
or
#endif // FLAG
Any non-comment after the #endif on the same line is illegal, even
though some compilers may accept it. The #ifdef/#endif pairs
may be nested within each other.
The complement of #define is #undef (short for "un-define"),
which will make an #ifdef statement using the same variable yield
a false result. #undef will also cause the preprocessor to stop using
a macro. The complement of #ifdef is #ifndef, which will yield a
true if the label has not been defined (this is the one we will use in
header files).
There are other useful features in the C preprocessor. You should
check your local documentation for the full set.
A standard for header files
In each header file that contains a structure, you should first check
to see if this header has already been included in this particular cpp
file. You do this by testing a preprocessor flag. If the flag isn't set,
the file wasn't included and you should set the flag (so the
structure can't get re-declared) and declare the structure. If the flag
was set then that type has already been declared so you should just
ignore the code that declares it. Here's how the header file should
look:
#ifndef HEADER_FLAG
#define HEADER_FLAG
// Type declaration here...
#endif // HEADER_FLAG
As you can see, the first time the header file is included, the
contents of the header file (including your type declaration) will be
included by the preprocessor. All the subsequent times it is
included ­ in a single compilation unit ­ the type declaration will
be ignored. The name HEADER_FLAG can be any unique name,
264
Thinking in C++
img
but a reliable standard to follow is to capitalize the name of the
header file and replace periods with underscores (leading
underscores, however, are reserved for system names). Here's an
example:
//: C04:Simple.h
// Simple header that prevents re-definition
#ifndef SIMPLE_H
#define SIMPLE_H
struct Simple {
int i,j,k;
initialize() { i = j = k = 0; }
};
#endif // SIMPLE_H ///:~
Although the SIMPLE_Hafter the #endif is commented out and
thus ignored by the preprocessor, it is useful for documentation.
These preprocessor statements that prevent multiple inclusion are
often referred to as include guards.
Namespaces in headers
You'll notice that using directives are present in nearly all the cpp
files in this book, usually in the form:
using namespace std;
Since std is the namespace that surrounds the entire Standard C++
library, this particular using directive allows the names in the
Standard C++ library to be used without qualification. However,
you'll virtually never see a using directive in a header file (at least,
not outside of a scope). The reason is that the using directive
eliminates the protection of that particular namespace, and the
effect lasts until the end of the current compilation unit. If you put
a using directive (outside of a scope) in a header file, it means that
this loss of "namespace protection" will occur with any file that
includes this header, which often means other header files. Thus, if
you start putting using directives in header files, it's very easy to
4: Data Abstraction
265
img
end up "turning off" namespaces practically everywhere, and
thereby neutralizing the beneficial effects of namespaces.
In short: don't put using directives in header files.
Using headers in projects
When building a project in C++, you'll usually create it by bringing
together a lot of different types (data structures with associated
functions). You'll usually put the declaration for each type or group
of associated types in a separate header file, then define the
functions for that type in a translation unit. When you use that
type, you must include the header file to perform the declarations
properly.
Sometimes that pattern will be followed in this book, but more
often the examples will be very small, so everything ­ the structure
declarations, function definitions, and the main( ) function ­ may
appear in a single file. However, keep in mind that you'll want to
use separate files and header files in practice.
Nested structures
The convenience of taking data and function names out of the
global name space extends to structures. You can nest a structure
within another structure, and therefore keep associated elements
together. The declaration syntax is what you would expect, as you
can see in the following structure, which implements a push-down
stack as a simple linked list so it "never" runs out of memory:
//: C04:Stack.h
// Nested struct in linked list
#ifndef STACK_H
#define STACK_H
struct Stack {
struct Link {
void* data;
266
Thinking in C++
img
Link* next;
void initialize(void* dat, Link* nxt);
}* head;
void initialize();
void push(void* dat);
void* peek();
void* pop();
void cleanup();
};
#endif // STACK_H ///:~
The nested struct is called Link, and it contains a pointer to the
next Link in the list and a pointer to the data stored in the Link. If
the next pointer is zero, it means you're at the end of the list.
Notice that the head pointer is defined right after the declaration
for struct Link instead of a separate definition Link* head This is
,
.
a syntax that came from C, but it emphasizes the importance of the
semicolon after the structure declaration; the semicolon indicates
the end of the comma-separated list of definitions of that structure
type. (Usually the list is empty.)
The nested structure has its own initialize( )function, like all the
structures presented so far, to ensure proper initialization. Stack
has both an initialize( )and cleanup( )function, as well as push( ),
which takes a pointer to the data you wish to store (it assumes this
has been allocated on the heap), and pop( ), which returns the data
pointer from the top of the Stack and removes the top element.
(When you pop( ) an element, you are responsible for destroying
the object pointed to by the data.) The peek( ) function also returns
the data pointer from the top element, but it leaves the top element
on the Stack.
Here are the definitions for the member functions:
//: C04:Stack.cpp {O}
// Linked list with nesting
#include "Stack.h"
#include "../require.h"
using namespace std;
4: Data Abstraction
267
img
void
Stack::Link::initialize(void* dat, Link* nxt) {
data = dat;
next = nxt;
}
void Stack::initialize() { head = 0; }
void Stack::push(void* dat) {
Link* newLink = new Link;
newLink->initialize(dat, head);
head = newLink;
}
void* Stack::peek() {
require(head != 0, "Stack empty");
return head->data;
}
void* Stack::pop() {
if(head == 0) return 0;
void* result = head->data;
Link* oldHead = head;
head = head->next;
delete oldHead;
return result;
}
void Stack::cleanup() {
require(head == 0, "Stack not empty");
} ///:~
The first definition is particularly interesting because it shows you
how to define a member of a nested structure. You simply use an
additional level of scope resolution to specify the name of the
enclosing struct. Stack::Link::initialize( takes the arguments and
)
assigns them to its members.
Stack::initialize( )
sets head to zero, so the object knows it has an
empty list.
268
Thinking in C++
img
Stack::push( )takes the argument, which is a pointer to the variable
you want to keep track of, and pushes it on the Stack. First, it uses
new to allocate storage for the Link it will insert at the top. Then it
calls Link's initialize( )function to assign the appropriate values to
the members of the Link. Notice that the next pointer is assigned to
the current head; then head is assigned to the new Link pointer.
This effectively pushes the Link in at the top of the list.
Stack::pop( )captures the data pointer at the current top of the
Stack; then it moves the head pointer down and deletes the old top
of the Stack, finally returning the captured pointer. When pop( )
removes the last element, then head again becomes zero, meaning
the Stack is empty.
Stack::cleanup( )
doesn't actually do any cleanup. Instead, it
establishes a firm policy that "you (the client programmer using
this Stack object) are responsible for popping all the elements off
this Stack and deleting them." The require( )is used to indicate
that a programming error has occurred if the Stack is not empty.
Why couldn't the Stack destructor be responsible for all the objects
that the client programmer didn't pop( )? The problem is that the
Stack is holding void pointers, and you'll learn in Chapter 13 that
calling delete for a void* doesn't clean things up properly. The
subject of "who's responsible for the memory" is not even that
simple, as we'll see in later chapters.
Here's an example to test the Stack:
//: C04:StackTest.cpp
//{L} Stack
//{T} StackTest.cpp
// Test of nested linked list
#include "Stack.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
4: Data Abstraction
269
img
int main(int argc, char* argv[]) {
requireArgs(argc, 1); // File name is argument
ifstream in(argv[1]);
assure(in, argv[1]);
Stack textlines;
textlines.initialize();
string line;
// Read file and store lines in the Stack:
while(getline(in, line))
textlines.push(new string(line));
// Pop the lines from the Stack and print them:
string* s;
while((s = (string*)textlines.pop()) != 0) {
cout << *s << endl;
delete s;
}
textlines.cleanup();
} ///:~
This is similar to the earlier example, but it pushes lines from a file
(as string pointers) on the Stack and then pops them off, which
results in the file being printed out in reverse order. Note that the
pop( ) member function returns a void* and this must be cast back
to a string* before it can be used. To print the string, the pointer is
dereferenced.
As textlinesis being filled, the contents of line is "cloned" for each
push( ) by making a new string(line)The value returned from the
.
new-expression is a pointer to the new string that was created and
that copied the information from line. If you had simply passed the
address of line to push( ), you would end up with a Stack filled
with identical addresses, all pointing to line. You'll learn more
about this "cloning" process later in the book.
The file name is taken from the command line. To guarantee that
there are enough arguments on the command line, you see a
second function used from the require.hheader file:
requireArgs( ,) which compares argc to the desired number of
270
Thinking in C++
img
arguments and prints an appropriate error message and exits the
program if there aren't enough arguments.
Global scope resolution
The scope resolution operator gets you out of situations in which
the name the compiler chooses by default (the "nearest" name) isn't
what you want. For example, suppose you have a structure with a
local identifier a, and you want to select a global identifier a from
inside a member function. The compiler would default to choosing
the local one, so you must tell it to do otherwise. When you want to
specify a global name using scope resolution, you use the operator
with nothing in front of it. Here's an example that shows global
scope resolution for both a variable and a function:
//: C04:Scoperes.cpp
// Global scope resolution
int a;
void f() {}
struct S {
int a;
void f();
};
void S::f() {
::f();  // Would be recursive otherwise!
::a++;  // Select the global a
a--;
// The a at struct scope
}
int main() { S s; f(); } ///:~
Without scope resolution in S::f( ), the compiler would default to
selecting the member versions of f( ) and a.
Summary
In this chapter, you've learned the fundamental "twist" of C++:
that you can place functions inside of structures. This new type of
structure is called an abstract data type, and variables you create
4: Data Abstraction
271
img
using this structure are called objects, or instances, of that type.
Calling a member function for an object is called sending a message
to that object. The primary action in object-oriented programming
is sending messages to objects.
Although packaging data and functions together is a significant
benefit for code organization and makes library use easier because
it prevents name clashes by hiding the names, there's a lot more
you can do to make programming safer in C++. In the next chapter,
you'll learn how to protect some members of a struct so that only
you can manipulate them. This establishes a clear boundary
between what the user of the structure can change and what only
the programmer may change.
Exercises
Solutions to selected exercises can be found in the electronic document The Thinking in C++ Annotated
Solution Guide, available for a small fee from http://.
1.
In the Standard C library, the function puts( ) prints a
char array to the console (so you can say puts("hello")
).
Write a C program that uses puts( ) but does not include
<stdio.h>or otherwise declare the function. Compile this
program with your C compiler. (Some C++ compilers are
not distinct from their C compilers; in this case you may
need to discover a command-line flag that forces a C
compilation.) Now compile it with the C++ compiler and
note the difference.
2.
Create a struct declaration with a single member
function, then create a definition for that member
function. Create an object of your new data type, and call
the member function.
3.
Change your solution to Exercise 2 so the struct is
declared in a properly "guarded" header file, with the
definition in one cpp file and your main( ) in another.
272
Thinking in C++
img
4.
Create a struct with a single int data member, and two
global functions, each of which takes a pointer to that
struct. The first function has a second int argument and
sets the struct's int to the argument value, the second
displays the int from the struct. Test the functions.
5.
Repeat Exercise 4 but move the functions so they are
member functions of the struct, and test again.
6.
Create a class that (redundantly) performs data member
selection and a member function call using the this
keyword (which refers to the address of the current
object).
7.
Make a Stash that holds doubles. Fill it with 25 double
values, then print them out to the console.
8.
Repeat Exercise 7 with Stack.
9.
Create a file containing a function f( ) that takes an int
argument and prints it to the console using the printf( )
function in <stdio.h>by saying: printf("%d\n", i)
in
which i is the int you wish to print. Create a separate file
containing main( ), and in this file declare f( ) to take a
float argument. Call f( ) from inside main( ). Try to
compile and link your program with the C++ compiler
and see what happens. Now compile and link the
program using the C compiler, and see what happens
when it runs. Explain the behavior.
10.
Find out how to produce assembly language from your C
and C++ compilers. Write a function in C and a struct
with a single member function in C++. Produce assembly
language from each and find the function names that are
produced by your C function and your C++ member
function, so you can see what sort of name decoration
occurs inside the compiler.
11.
Write a program with conditionally-compiled code in
main( ), so that when a preprocessor value is defined one
message is printed, but when it is not defined another
message is printed. Compile this code experimenting
with a #define within the program, then discover the
4: Data Abstraction
273
img
way your compiler takes preprocessor definitions on the
command line and experiment with that.
12.
Write a program that uses assert( )with an argument that
is always false (zero) to see what happens when you run
it. Now compile it with #define NDEBUGand run it
again to see the difference.
13.
Create an abstract data type that represents a videotape
in a video rental store. Try to consider all the data and
operations that may be necessary for the Video type to
work well within the video rental management system.
Include a print( )member function that displays
information about the Video.
14.
Create a Stack object to hold the Video objects from
Exercise 13. Create several Video objects, store them in
the Stack, then display them using Video::print( .)
15.
Write a program that prints out all the sizes for the
fundamental data types on your computer using sizeof.
16.
Modify Stash to use a vector<char>as its underlying
data structure.
17.
Dynamically create pieces of storage of the following
types, using new: int, long, an array of 100 chars, an
array of 100 floats. Print the addresses of these and then
free the storage using delete.
18.
Write a function that takes a char* argument. Using new,
dynamically allocate an array of char that is the size of
the char array that's passed to the function. Using array
indexing, copy the characters from the argument to the
dynamically allocated array (don't forget the null
terminator) and return the pointer to the copy. In your
main( ), test the function by passing a static quoted
character array, then take the result of that and pass it
back into the function. Print both strings and both
pointers so you can see they are different storage. Using
delete, clean up all the dynamic storage.
19.
Show an example of a structure declared within another
structure (a nested structure). Declare data members in
274
Thinking in C++
img
both structs, and declare and define member functions in
both structs. Write a main( ) that tests your new types.
20.
How big is a structure? Write a piece of code that prints
the size of various structures. Create structures that have
data members only and ones that have data members
and function members. Then create a structure that has
no members at all. Print out the sizes of all these. Explain
the reason for the result of the structure with no data
members at all.
21.
C++ automatically creates the equivalent of a typedef for
structs, as you've seen in this chapter. It also does this for
enumerations and unions. Write a small program that
demonstrates this.
22.
Create a Stack that holds Stashes. Each Stash will hold
five lines from an input file. Create the Stashes using
new. Read a file into your Stack, then reprint it in its
original form by extracting it from the Stack.
23.
Modify Exercise 22 so that you create a struct that
encapsulates the Stack of Stashes. The user should only
add and get lines via member functions, but under the
covers the struct happens to use a Stack of Stashes.
24.
Create a struct that holds an int and a pointer to another
instance of the same struct. Write a function that takes
the address of one of these structs and an int indicating
the length of the list you want created. This function will
make a whole chain of these structs (a linked list), starting
from the argument (the head of the list), with each one
pointing to the next. Make the new structs using new,
and put the count (which object number this is) in the int.
In the last struct in the list, put a zero value in the pointer
to indicate that it's the end. Write a second function that
takes the head of your list and moves through to the end,
printing out both the pointer value and the int value for
each one.
25.
Repeat Exercise 24, but put the functions inside a struct
instead of using "raw" structs and functions.
4: Data Abstraction
275