|
|||||
4:
Data Abstraction
C++
is a productivity enhancement tool. Why
else
would
you make the effort (and it is an
effort,
regardless
of how easy we attempt to make
the
transition)
233
to
switch from some language
that you already know and
are
productive
with to a new language in which you're
going to be less
productive
for a while, until you get the
hang of it? It's
because
you've
become convinced that you're
going to get big gains
by
using
this new tool.
Productivity,
in computer programming terms,
means that fewer
people
can make much more complex
and impressive programs in
less
time. There are certainly
other issues when it comes
to
choosing
a language, such as efficiency
(does the nature of
the
language
cause slowdown and code bloat?),
safety (does the
language
help you ensure that your
program will always do what
you
plan, and handle errors
gracefully?), and maintenance
(does
the
language help you create
code that is easy to
understand,
modify,
and extend?). These are
certainly important factors
that
will
be examined in this
book.
But
raw productivity means a program that
formerly took three of
you
a week to write now takes one of you a
day or two. This
touches
several levels of economics. You're happy
because you get
the
rush of power that comes from
building something, your
client
(or
boss) is happy because products
are produced faster and
with
fewer
people, and the customers
are happy because they
get
products
more cheaply. The only way to
get massive increases
in
productivity
is to leverage off other people's
code. That is, to use
libraries.
A
library is simply a bunch of code that
someone else has
written
and
packaged together. Often,
the most minimal package is a
file
with
an extension like lib
and
one or more header files to
tell your
compiler
what's in the library. The
linker knows how to search
through
the library file and extract
the appropriate compiled
code.
But
that's only one way to deliver a
library. On platforms that
span
many
architectures, such as Linux/Unix, often
the only sensible
way
to deliver a library is with source
code, so it can be
reconfigured
and recompiled on the new
target.
234
Thinking
in C++
Thus,
libraries are probably the
most important way to
improve
productivity,
and one of the primary design
goals of C++ is to
make
library use easier. This
implies that there's
something hard
about
using libraries in C. Understanding
this factor will give you
a
first
insight into the design of
C++, and thus insight into how to
use
it.
A
tiny C-like library
A
library usually starts out as a
collection of functions, but if
you
have
used third-party C libraries you know
there's usually more to
it
than that because there's
more to life than behavior,
actions, and
functions.
There are also
characteristics (blue, pounds,
texture,
luminance),
which are represented by data. And when
you start to
deal
with a set of characteristics in C, it is very
convenient to clump
them
together into a struct,
especially if you want to
represent
more
than one similar thing in your problem
space. Then you can
make
a variable of this struct
for
each thing.
Thus,
most C libraries have a set
of structs
and a set of functions
that
act on those structs.
As an example of what such a
system
looks
like, consider a programming
tool that acts like an
array, but
whose
size can be established at
runtime, when it is created. I'll
call
it
a CStash.
Although it's written in C++, it has
the style of what
you'd
write in C:
//:
C04:CLib.h
//
Header file for a C-like
library
//
An array-like entity created at
runtime
typedef
struct CStashTag {
int
size;
//
Size of each space
int
quantity; // Number of storage
spaces
int
next;
//
Next empty space
//
Dynamically allocated array of
bytes:
unsigned
char* storage;
}
CStash;
4:
Data Abstraction
235
void
initialize(CStash* s, int
size);
void
cleanup(CStash* s);
int
add(CStash* s, const void*
element);
void*
fetch(CStash* s, int
index);
int
count(CStash* s);
void
inflate(CStash* s, int
increase);
///:~
A
tag name like CStashTagis
generally used for a struct
in
case
you
need to reference the
struct
inside
itself. For example,
when
creating
a linked
list (each
element in your list contains a
pointer to
the
next element), you need a pointer to
the next struct
variable,
so
you
need a way to identify the type of
that pointer within the
struct
body.
Also, you'll almost universally
see the typedef
as
shown
above
for every struct
in
a C library. This is done so you
can treat
the
struct
as
if it were a new type and define
variables of that struct
like
this:
CStash
A, B, C;
The
storage
pointer
is an unsigned
char* An
unsigned
charis
the
.
smallest
piece of storage a C compiler
supports, although on
some
machines
it can be the same size as
the largest. It's
implementation
dependent,
but is often one byte long.
You might think that because
the
CStash
is
designed to hold any type of variable, a
void*
would
be
more appropriate here. However,
the purpose is not to treat
this
storage
as a block of some unknown type, but
rather as a block of
contiguous
bytes.
The
source code for the
implementation file (which you may
not
get
if you buy a library commercially you might
get only a
compiled
obj
or
lib
or
dll,
etc.) looks like
this:
//:
C04:CLib.cpp {O}
//
Implementation of example C-like
library
//
Declare structure and
functions:
#include
"CLib.h"
#include
<iostream>
#include
<cassert>
using
namespace std;
236
Thinking
in C++
//
Quantity of elements to
add
//
when increasing
storage:
const
int increment = 100;
void
initialize(CStash* s, int sz)
{
s->size
= sz;
s->quantity
= 0;
s->storage
= 0;
s->next
= 0;
}
int
add(CStash* s, const void*
element) {
if(s->next
>= s->quantity) //Enough space
left?
inflate(s,
increment);
//
Copy element into
storage,
//
starting at next empty
space:
int
startBytes = s->next *
s->size;
unsigned
char* e = (unsigned
char*)element;
for(int
i = 0; i < s->size; i++)
s->storage[startBytes
+ i] = e[i];
s->next++;
return(s->next
- 1); // Index number
}
void*
fetch(CStash* s, int index)
{
//
Check index
boundaries:
assert(0
<= index);
if(index
>= s->next)
return
0; // To indicate the
end
//
Produce pointer to desired
element:
return
&(s->storage[index *
s->size]);
}
int
count(CStash* s) {
return
s->next; // Elements in
CStash
}
void
inflate(CStash* s, int increase)
{
assert(increase
> 0);
int
newQuantity = s->quantity +
increase;
int
newBytes = newQuantity *
s->size;
int
oldBytes = s->quantity *
s->size;
unsigned
char* b = new unsigned
char[newBytes];
for(int
i = 0; i < oldBytes; i++)
b[i]
= s->storage[i]; // Copy old to
new
4:
Data Abstraction
237
delete
[](s->storage); // Old
storage
s->storage
= b; // Point to new
memory
s->quantity
= newQuantity;
}
void
cleanup(CStash* s) {
if(s->storage
!= 0) {
cout
<< "freeing storage" <<
endl;
delete
[]s->storage;
}
}
///:~
initialize(
)performs
the necessary setup for
struct
CStashby
setting
the internal variables to
appropriate values. Initially,
the
storage
pointer
is set to zero no initial
storage is allocated.
The
add(
) function
inserts an element into the
CStash
at
the next
available
location. First, it checks to
see if there is any
available
space
left. If not, it expands the
storage using the inflate(
)function,
described
later.
Because
the compiler doesn't know
the specific type of the
variable
being
stored (all the function
gets is a void*),
you can't just do an
assignment,
which would certainly be the convenient
thing.
Instead,
you must copy the variable
byte-by-byte. The
most
straightforward
way to perform the copying is with
array indexing.
Typically,
there are already data
bytes in storage,
and this is
indicated
by the value of next.
To start with the right byte
offset,
next
is
multiplied by the size of
each element (in bytes) to
produce
startBytes
Then
the argument element
is
cast to an unsigned
char
.
*
so
that it can be addressed
byte-by-byte and copied into
the
available
storage
space.
next
is
incremented so that it indicates
the
next
available piece of storage, and
the "index number" where
the
value
was stored so that value
can be retrieved using this
index
number
with fetch(
)
.
fetch(
)checks
to see that the index
isn't out of bounds and then
returns
the address of the desired
variable, calculated using
the
index
argument.
Since index
indicates
the number of elements to
238
Thinking
in C++
offset
into the CStash,
it must be multiplied by the number of
bytes
occupied
by each piece to produce the
numerical offset in
bytes.
When
this offset is used to index
into storage
using
array indexing,
you
don't get the address, but
instead the byte at the
address. To
produce
the address, you must use
the address-of operator
&.
count(
)may
look a bit strange at first
to a seasoned C programmer.
It
seems like a lot of trouble to go through
to do something that
would
probably be a lot easier to do by hand.
If you have a struct
CStash
called
intStash
for
example, it would seem much
more
,
straightforward
to find out how many elements it has by
saying
intStash.nextinstead
of making a function call (which
has
overhead),
such as count(&intStash)However,
if you wanted to
.
change
the internal representation of
CStash
and
thus the way the
count
was calculated, the function
call interface allows
the
necessary
flexibility. But alas, most
programmers won't bother to
find
out about your "better" design for
the library. They'll look
at
the
struct
and
grab the next
value
directly, and possibly
even
change
next
without
your permission. If only there were
some way
for
the library designer to have
better control over things
like this!
(Yes,
that's foreshadowing.)
Dynamic
storage allocation
You
never know the maximum amount of storage
you might need
for
a CStash,
so the memory pointed to by storage
is
allocated from
the
heap. The
heap is a big block of memory
used for allocating
smaller
pieces at runtime. You use
the heap when you don't know
the
size of the memory you'll need while
you're writing a program.
That
is, only at runtime will you find out that you
need space to
hold
200 Airplanevariables
instead of 20. In Standard C,
dynamic-
memory
allocation functions include
malloc(
) calloc( ) realloc( )
,
,
,
and
free(
).
Instead of library calls,
however, C++ has a
more
sophisticated
(albeit simpler to use)
approach to dynamic memory
that
is integrated into the language via
the keywords new
and
delete.
4:
Data Abstraction
239
The
inflate(
)function
uses new
to
get a bigger chunk of space
for
the
CStash.
In this situation, we will only expand
memory and not
shrink
it, and the assert(
)will
guarantee that a negative number
is
not
passed to inflate(
)as
the increasevalue.
The new number of
elements
that can be held (after
inflate(
)completes)
is calculated as
newQuantity
and
this is multiplied by the number of
bytes per
,
element
to produce newBytes
which
will be the number of bytes in
,
the
allocation. So that we know how many
bytes to copy over
from
the
old location, oldBytesis
calculated using the old
quantity
.
The
actual storage allocation
occurs in the new-expression, which is
the
expression involving the new
keyword:
new
unsigned char[newBytes];
The
general form of the new-expression
is:
new
Type;
in
which Type
describes
the type of variable you want allocated
on
the
heap. In this case, we want an array of
unsigned
charthat
is
newByteslong,
so that is what appears as the
Type.
You can also
allocate
something as simple as an int
by
saying:
new
int;
and
although this is rarely
done, you can see that
the form is
consistent.
A
new-expression returns a pointer
to an
object of the exact type
that
you asked for. So if you say new
Type you get
back a pointer
,
to
a Type.
If you say new
int, you get
back a pointer to an int.
If
you
want a new
unsigned chararray, you
get back a pointer to
the
first
element of that array. The
compiler will ensure that you
assign
the
return value of the new-expression to a
pointer of the
correct
type.
240
Thinking
in C++
Of
course, any time you request memory
it's possible for the
request
to fail, if there is no more memory. As
you will learn, C++
has
mechanisms that come into play if
the memory-allocation
operation
is unsuccessful.
Once
the new storage is allocated,
the data in the old storage
must
be
copied to the new storage;
this is again accomplished with
array
indexing,
copying one byte at a time
in a loop. After the data
is
copied,
the old storage must be released so
that it can be used
by
other
parts of the program if they
need new storage. The
delete
keyword
is the complement of new,
and must be applied to release
any
storage that is allocated with
new
(if
you forget to use delete,
that
storage remains unavailable, and if
this so-called memory
leak
happens
enough, you'll run out of memory). In
addition, there's a
special
syntax when you're deleting an
array. It's as if you must
remind
the compiler that this
pointer is not just pointing to
one
object,
but to an array of objects: you put a set
of empty square
brackets
in front of the pointer to be
deleted:
delete
[]myArray;
Once
the old storage has been
deleted, the pointer to the
new
storage
can be assigned to the
storage
pointer,
the quantity is
adjusted,
and inflate(
)has
completed its job.
Note
that the heap manager is
fairly primitive. It gives you
chunks
of
memory and takes them back when you
delete
them.
There's no
inherent
facility for heap
compaction, which
compresses the heap to
provide
bigger free chunks. If a
program allocates and frees
heap
storage
for a while, you can end up with a
fragmented
heap
that has
lots
of memory free, but without any pieces that
are big enough to
allocate
the size you're looking for at
the moment. A heap
compactor
complicates a program because it
moves memory
chunks
around, so your pointers won't retain
their proper values.
Some
operating environments have
heap compaction built in, but
they
require you to use special memory
handles
(which can
be
temporarily
converted to pointers, after
locking the memory so
the
4:
Data Abstraction
241
heap
compactor can't move it)
instead of pointers. You can
also
build
your own heap-compaction scheme, but this
is not a task to
be
undertaken lightly.
When
you create a variable on the
stack at compile-time,
the
storage
for that variable is automatically
created and freed by
the
compiler.
The compiler knows exactly how much
storage is needed,
and
it knows the lifetime of the
variables because of scoping.
With
dynamic
memory allocation, however, the
compiler doesn't know
how
much storage you're going to need,
and
it doesn't
know the
lifetime
of that storage. That is,
the storage doesn't get
cleaned up
automatically.
Therefore, you're responsible for
releasing the
storage
using delete,
which tells the heap manager
that storage can
be
used by the next call to
new.
The logical place for this
to happen
in
the library is in the
cleanup(
)function
because that is where
all
the
closing-up housekeeping is
done.
To
test the library, two
CStashes
are created. The first
holds ints
and
the second holds arrays of
80 chars:
//:
C04:CLibTest.cpp
//{L}
CLib
//
Test the C-like
library
#include
"CLib.h"
#include
<fstream>
#include
<iostream>
#include
<string>
#include
<cassert>
using
namespace std;
int
main() {
//
Define variables at the
beginning
//
of the block, as in C:
CStash
intStash, stringStash;
int
i;
char*
cp;
ifstream
in;
string
line;
const
int bufsize = 80;
//
Now remember to initialize
the variables:
242
Thinking
in C++
initialize(&intStash,
sizeof(int));
for(i
= 0; i < 100; i++)
add(&intStash,
&i);
for(i
= 0; i < count(&intStash);
i++)
cout
<< "fetch(&intStash, " << i << ") =
"
<<
*(int*)fetch(&intStash, i)
<<
endl;
//
Holds 80-character
strings:
initialize(&stringStash,
sizeof(char)*bufsize);
in.open("CLibTest.cpp");
assert(in);
while(getline(in,
line))
add(&stringStash,
line.c_str());
i
= 0;
while((cp
= (char*)fetch(&stringStash,i++))!=0)
cout
<< "fetch(&stringStash, " << i << ") =
"
<<
cp << endl;
cleanup(&intStash);
cleanup(&stringStash);
}
///:~
Following
the form required by C, all the
variables are created
at
the
beginning of the scope of
main(
).
Of course, you must
remember
to initialize the CStash
variables
later in the block by
calling
initialize(
.
One of the problems with C libraries is
that you
)
must
carefully convey to the user
the importance of the
initialization
and cleanup functions. If these
functions aren't
called,
there
will be a lot of trouble. Unfortunately,
the user doesn't
always
wonder
if initialization and cleanup are
mandatory. They know
what
they
want to
accomplish, and they're not as concerned
about
you
jumping up and down saying, "Hey, wait, you
have to do this
first!"
Some users have even
been known to initialize the
elements
of
a structure themselves. There's
certainly no mechanism in C to
prevent
it (more foreshadowing).
The
intStashis
filled up with integers, and the
stringStashis
filled
with
character arrays. These
character arrays are
produced by
opening
the source code file,
CLibTest.cpp
and
reading the lines
,
from
it into a string
called
line,
and then producing a pointer to
the
character
representation of line
using
the member function
c_str(
)
.
4:
Data Abstraction
243
After
each Stash
is
loaded, it is displayed. The
intStashis
printed
using
a for
loop,
which uses count(
)to
establish its limit.
The
stringStashis
printed with a while,
which breaks out when fetch(
)
returns
zero to indicate it is out of
bounds.
You'll
also notice an additional
cast in
cp
= (char*)fetch(&stringStash,i++)
This
is due to the stricter type checking in
C++, which does not
allow
you to simply assign a void*
to
any other type (C allows
this).
Bad
guesses
There
is one more important issue
you should understand
before
we
look at the general problems
in creating a C library. Note
that
the
CLib.h
header
file must
be
included in any file that
refers to
CStash
because
the compiler can't even
guess at what that
structure
looks like. However, it can
guess at
what a function looks
like;
this sounds like a feature
but it turns out to be a major C
pitfall.
Although
you should always declare
functions by including a
header
file, function declarations
aren't essential in C. It's
possible
in
C (but not
in C++) to
call a function that you
haven't declared. A
good
compiler will warn you that you probably
ought to declare a
function
first, but it isn't enforced by
the C language standard.
This
is
a dangerous practice, because
the C compiler can assume
that a
function
that you call with an int
argument
has an argument list
containing
int,
even if it may actually contain a
float.
This can
produce
bugs that are very difficult
to find, as you will see.
Each
separate C implementation file (with an
extension of .c)
is a
translation
unit. That is,
the compiler is run separately on
each
translation
unit, and when it is running it is aware of only that
unit.
Thus,
any information you provide by including
header files is
quite
important because it determines
the compiler's
244
Thinking
in C++
understanding
of the rest of your program.
Declarations in header
files
are particularly important,
because everywhere the
header is
included,
the compiler will know exactly what to
do. If, for
example,
you have a declaration in a header
file that says void
func(float)
the
compiler knows that if you call
that function with
,
an
integer argument, it should
convert the int
to
a float
as
it passes
the
argument (this is called
promotion). Without
the declaration, the
C
compiler would simply assume that a
function func(int)existed,
it
wouldn't do the promotion, and the wrong
data would quietly be
passed
into func(
).
For
each translation unit, the
compiler creates an object
file, with an
extension
of .o
or
.obj
or
something similar. These
object files, along
with
the necessary start-up code,
must be collected by the
linker
into
the executable program. During
linking, all the
external
references
must be resolved. For example, in
CLibTest.cpp
,
functions
such as initialize(
)and
fetch(
)are
declared (that is,
the
compiler
is told what they look like) and used,
but not defined.
They
are defined elsewhere, in
CLib.cpp
Thus,
the calls in
.
CLib.cppare
external references. The
linker must, when it puts
all
the
object files together, take
the unresolved external
references and
find
the addresses they actually
refer to. Those addresses
are put
into
the executable program to
replace the external
references.
It's
important to realize that in C,
the external references that
the
linker
searches for are simply function
names, generally with an
underscore
in front of them. So all the linker
has to do is match up
the
function name where it is
called and the function body in
the
object
file, and it's done. If you
accidentally made a call
that the
compiler
interpreted as func(int)and
there's a function body for
func(float)in
some other object file,
the linker will see
_func
in
one
place
and _func
in
another, and it will think everything's OK.
The
func(
) at
the calling location will push an
int
onto
the stack, and
the
func(
) function
body will expect a float
to
be on the stack. If
the
function
only reads the value and
doesn't write to it, it won't blow
up
the stack. In fact, the
float
value
it reads off the stack might
even
4:
Data Abstraction
245
make
some kind of sense. That's
worse because it's harder to
find
the
bug.
What's
wrong?
We
are remarkably adaptable,
even in situations in which
perhaps
we
shouldn't
adapt.
The style of the CStash
library
has been a staple
for
C programmers, but if you look at it for a
while, you might
notice
that it's rather . . . awkward. When you
use it, you have to
pass
the address of the structure
to every single function in
the
library.
When reading the code, the
mechanism of the library
gets
mixed
with the meaning of the
function calls, which is
confusing
when
you're trying to understand what's
going on.
One
of the biggest obstacles,
however, to using libraries in C is
the
problem
of name
clashes. C has a
single name space for
functions;
that
is, when the linker looks
for a function name, it looks in
a
single
master list. In addition, when
the compiler is working on a
translation
unit, it can work only with a single
function with a
given
name.
Now
suppose you decide to buy two libraries
from two different
vendors,
and each library has a
structure that must be
initialized
and
cleaned up. Both vendors
decided that initialize(
)and
cleanup(
)are
good names. If you include
both their header files
in
a
single translation unit, what does
the C compiler do?
Fortunately,
C
gives you an error, telling you
there's a type mismatch in the
two
different
argument lists of the
declared functions. But even if
you
don't
include them in the same
translation unit, the linker will
still
have
problems. A good linker will
detect that there's a name
clash,
but
some linkers take the
first function name they
find, by
searching
through the list of object
files in the order you give
them
in
the link list. (This can
even be thought of as a feature because
it
allows
you to replace a library function with
your own version.)
246
Thinking
in C++
In
either event, you can't use
two C libraries that contain a
function
with
the identical name. To solve
this problem, C library
vendors
will
often prepend a sequence of
unique characters to the
beginning
of
all their function names. So
initialize(
)and
cleanup(
)might
become
CStash_initialize(
)
and
CStash_cleanup(
.)This
is a
logical
thing to do because it "decorates" the
name of the struct
the
function
works on with the name of the
function.
Now
it's time to take the
first step toward creating classes in
C++.
Variable
names inside a struct
do
not clash with global
variable
names.
So why not take advantage of this for
function names, when
those
functions operate on a particular
struct?
That is, why not
make
functions members of structs?
The
basic object
Step
one is exactly that. C++
functions can be placed
inside structs
as
"member functions." Here's what it
looks like after
converting
the
C version of CStash
to
the C++ Stash:
//:
C04:CppLib.h
//
C-like library converted to
C++
struct
Stash {
int
size;
//
Size of each space
int
quantity; // Number of storage
spaces
int
next;
//
Next empty space
//
Dynamically allocated array of
bytes:
unsigned
char* storage;
//
Functions!
void
initialize(int size);
void
cleanup();
int
add(const void*
element);
void*
fetch(int index);
int
count();
void
inflate(int increase);
};
///:~
First,
notice there is no typedef.
Instead of requiring you to create
a
typedef,
the C++ compiler turns the
name of the structure into
a
4:
Data Abstraction
247
new
type name for the program
(just as int,
char,
float
and
double
are
type names).
All
the data members are
exactly the same as before,
but now the
functions
are inside the body of the
struct.
In addition, notice
that
the
first argument from the C
version of the library has
been
removed.
In C++, instead of forcing you to
pass the address of
the
structure
as the first argument to all
the functions that operate
on
that
structure, the compiler
secretly does this for you. Now
the only
arguments
for the functions are
concerned with what the
function
does, not the
mechanism of the function's
operation.
It's
important to realize that
the function code is
effectively the
same
as it was with the C version of
the library. The number
of
arguments
is the same (even though you don't
see the structure
address
being passed in, it's still
there), and there's only
one
function
body for each function. That
is, just because you
say
Stash
A, B, C;
doesn't
mean you get a different
add(
) function
for each variable.
So
the code that's generated is
almost identical to what you would
have
written for the C version of the
library. Interestingly
enough,
this
includes the "name
decoration" you probably would
have
done
to produce Stash_initialize(
,)Stash_cleanup(
,)and
so on.
When
the function name is inside
the struct,
the compiler
effectively
does the same thing.
Therefore, initialize(
)inside
the
structure
Stash
will
not collide with a function named
initialize(
)
inside
any other structure, or even a
global function named
initialize(
.
Most of the time you don't have to worry
about the
)
function
name decoration you use
the undecorated name.
But
sometimes
you do need to be able to specify
that this initialize(
)
belongs
to the struct
Stash, and not to any
other struct.
In
particular,
when you're defining the function you
need to fully
specify
which one it is. To accomplish
this full specification, C++
has
an operator (::)
called the scope
resolution operator (named
so
248
Thinking
in C++
because
names can now be in different
scopes: at global scope
or
within
the scope of a struct).
For example, if you want to
specify
initialize(
,
which belongs to Stash,
you say Stash::initialize(int
)
size).
You can see how the scope
resolution operator is used in
the
function
definitions:
//:
C04:CppLib.cpp {O}
//
C library converted to
C++
//
Declare structure and
functions:
#include
"CppLib.h"
#include
<iostream>
#include
<cassert>
using
namespace std;
//
Quantity of elements to
add
//
when increasing
storage:
const
int increment = 100;
void
Stash::initialize(int sz) {
size
= sz;
quantity
= 0;
storage
= 0;
next
= 0;
}
int
Stash::add(const void* element)
{
if(next
>= quantity) // Enough space
left?
inflate(increment);
//
Copy element into
storage,
//
starting at next empty
space:
int
startBytes = next *
size;
unsigned
char* e = (unsigned
char*)element;
for(int
i = 0; i < size; i++)
storage[startBytes
+ i] = e[i];
next++;
return(next
- 1); // Index number
}
void*
Stash::fetch(int index) {
//
Check index
boundaries:
assert(0
<= index);
if(index
>= next)
return
0; // To indicate the
end
//
Produce pointer to desired
element:
4:
Data Abstraction
249
return
&(storage[index * size]);
}
int
Stash::count() {
return
next; // Number of elements in
CStash
}
void
Stash::inflate(int increase) {
assert(increase
> 0);
int
newQuantity = quantity +
increase;
int
newBytes = newQuantity *
size;
int
oldBytes = quantity *
size;
unsigned
char* b = new unsigned
char[newBytes];
for(int
i = 0; i < oldBytes; i++)
b[i]
= storage[i]; // Copy old to
new
delete
[]storage; // Old
storage
storage
= b; // Point to new
memory
quantity
= newQuantity;
}
void
Stash::cleanup() {
if(storage
!= 0) {
cout
<< "freeing storage" <<
endl;
delete
[]storage;
}
}
///:~
There
are several other things
that are different between C
and
C++.
First, the declarations in
the header files are
required
by
the
compiler.
In C++ you cannot call a function without
declaring it
first.
The compiler will issue an
error message otherwise.
This is an
important
way to ensure that function
calls are consistent
between
the
point where they are called
and the point where they
are
defined.
By forcing you to declare the
function before you call
it,
the
C++ compiler virtually ensures that you
will perform this
declaration
by including the header
file. If you also include
the
same
header file in the place
where the functions are
defined, then
the
compiler checks to make sure
that the declaration in the
header
and
the function definition
match up. This means that
the header
file
becomes a validated repository for
function declarations and
250
Thinking
in C++
ensures
that functions are used
consistently throughout all
translation
units in the project.
Of
course, global functions can
still be declared by hand
every
place
where they are defined and
used. (This is so tedious
that it
becomes
very unlikely.) However, structures must
always be
declared
before they are defined or
used, and the most
convenient
place
to put a structure definition is in a
header file, except
for
those
you intentionally hide in a
file.
You
can see that all the
member functions look almost
the same as
when
they were C functions, except for
the scope resolution
and
the
fact that the first
argument from the C version of
the library is
no
longer explicit. It's still
there, of course, because
the function has
to
be able to work on a particular struct
variable.
But notice, inside
the
member function, that the
member selection is also
gone! Thus,
instead
of saying s>size
= sz;you say
size
= sz;and
eliminate the
tedious
s>,
which didn't really add anything to
the meaning of
what
you were doing anyway. The C++
compiler is apparently
doing
this for you. Indeed, it is taking
the "secret" first
argument
(the
address of the structure
that we were previously
passing in by
hand)
and applying the member
selector whenever you refer to
one
of
the data members of a
struct.
This means that whenever you
are
inside
the member function of
another struct,
you can refer to any
member
(including another member
function) by simply giving
its
name.
The compiler will search through
the local structure's
names
before
looking for a global version of
that name. You'll find
that
this
feature means that not only is your
code easier to write, it's
a
lot
easier to read.
But
what if, for some reason, you
want
to be able
to get your hands
on
the address of the
structure? In the C version of
the library it
was
easy because each function's
first argument was a
CStash*
called
s.
In C++, things are even
more consistent. There's a
special
keyword,
called this,
which produces the address of
the struct.
It's
4:
Data Abstraction
251
the
equivalent of the `s'
in the C version of the
library. So we can
revert
to the C style of things by
saying
this->size
= Size;
The
code generated by the
compiler is exactly the
same, so you
don't
need to use this
in
such a fashion; occasionally, you'll
see
code
where people explicitly use
this->
everywhere
but it doesn't
add
anything to the meaning of
the code and often indicates
an
inexperienced
programmer. Usually, you don't use
this
often,
but
when
you need it, it's there
(some of the examples later
in the book
will
use this).
There's
one last item to mention. In
C, you could assign a void*
to
any
other pointer like
this:
int
i = 10;
void*
vp = &i; // OK in both C and
C++
int*
ip = vp; // Only acceptable in
C
and
there was no complaint from
the compiler. But in C++,
this
statement
is not allowed. Why? Because C is not so
particular about
type
information, so it allows you to assign a
pointer with an
unspecified
type to a pointer with a specified type.
Not so with
C++.
Type is critical in C++, and the
compiler stamps its foot
when
there
are any violations of type information.
This has always
been
important,
but it is especially important in C++
because you have
member
functions in structs.
If you could pass pointers to
structs
around
with impunity in C++, then you could end
up calling a
member
function for a struct
that
doesn't even logically exist
for
that
struct!
A real recipe for disaster.
Therefore, while C++ allows
the
assignment of any type of pointer to a
void*
(this
was the
original
intent of void*,
which is required to be large enough
to
hold
a pointer to any type), it will not
allow you to
assign a void
pointer
to any other type of pointer. A cast is
always required to
tell
the
reader and the compiler that
you really do want to treat it as
the
destination
type.
252
Thinking
in C++
This
brings up an interesting issue. One of
the important goals
for
C++
is to compile as much existing C code as
possible to allow for
an
easy transition to the new
language. However, this doesn't
mean
any
code that C allows will
automatically be allowed in C++.
There
are
a number of things the C compiler
lets you get away with
that
are
dangerous and error-prone. (We'll
look at them as the
book
progresses.)
The C++ compiler generates
warnings and errors for
these
situations. This is often much
more of an advantage than a
hindrance.
In fact, there are many
situations in which you are
trying
to run down an error in C and just can't
find it, but as soon
as
you recompile the program in
C++, the compiler points out
the
problem!
In C, you'll often find that you can
get the program to
compile,
but then you have to get it to work. In
C++, when the
program
compiles correctly, it often
works, too! This is because
the
language
is a lot stricter about
type.
You
can see a number of new things in
the way the C++ version
of
Stash
is
used in the following test
program:
//:
C04:CppLibTest.cpp
//{L}
CppLib
//
Test of C++ library
#include
"CppLib.h"
#include
"../require.h"
#include
<fstream>
#include
<iostream>
#include
<string>
using
namespace std;
int
main() {
Stash
intStash;
intStash.initialize(sizeof(int));
for(int
i = 0; i < 100; i++)
intStash.add(&i);
for(int
j = 0; j < intStash.count();
j++)
cout
<< "intStash.fetch(" << j << ") =
"
<<
*(int*)intStash.fetch(j)
<<
endl;
//
Holds 80-character
strings:
Stash
stringStash;
4:
Data Abstraction
253
const
int bufsize = 80;
stringStash.initialize(sizeof(char)
* bufsize);
ifstream
in("CppLibTest.cpp");
assure(in,
"CppLibTest.cpp");
string
line;
while(getline(in,
line))
stringStash.add(line.c_str());
int
k = 0;
char*
cp;
while((cp
=(char*)stringStash.fetch(k++)) != 0)
cout
<< "stringStash.fetch(" << k << ") =
"
<<
cp << endl;
intStash.cleanup();
stringStash.cleanup();
}
///:~
One
thing you'll notice is that the
variables are all defined "on
the
fly"
(as introduced in the
previous chapter). That is, they
are
defined
at any point in the scope,
rather than being restricted
as
in
C to the beginning of the
scope.
The
code is quite similar to
CLibTest.cpp
but
when a member
,
function
is called, the call occurs
using the member
selection
operator
`.'
preceded by the name of the
variable. This is a
convenient
syntax because it mimics the
selection of a data
member
of
the structure. The
difference is that this is a
function member, so
it
has an argument list.
Of
course, the call that
the compiler actually
generates
looks much
more
like the original C library
function. Thus, considering
name
decoration
and the passing of this,
the C++ function call
intStash.initialize(sizeof(int),
100)
becomes
something like
Stash_initialize(&intStash,
sizeof(int), 100) you ever
wonder
.
If
what's
going on underneath the
covers, remember that the
original
C++
compiler cfront
from
AT&T produced C code as its
output,
which
was then compiled by the underlying C
compiler. This
approach
meant that cfront
could
be quickly ported to any
machine
that
had a C compiler, and it helped to
rapidly disseminate C++
compiler
technology. But because the C++
compiler had to generate
254
Thinking
in C++
C,
you know that there must be some way to
represent C++ syntax
in
C (some compilers still allow you to
produce C code).
There's
one other change from
ClibTest.cpp
which
is the
,
introduction
of the require.hheader
file. This is a header file
that I
created
for this book to perform
more sophisticated error
checking
than
that provided by assert(
) It
contains several
functions,
.
including
the one used here
called assure(
),which
is used for files.
This
function checks to see if
the file has successfully
been opened,
and
if not it reports to standard error
that the file could not
be
opened
(thus it needs the name of
the file as the second
argument)
and
exits the program. The
require.hfunctions
will be used
throughout
the book, in particular to
ensure that there are
the right
number
of command-line arguments and that
files are opened
properly.
The require.hfunctions
replace repetitive and
distracting
error-checking
code, and yet they provide essentially
useful error
messages.
These functions will be fully explained
later in the book.
What's
an object?
Now
that you've seen an initial
example, it's time to step
back and
take
a look at some terminology.
The act of bringing
functions
inside
structures is the root of what C++
adds to C, and it
introduces
a new way of thinking about structures: as
concepts. In
C,
a struct
is
an agglomeration of data, a way to
package data so
you
can treat it in a clump. But
it's hard to think about it as
anything
but a programming convenience. The
functions that
operate
on those structures are
elsewhere. However, with functions
in
the package, the structure
becomes a new creature, capable
of
describing
both characteristics (like a C
struct
does)
and
behaviors.
The
concept of an object, a free-standing,
bounded entity that
can
remember
and
act,
suggests itself.
In
C++, an object is just a
variable, and the purest
definition is "a
region
of storage" (this is a more
specific way of saying, "an
object
must
have a unique identifier," which in
the case of C++ is a
4:
Data Abstraction
255
unique
memory address). It's a place where you
can store data, and
it's
implied that there are
also operations that can be
performed on
this
data.
Unfortunately,
there's not complete consistency
across languages
when
it comes to these terms,
although they are fairly well-
accepted.
You will also sometimes encounter
disagreement about
what
an object-oriented language is,
although that seems to
be
reasonably
well sorted out by now. There are
languages that are
object-based, which
means that they have objects
like the C++
structures-with-functions
that you've seen so far.
This, however, is
only
part of the picture when it
comes to an object-oriented
language,
and languages that stop at
packaging functions
inside
data
structures are object-based, not
object-oriented.
Abstract
data typing
The
ability to package data with
functions allows you to create
a
type
may have several pieces of
data packaged together.
For
example,
a float
has
an exponent, a mantissa, and a sign
bit. You
can
tell it to do things: add to
another float
or
to an int,
and so on.
It
has characteristics and
behavior.
The
definition of Stash
creates
a new data type. You can
add(
),
fetch(
) and
inflate(
) You
create one by saying
Stash
s,
just as you
,
.
create
a float
by
saying float
f.
A Stash
also
has characteristics and
behavior.
Even though it acts like a
real, built-in data type, we
refer
to
it as an abstract
data type, perhaps
because it allows us to
abstract
a
concept from the problem
space into the solution
space. In
addition,
the C++ compiler treats it
like a new data type, and if
you
say
a function expects a Stash,
the compiler makes sure you
pass a
1
This term can
cause debate. Some people
use it as defined here;
others use it to
describe
access
control, discussed in
the following
chapter.
256
Thinking
in C++
Stash
to that
function. So the same level
of type checking happens
with
abstract data types
(sometimes called user-defined
types) as
with
built-in types.
You
can immediately see a
difference, however, in the way
you
perform
operations on objects. You
say
object.memberFunction(arglist)
.
This is "calling a
member
function
for an object." But in object-oriented
parlance, this is
also
referred
to as "sending a message to an object."
So for a Stash
s,
the
statement
s.add(&i)"sends
a message to s"
saying, "add(
) this
to
yourself."
In fact, object-oriented programming
can be summed up
in
a single phrase: sending
messages to objects. Really,
that's all you
do
create a bunch of objects and send
messages to them. The
trick,
of
course, is figuring out what your objects
and messages are, but
once
you accomplish this the
implementation in C++ is
surprisingly
straightforward.
Object
details
A
question that often comes up
in seminars is, "How big is
an
object,
and what does it look like?"
The answer is "about what
you
expect
from a C struct."
In fact, the code the C
compiler produces
for
a C struct
(with
no C++ adornments) will usually look
exactly
the
same as the code produced by
a C++ compiler. This is
reassuring
to those C programmers who depend on
the details of
size
and layout in their code, and for some
reason directly
access
structure
bytes instead of using
identifiers (relying on a
particular
size
and layout for a structure is a nonportable
activity).
The
size of a struct
is
the combined size of all of
its members.
Sometimes
when the compiler lays out a
struct,
it adds extra bytes
to
make the boundaries come out
neatly this may
increase
execution
efficiency. In Chapter 15, you'll
see how in some cases
"secret"
pointers are added to the
structure, but you don't need to
worry
about that right now.
4:
Data Abstraction
257
You
can determine the size of a
struct
using
the sizeof
operator.
Here's
a small example:
//:
C04:Sizeof.cpp
//
Sizes of structs
#include
"CLib.h"
#include
"CppLib.h"
#include
<iostream>
using
namespace std;
struct
A {
int
i[100];
};
struct
B {
void
f();
};
void
B::f() {}
int
main() {
cout
<< "sizeof struct A = " <<
sizeof(A)
<<
" bytes" << endl;
cout
<< "sizeof struct B = " <<
sizeof(B)
<<
" bytes" << endl;
cout
<< "sizeof CStash in C = "
<<
sizeof(CStash) << " bytes" <<
endl;
cout
<< "sizeof Stash in C++ =
"
<<
sizeof(Stash) << " bytes" <<
endl;
}
///:~
On
my machine (your results may vary) the
first print statement
produces
200 because each int
occupies
two bytes. struct
Bis
something
of an anomaly because it is a struct
with
no data
members.
In C, this is illegal, but in C++ we need
the option of
creating
a struct
whose
sole task is to scope
function names, so it is
allowed.
Still, the result produced
by the second print statement
is
a
somewhat surprising nonzero
value. In early versions of
the
language,
the size was zero, but an
awkward situation arises when
you
create such objects: They
have the same address as
the object
created
directly after them, and so
are not distinct. One of
the
fundamental
rules of objects is that
each object must have a
unique
258
Thinking
in C++
address,
so structures with no data members will
always have
some
minimum nonzero size.
The
last two sizeof
statements
show you that the size of
the
structure
in C++ is the same as the
size of the equivalent
version in
C.
C++ tries not to add any unnecessary
overhead.
Header
file etiquette
When
you create a struct
containing
member functions, you
are
creating
a new data type. In general, you want
this type to be easily
accessible
to yourself and others. In addition, you
want to separate
the
interface (the declaration) from
the implementation
(the
definition
of the member functions) so
the implementation can
be
changed
without forcing a re-compile of the
entire system. You
achieve
this end by putting the
declaration for your new type in a
header
file.
When
I first learned to program in C,
the header file was a
mystery
to
me. Many C books don't seem to
emphasize it, and the
compiler
didn't
enforce function declarations, so it
seemed optional most
of
the
time, except when structures
were declared. In C++ the
use of
header
files becomes crystal clear.
They are virtually mandatory for
easy
program development, and you put very
specific information
in
them: declarations. The
header file tells the
compiler what is
available
in your library. You can use
the library even if you
only
possess
the header file along with
the object file or library
file; you
don't
need the source code for
the cpp
file.
The header file is
where
the
interface specification is
stored.
Although
it is not enforced by the compiler,
the best approach to
building
large projects in C is to use
libraries; collect
associated
functions
into the same object module
or library, and use a
header
file
to hold all the declarations for the
functions. It is de
rigueur in
C++;
you could throw any function into a C
library, but the C++
abstract
data type determines the
functions that are
associated by
4:
Data Abstraction
259
dint
of their common access to
the data in a struct.
Any member
function
must be declared in the struct
declaration;
you cannot put
it
elsewhere. The use of
function libraries was
encouraged in C and
institutionalized
in C++.
Importance
of header files
When
using a function from a library, C
allows you the option
of
ignoring
the header file and simply
declaring the function by
hand.
In
the past, people would
sometimes do this to speed up
the
compiler
just a bit by avoiding the
task of opening and
including
the
file (this is usually not an issue with
modern compilers).
For
example,
here's an extremely lazy declaration of
the C function
printf(
)(from
<stdio.h>
):
printf(...);
has
some arguments, each of which
has a type, but ignore that.
Just
take
whatever arguments you see and
accept them. By using
this
kind
of declaration, you suspend all error
checking on the
arguments.
This
practice can cause subtle
problems. If you declare functions
by
hand,
in one file you may make a
mistake. Since the compiler
sees
only
your hand-declaration in that file, it
may be able to adapt to
your
mistake. The program will then link
correctly, but the use
of
the
function in that one file
will be faulty. This is a tough error
to
find,
and is easily avoided by using a
header file.
If
you place all your function declarations
in a header file, and
include
that header everywhere you
use the function and
where
you
define the function, you
ensure a consistent declaration
across
2
To write a function
definition for a function
that takes a true variable
argument list,
you
must use varargs, although
these should be avoided in C++. You
can find details
about
the use of varargs in your C
manual.
260
Thinking
in C++
the
whole system. You also ensure
that the declaration and
the
definition
match by including the
header in the definition
file.
If
a struct
is
declared in a header file in
C++, you must
include
the
header
file everywhere a struct
is
used and where struct
member
functions
are defined. The C++
compiler will give an error
message
if
you try to call a regular function, or to
call or define a
member
function,
without declaring it first. By enforcing
the proper use of
header
files, the language ensures
consistency in libraries, and
reduces
bugs by forcing the same
interface to be used
everywhere.
The
header is a contract between you and
the user of your
library.
The
contract describes your data
structures, and states
the
arguments
and return values for the function
calls. It says,
"Here's
what
my library does." The user
needs some of this
information to
develop
the application and the
compiler needs all of it to
generate
proper
code. The user of the
struct
simply
includes the header
file,
creates
objects (instances) of that
struct,
and links in the
object
module
or library (i.e.: the
compiled code).
The
compiler enforces the
contract by requiring you to declare
all
structures
and functions before they are
used and, in the case
of
member
functions, before they are
defined. Thus, you're forced
to
put
the declarations in the
header and to include the
header in the
file
where the member functions
are defined and the file(s)
where
they
are used. Because a single
header file describing your
library is
included
throughout the system, the
compiler can ensure
consistency
and prevent errors.
There
are certain issues that you
must be aware of in order to
organize
your code properly and write effective
header files. The
first
issue concerns what you can put into
header files. The
basic
rule
is "only declarations," that is, only
information to the
compiler
but
nothing that allocates
storage by generating code or
creating
variables.
This is because the header
file will typically be
included
in
several translation units in a
project, and if storage for
one
identifier
is allocated in more than one
place, the linker will
come
4:
Data Abstraction
261
up
with a multiple definition error
(this is C++'s one
definition rule:
You
can declare things as many
times as you want, but there can
be
only
one actual definition for
each thing).
This
rule isn't completely hard and fast. If
you define a variable
that
is "file static" (has
visibility only within a file) inside a
header
file,
there will be multiple instances of that
data across the
project,
do
anything in the header file
that will cause an ambiguity at
link
time.
The
multiple-declaration problem
The
second header-file issue is
this: when you put a struct
declaration
in a header file, it is possible for
the file to be
included
more
than once in a complicated program.
Iostreams are a good
example.
Any time a struct
does
I/O it may include one of
the
iostream
headers. If the cpp
file
you are working on uses more
than
one
kind of struct
(typically
including a header file for
each one),
you
run the risk of including
the <iostream>header
more than
once
and re-declaring iostreams.
The
compiler considers the
redeclaration of a structure
(this
includes
both structs
and classes)
to be an error, since it would
otherwise
allow you to use the same
name for different types.
To
prevent
this error when multiple header
files are included,
you
need
to build some intelligence into your
header files using
the
preprocessor
(Standard C++ header files
like <iostream>already
have
this "intelligence").
Both
C and C++ allow you to redeclare a function, as long
as the
two
declarations match, but neither will
allow the redeclaration of a
structure.
In C++ this rule is especially important
because if the
3
However, in Standard C++
file static is a deprecated
feature.
262
Thinking
in C++
compiler
allowed you to redeclare a structure and
the two
declarations
differed, which one would it
use?
The
problem of redeclaration comes up
quite a bit in C++
because
each
data type (structure with functions)
generally has its own
header
file, and you have to include
one header in another if
you
want
to create another data type
that uses the first
one. In any cpp
file
in your project, it's likely
that you'll include several
files that
include
the same header file. During
a single compilation,
the
compiler
can see the same
header file several times.
Unless you do
something
about it, the compiler will
see the redeclaration of
your
structure
and report a compile-time error. To
solve the problem,
you
need to know a bit more
about the
preprocessor.
The
preprocessor directives
#define,
#ifdef, and #endif
The
preprocessor directive #define
can
be used to create
compile-
time
flags. You have two choices: you
can simply tell the
preprocessor
that the flag is defined,
without specifying a value:
#define
FLAG
or
you can give it a value (which is
the typical C way to define
a
constant):
#define
PI 3.14159
In
either case, the label can
now be tested by the preprocessor to
see
if
it has been defined:
#ifdef
FLAG
This
will yield a true result, and the
code following the #ifdef
will
be
included in the package sent
to the compiler. This
inclusion
stops
when the preprocessor encounters
the statement
#endif
4:
Data Abstraction
263
or
#endif
// FLAG
Any
non-comment after the
#endif
on
the same line is illegal,
even
though
some compilers may accept
it. The #ifdef/#endif
pairs
may
be nested within each
other.
The
complement of #define
is
#undef
(short
for "un-define"),
which
will make an #ifdef
statement
using the same variable
yield
a
false result. #undef
will
also cause the preprocessor
to stop using
a
macro. The complement of
#ifdef
is
#ifndef,
which will yield a
true
if the label has not been
defined (this is the one we
will use in
header
files).
There
are other useful features in
the C preprocessor. You
should
check
your local documentation for the full
set.
A
standard for header files
In
each header file that
contains a structure, you should
first check
to
see if this header has
already been included in
this particular cpp
file.
You do this by testing a preprocessor
flag. If the flag isn't
set,
the
file wasn't included and you
should set the flag
(so the
structure
can't get re-declared) and
declare the structure. If
the flag
was
set then that type has
already been declared so you
should just
ignore
the code that declares
it. Here's how the header
file should
look:
#ifndef
HEADER_FLAG
#define
HEADER_FLAG
//
Type declaration
here...
#endif
// HEADER_FLAG
As
you can see, the first
time the header file is
included, the
contents
of the header file
(including your type declaration) will
be
included
by the preprocessor. All the
subsequent times it is
included
in a single compilation unit
the type declaration
will
be
ignored. The name
HEADER_FLAG can be any unique
name,
264
Thinking
in C++
but
a reliable standard to follow is to
capitalize the name of
the
header
file and replace periods with
underscores (leading
underscores,
however, are reserved for
system names). Here's
an
example:
//:
C04:Simple.h
//
Simple header that prevents
re-definition
#ifndef
SIMPLE_H
#define
SIMPLE_H
struct
Simple {
int
i,j,k;
initialize()
{ i = j = k = 0; }
};
#endif
// SIMPLE_H ///:~
Although
the SIMPLE_Hafter
the #endif
is
commented out and
thus
ignored by the preprocessor, it is
useful for documentation.
These
preprocessor statements that
prevent multiple inclusion
are
often
referred to as include
guards.
Namespaces
in headers
You'll
notice that using
directives are
present in nearly all the
cpp
files
in this book, usually in the
form:
using
namespace std;
Since
std
is
the namespace that surrounds
the entire Standard
C++
library,
this particular using
directive allows the names
in the
Standard
C++ library to be used without
qualification. However,
you'll
virtually never see a using
directive in a header file
(at least,
not
outside of a scope). The
reason is that the using
directive
eliminates
the protection of that
particular namespace, and
the
effect
lasts until the end of the
current compilation unit. If you
put
a
using directive (outside of a
scope) in a header file, it
means that
this
loss of "namespace protection" will
occur with any file
that
includes
this header, which often
means other header files.
Thus, if
you
start putting using directives in
header files, it's very easy
to
4:
Data Abstraction
265
end
up "turning off" namespaces practically
everywhere, and
thereby
neutralizing the beneficial
effects of namespaces.
In
short: don't put using directives in
header files.
Using
headers in projects
When
building a project in C++, you'll usually
create it by bringing
together
a lot of different types (data
structures with associated
functions).
You'll usually put the declaration for
each type or group
of
associated types in a separate
header file, then define
the
functions
for that type in a translation unit. When you
use that
type,
you must include the header
file to perform the
declarations
properly.
Sometimes
that pattern will be followed in
this book, but more
often
the examples will be very small, so
everything the
structure
declarations,
function definitions, and the
main(
) function
may
appear
in a single file. However, keep in mind
that you'll want to
use
separate files and header
files in practice.
Nested
structures
The
convenience of taking data and
function names out of
the
global
name space extends to
structures. You can nest a
structure
within
another structure, and therefore
keep associated
elements
together.
The declaration syntax is what you would
expect, as you
can
see in the following
structure, which implements a
push-down
stack
as a simple linked list so it "never"
runs out of memory:
//:
C04:Stack.h
//
Nested struct in linked
list
#ifndef
STACK_H
#define
STACK_H
struct
Stack {
struct
Link {
void*
data;
266
Thinking
in C++
Link*
next;
void
initialize(void* dat, Link*
nxt);
}*
head;
void
initialize();
void
push(void* dat);
void*
peek();
void*
pop();
void
cleanup();
};
#endif
// STACK_H ///:~
The
nested struct
is
called Link,
and it contains a pointer to
the
next
Link
in
the list and a pointer to
the data stored in the
Link.
If
the
next
pointer
is zero, it means you're at
the end of the
list.
Notice
that the head
pointer
is defined right after the
declaration
for
struct
Link instead of
a separate definition Link*
head This
is
,
.
a
syntax that came from C, but it
emphasizes the importance of
the
semicolon
after the structure
declaration; the semicolon
indicates
the
end of the comma-separated
list of definitions of that
structure
type.
(Usually the list is
empty.)
The
nested structure has its own
initialize(
)function,
like all the
structures
presented so far, to ensure
proper initialization. Stack
has
both an initialize(
)and
cleanup(
)function,
as well as push(
),
which
takes a pointer to the data
you wish to store (it assumes
this
has
been allocated on the heap),
and pop(
),
which returns the data
pointer
from the top of the Stack
and
removes the top
element.
(When
you pop(
) an
element, you are responsible for
destroying
the
object pointed to by the
data.)
The peek(
) function
also returns
the
data
pointer
from the top element, but it leaves
the top element
on
the Stack.
Here
are the definitions for the
member functions:
//:
C04:Stack.cpp {O}
//
Linked list with
nesting
#include
"Stack.h"
#include
"../require.h"
using
namespace std;
4:
Data Abstraction
267
void
Stack::Link::initialize(void*
dat, Link* nxt) {
data
= dat;
next
= nxt;
}
void
Stack::initialize() { head = 0; }
void
Stack::push(void* dat) {
Link*
newLink = new Link;
newLink->initialize(dat,
head);
head
= newLink;
}
void*
Stack::peek() {
require(head
!= 0, "Stack empty");
return
head->data;
}
void*
Stack::pop() {
if(head
== 0) return 0;
void*
result = head->data;
Link*
oldHead = head;
head
= head->next;
delete
oldHead;
return
result;
}
void
Stack::cleanup() {
require(head
== 0, "Stack not
empty");
}
///:~
The
first definition is particularly
interesting because it shows
you
how
to define a member of a nested
structure. You simply use an
additional
level of scope resolution to
specify the name of
the
enclosing
struct.
Stack::Link::initialize(
takes
the arguments and
)
assigns
them to its members.
Stack::initialize(
)
sets
head
to
zero, so the object knows it
has an
empty
list.
268
Thinking
in C++
Stack::push(
)takes
the argument, which is a pointer to
the variable
you
want to keep track of, and
pushes it on the Stack.
First, it uses
new
to
allocate storage for the
Link
it
will insert at the top. Then
it
calls
Link's
initialize(
)function
to assign the appropriate
values to
the
members of the Link.
Notice that the next
pointer
is assigned to
the
current head;
then head
is
assigned to the new Link
pointer.
This
effectively pushes the
Link
in
at the top of the
list.
Stack::pop(
)captures
the data
pointer
at the current top of
the
Stack;
then it moves the head
pointer
down and deletes the old top
of
the Stack,
finally returning the captured
pointer. When pop(
)
removes
the last element, then
head
again
becomes zero, meaning
the
Stack
is
empty.
Stack::cleanup(
)
doesn't
actually do any cleanup. Instead,
it
establishes
a firm policy that "you (the
client programmer
using
this
Stack
object)
are responsible for popping all
the elements off
this
Stack
and
deleting them." The
require(
)is
used to indicate
that
a programming error has
occurred if the Stack
is
not empty.
Why
couldn't the Stack
destructor
be responsible for all the
objects
that
the client programmer didn't
pop(
)?
The problem is that
the
Stack
is
holding void
pointers,
and you'll learn in Chapter 13
that
calling
delete
for
a void*
doesn't
clean things up properly.
The
subject
of "who's responsible for the memory" is not
even that
simple,
as we'll see in later
chapters.
Here's
an example to test the
Stack:
//:
C04:StackTest.cpp
//{L}
Stack
//{T}
StackTest.cpp
//
Test of nested linked
list
#include
"Stack.h"
#include
"../require.h"
#include
<fstream>
#include
<iostream>
#include
<string>
using
namespace std;
4:
Data Abstraction
269
int
main(int argc, char* argv[])
{
requireArgs(argc,
1); // File name is
argument
ifstream
in(argv[1]);
assure(in,
argv[1]);
Stack
textlines;
textlines.initialize();
string
line;
//
Read file and store
lines in the Stack:
while(getline(in,
line))
textlines.push(new
string(line));
//
Pop the lines from
the Stack and print
them:
string*
s;
while((s
= (string*)textlines.pop()) != 0) {
cout
<< *s << endl;
delete
s;
}
textlines.cleanup();
}
///:~
This
is similar to the earlier
example, but it pushes lines from a
file
(as
string
pointers)
on the Stack
and
then pops them off, which
results
in the file being printed
out in reverse order. Note that
the
pop(
) member
function returns a void*
and
this must be cast
back
to
a string*
before
it can be used. To print the
string,
the pointer is
dereferenced.
As
textlinesis
being filled, the contents
of line
is
"cloned" for each
push(
) by
making a new
string(line)The
value returned from
the
.
new-expression
is a pointer to the new string
that
was created and
that
copied the information from
line.
If you had simply passed the
address
of line
to
push(
),
you would end up with a Stack
filled
with
identical addresses, all pointing to
line.
You'll learn more
about
this "cloning" process later
in the book.
The
file name is taken from the
command line. To guarantee
that
there
are enough arguments on the
command line, you see
a
second
function used from the
require.hheader
file:
requireArgs(
,) which
compares argc
to
the desired number of
270
Thinking
in C++
arguments
and prints an appropriate error
message and exits the
program
if there aren't enough
arguments.
Global
scope resolution
The
scope resolution operator
gets you out of situations in
which
the
name the compiler chooses by
default (the "nearest" name)
isn't
what
you want. For example, suppose you
have a structure with a
local
identifier a,
and you want to select a global
identifier a
from
inside
a member function. The
compiler would default to
choosing
the
local one, so you must tell it to do
otherwise. When you want to
specify
a global name using scope
resolution, you use the
operator
with
nothing in front of it. Here's an
example that shows
global
scope
resolution for both a variable and a
function:
//:
C04:Scoperes.cpp
//
Global scope
resolution
int
a;
void
f() {}
struct
S {
int
a;
void
f();
};
void
S::f() {
::f();
// Would be recursive
otherwise!
::a++;
// Select the global
a
a--;
//
The a at struct scope
}
int
main() { S s; f(); }
///:~
Without
scope resolution in S::f(
),
the compiler would default
to
selecting
the member versions of
f(
) and
a.
Summary
In
this chapter, you've learned
the fundamental "twist" of
C++:
that
you can place functions
inside of structures. This new
type of
structure
is called an abstract
data type, and
variables you create
4:
Data Abstraction
271
using
this structure are called
objects, or instances, of that
type.
Calling
a member function for an object is
called sending
a message
to
that object. The primary
action in object-oriented
programming
is
sending messages to
objects.
Although
packaging data and functions
together is a significant
benefit
for code organization and makes
library use easier
because
it
prevents name clashes by
hiding the names, there's a
lot more
you
can do to make programming
safer in C++. In the next
chapter,
you'll
learn how to protect some
members of a struct
so
that only
you
can manipulate them. This
establishes a clear
boundary
between
what the user of the
structure can change and what
only
the
programmer may change.
Exercises
Solutions
to selected exercises can be found in
the electronic document
The
Thinking in C++
Annotated
Solution
Guide,
available for a small fee
from http://.
1.
In
the Standard C library, the
function puts(
) prints
a
char
array to the console (so you
can say puts("hello")
).
Write
a C program that uses
puts(
) but
does not include
<stdio.h>or
otherwise declare the
function. Compile
this
program
with your C compiler. (Some C++ compilers
are
not
distinct from their C compilers; in
this case you may
need
to discover a command-line flag
that forces a C
compilation.)
Now compile it with the C++ compiler
and
note
the difference.
2.
Create
a struct
declaration
with a single member
function,
then create a definition for that
member
function.
Create an object of your new data
type, and call
the
member function.
3.
Change
your solution to Exercise 2 so the
struct
is
declared
in a properly "guarded" header
file, with the
definition
in one cpp
file
and your main(
) in
another.
272
Thinking
in C++
4.
Create
a struct
with
a single int
data
member, and two
global
functions, each of which takes a
pointer to that
struct.
The first function has a
second int
argument
and
sets
the struct's
int
to
the argument value, the
second
displays
the int
from
the struct.
Test the functions.
5.
Repeat
Exercise 4 but move the functions so they
are
member
functions of the struct,
and test again.
6.
Create
a class that (redundantly)
performs data member
selection
and a member function call
using the this
keyword
(which refers to the address of
the current
object).
7.
Make
a Stash
that
holds doubles.
Fill it with 25 double
values,
then print them out to the
console.
8.
Repeat
Exercise 7 with Stack.
9.
Create
a file containing a function
f(
) that
takes an int
argument
and prints it to the console
using the printf(
)
function
in <stdio.h>by
saying: printf("%d\n",
i)
in
which
i
is
the int
you
wish to print. Create a separate
file
containing
main(
),
and in this file declare
f(
) to
take a
float
argument.
Call f(
) from
inside main(
).
Try to
compile
and link your program with the C++
compiler
and
see what happens. Now compile and link
the
program
using the C compiler, and
see what happens
when
it runs. Explain the
behavior.
10.
Find
out how to produce assembly language from
your C
and
C++ compilers. Write a function in C and
a struct
with
a single member function in
C++. Produce assembly
language
from each and find the function
names that are
produced
by your C function and your C++
member
function,
so you can see what sort of
name decoration
occurs
inside the compiler.
11.
Write
a program with conditionally-compiled
code in
main(
),
so that when a preprocessor value is
defined one
message
is printed, but when it is not defined
another
message
is printed. Compile this
code experimenting
with
a #define
within
the program, then discover
the
4:
Data Abstraction
273
way
your compiler takes preprocessor
definitions on the
command
line and experiment with
that.
12.
Write
a program that uses
assert(
)with
an argument that
is
always false (zero) to see
what happens when you run
it.
Now compile it with #define
NDEBUGand run
it
again
to see the
difference.
13.
Create
an abstract data type that
represents a videotape
in
a video rental store. Try to
consider all the data
and
operations
that may be necessary for the
Video
type
to
work
well within the video rental
management system.
Include
a print(
)member
function that
displays
information
about the Video.
14.
Create
a Stack
object
to hold the Video
objects
from
Exercise
13. Create several Video
objects,
store them in
the
Stack,
then display them using Video::print(
.)
15.
Write
a program that prints out all
the sizes for the
fundamental
data types on your computer
using sizeof.
16.
Modify
Stash
to
use a vector<char>as
its underlying
data
structure.
17.
Dynamically
create pieces of storage of
the following
types,
using new:
int,
long,
an array of 100 chars,
an
array
of 100 floats.
Print the addresses of these
and then
free
the storage using delete.
18.
Write
a function that takes a
char*
argument.
Using new,
dynamically
allocate an array of char
that
is the size of
the
char
array
that's passed to the
function. Using array
indexing,
copy the characters from the
argument to the
dynamically
allocated array (don't
forget the null
terminator)
and return the pointer to the
copy. In your
main(
),
test the function by passing
a static quoted
character
array, then take the result
of that and pass it
back
into the function. Print
both strings and both
pointers
so you can see they are
different storage.
Using
delete,
clean up all the dynamic
storage.
19.
Show
an example of a structure declared within
another
structure
(a nested
structure). Declare
data members in
274
Thinking
in C++
both
structs,
and declare and define member
functions in
both
structs.
Write a main(
) that
tests your new types.
20.
How
big is a structure? Write a
piece of code that
prints
the
size of various structures.
Create structures that
have
data
members only and ones that
have data members
and
function members. Then create a
structure that has
no
members at all. Print out
the sizes of all these.
Explain
the
reason for the result of the
structure with no data
members
at all.
21.
C++
automatically creates the
equivalent of a typedef
for
structs,
as you've seen in this chapter. It
also does this for
enumerations
and unions. Write a small
program that
demonstrates
this.
22.
Create
a Stack
that
holds Stashes.
Each Stash
will
hold
five
lines from an input file. Create
the Stashes
using
new.
Read a file into your Stack,
then reprint it in its
original
form by extracting it from the Stack.
23.
Modify
Exercise 22 so that you create a
struct
that
encapsulates
the Stack
of
Stashes.
The user should only
add
and get lines via member
functions, but under the
covers
the struct
happens
to use a Stack
of
Stashes.
24.
Create
a struct
that
holds an int
and
a pointer to another
instance
of the same struct.
Write a function that
takes
the
address of one of these
structs
and an int
indicating
the
length of the list you want
created. This function
will
make
a whole chain of these structs
(a linked
list),
starting
from
the argument (the head
of the
list), with each one
pointing
to the next. Make the new
structs
using new,
and
put the count (which object number
this is) in the int.
In
the last struct
in
the list, put a zero value
in the pointer
to
indicate that it's the
end. Write a second function
that
takes
the head of your list and
moves through to the
end,
printing
out both the pointer value
and the int
value
for
each
one.
25.
Repeat
Exercise 24, but put the
functions inside a struct
instead
of using "raw" structs
and functions.
4:
Data Abstraction
275
Table of Contents:
|
|||||