|
|||||
![]() 5
Subroutines
5.1.
PROGRAM FLOW
Till
now we have accumulated the
very basic tools of assembly
language
programming.
A very important weapon in our
arsenal is the
conditional
jump
instruction. During the
course of last two chapters we
used these tools
to
write two very useful
algorithms of sorting and
multiplication. The
multiplication
algorithm is useful even
though there is a MUL instruction
in
the
8088 instruction set, which
can multiply 8bit and
16bit operands. This
is
because
of the extensibility of our algorithm, as
it is not limited to 16bits
and
can do
32bit or 64bit multiplication with minor
changes.
Both of
these algorithms will be used a
number of times in any
program of
a
reasonable size and
complexity. An application does
not only need to
multiply
at a single point in code; it
multiplies at a number of places.
If
multiplication
or sorting is needed at 100
places in code, copying it
100
times
is a totally infeasible solution.
Maintaining such a code is
an
impossible
task.
The
straightforward solution to this
problem using the concepts
we have
acquainted
till now is to write the
code at one place with a
label, and
whenever
we need to sort we jump to this
label. But there is problem
with
this
logic, and the problem is
that after sorting is
complete how the
processor
will know
where to go back. The
immediate answer is to jump back to a
label
following
the jump to bubble sort. But we
have jumped to bubble sort
from
100
places in code. Which of the
100 positions in code should
we jump
back?
Jump back at the first
invocation, but jump has a single
fixed target.
How will
the second invocation work?
The second jump to bubble
sort will
never
have control back at the
next line.
Instruction
are tied to one another
forming an execution thread,
just like a
knitted
thread where pieces of
cotton of different sizes
are twisted together
to
form a
thread. This thread of
execution is our program. The jump
instruction
breaks
this thread permanently,
making a permanent diversion,
like a turn
on a
highway. The conditional jump
selects one of the two
possible
directions,
like right or left turn on a
road. So there is no concept
of
returning.
However
there are roundabouts on
roads as well that take us
back from
where
we started after having
traveled on the boundary of
the round. This is
the
concept of a temporary diversion.
Two or more permanent
diversions can
take us
back from where we started,
just like two or more road
turns can
take us
back to the starting point,
but they are still permanent
diversions in
their
nature.
We need
some way to implement the
concept of temporary diversion
in
assembly
language. We want to create a roundabout
of bubble sort,
another
roundabout
of our multiplication algorithm, so that
we can enter into
the
roundabout
whenever we need it and
return back to wherever we
left from
after
completing the round.
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
Program
Bubble
Sort
Swap
Key
point in the above
discussion is returning to where we
left from, like a
loop in
a knitted thread. Diversion
should be temporary and not
permanent.
The
code of bubble sort written
at one place, multiply at
another, and we
temporarily
divert to that place, thus
avoiding a repetition of code at a
100
places.
CALL
and RET
In
every processor, instructions
are available to divert
temporarily and to
divert
permanently. The instructions
for permanent diversion in
8088 are the
jump
instructions, while the
instruction for temporary
diversion is the CALL
instruction.
The word call must be
familiar to the readers from
subroutine
call in
higher level languages. The
CALL instruction allows
temporary
diversion
and therefore reusability of
code. Now we can place the
code for
bubble
sort at one place and
reuse it again and again.
This was not possible
with
permanent diversion.
Actually the 8088
permanent diversion
mechanism
can be tricked to achieve
temporary diversion. However it is
not
possible
without getting into a lot
of trouble. The key idea in
doing it this way
is to
use the jump instruction
form that takes a register
as argument.
Therefore
this is not impossible but
this is not the way it is
done.
The
natural way to do this is to use
the CALL instruction
followed by a
label,
just like JMP is followed by a
label. Execution will divert to
the code
following
the label. Till now the
operation has been similar
to the JMP
instruction.
When the subroutine
completes we need to return.
The RET
instruction
is used for this purpose.
The word return holds in
its meaning
that we
are to return from where we
came and need no explicit
destination.
Therefore
RET takes no arguments and
transfers control back to
the
instruction
following the CALL that
took us in this subroutine.
The actual
technical
process that informs RET
where to return will be discussed
later
after
we have discussed the system
stack.
CALL
takes a label as argument
and execution starts from
that label, until
the
RET instruction is encountered
and it takes execution back
to the
instruction
following the CALL. Both
the instructions are
commonly used as
a pair,
however technically they are
independent in their operation.
The RET
works
regardless of the CALL and
the CALL works regardless of
the RET. If
you
CALL a subroutine it will not
complain if there is no RET
present and
similarly
if you RET without being
called it won't complain. It is a
logical pair
and is
used as a pair in every
decent code. However
sometimes we play
tricks
with
the processor and we use
CALL or RET alone. This will
become clear
when we
need to play such tricks in
later chapters.
56
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
Parameters
We
intend to write the bubble
sort code at one place
and CALL it whenever
needed.
An immediately visible problem is
that whenever we call
this
subroutine
it will sort the same array
in the same order. However
in a real
application
we will need to sort various
arrays of various sizes. We
might
sometimes
need an ascending sort and
descending at other times.
Similarly
our
data may be signed or unsigned.
Such pieces of information
that may
change
from invocation to invocation
and should be passed from
the caller to
the
subroutine are called
parameters.
There
must be some way of passing these
parameters to the
subroutine.
Revising
the subroutine temporary
flow breakage mechanism, the
most
straightforward
way is to use registers. The
CALL mechanism breaks
the
thread
of execution and does not
change registers, except IP which
must
change
for processor to start
executing at another place,
and SP whose
change
will be discussed in detail later. Any of
the other registers can
hold
parameters
for the subroutine.
5.2.
OUR FIRST
SUBROUTINE
Now we want to
modify the bubble sort
code so that it works as
a
subroutine.
We place a label at the
start of bubble sort code,
which works as
the
anchor point and will be
used in the CALL instruction
to call the
subroutine.
We also place a RET at the
end of the algorithm to
return from
where
we called the
subroutine.
Example
5.1
01
; bubble sort algorithm as a
subroutine
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
swap:
db
0
07
08
bubblesort:
dec
cx
; last element not
compared
09
shl
cx, 1
; turn into byte
count
10
11
mainloop:
mov
si, 0
; initialize array index to
zero
12
mov
byte [swap],
0
; reset swap flag
to no swaps
13
14
innerloop:
mov
ax, [bx+si]
; load number in
ax
15
cmp
ax,
[bx+si+2]
; compare with
next number
16
jbe
noswap
; no swap if
already in order
17
18
mov
dx,
[bx+si+2]
;
load second
element in dx
19
mov
[bx+si], dx
;
store first number
in second
20
mov
[bx+si+2],
ax
;
store second
number in first
21
mov
byte [swap],
1
;
flag that a swap
has been done
22
23
noswap:
add
si, 2
; advance si to
next index
24
cmp
si, cx
; are we at last
index
25
jne
innerloop
; if not compare
next two
26
27
cmp
byte [swap],
1
; check if a
swap has been
done
28
je
mainloop
; if yes
make another pass
29
30
ret
; go back to
where we came from
31
32
start:
mov bx,
data
; send start of
array in bx
33
mov cx,
10
; send count of
elements in cx
34
call
bubblesort
; call our
subroutine
35
36
mov
ax,
0x4c00
; terminate
program
37
int
0x21
The
routine has received the
count of elements in CX. Since it
makes
08-09
one
less comparison than the
number of elements it decrements
it.
Then it
multiplies it by two since this a
word array and each
element
57
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
takes
two bytes. Left shifting has
been used to multiply by
two.
14
Base+index+offset
addressing has been used. BX
holds the start of
array,
SI the offset into it and an
offset of 2 when the next
element is
to be
read. BX can be directly
changed but then a separate
counter
would
be needed, as SI is directly compared
with CX in our case.
32-37
The
code starting from the
start label is our main
program
analogous
to the main in the C language. BX
and CX hold our
parameters
for the bubblesort
subroutine and the CALL is
made to
invoke
the subroutine.
Inside
the debugger we observe the
same unsigned data that we
are so
used to
now. The number 0103 is
passed via BX to the
subroutine which is
the
start of our data and the
number 000A via CX which is
the number of
elements
in our data. If we step over
the CALL instruction we see
our data
sorted
in a single step and we are
at the termination instructions.
The
processor
has jumped to the bubblesort
routine, executed it to
completion,
and
returned back from it but
the process was hidden due
to the step over
command.
If however we trace into the
CALL instruction, we land at
the first
instruction
of our routine. At the end of
the routine, when the
RET
instruction
is executed, we immediately land
back to our termination
instructions,
to be precise the instruction
following the CALL.
Also
observe that with the CALL
instruction SP is decremented by two
from
FFFE to
FFFC, and the stack
windows shows 0150 at its
top. As the RET is
executed
SP is recovered and the 0150
is also removed from the
stack. Match
it with
the address of the
instruction following the
CALL which is 0150 as
well.
The 0150 removed from
the stack by the RET
instruction has been
loaded
into the IP register thereby
resuming execution from
address 0150.
CALL
placed where to return on
the stack for the
RET instruction. The
stack
is
automatically used with the
CALL and RET instructions.
Stack will be
explained
in detail later, however the
idea is that the one who is
departing
stores
the address to return at a known
place. This is the place
using which
CALL
and RET coordinate. How this
placed is actually used by
the CALL and
RET
instructions will be described after
the stack is
discussed.
After
emphasizing reusability so much, it is
time for another
example
which
uses the same bubblesort
routine on two different arrays of
different
sizes.
Example
5.2
01
; bubble sort subroutine called
twice
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
data2:
dw
328, 329, 898, 8923, 8293,
2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200,
761, 167, 90, 5
08
swap:
db
0
09
10
bubblesort:
dec
cx
; last element not
compared
11
shl
cx, 1
; turn into byte
count
12
13
mainloop:
mov
si, 0
; initialize array index to
zero
14
mov
byte [swap],
0
; reset swap flag
to no swaps
15
16
innerloop:
mov
ax, [bx+si]
; load number in
ax
17
cmp
ax,
[bx+si+2]
; compare with
next number
18
jbe
noswap
; no swap if
already in order
19
20
mov
dx,
[bx+si+2]
;
load second
element in dx
21
mov
[bx+si], dx
;
store first number
in second
22
mov
[bx+si+2],
ax
;
store second
number in first
23
mov
byte [swap],
1
;
flag that a swap
has been done
24
25
noswap:
add
si, 2
; advance si to
next index
58
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
26
cmp
si, cx
; are we at last
index
27
jne
innerloop
; if not compare
next two
28
29
cmp
byte [swap],
1
; check if a
swap has been
done
30
je
mainloop
; if yes
make another pass
31
32
ret
; go back to
where we came from
33
34
start:
mov bx,
data
; send start of
array in bx
35
mov cx,
10
; send count of
elements in cx
36
call
bubblesort
; call our
subroutine
37
38
mov bx,
data2
; send start of
array in bx
39
mov cx,
20
; send count of
elements in cx
40
call
bubblesort
; call our subroutine
again
41
42
mov
ax,
0x4c00
; terminate
program
43
int
0x21
There
are two different data
arrays declared. One of 10
elements and
05-07
the
other of 20 elements. The
second array is declared on two
lines,
where
the second line is
continuation of the first. No
additional label
is
needed since they are
situated consecutively in
memory.
34-40
The
other change is in the main
where the bubblesort
subroutine is
called
twice, once on the first
array and once on the
second.
Inside
the debugger observe that
stepping over the first
call, the first
array
is
sorted and stepping over
the second call the
second array is sorted.
If
however
we step in SP is decremented and
the stack holds 0178 which
is the
address
of the instruction following
the call. The RET
consumes that 0178
and
restores SP. The next
CALL places 0181 on the
stack and SP is again
decremented.
The RET consumes this
number and execution resumes
from
the
instruction at 0181. This is
the coordinated function of
CALL and RET
using
the stack.
In both
of the above examples, there
is a shortcoming. The subroutine
to
sort
the elements is destroying
the registers AX, CX, DX, and
SI. That means
that
the caller of this routine
has to make sure that it
does not hold
any
important
data in these registers
before calling this
function, because
after
the
call has returned the
registers will be containing meaningless
data for the
caller.
With a program containing thousands of
subroutines expecting
the
caller
to remember the set of
modified registers for each
subroutine is
unrealistic
and unreasonable. Also
registers are limited in
number, and
restricting
the caller on the use of
register will make the
caller's job very
tough.
This shortcoming will be removed
using the very important
system
stack.
5.3.
STACK
Stack
is a data structure that
behaves in a first in last out
manner. It can
contain
many elements and there is
only one way in and out of
the container.
When an
element is inserted it sits on
top of all other elements
and when an
element
is removed the one sitting
at top of all others is
removed first. To
visualize
the structure consider a
test tube and put some
balls in it. The
second
ball will come above the
first and the third will
come above the
second.
When a ball is taken out
only the one at the
top can be removed.
The
operation
of placing an element on top of
the stack is called pushing
the
element
and the operation of
removing an element from the
top of the stack
is
called popping the element.
The last thing pushed is
popped out first; the
last in
first out behavior.
We can
peek at any ball inside
the test tube but we cannot
remove it
without
removing every ball on top
of it. Similarly we can read
any element
from
the stack but cannot remove
it without removing everything
above it.
The
stack operations of pushing
and popping only work at the
top of the
59
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
stack.
This top of stack is
contained in the SP register.
The physical address
of the
stack is obtained by the
SS:SP combination. The stack
segment
registers
tells where the stack is
located and the stack
pointer marks the top
of
stack inside this
segment.
Whenever
an element is pushed on the
stack SP is decremented by two as
the
8088 stack works on word
sized elements. Single bytes
cannot be pushed
or
popped from the stack.
Also it is a decrementing stack.
Another possibility
is an
incrementing stack. A decrementing
stack moves from higher
addresses
to
lower addresses as elements
are added in it while an
incrementing stack
moves
from lower addresses to
higher addresses as elements
are added.
There
is no special reason or argument in
favor of one or another, and
more
or less
depends on the choice of the
designers. Another processor
8051 by
the
same manufacturer has an
incrementing stack while
8088 has a
decrementing
one.
Memory
is like a shelf numbered as
zero at the top and
the maximum at
the
bottom. If a decrementing stack
starts at shelf 5, the first
item is placed
in
shelf 5, the next item is
placed in shelf 4, the next
in shelf 3 and so on.
The
operations of placing items on
the stack and removing them
from there
are
called push and pop. The
push operation copies its
operand on the stack,
while
the pop operation makes a
copy from the top of
the stack into
its
operand.
When an item is pushed on a
decrementing stack, the top
of the
stack
is first decremented and the
element is then copied into
this space.
With a
pop the element at the
top of the stack is copied
into the pop
operand
and
the top of stack is
incremented afterwards.
The
basic use of the stack is to
save things and recover
from there when
needed.
For example we discussed the
shortcoming in our last example
that
it
destroyed the caller's
registers, and the callers
are not supposed to
remember
which registers are destroyed by
the thousand routines they
use.
Using
the stack the subroutine
can save the caller's
value of the registers
on
the
stack, and recover them from
there before returning.
Meanwhile the
subroutine
can freely use the
registers. From the caller's
point of view if the
registers
contain the same value
before and after the
call, it doesn't matter
if
the
subroutine used them
meanwhile.
Similarly
during the CALL operation,
the current value of the
instruction
pointer
is automatically saved on the
stack, and the destination
of CALL is
loaded
in the instruction pointer.
Execution therefore resumes
from the
destination
of CALL. When the RET
instruction is executed, it recovers
the
value
of the instruction pointer
from the stack. The
next instruction
executed
is
therefore the one following
the CALL. Observe how
playing with the
instruction
pointer affects the program
flow.
There
is a form of the RET
instruction called "RET n"
where n is a numeric
argument.
After performing the
operation of RET, it further
increments the
stack
pointer by this number, i.e.
SP is first incremented by two and
then by
n. Its
function will become clear when
parameter passing is
discussed.
Now we
describe the operation of
the stack in CALL and
RET with an
example.
The top of stack stored in
the stack pointer is
initialized at 2000.
The
space above SP is considered
empty and free. When
the stack pointer is
decremented
by two, we took a word from
the empty space and
can use it for
our
purpose. The unit of stack
operations is a word. Some
instructions push
multiple
words; however byte pushes
cannot be made. Now the
value 017B is
stored
in the word reserved on the
stack. The RET will copy
this value in the
instruction
pointer and increment the
stack pointer by two making it
2000
again,
thereby reverting the
operation of CALL.
This is
how CALL and RET behave
for near calls. There is
also a far version
of
these functions when the
target routine is in another
segment. This
version
of CALL takes a segment
offset pair just like
the far jump
instruction.
The
CALL will push both the
segment and the offset on
the stack in this
case,
followed by loading CS and IP with
the values given in the
instruction.
60
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
The
corresponding instruction RETF will
pop the offset in the
instruction
pointer
followed by popping the
segment in the code segment
register.
Apart
from CALL and RET,
the operations that use
the stack are PUSH
and
POP.
Two other operations that
will be discussed later are INT
and IRET.
Regarding
the stack, the operation of
PUSH is similar to CALL however
with
a
register other than the
instruction pointer. For
example "push ax" will
push
the
current value of the AX
register on the stack. The
operation of PUSH is
shown
below.
SP
SP Â 2
[SP]
AX
The
operation of POP is the
reverse of this. A copy of
the element at the
top
of the
stack is made in the
operand, and the top of
the stack is
incremented
afterwards.
The operation of "pop ax" is
shown below.
AX
[SP]
SP
SP + 2
Making
corresponding PUSH and POP
operations is the responsibility
of
the
programmer. If "push ax" is
followed by "pop dx" effectively
copying the
value
of the AX register in the DX
register, the processor
won't complain.
Whether
this sequence is logically
correct or not should be
ensured by the
programmer.
For example when PUSH and
POP are used to save
and restore
registers
from the stack, order must
be correct so that the saved
value of AX
is
reloaded in the AX register
and not any other
register. For this the
order of
POP
operations need to be the
reverse of the order of PUSH
operations.
Now we
consider another example
that is similar to the
previous examples,
however
the code to swap the two
elements has been extracted
into another
subroutine,
so that the formation of
stack can be observed during
nested
subroutine
calls.
Example
5.3
01
; bubble sort subroutine using
swap subroutine
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
data2:
dw
328, 329, 898, 8923, 8293,
2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200,
761, 167, 90, 5
08
swapflag:
db
0
09
10
swap:
mov ax,
[bx+si]
;
load first number
in ax
11
xchg ax,
[bx+si+2]
;
exchange with
second number
12
mov [bx+si],
ax
;
store second
number in first
13
ret
;
go back to
where we came from
14
15
bubblesort:
dec
cx
; last element not
compared
16
shl
cx, 1
; turn into byte
count
17
18
mainloop:
mov
si, 0
; initialize array index to
zero
19
mov
byte [swapflag], 0 ; reset
swap flag to no swaps
20
21
innerloop:
mov
ax, [bx+si]
; load number in
ax
22
cmp
ax,
[bx+si+2]
; compare with
next number
23
jbe
noswap
; no swap if
already in order
24
25
call
swap
; swaps
two elements
26
mov byte
[swapflag], 1 ; flag that a swap has been
done
27
28
noswap:
add
si, 2
; advance si to
next index
29
cmp
si, cx
; are we at last
index
30
jne
innerloop
; if not compare
next two
31
32
cmp
byte [swapflag], 1 ;
check if a swap has been
done
33
je
mainloop
; if yes
make another pass
34
ret
; go back to
where we came from
35
36
start:
mov
bx, data
; send start of
array in bx
37
mov
cx, 10
; send count of
elements in cx
61
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
38
call
bubblesort
; call our
subroutine
39
40
mov bx,
data2
; send start of
array in bx
41
mov cx,
20
; send count of
elements in cx
42
call
bubblesort
; call our subroutine
again
43
44
mov
ax,
0x4c00
; terminate
program
45
int
0x21
A new
instruction XCHG has been
introduced. The
instruction
11
swaps
its source and its
destination operands however at
most one
of the
operands could be in memory, so
the other has to be loaded
in
a
register. The instruction
has reduced the code
size by one
instruction.
13
The
RET at the end of swap
makes it a subroutine.
Inside
the debugger observe the
use of stack by CALL and
RET
instructions,
especially the nested
CALL.
5.4.
SAVING AND RESTORING
REGISTERS
The
subroutines we wrote till now
have been destroying certain
registers
and our
calling code has been
carefully written to not use
those registers.
However
this cannot be remembered
for a good number of
subroutines.
Therefore
our subroutines need to implement
some mechanism of
retaining
the
callers' value of any
registers used.
The
trick is to use the PUSH and
POP operations and save
the callers'
value
on the stack and recover it
from there on return. Our
swap subroutine
destroyed
the AX register while the
bubblesort subroutine destroyed AX,
CX,
and
SI. BX was not modified in
the subroutine. It had the
same value at
entry
and at exit; it was only
used by the subroutine. Our
next example
improves
on the previous version by
saving and restoring any
registers that it
will
modify using the PUSH and
POP operations.
Example
5.4
01
; bubble sort and
swap subroutines saving and restoring
registers
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
data2:
dw
328, 329, 898, 8923, 8293,
2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200,
761, 167, 90, 5
08
swapflag:
db
0
09
10
swap:
push
ax
; save old value
of ax
11
12
mov ax,
[bx+si]
; load first
number in ax
13
xchg ax,
[bx+si+2]
; exchange with
second number
14
mov [bx+si],
ax
; store second
number in first
15
16
pop
ax
; restore old value of
ax
17
ret
; go back to
where we came from
18
19
bubblesort:
push
ax
; save old value
of ax
20
push
cx
; save old value
of cx
21
push
si
; save old value
of si
22
23
dec
cx
; last element not
compared
24
shl
cx, 1
; turn into byte
count
25
26
mainloop:
mov
si, 0
; initialize array index to
zero
27
mov
byte [swapflag], 0 ; reset
swap flag to no swaps
28
29
innerloop:
mov
ax, [bx+si]
; load number in
ax
30
cmp
ax,
[bx+si+2]
; compare with
next number
31
jbe
noswap
; no swap if
already in order
32
62
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
33
call
swap
; swaps
two elements
34
mov byte
[swapflag], 1 ; flag that a swap has been
done
35
36
noswap:
add
si, 2
; advance si to
next index
37
cmp
si, cx
; are we at last
index
38
jne
innerloop
; if not compare
next two
39
40
cmp
byte [swapflag], 1 ;
check if a swap has been
done
41
je
mainloop
; if yes
make another pass
42
43
pop
si
;
restore
old value of
si
44
pop
cx
;
restore
old value of
cx
45
pop
ax
;
restore
old value of
ax
46
ret
;
go
back
to where we
came from
47
48
start:
mov bx,
data
; send start of
array in bx
49
mov cx,
10
; send count of
elements in cx
50
call
bubblesort
; call our
subroutine
51
52
mov bx,
data2
; send start of
array in bx
53
mov cx,
20
; send count of
elements in cx
54
call
bubblesort
; call our subroutine
again
55
56
mov
ax,
0x4c00
; terminate
program
57
int
0x21
When
multiple registers are
pushed, order is very
important. If AX,
19-21
CX, and
SI are pushed in this order,
they must be popped in
the
reverse
order of SI, CX, and AX.
This is again because the
stack
behaves
in a Last In First Out
manner.
Inside
the debugger we can observe
that the registers before
and after the
CALL
operation are exactly
identical. Effectively the
caller can assume
the
registers
are untouched. By tracing
into the subroutines we can
observe how
their
value is saved on the stack
by the PUSH instructions and
recovered
from
their before exit. Saving
and restoring registers this
way in subroutines
is a
standard way and must be
followed.
PUSH
PUSH
decrements SP (the stack
pointer) by two and then
transfers a word
from
the source operand to the
top of stack now pointed to by
SP. PUSH
often
is used to place parameters on
the stack before calling a
procedure;
more
generally, it is the basic
means of storing temporary
data on the stack.
POP
POP
transfers the word at the
current top of stack
(pointed to by SP) to
the
destination
operand and then increments
SP by two to point to the new
top
of
stack. POP can be used to
move temporary variables
from the stack to
registers
or memory.
Observe
that the operand of PUSH is
called a source operand
since the
data is
moving to the stack from
the operand, while the
operand of POP is
called
destination since data is
moving from the stack to
the operand.
CALL
CALL
activates an out-of-line procedure,
saving information on the
stack to
permit
a RET (return) instruction in
the procedure to transfer
control back to
the
instruction following the
CALL. For an intra segment
direct CALL, SP is
decremented
by two and IP is pushed onto
the stack. The target
procedure's
relative
displacement from the CALL
instruction is then added to
the
instruction
pointer. For an inter
segment direct CALL, SP is
decremented by
two,
and CS is pushed onto the
stack. CS is replaced by the
segment word
contained
in the instruction. SP again is
decremented by two. IP is
pushed
onto
the stack and replaced by
the offset word in the
instruction.
63
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
The
out-of-line procedure is the
temporary division, the
concept of
roundabout
that we discussed. Near
calls are also called
intra segment calls,
while
far calls are called
inter-segment calls. There
are also versions that
are
called
indirect calls; however they
will be discuss later when they
are used.
RET
RET
(Return) transfers control
from a procedure back to the
instruction
following
the CALL that activated
the procedure. RET pops
the word at the
top of
the stack (pointed to by
register SP) into the
instruction pointer
and
increments
SP by two. If RETF (inter
segment RET) is used the
word at the
top of
the stack is popped into
the IP register and SP is
incremented by two.
The
word at the new top of stack
is popped into the CS
register, and SP is
again
incremented by two. If an optional
pop value has been
specified, RET
adds
that value to SP. This
feature may be used to discard
parameters
pushed
onto the stack before
the execution of the CALL
instruction.
5.5.
PARAMETER PASSING THROUGH
STACK
Due to
the limited number of
registers, parameter passing by
registers is
constrained
in two ways. The maximum parameters a
subroutine can
receive
are
seven when all the general
registers are used. Also,
with the subroutines
are
themselves limited in their
use of registers, and this
limited increases
when
the subroutine has to make a
nested call thereby using
certain
registers
as its parameters. Due to
this, parameter passing by
registers is not
expandable
and generalizable. However
this is the fastest
mechanism
available
for passing parameters and
is used where speed is
important.
Considering
stack as an alternate, we observe
that whatever data is
placed
there,
it stays there, and across
function calls as well. For
example the
bubble
sort subroutine needs an
array address and the
count of elements. If
we
place both of these on the
stack, and call the
subroutine afterwards, it
will
stay there. The subroutine
is invoked with its return
address on top of
the
stack and its parameters
beneath it.
To
access the arguments from
the stack, the immediate
idea that strikes is
to pop
them off the stack. And this
is the only possibility
using the given
set
of
information. However the
first thing popped off
the stack would be
the
return
address and not the
arguments. This is because
the arguments were
first
pushed on the stack and
the subroutine was called
afterwards. The
arguments
cannot be popped without
first popping the return
address. If a
heaving
thing falls on someone's
leg, the heavy thing is
removed first and
the
leg is
not pulled out to reduce the
damage. Same is the case
with our
parameters
on which the return address
has fallen.
To
handle this using PUSH and
POP, we must first pop the
return address
in a
register, then pop the
operands, and push the
return address back
on
the
stack so that RET will
function normally. However so much
effort doesn't
seem to
pay back the price.
Processor designers should
have provided a
logical
and neat way to perform this
operation. They did provided
a way and
infact
we will do this without introducing
any new instruction.
Recall
that the default segment
association of the BP register is
the stack
segment
and the reason for
this association had been
deferred for now.
The
reason
is to peek inside the stack
using the BP register and
read the
parameters
without removing them and
without touching the stack
pointer.
The
stack pointer could not be
used for this purpose, as it
cannot be used in
an
effective address. It is automatically
used as a pointer and cannot
be
explicitly
used. Also the stack
pointer is a dynamic pointer
and sometimes
changes
without telling us in the
background. It is just that
whenever we
touch
it, it is where we expect it to
be. The base pointer is
provided as a
replacement
of the stack pointer so that
we can peek inside the
stack
without
modifying the structure of
the stack.
64
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
When
the bubble sort subroutine
is called, the stack pointer
is pointing to
the
return address. Two bytes
below it is the second
parameter and four
bytes
below is the first
parameter. The stack pointer
is a reference point to
these
parameters. If the value of SP is
captured in BP, then the
return
address
is located at [bp+0], the
second parameter is at [bp+2],
and the first
parameter
is at [bp+4]. This is because SP
and BP both had the
same value
and
they both defaulted to the
same segment, the stack
segment.
This
copying of SP into BP is like
taking a snapshot or like
freezing the
stack
at that moment. Even if more
pushes are made on the
stack
decrementing
the stack pointer, our
reference point will not
change. The
parameters
will still be accessible at the
same offsets from the
base pointer.
If
however the stack pointer
increments beyond the base
pointer, the
references
will become invalid. The
base pointer will act as the
datum point
to
access our parameters. However we
have destroyed the original
value of
BP in
the process, and this will
cause problems in nested
calls where both
the
outer and the inner
subroutines need to access
their own parameters.
The
outer subroutine will have
its base pointer destroyed
after the call
and
will be
unable to access its
parameters.
To
solve both of these
problems, we reach at the
standard way of accessing
parameters
on the stack. The first two
instructions of any
subroutines
accessing
its parameters from the
stack are given
below.
push
bp
mov bp,
sp
As a
result our datum point has
shifted by a word. Now the
old value of BP
will be
contained in [bp] and the
return address will be at [bp+2].
The second
parameters
will be [bp+4] while the
first one will be at [bp+6]. We
give an
example
of bubble sort subroutine
using this standard way of
argument
passing
through stack.
Example
5.5
01
; bubble sort subroutine taking
parameters from stack
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
data2:
dw
328, 329, 898, 8923, 8293,
2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200,
761, 167, 90, 5
08
swapflag:
db
0
09
10
bubblesort:
push
bp
;
save
old value of
bp
11
mov
bp, sp
;
make
bp our reference
point
12
push
ax
;
save
old value of
ax
13
push
bx
;
save
old value of
bx
14
push
cx
;
save
old value of
cx
15
push
si
;
save
old value of
si
16
17
mov
bx, [bp+6]
;
load
start of array in
bx
18
mov
cx, [bp+4]
;
load
count of elements
in cx
19
dec
cx
;
last
element not
compared
20
shl
cx, 1
;
turn
into byte
count
21
22
mainloop:
mov
si, 0
; initialize array index to
zero
23
mov
byte [swapflag], 0 ; reset
swap flag to no swaps
24
25
innerloop:
mov
ax, [bx+si]
; load number in
ax
26
cmp
ax,
[bx+si+2]
; compare with
next number
27
jbe
noswap
; no swap if
already in order
28
29
xchg ax,
[bx+si+2]
; exchange ax with
second number
30
mov [bx+si],
ax
; store second
number in first
31
mov byte
[swapflag], 1 ; flag that a swap has been
done
32
33
noswap:
add
si, 2
; advance si to
next index
34
cmp
si, cx
; are we at last
index
35
jne
innerloop
; if not compare
next two
36
65
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
37
cmp
byte [swapflag], 1 ;
check if a swap has been
done
38
je
mainloop
; if yes
make another pass
39
40
pop
si
;
restore
old
value of si
41
pop
cx
;
restore
old
value of cx
42
pop
bx
;
restore
old
value of bx
43
pop
ax
;
restore
old
value of ax
44
pop
bp
;
restore
old
value of bp
45
ret
4
;
go
back
and
remove
two params
46
47
start:
mov
ax, data
48
push
ax
; place start of array on
stack
49
mov
ax, 10
50
push
ax
; place element count on
stack
51
call
bubblesort
; call our
subroutine
52
53
mov
ax, data2
54
push
ax
; place start of array on
stack
55
mov
ax, 20
56
push
ax
; place element count on
stack
57
call
bubblesort
; call our subroutine
again
58
59
mov
ax,
0x4c00
; terminate
program
60
int
0x21
The
value of the stack pointer
is captured in the base
pointer. With
11
further
pushes SP will change but BP will not
and therefore we will
read
parameters from bp+4 and
bp+6.
45
The
form of RET that takes an
argument is used causing
four to be
added
to SP after the return
address has been popped in
the
instruction
pointer. This will effectively
discard the parameters
that
are
still there on the
stack.
47-50
We push
the address of the array we
want to sort followed by
the
count
of elements. As immediate cannot be
directly pushed in
the
8088
architecture, we first load it in
the AX register and then
push
the AX
register on the
stack.
Inside
the debugger, concentrate on
the operation of BP and the
stack. The
parameters
are placed on the stack by
the caller, the subroutine
accesses
them
using the base pointer,
and the special form of
RET removes them
without
any extra instruction. The
value of stack pointer of
FFF6 is turned
into
FFFE by the RET instruction.
This was the value in SP
before any of the
parameters
was pushed.
Stack
Clearing by Caller or Callee
Parameters
pushed for a subroutine are
a waste after the subroutine
has
returned.
They have to be cleared from
the stack. Either of the
caller and the
callee
can take the responsibility
of clearing them from there. If
the callee
has to
clear the stack it cannot do
this easily unless RET n
exists. That is
why
most general processors have
this instruction. Stack
clearing by the
caller
needs an extra instruction on
behalf of the caller after
every call made
to the
subroutine, unnecessarily increasing
instructions in the program.
If
there
are thousand calls to a
subroutine the code to clear
the stack is
repeated
a thousand times. Therefore
the prevalent convention in
most high
level
languages is stack clearing by
the callee; even though
the other
convention
is still used in some
languages.
If RET
n is not available, stack
clearing by the callee is a
complicated
process.
It will have to save the
return address in a register,
then remove the
parameters,
and then place back
the return address so that
RET will
function.
When this instruction was
introduced in processors, only
then high
level
language designers switched to
stack clearing by the
callee. This is also
exactly
why RET n adds n to SP after
performing the operation of
RET. The
other
way around would be totally
useless for our purpose.
Consider the
stack
condition at the time of RET
and this will become clear
why this will be
66
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
useless.
Also observe that RET n
has discarded the arguments
rather than
popping
them as they were no longer of
any use either of the
caller or the
callee.
The
strong argument in favour of
callee cleared stacks is
that the
arguments
were placed on the stack
for the subroutine, the
caller did not
needed
them for itself, so the
subroutine is responsible for
removing them.
Removing
the arguments is important as if
the stack is not cleared or
is
partially
cleared the stack will
eventually become full, SP will
reach 0, and
thereafter
wraparound producing unexpected
results. This is called
stack
overflow.
Therefore clearing anything
placed on the stack is very
important.
5.6.
LOCAL VARIABLES
Another
important role of the stack
is in the creation of local
variables that
are
only needed while the
subroutine is in execution and
not afterwards.
They
should not take permanent
space like global variables.
Local variables
should
be created when the subroutine is
called and discarded
afterwards.
So that
the spaced used by them can
be reused for the local
variables of
another
subroutine. They only have
meaning inside the
subroutine and no
meaning
outside it.
The
most convenient place to
store these variables is the
stack. We need
some
special manipulation of the
stack for this task. We
need to produce a
gap in
the stack for our variables.
This is explained with the
help of the
swapflag
in the bubble sort
example.
The
swapflag we have declared as a
word occupying space
permanently is
only
needed by the bubble sort
subroutine and should be a
local variable.
Actually
the variable was introduced
with the intent of making it a
local
variable
at this time. The stack
pointer will be decremented by an extra
two
bytes
thereby producing a gap in which a
word can reside. This
gap will be
used
for our temporary, local, or
automatic variable; however we
name it. We
can
decrement it as much as we want producing
the desired space,
however
the
decrement must be by an even number, as
the unit of stack operation
is
a word.
In our case we needed just
one word. Also the
most convenient
position
for this gap is immediately
after saving the value of SP
in BP. So that
the
same base pointer can be
used to access the local
variables as well;
this
time
using negative offsets. The
standard way to start a subroutine
which
needs
to access parameters and has
local variables is as
under.
push
bp
mov bp,
sp
sub sp,
2
The
gap could have been
created with a dummy push, but the
subtraction
makes
it clear that the value
pushed is not important and
the gap will be
used
for our local variable. Also
gap of any size can be
created in a single
instruction
with subtraction. The parameters
can still be accessed at
bp+4
and
bp+6 and the swapflag
can be accessed at bp-2. The
subtraction in SP
was
after taking the snapshot;
therefore BP is above the
parameters but
below
the local variables. The
parameters are therefore
accessed using
positive
offsets from BP and the
local variables are accessed
using negative
offsets.
We
modify the bubble sort
subroutine to use a local
variable to store the
swap
flag. The swap flag
remembered whether a swap
has been done in a
particular
iteration of bubble
sort.
Example
5.6
01
; bubble sort subroutine using
a local variable
02
[org
0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30,
10, 0
06
data2:
dw
328, 329, 898, 8923, 8293,
2345, 10, 877, 355, 98
67
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
07
dw
888, 533, 2000, 1020, 30, 200,
761, 167, 90, 5
08
09
bubblesort:
push
bp
;
save
old value of
bp
10
mov bp,
sp
;
make
bp our reference
point
11
sub sp,
2
;
make
two byte
space on stack
12
push
ax
;
save
old value of
ax
13
push
bx
;
save
old value of
bx
14
push
cx
;
save
old value of
cx
15
push
si
;
save
old value of
si
16
17
mov
bx, [bp+6]
;
load
start of array in
bx
18
mov
cx, [bp+4]
;
load
count of elements
in cx
19
dec
cx
;
last
element not
compared
20
shl
cx, 1
;
turn
into byte
count
21
22
mainloop:
mov
si, 0
; initialize array index to
zero
23
mov
word [bp-2],
0
; reset swap flag
to no swaps
24
25
innerloop:
mov
ax, [bx+si]
; load number in
ax
26
cmp
ax,
[bx+si+2]
; compare with
next number
27
jbe
noswap
; no swap if
already in order
28
29
xchg ax,
[bx+si+2]
; exchange ax with
second number
30
mov [bx+si],
ax
; store second
number in first
31
mov
word [bp-2], 1
; flag that a swap
has been done
32
33
noswap:
add
si, 2
; advance si to
next index
34
cmp
si, cx
; are we at last
index
35
jne
innerloop
; if not compare
next two
36
37
cmp
word [bp-2],
1
; check if a
swap has been
done
38
je
mainloop
; if yes
make another pass
39
40
pop
si
;
restore old value of
si
41
pop
cx
;
restore old value of
cx
42
pop
bx
;
restore old value of
bx
43
pop
ax
;
restore old value of
ax
44
mov
sp, bp
;
remove
space created on stack
45
pop
bp
;
restore old value of
bp
46
ret
4
;
go back
and remove two
params
47
48
start:
mov
ax, data
49
push
ax
; place start of array on
stack
50
mov
ax, 10
51
push
ax
; place element count on
stack
52
call
bubblesort
; call our
subroutine
53
54
mov
ax, data2
55
push
ax
; place start of array on
stack
56
mov
ax, 20
57
push
ax
; place element count on
stack
58
call
bubblesort
; call our subroutine
again
59
60
mov
ax,
0x4c00
; terminate
program
61
int
0x21
A word
gap has been created
for swap flag. This is
equivalent to a
11
dummy
push. The registers are
pushed above this
gap.
23
The
swapflag is accessed with [bp-2].
The parameters are
accessed
in the
same manner as the last
examples.
44
We are
removing the hole that we
created. The hole is removed
by
restoring
the value of SP that it had
at the time of snapshot or at
the
value
it had before the local
variable was created. This
can be
replaced
with "add sp, 2" however the
one used in the code
is
preferred
since it does not require to
remember how much space
for
local
variables was allocated in the
start. After this operation
SP
points
to the old value of BP from
where we can proceed as
usual.
We
needed memory to store the
swap flag. The fact
that it is in the
stack
segment
or the data segment doesn't
bother us. This will just
change the
addressing
scheme.
68
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
EXERCISES
1.
Replace the following valid
instruction with a single instruction
that
has
the same effect. Don't
consider the effect on
flags.
push
word L1
jmp L2
L1:
2.
Replace the following
invalid instructions with a single
instruction
that
has the same
effect.
a.
pop
ip
b.
mov
ip, L5
c.
sub
sp, 2
mov
[ss:sp], ax
d.
mov ax,
[ss:sp]
add sp,
2
e.
add
sp, 6
mov
ip,
[ss:sp-6]
3.
Write a recursive function to
calculate the Fibonacci of a
number.
The
number is passed as a parameter
via the stack and
the
calculated
Fibonacci number is returned in
the AX register. A
local
variable
should be used to store the
return value from the
first
recursive
call. Fibonacci function is
defined as follows:
Fibonacci(0) =
0
Fibonacci(1) =
1
Fibonacci(n) = Fibonacci(n-1) +
Fibonacci(n-2)
4.
Write the above Fibonacci
function iteratively.
HINT:
Use two registers to hold
the current and the
previous
Fibonacci
numbers in a loop.
5.
Write a function switch_stack
meant to change the current
stack and
will be
called as below. The
function should destroy no
registers.
push
word [new_stack_segment]
push
word [new_stack_offset]
call
switch_stack
6.
Write a function "addtoset"
that takes offset of a
function and
remembers
this offset in an array that
can hold a maximum of 8
offsets.
It does nothing if there are
already eight offsets in the
set.
Write
another function "callset"
that makes a call to all
functions in
the
set one by one.
7. Do
the above exercise such
that "callset" does not
use a CALL or a
JMP to
invoke the functions.
HINT:
Setup the stack
appropriately such that the
RET will execute
the
first function, its RET
execute the next and so on
till the last
RET
returns
to the caller of
"callset."
8. Make
an array of 0x80 bytes and
treat it as one of 0x400
bits. Write
a
function myalloc that takes
one argument, the number of
bits. It
finds
that many consecutive zero
bits in the array, makes
them one,
and
returns in AX the index of
the first bit. Write
another function
myfree
that takes two arguments,
index of a bit in the array,
and the
number
of bits. It makes that many
consecutive bits zero,
whatever
their
previous values are,
starting from the index in
the first
argument.
9. [Circular
Queue] Write functions to
implement circular
queues.
Declare
16x32 words of data for 16
queues numbered from 0 to
15.
Each
queue has a front index, a
rear index and 30 locations
for data
totaling
to 32 words. Declare another
word variable whose 16
bits
correspond
to the 16 queues and a 1 bit
signals that the
corresponding
queue is used and a 0 bit
signals that it is free.
Write
a
function "qcreate" that
returns a queue number after
finding a free
queue
or -1 if it failed. Write a function
"qdestroy" that marks
the
queue
as free. Write two other
functions "qadd" and
"qremove" that
69
![]() Computer
Architecture & Assembly Language
Programming
Course
Code: CS401
CS401@vu.edu.pk
can
add and remove items
from the circular queue.
The two
functions
return 0 if they failed and
1 otherwise.
10.
[Linked List] Declare 1024
nodes of four bytes each.
The first 2 bytes
will be
used for data and
the next 2 bytes for
storing the offset of
another
node. Also declare a word
variable "firstfree" to store
the
offset
of the first free node.
Write the following five
functions:
a.
"init" chains all 1024
nodes into a list with
offset of first
node in
firstfree, offset of the
second node in the later
two
bytes
of the first node and so
on. The later two bytes of
the
last
node contains zero.
b.
"createlist" returns the
offset of the node stored in
firstfree
through
AX. It sets firstfree to the
offset stored in the
later
two
bytes of that node, and it
sets the later two bytes of
that
node to
zero.
c.
"insertafter" takes two parameters,
the offset of a node
and
a word
data. It removes one node
from freelist just
like
"createlist"
and inserts it after the
said node and
updates
the new
node's data part.
d.
"deleteafter" takes a node as
its parameter and removes
the
node
immediately after it in the
linked list if there is
one.
e.
"deletelist" takes a node as
its parameters and traverses
the
linked
list starting at this node
and removes all nodes
from
it and
add them back to the free
list.
70
Table of Contents:
|
|||||