ZeePedia

Subroutines: Program Flow, Stack, Saving and Restoring Registers

<< Manipulations: Multiplication Algorithm, Shifting and Rotations, Bitwise Logical Operations
Display Memory: ASCII Codes, Display Memory Formation, Assembly Language >>
img
5
Subroutines
5.1. PROGRAM FLOW
Till now we have accumulated the very basic tools of assembly language
programming. A very important weapon in our arsenal is the conditional
jump instruction. During the course of last two chapters we used these tools
to write two very useful algorithms of sorting and multiplication. The
multiplication algorithm is useful even though there is a MUL instruction in
the 8088 instruction set, which can multiply 8bit and 16bit operands. This is
because of the extensibility of our algorithm, as it is not limited to 16bits and
can do 32bit or 64bit multiplication with minor changes.
Both of these algorithms will be used a number of times in any program of
a reasonable size and complexity. An application does not only need to
multiply at a single point in code; it multiplies at a number of places. If
multiplication or sorting is needed at 100 places in code, copying it 100
times is a totally infeasible solution. Maintaining such a code is an
impossible task.
The straightforward solution to this problem using the concepts we have
acquainted till now is to write the code at one place with a label, and
whenever we need to sort we jump to this label. But there is problem with
this logic, and the problem is that after sorting is complete how the processor
will know where to go back. The immediate answer is to jump back to a label
following the jump to bubble sort. But we have jumped to bubble sort from
100 places in code. Which of the 100 positions in code should we jump
back? Jump back at the first invocation, but jump has a single fixed target.
How will the second invocation work? The second jump to bubble sort will
never have control back at the next line.
Instruction are tied to one another forming an execution thread, just like a
knitted thread where pieces of cotton of different sizes are twisted together to
form a thread. This thread of execution is our program. The jump instruction
breaks this thread permanently, making a permanent diversion, like a turn
on a highway. The conditional jump selects one of the two possible
directions, like right or left turn on a road. So there is no concept of
returning.
However there are roundabouts on roads as well that take us back from
where we started after having traveled on the boundary of the round. This is
the concept of a temporary diversion. Two or more permanent diversions can
take us back from where we started, just like two or more road turns can
take us back to the starting point, but they are still permanent diversions in
their nature.
We need some way to implement the concept of temporary diversion in
assembly language. We want to create a roundabout of bubble sort, another
roundabout of our multiplication algorithm, so that we can enter into the
roundabout whenever we need it and return back to wherever we left from
after completing the round.
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
Program
Bubble Sort
Swap
Key point in the above discussion is returning to where we left from, like a
loop in a knitted thread. Diversion should be temporary and not permanent.
The code of bubble sort written at one place, multiply at another, and we
temporarily divert to that place, thus avoiding a repetition of code at a 100
places.
CALL and RET
In every processor, instructions are available to divert temporarily and to
divert permanently. The instructions for permanent diversion in 8088 are the
jump instructions, while the instruction for temporary diversion is the CALL
instruction. The word call must be familiar to the readers from subroutine
call in higher level languages. The CALL instruction allows temporary
diversion and therefore reusability of code. Now we can place the code for
bubble sort at one place and reuse it again and again. This was not possible
with  permanent  diversion.  Actually  the  8088  permanent  diversion
mechanism can be tricked to achieve temporary diversion. However it is not
possible without getting into a lot of trouble. The key idea in doing it this way
is to use the jump instruction form that takes a register as argument.
Therefore this is not impossible but this is not the way it is done.
The natural way to do this is to use the CALL instruction followed by a
label, just like JMP is followed by a label. Execution will divert to the code
following the label. Till now the operation has been similar to the JMP
instruction. When the subroutine completes we need to return. The RET
instruction is used for this purpose. The word return holds in its meaning
that we are to return from where we came and need no explicit destination.
Therefore RET takes no arguments and transfers control back to the
instruction following the CALL that took us in this subroutine. The actual
technical process that informs RET where to return will be discussed later
after we have discussed the system stack.
CALL takes a label as argument and execution starts from that label, until
the RET instruction is encountered and it takes execution back to the
instruction following the CALL. Both the instructions are commonly used as
a pair, however technically they are independent in their operation. The RET
works regardless of the CALL and the CALL works regardless of the RET. If
you CALL a subroutine it will not complain if there is no RET present and
similarly if you RET without being called it won't complain. It is a logical pair
and is used as a pair in every decent code. However sometimes we play tricks
with the processor and we use CALL or RET alone. This will become clear
when we need to play such tricks in later chapters.
56
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
Parameters
We intend to write the bubble sort code at one place and CALL it whenever
needed. An immediately visible problem is that whenever we call this
subroutine it will sort the same array in the same order. However in a real
application we will need to sort various arrays of various sizes. We might
sometimes need an ascending sort and descending at other times. Similarly
our data may be signed or unsigned. Such pieces of information that may
change from invocation to invocation and should be passed from the caller to
the subroutine are called parameters.
There must be some way of passing these parameters to the subroutine.
Revising the subroutine temporary flow breakage mechanism, the most
straightforward way is to use registers. The CALL mechanism breaks the
thread of execution and does not change registers, except IP which must
change for processor to start executing at another place, and SP whose
change will be discussed in detail later. Any of the other registers can hold
parameters for the subroutine.
5.2. OUR FIRST SUBROUTINE
Now we want to modify the bubble sort code so that it works as a
subroutine. We place a label at the start of bubble sort code, which works as
the anchor point and will be used in the CALL instruction to call the
subroutine. We also place a RET at the end of the algorithm to return from
where we called the subroutine.
Example 5.1
01
; bubble sort algorithm as a subroutine
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
swap:
db
0
07
08
bubblesort:
dec
cx
; last element not compared
09
shl
cx, 1
; turn into byte count
10
11
mainloop:
mov
si, 0
; initialize array index to zero
12
mov
byte [swap], 0
; reset swap flag to no swaps
13
14
innerloop:
mov
ax, [bx+si]
; load number in ax
15
cmp
ax, [bx+si+2]
; compare with next number
16
jbe
noswap
; no swap if already in order
17
18
mov
dx, [bx+si+2]
;
load second element in dx
19
mov
[bx+si], dx
;
store first number in second
20
mov
[bx+si+2], ax
;
store second number in first
21
mov
byte [swap], 1
;
flag that a swap has been done
22
23
noswap:
add
si, 2
; advance si to next index
24
cmp
si, cx
; are we at last index
25
jne
innerloop
; if not compare next two
26
27
cmp
byte [swap], 1
; check if a swap has been done
28
je
mainloop
; if yes make another pass
29
30
ret
; go back to where we came from
31
32
start:
mov  bx, data
; send start of array in bx
33
mov  cx, 10
; send count of elements in cx
34
call bubblesort
; call our subroutine
35
36
mov
ax, 0x4c00
; terminate program
37
int
0x21
The routine has received the count of elements in CX. Since it makes
08-09
one less comparison than the number of elements it decrements it.
Then it multiplies it by two since this a word array and each element
57
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
takes two bytes. Left shifting has been used to multiply by two.
14
Base+index+offset addressing has been used. BX holds the start of
array, SI the offset into it and an offset of 2 when the next element is
to be read. BX can be directly changed but then a separate counter
would be needed, as SI is directly compared with CX in our case.
32-37
The code starting from the start label is our main program
analogous to the main in the C language. BX and CX hold our
parameters for the bubblesort subroutine and the CALL is made to
invoke the subroutine.
Inside the debugger we observe the same unsigned data that we are so
used to now. The number 0103 is passed via BX to the subroutine which is
the start of our data and the number 000A via CX which is the number of
elements in our data. If we step over the CALL instruction we see our data
sorted in a single step and we are at the termination instructions. The
processor has jumped to the bubblesort routine, executed it to completion,
and returned back from it but the process was hidden due to the step over
command. If however we trace into the CALL instruction, we land at the first
instruction of our routine. At the end of the routine, when the RET
instruction is executed, we immediately land back to our termination
instructions, to be precise the instruction following the CALL.
Also observe that with the CALL instruction SP is decremented by two from
FFFE to FFFC, and the stack windows shows 0150 at its top. As the RET is
executed SP is recovered and the 0150 is also removed from the stack. Match
it with the address of the instruction following the CALL which is 0150 as
well. The 0150 removed from the stack by the RET instruction has been
loaded into the IP register thereby resuming execution from address 0150.
CALL placed where to return on the stack for the RET instruction. The stack
is automatically used with the CALL and RET instructions. Stack will be
explained in detail later, however the idea is that the one who is departing
stores the address to return at a known place. This is the place using which
CALL and RET coordinate. How this placed is actually used by the CALL and
RET instructions will be described after the stack is discussed.
After emphasizing reusability so much, it is time for another example
which uses the same bubblesort routine on two different arrays of different
sizes.
Example 5.2
01
; bubble sort subroutine called twice
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
data2:
dw
328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
08
swap:
db
0
09
10
bubblesort:
dec
cx
; last element not compared
11
shl
cx, 1
; turn into byte count
12
13
mainloop:
mov
si, 0
; initialize array index to zero
14
mov
byte [swap], 0
; reset swap flag to no swaps
15
16
innerloop:
mov
ax, [bx+si]
; load number in ax
17
cmp
ax, [bx+si+2]
; compare with next number
18
jbe
noswap
; no swap if already in order
19
20
mov
dx, [bx+si+2]
;
load second element in dx
21
mov
[bx+si], dx
;
store first number in second
22
mov
[bx+si+2], ax
;
store second number in first
23
mov
byte [swap], 1
;
flag that a swap has been done
24
25
noswap:
add
si, 2
; advance si to next index
58
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
26
cmp
si, cx
; are we at last index
27
jne
innerloop
; if not compare next two
28
29
cmp
byte [swap], 1
; check if a swap has been done
30
je
mainloop
; if yes make another pass
31
32
ret
; go back to where we came from
33
34
start:
mov  bx, data
; send start of array in bx
35
mov  cx, 10
; send count of elements in cx
36
call bubblesort
; call our subroutine
37
38
mov bx, data2
; send start of array in bx
39
mov cx, 20
; send count of elements in cx
40
call bubblesort
; call our subroutine again
41
42
mov
ax, 0x4c00
; terminate program
43
int
0x21
There are two different data arrays declared. One of 10 elements and
05-07
the other of 20 elements. The second array is declared on two lines,
where the second line is continuation of the first. No additional label
is needed since they are situated consecutively in memory.
34-40
The other change is in the main where the bubblesort subroutine is
called twice, once on the first array and once on the second.
Inside the debugger observe that stepping over the first call, the first array
is sorted and stepping over the second call the second array is sorted. If
however we step in SP is decremented and the stack holds 0178 which is the
address of the instruction following the call. The RET consumes that 0178
and restores SP. The next CALL places 0181 on the stack and SP is again
decremented. The RET consumes this number and execution resumes from
the instruction at 0181. This is the coordinated function of CALL and RET
using the stack.
In both of the above examples, there is a shortcoming. The subroutine to
sort the elements is destroying the registers AX, CX, DX, and SI. That means
that the caller of this routine has to make sure that it does not hold any
important data in these registers before calling this function, because after
the call has returned the registers will be containing meaningless data for the
caller. With a program containing thousands of subroutines expecting the
caller to remember the set of modified registers for each subroutine is
unrealistic and unreasonable. Also registers are limited in number, and
restricting the caller on the use of register will make the caller's job very
tough. This shortcoming will be removed using the very important system
stack.
5.3. STACK
Stack is a data structure that behaves in a first in last out manner. It can
contain many elements and there is only one way in and out of the container.
When an element is inserted it sits on top of all other elements and when an
element is removed the one sitting at top of all others is removed first. To
visualize the structure consider a test tube and put some balls in it. The
second ball will come above the first and the third will come above the
second. When a ball is taken out only the one at the top can be removed. The
operation of placing an element on top of the stack is called pushing the
element and the operation of removing an element from the top of the stack
is called popping the element. The last thing pushed is popped out first; the
last in first out behavior.
We can peek at any ball inside the test tube but we cannot remove it
without removing every ball on top of it. Similarly we can read any element
from the stack but cannot remove it without removing everything above it.
The stack operations of pushing and popping only work at the top of the
59
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
stack. This top of stack is contained in the SP register. The physical address
of the stack is obtained by the SS:SP combination. The stack segment
registers tells where the stack is located and the stack pointer marks the top
of stack inside this segment.
Whenever an element is pushed on the stack SP is decremented by two as
the 8088 stack works on word sized elements. Single bytes cannot be pushed
or popped from the stack. Also it is a decrementing stack. Another possibility
is an incrementing stack. A decrementing stack moves from higher addresses
to lower addresses as elements are added in it while an incrementing stack
moves from lower addresses to higher addresses as elements are added.
There is no special reason or argument in favor of one or another, and more
or less depends on the choice of the designers. Another processor 8051 by
the same manufacturer has an incrementing stack while 8088 has a
decrementing one.
Memory is like a shelf numbered as zero at the top and the maximum at
the bottom. If a decrementing stack starts at shelf 5, the first item is placed
in shelf 5, the next item is placed in shelf 4, the next in shelf 3 and so on.
The operations of placing items on the stack and removing them from there
are called push and pop. The push operation copies its operand on the stack,
while the pop operation makes a copy from the top of the stack into its
operand. When an item is pushed on a decrementing stack, the top of the
stack is first decremented and the element is then copied into this space.
With a pop the element at the top of the stack is copied into the pop operand
and the top of stack is incremented afterwards.
The basic use of the stack is to save things and recover from there when
needed. For example we discussed the shortcoming in our last example that
it destroyed the caller's registers, and the callers are not supposed to
remember which registers are destroyed by the thousand routines they use.
Using the stack the subroutine can save the caller's value of the registers on
the stack, and recover them from there before returning. Meanwhile the
subroutine can freely use the registers. From the caller's point of view if the
registers contain the same value before and after the call, it doesn't matter if
the subroutine used them meanwhile.
Similarly during the CALL operation, the current value of the instruction
pointer is automatically saved on the stack, and the destination of CALL is
loaded in the instruction pointer. Execution therefore resumes from the
destination of CALL. When the RET instruction is executed, it recovers the
value of the instruction pointer from the stack. The next instruction executed
is therefore the one following the CALL. Observe how playing with the
instruction pointer affects the program flow.
There is a form of the RET instruction called "RET n" where n is a numeric
argument. After performing the operation of RET, it further increments the
stack pointer by this number, i.e. SP is first incremented by two and then by
n. Its function will become clear when parameter passing is discussed.
Now we describe the operation of the stack in CALL and RET with an
example. The top of stack stored in the stack pointer is initialized at 2000.
The space above SP is considered empty and free. When the stack pointer is
decremented by two, we took a word from the empty space and can use it for
our purpose. The unit of stack operations is a word. Some instructions push
multiple words; however byte pushes cannot be made. Now the value 017B is
stored in the word reserved on the stack. The RET will copy this value in the
instruction pointer and increment the stack pointer by two making it 2000
again, thereby reverting the operation of CALL.
This is how CALL and RET behave for near calls. There is also a far version
of these functions when the target routine is in another segment. This
version of CALL takes a segment offset pair just like the far jump instruction.
The CALL will push both the segment and the offset on the stack in this
case, followed by loading CS and IP with the values given in the instruction.
60
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
The corresponding instruction RETF will pop the offset in the instruction
pointer followed by popping the segment in the code segment register.
Apart from CALL and RET, the operations that use the stack are PUSH and
POP. Two other operations that will be discussed later are INT and IRET.
Regarding the stack, the operation of PUSH is similar to CALL however with
a register other than the instruction pointer. For example "push ax" will push
the current value of the AX register on the stack. The operation of PUSH is
shown below.
SP
SP ­ 2
[SP]
AX
The operation of POP is the reverse of this. A copy of the element at the top
of the stack is made in the operand, and the top of the stack is incremented
afterwards. The operation of "pop ax" is shown below.
AX
[SP]
SP
SP + 2
Making corresponding PUSH and POP operations is the responsibility of
the programmer. If "push ax" is followed by "pop dx" effectively copying the
value of the AX register in the DX register, the processor won't complain.
Whether this sequence is logically correct or not should be ensured by the
programmer. For example when PUSH and POP are used to save and restore
registers from the stack, order must be correct so that the saved value of AX
is reloaded in the AX register and not any other register. For this the order of
POP operations need to be the reverse of the order of PUSH operations.
Now we consider another example that is similar to the previous examples,
however the code to swap the two elements has been extracted into another
subroutine, so that the formation of stack can be observed during nested
subroutine calls.
Example 5.3
01
; bubble sort subroutine using swap subroutine
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
data2:
dw
328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
08
swapflag:
db
0
09
10
swap:
mov  ax, [bx+si]
;
load first number in ax
11
xchg ax, [bx+si+2]
;
exchange with second number
12
mov  [bx+si], ax
;
store second number in first
13
ret
;
go back to where we came from
14
15
bubblesort:
dec
cx
; last element not compared
16
shl
cx, 1
; turn into byte count
17
18
mainloop:
mov
si, 0
; initialize array index to zero
19
mov
byte [swapflag], 0 ; reset swap flag to no swaps
20
21
innerloop:
mov
ax, [bx+si]
; load number in ax
22
cmp
ax, [bx+si+2]
; compare with next number
23
jbe
noswap
; no swap if already in order
24
25
call swap
; swaps two elements
26
mov  byte [swapflag], 1 ; flag that a swap has been done
27
28
noswap:
add
si, 2
; advance si to next index
29
cmp
si, cx
; are we at last index
30
jne
innerloop
; if not compare next two
31
32
cmp
byte [swapflag], 1 ; check if a swap has been done
33
je
mainloop
; if yes make another pass
34
ret
; go back to where we came from
35
36
start:
mov
bx, data
; send start of array in bx
37
mov
cx, 10
; send count of elements in cx
61
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
38
call bubblesort
; call our subroutine
39
40
mov bx, data2
; send start of array in bx
41
mov cx, 20
; send count of elements in cx
42
call bubblesort
; call our subroutine again
43
44
mov
ax, 0x4c00
; terminate program
45
int
0x21
A new instruction XCHG has been introduced. The instruction
11
swaps its source and its destination operands however at most one
of the operands could be in memory, so the other has to be loaded in
a register. The instruction has reduced the code size by one
instruction.
13
The RET at the end of swap makes it a subroutine.
Inside the debugger observe the use of stack by CALL and RET
instructions, especially the nested CALL.
5.4. SAVING AND RESTORING REGISTERS
The subroutines we wrote till now have been destroying certain registers
and our calling code has been carefully written to not use those registers.
However this cannot be remembered for a good number of subroutines.
Therefore our subroutines need to implement some mechanism of retaining
the callers' value of any registers used.
The trick is to use the PUSH and POP operations and save the callers'
value on the stack and recover it from there on return. Our swap subroutine
destroyed the AX register while the bubblesort subroutine destroyed AX, CX,
and SI. BX was not modified in the subroutine. It had the same value at
entry and at exit; it was only used by the subroutine. Our next example
improves on the previous version by saving and restoring any registers that it
will modify using the PUSH and POP operations.
Example 5.4
01
; bubble sort and swap subroutines saving and restoring registers
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
data2:
dw
328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
08
swapflag:
db
0
09
10
swap:
push ax
; save old value of ax
11
12
mov  ax, [bx+si]
; load first number in ax
13
xchg ax, [bx+si+2]
; exchange with second number
14
mov  [bx+si], ax
; store second number in first
15
16
pop
ax
; restore old value of ax
17
ret
; go back to where we came from
18
19
bubblesort:
push ax
; save old value of ax
20
push cx
; save old value of cx
21
push si
; save old value of si
22
23
dec
cx
; last element not compared
24
shl
cx, 1
; turn into byte count
25
26
mainloop:
mov
si, 0
; initialize array index to zero
27
mov
byte [swapflag], 0 ; reset swap flag to no swaps
28
29
innerloop:
mov
ax, [bx+si]
; load number in ax
30
cmp
ax, [bx+si+2]
; compare with next number
31
jbe
noswap
; no swap if already in order
32
62
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
33
call swap
; swaps two elements
34
mov  byte [swapflag], 1 ; flag that a swap has been done
35
36
noswap:
add
si, 2
; advance si to next index
37
cmp
si, cx
; are we at last index
38
jne
innerloop
; if not compare next two
39
40
cmp
byte [swapflag], 1 ; check if a swap has been done
41
je
mainloop
; if yes make another pass
42
43
pop
si
;
restore
old value of si
44
pop
cx
;
restore
old value of cx
45
pop
ax
;
restore
old value of ax
46
ret
;
go back
to where we came from
47
48
start:
mov  bx, data
; send start of array in bx
49
mov  cx, 10
; send count of elements in cx
50
call bubblesort
; call our subroutine
51
52
mov  bx, data2
; send start of array in bx
53
mov  cx, 20
; send count of elements in cx
54
call bubblesort
; call our subroutine again
55
56
mov
ax, 0x4c00
; terminate program
57
int
0x21
When multiple registers are pushed, order is very important. If AX,
19-21
CX, and SI are pushed in this order, they must be popped in the
reverse order of SI, CX, and AX. This is again because the stack
behaves in a Last In First Out manner.
Inside the debugger we can observe that the registers before and after the
CALL operation are exactly identical. Effectively the caller can assume the
registers are untouched. By tracing into the subroutines we can observe how
their value is saved on the stack by the PUSH instructions and recovered
from their before exit. Saving and restoring registers this way in subroutines
is a standard way and must be followed.
PUSH
PUSH decrements SP (the stack pointer) by two and then transfers a word
from the source operand to the top of stack now pointed to by SP. PUSH
often is used to place parameters on the stack before calling a procedure;
more generally, it is the basic means of storing temporary data on the stack.
POP
POP transfers the word at the current top of stack (pointed to by SP) to the
destination operand and then increments SP by two to point to the new top
of stack. POP can be used to move temporary variables from the stack to
registers or memory.
Observe that the operand of PUSH is called a source operand since the
data is moving to the stack from the operand, while the operand of POP is
called destination since data is moving from the stack to the operand.
CALL
CALL activates an out-of-line procedure, saving information on the stack to
permit a RET (return) instruction in the procedure to transfer control back to
the instruction following the CALL. For an intra segment direct CALL, SP is
decremented by two and IP is pushed onto the stack. The target procedure's
relative displacement from the CALL instruction is then added to the
instruction pointer. For an inter segment direct CALL, SP is decremented by
two, and CS is pushed onto the stack. CS is replaced by the segment word
contained in the instruction. SP again is decremented by two. IP is pushed
onto the stack and replaced by the offset word in the instruction.
63
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
The out-of-line procedure is the temporary division, the concept of
roundabout that we discussed. Near calls are also called intra segment calls,
while far calls are called inter-segment calls. There are also versions that are
called indirect calls; however they will be discuss later when they are used.
RET
RET (Return) transfers control from a procedure back to the instruction
following the CALL that activated the procedure. RET pops the word at the
top of the stack (pointed to by register SP) into the instruction pointer and
increments SP by two. If RETF (inter segment RET) is used the word at the
top of the stack is popped into the IP register and SP is incremented by two.
The word at the new top of stack is popped into the CS register, and SP is
again incremented by two. If an optional pop value has been specified, RET
adds that value to SP. This feature may be used to discard parameters
pushed onto the stack before the execution of the CALL instruction.
5.5. PARAMETER PASSING THROUGH STACK
Due to the limited number of registers, parameter passing by registers is
constrained in two ways. The maximum parameters a subroutine can receive
are seven when all the general registers are used. Also, with the subroutines
are themselves limited in their use of registers, and this limited increases
when the subroutine has to make a nested call thereby using certain
registers as its parameters. Due to this, parameter passing by registers is not
expandable and generalizable. However this is the fastest mechanism
available for passing parameters and is used where speed is important.
Considering stack as an alternate, we observe that whatever data is placed
there, it stays there, and across function calls as well. For example the
bubble sort subroutine needs an array address and the count of elements. If
we place both of these on the stack, and call the subroutine afterwards, it
will stay there. The subroutine is invoked with its return address on top of
the stack and its parameters beneath it.
To access the arguments from the stack, the immediate idea that strikes is
to pop them off the stack. And this is the only possibility using the given set
of information. However the first thing popped off the stack would be the
return address and not the arguments. This is because the arguments were
first pushed on the stack and the subroutine was called afterwards. The
arguments cannot be popped without first popping the return address. If a
heaving thing falls on someone's leg, the heavy thing is removed first and the
leg is not pulled out to reduce the damage. Same is the case with our
parameters on which the return address has fallen.
To handle this using PUSH and POP, we must first pop the return address
in a register, then pop the operands, and push the return address back on
the stack so that RET will function normally. However so much effort doesn't
seem to pay back the price. Processor designers should have provided a
logical and neat way to perform this operation. They did provided a way and
infact we will do this without introducing any new instruction.
Recall that the default segment association of the BP register is the stack
segment and the reason for this association had been deferred for now. The
reason is to peek inside the stack using the BP register and read the
parameters without removing them and without touching the stack pointer.
The stack pointer could not be used for this purpose, as it cannot be used in
an effective address. It is automatically used as a pointer and cannot be
explicitly used. Also the stack pointer is a dynamic pointer and sometimes
changes without telling us in the background. It is just that whenever we
touch it, it is where we expect it to be. The base pointer is provided as a
replacement of the stack pointer so that we can peek inside the stack
without modifying the structure of the stack.
64
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
When the bubble sort subroutine is called, the stack pointer is pointing to
the return address. Two bytes below it is the second parameter and four
bytes below is the first parameter. The stack pointer is a reference point to
these parameters. If the value of SP is captured in BP, then the return
address is located at [bp+0], the second parameter is at [bp+2], and the first
parameter is at [bp+4]. This is because SP and BP both had the same value
and they both defaulted to the same segment, the stack segment.
This copying of SP into BP is like taking a snapshot or like freezing the
stack at that moment. Even if more pushes are made on the stack
decrementing the stack pointer, our reference point will not change. The
parameters will still be accessible at the same offsets from the base pointer.
If however the stack pointer increments beyond the base pointer, the
references will become invalid. The base pointer will act as the datum point
to access our parameters. However we have destroyed the original value of
BP in the process, and this will cause problems in nested calls where both
the outer and the inner subroutines need to access their own parameters.
The outer subroutine will have its base pointer destroyed after the call and
will be unable to access its parameters.
To solve both of these problems, we reach at the standard way of accessing
parameters on the stack. The first two instructions of any subroutines
accessing its parameters from the stack are given below.
push bp
mov  bp, sp
As a result our datum point has shifted by a word. Now the old value of BP
will be contained in [bp] and the return address will be at [bp+2]. The second
parameters will be [bp+4] while the first one will be at [bp+6]. We give an
example of bubble sort subroutine using this standard way of argument
passing through stack.
Example 5.5
01
; bubble sort subroutine taking parameters from stack
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
data2:
dw
328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
07
dw
888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
08
swapflag:
db
0
09
10
bubblesort:
push
bp
;
save
old value of bp
11
mov
bp, sp
;
make
bp our reference point
12
push
ax
;
save
old value of ax
13
push
bx
;
save
old value of bx
14
push
cx
;
save
old value of cx
15
push
si
;
save
old value of si
16
17
mov
bx, [bp+6]
;
load
start of array in bx
18
mov
cx, [bp+4]
;
load
count of elements in cx
19
dec
cx
;
last
element not compared
20
shl
cx, 1
;
turn
into byte count
21
22
mainloop:
mov
si, 0
; initialize array index to zero
23
mov
byte [swapflag], 0 ; reset swap flag to no swaps
24
25
innerloop:
mov
ax, [bx+si]
; load number in ax
26
cmp
ax, [bx+si+2]
; compare with next number
27
jbe
noswap
; no swap if already in order
28
29
xchg ax, [bx+si+2]
; exchange ax with second number
30
mov  [bx+si], ax
; store second number in first
31
mov  byte [swapflag], 1 ; flag that a swap has been done
32
33
noswap:
add
si, 2
; advance si to next index
34
cmp
si, cx
; are we at last index
35
jne
innerloop
; if not compare next two
36
65
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
37
cmp
byte [swapflag], 1 ; check if a swap has been done
38
je
mainloop
; if yes make another pass
39
40
pop
si
;
restore
old
value of si
41
pop
cx
;
restore
old
value of cx
42
pop
bx
;
restore
old
value of bx
43
pop
ax
;
restore
old
value of ax
44
pop
bp
;
restore
old
value of bp
45
ret
4
;
go back
and
remove two params
46
47
start:
mov
ax, data
48
push
ax
; place start of array on stack
49
mov
ax, 10
50
push
ax
; place element count on stack
51
call
bubblesort
; call our subroutine
52
53
mov
ax, data2
54
push
ax
; place start of array on stack
55
mov
ax, 20
56
push
ax
; place element count on stack
57
call
bubblesort
; call our subroutine again
58
59
mov
ax, 0x4c00
; terminate program
60
int
0x21
The value of the stack pointer is captured in the base pointer. With
11
further pushes SP will change but BP will not and therefore we will
read parameters from bp+4 and bp+6.
45
The form of RET that takes an argument is used causing four to be
added to SP after the return address has been popped in the
instruction pointer. This will effectively discard the parameters that
are still there on the stack.
47-50
We push the address of the array we want to sort followed by the
count of elements. As immediate cannot be directly pushed in the
8088 architecture, we first load it in the AX register and then push
the AX register on the stack.
Inside the debugger, concentrate on the operation of BP and the stack. The
parameters are placed on the stack by the caller, the subroutine accesses
them using the base pointer, and the special form of RET removes them
without any extra instruction. The value of stack pointer of FFF6 is turned
into FFFE by the RET instruction. This was the value in SP before any of the
parameters was pushed.
Stack Clearing by Caller or Callee
Parameters pushed for a subroutine are a waste after the subroutine has
returned. They have to be cleared from the stack. Either of the caller and the
callee can take the responsibility of clearing them from there. If the callee
has to clear the stack it cannot do this easily unless RET n exists. That is
why most general processors have this instruction. Stack clearing by the
caller needs an extra instruction on behalf of the caller after every call made
to the subroutine, unnecessarily increasing instructions in the program. If
there are thousand calls to a subroutine the code to clear the stack is
repeated a thousand times. Therefore the prevalent convention in most high
level languages is stack clearing by the callee; even though the other
convention is still used in some languages.
If RET n is not available, stack clearing by the callee is a complicated
process. It will have to save the return address in a register, then remove the
parameters, and then place back the return address so that RET will
function. When this instruction was introduced in processors, only then high
level language designers switched to stack clearing by the callee. This is also
exactly why RET n adds n to SP after performing the operation of RET. The
other way around would be totally useless for our purpose. Consider the
stack condition at the time of RET and this will become clear why this will be
66
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
useless. Also observe that RET n has discarded the arguments rather than
popping them as they were no longer of any use either of the caller or the
callee.
The strong argument in favour of callee cleared stacks is that the
arguments were placed on the stack for the subroutine, the caller did not
needed them for itself, so the subroutine is responsible for removing them.
Removing the arguments is important as if the stack is not cleared or is
partially cleared the stack will eventually become full, SP will reach 0, and
thereafter wraparound producing unexpected results. This is called stack
overflow. Therefore clearing anything placed on the stack is very important.
5.6. LOCAL VARIABLES
Another important role of the stack is in the creation of local variables that
are only needed while the subroutine is in execution and not afterwards.
They should not take permanent space like global variables. Local variables
should be created when the subroutine is called and discarded afterwards.
So that the spaced used by them can be reused for the local variables of
another subroutine. They only have meaning inside the subroutine and no
meaning outside it.
The most convenient place to store these variables is the stack. We need
some special manipulation of the stack for this task. We need to produce a
gap in the stack for our variables. This is explained with the help of the
swapflag in the bubble sort example.
The swapflag we have declared as a word occupying space permanently is
only needed by the bubble sort subroutine and should be a local variable.
Actually the variable was introduced with the intent of making it a local
variable at this time. The stack pointer will be decremented by an extra two
bytes thereby producing a gap in which a word can reside. This gap will be
used for our temporary, local, or automatic variable; however we name it. We
can decrement it as much as we want producing the desired space, however
the decrement must be by an even number, as the unit of stack operation is
a word. In our case we needed just one word. Also the most convenient
position for this gap is immediately after saving the value of SP in BP. So that
the same base pointer can be used to access the local variables as well; this
time using negative offsets. The standard way to start a subroutine which
needs to access parameters and has local variables is as under.
push bp
mov  bp, sp
sub  sp, 2
The gap could have been created with a dummy push, but the subtraction
makes it clear that the value pushed is not important and the gap will be
used for our local variable. Also gap of any size can be created in a single
instruction with subtraction. The parameters can still be accessed at bp+4
and bp+6 and the swapflag can be accessed at bp-2. The subtraction in SP
was after taking the snapshot; therefore BP is above the parameters but
below the local variables. The parameters are therefore accessed using
positive offsets from BP and the local variables are accessed using negative
offsets.
We modify the bubble sort subroutine to use a local variable to store the
swap flag. The swap flag remembered whether a swap has been done in a
particular iteration of bubble sort.
Example 5.6
01
; bubble sort subroutine using a local variable
02
[org 0x0100]
03
jmp start
04
05
data:
dw
60, 55, 45, 50, 40, 35, 25, 30, 10, 0
06
data2:
dw
328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98
67
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
07
dw
888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5
08
09
bubblesort:
push bp
;
save
old value of bp
10
mov  bp, sp
;
make
bp our reference point
11
sub sp, 2
;
make
two byte space on stack
12
push ax
;
save
old value of ax
13
push bx
;
save
old value of bx
14
push cx
;
save
old value of cx
15
push si
;
save
old value of si
16
17
mov
bx, [bp+6]
;
load
start of array in bx
18
mov
cx, [bp+4]
;
load
count of elements in cx
19
dec
cx
;
last
element not compared
20
shl
cx, 1
;
turn
into byte count
21
22
mainloop:
mov
si, 0
; initialize array index to zero
23
mov
word [bp-2], 0
; reset swap flag to no swaps
24
25
innerloop:
mov
ax, [bx+si]
; load number in ax
26
cmp
ax, [bx+si+2]
; compare with next number
27
jbe
noswap
; no swap if already in order
28
29
xchg ax, [bx+si+2]
; exchange ax with second number
30
mov  [bx+si], ax
; store second number in first
31
mov  word [bp-2], 1
; flag that a swap has been done
32
33
noswap:
add
si, 2
; advance si to next index
34
cmp
si, cx
; are we at last index
35
jne
innerloop
; if not compare next two
36
37
cmp
word [bp-2], 1
; check if a swap has been done
38
je
mainloop
; if yes make another pass
39
40
pop
si
;
restore old value of si
41
pop
cx
;
restore old value of cx
42
pop
bx
;
restore old value of bx
43
pop
ax
;
restore old value of ax
44
mov
sp, bp
;
remove space created on stack
45
pop
bp
;
restore old value of bp
46
ret
4
;
go back and remove two params
47
48
start:
mov
ax, data
49
push
ax
; place start of array on stack
50
mov
ax, 10
51
push
ax
; place element count on stack
52
call
bubblesort
; call our subroutine
53
54
mov
ax, data2
55
push
ax
; place start of array on stack
56
mov
ax, 20
57
push
ax
; place element count on stack
58
call
bubblesort
; call our subroutine again
59
60
mov
ax, 0x4c00
; terminate program
61
int
0x21
A word gap has been created for swap flag. This is equivalent to a
11
dummy push. The registers are pushed above this gap.
23
The swapflag is accessed with [bp-2]. The parameters are accessed
in the same manner as the last examples.
44
We are removing the hole that we created. The hole is removed by
restoring the value of SP that it had at the time of snapshot or at the
value it had before the local variable was created. This can be
replaced with "add sp, 2" however the one used in the code is
preferred since it does not require to remember how much space for
local variables was allocated in the start. After this operation SP
points to the old value of BP from where we can proceed as usual.
We needed memory to store the swap flag. The fact that it is in the stack
segment or the data segment doesn't bother us. This will just change the
addressing scheme.
68
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
EXERCISES
1. Replace the following valid instruction with a single instruction that
has the same effect. Don't consider the effect on flags.
push word L1
jmp L2
L1:
2. Replace the following invalid instructions with a single instruction
that has the same effect.
a.
pop
ip
b.
mov
ip, L5
c.
sub
sp, 2
mov
[ss:sp], ax
d.
mov ax, [ss:sp]
add sp, 2
e.
add
sp, 6
mov
ip, [ss:sp-6]
3. Write a recursive function to calculate the Fibonacci of a number.
The number is passed as a parameter via the stack and the
calculated Fibonacci number is returned in the AX register. A local
variable should be used to store the return value from the first
recursive call. Fibonacci function is defined as follows:
Fibonacci(0) = 0
Fibonacci(1) = 1
Fibonacci(n) = Fibonacci(n-1) + Fibonacci(n-2)
4. Write the above Fibonacci function iteratively.
HINT: Use two registers to hold the current and the previous
Fibonacci numbers in a loop.
5. Write a function switch_stack meant to change the current stack and
will be called as below. The function should destroy no registers.
push word [new_stack_segment]
push word [new_stack_offset]
call switch_stack
6. Write a function "addtoset" that takes offset of a function and
remembers this offset in an array that can hold a maximum of 8
offsets. It does nothing if there are already eight offsets in the set.
Write another function "callset" that makes a call to all functions in
the set one by one.
7. Do the above exercise such that "callset" does not use a CALL or a
JMP to invoke the functions.
HINT: Setup the stack appropriately such that the RET will execute
the first function, its RET execute the next and so on till the last RET
returns to the caller of "callset."
8. Make an array of 0x80 bytes and treat it as one of 0x400 bits. Write
a function myalloc that takes one argument, the number of bits. It
finds that many consecutive zero bits in the array, makes them one,
and returns in AX the index of the first bit. Write another function
myfree that takes two arguments, index of a bit in the array, and the
number of bits. It makes that many consecutive bits zero, whatever
their previous values are, starting from the index in the first
argument.
9. [Circular Queue] Write functions to implement circular queues.
Declare 16x32 words of data for 16 queues numbered from 0 to 15.
Each queue has a front index, a rear index and 30 locations for data
totaling to 32 words. Declare another word variable whose 16 bits
correspond to the 16 queues and a 1 bit signals that the
corresponding queue is used and a 0 bit signals that it is free. Write
a function "qcreate" that returns a queue number after finding a free
queue or -1 if it failed. Write a function "qdestroy" that marks the
queue as free. Write two other functions "qadd" and "qremove" that
69
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
can add and remove items from the circular queue. The two
functions return 0 if they failed and 1 otherwise.
10. [Linked List] Declare 1024 nodes of four bytes each. The first 2 bytes
will be used for data and the next 2 bytes for storing the offset of
another node. Also declare a word variable "firstfree" to store the
offset of the first free node. Write the following five functions:
a. "init" chains all 1024 nodes into a list with offset of first
node in firstfree, offset of the second node in the later two
bytes of the first node and so on. The later two bytes of the
last node contains zero.
b. "createlist" returns the offset of the node stored in firstfree
through AX. It sets firstfree to the offset stored in the later
two bytes of that node, and it sets the later two bytes of that
node to zero.
c. "insertafter" takes two parameters, the offset of a node and
a word data. It removes one node from freelist just like
"createlist" and inserts it after the said node and updates
the new node's data part.
d. "deleteafter" takes a node as its parameter and removes the
node immediately after it in the linked list if there is one.
e. "deletelist" takes a node as its parameters and traverses the
linked list starting at this node and removes all nodes from
it and add them back to the free list.
70