Subroutines: Program Flow, Stack, Saving and Restoring Registers

<< Manipulations: Multiplication Algorithm, Shifting and Rotations, Bitwise Logical Operations

Display Memory: ASCII Codes, Display Memory Formation, Assembly Language >>

Subroutines

5.1. PROGRAM FLOW

Till now we have accumulated the very basic tools of assembly language

programming. A very important weapon in our arsenal is the conditional

jump instruction. During the course of last two chapters we used these tools

to write two very useful algorithms of sorting and multiplication. The

multiplication algorithm is useful even though there is a MUL instruction in

the 8088 instruction set, which can multiply 8bit and 16bit operands. This is

because of the extensibility of our algorithm, as it is not limited to 16bits and

can do 32bit or 64bit multiplication with minor changes.

Both of these algorithms will be used a number of times in any program of

a reasonable size and complexity. An application does not only need to

multiply at a single point in code; it multiplies at a number of places. If

multiplication or sorting is needed at 100 places in code, copying it 100

times is a totally infeasible solution. Maintaining such a code is an

impossible task.

The straightforward solution to this problem using the concepts we have

acquainted till now is to write the code at one place with a label, and

whenever we need to sort we jump to this label. But there is problem with

this logic, and the problem is that after sorting is complete how the processor

will know where to go back. The immediate answer is to jump back to a label

following the jump to bubble sort. But we have jumped to bubble sort from

100 places in code. Which of the 100 positions in code should we jump

back? Jump back at the first invocation, but jump has a single fixed target.

How will the second invocation work? The second jump to bubble sort will

never have control back at the next line.

Instruction are tied to one another forming an execution thread, just like a

knitted thread where pieces of cotton of different sizes are twisted together to

form a thread. This thread of execution is our program. The jump instruction

breaks this thread permanently, making a permanent diversion, like a turn

on a highway. The conditional jump selects one of the two possible

directions, like right or left turn on a road. So there is no concept of

returning.

However there are roundabouts on roads as well that take us back from

where we started after having traveled on the boundary of the round. This is

the concept of a temporary diversion. Two or more permanent diversions can

take us back from where we started, just like two or more road turns can

take us back to the starting point, but they are still permanent diversions in

their nature.

We need some way to implement the concept of temporary diversion in

assembly language. We want to create a roundabout of bubble sort, another

roundabout of our multiplication algorithm, so that we can enter into the

roundabout whenever we need it and return back to wherever we left from

after completing the round.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

Program

Bubble Sort

Swap

Key point in the above discussion is returning to where we left from, like a

loop in a knitted thread. Diversion should be temporary and not permanent.

The code of bubble sort written at one place, multiply at another, and we

temporarily divert to that place, thus avoiding a repetition of code at a 100

places.

CALL and RET

In every processor, instructions are available to divert temporarily and to

divert permanently. The instructions for permanent diversion in 8088 are the

jump instructions, while the instruction for temporary diversion is the CALL

instruction. The word call must be familiar to the readers from subroutine

call in higher level languages. The CALL instruction allows temporary

diversion and therefore reusability of code. Now we can place the code for

bubble sort at one place and reuse it again and again. This was not possible

with permanent diversion. Actually the 8088 permanent diversion

mechanism can be tricked to achieve temporary diversion. However it is not

possible without getting into a lot of trouble. The key idea in doing it this way

is to use the jump instruction form that takes a register as argument.

Therefore this is not impossible but this is not the way it is done.

The natural way to do this is to use the CALL instruction followed by a

label, just like JMP is followed by a label. Execution will divert to the code

following the label. Till now the operation has been similar to the JMP

instruction. When the subroutine completes we need to return. The RET

instruction is used for this purpose. The word return holds in its meaning

that we are to return from where we came and need no explicit destination.

Therefore RET takes no arguments and transfers control back to the

instruction following the CALL that took us in this subroutine. The actual

technical process that informs RET where to return will be discussed later

after we have discussed the system stack.

CALL takes a label as argument and execution starts from that label, until

the RET instruction is encountered and it takes execution back to the

instruction following the CALL. Both the instructions are commonly used as

a pair, however technically they are independent in their operation. The RET

works regardless of the CALL and the CALL works regardless of the RET. If

you CALL a subroutine it will not complain if there is no RET present and

similarly if you RET without being called it won't complain. It is a logical pair

and is used as a pair in every decent code. However sometimes we play tricks

with the processor and we use CALL or RET alone. This will become clear

when we need to play such tricks in later chapters.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

Parameters

We intend to write the bubble sort code at one place and CALL it whenever

needed. An immediately visible problem is that whenever we call this

subroutine it will sort the same array in the same order. However in a real

application we will need to sort various arrays of various sizes. We might

sometimes need an ascending sort and descending at other times. Similarly

our data may be signed or unsigned. Such pieces of information that may

change from invocation to invocation and should be passed from the caller to

the subroutine are called parameters.

There must be some way of passing these parameters to the subroutine.

Revising the subroutine temporary flow breakage mechanism, the most

straightforward way is to use registers. The CALL mechanism breaks the

thread of execution and does not change registers, except IP which must

change for processor to start executing at another place, and SP whose

change will be discussed in detail later. Any of the other registers can hold

parameters for the subroutine.

5.2. OUR FIRST SUBROUTINE

Now we want to modify the bubble sort code so that it works as a

subroutine. We place a label at the start of bubble sort code, which works as

the anchor point and will be used in the CALL instruction to call the

subroutine. We also place a RET at the end of the algorithm to return from

where we called the subroutine.

Example 5.1

; bubble sort algorithm as a subroutine

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

swap:

bubblesort:

dec

; last element not compared

shl

cx, 1

; turn into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

byte [swap], 0

; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

mov

dx, [bx+si+2]

;

load second element in dx

mov

[bx+si], dx

;

store first number in second

mov

[bx+si+2], ax

;

store second number in first

mov

byte [swap], 1

;

flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

cmp

byte [swap], 1

; check if a swap has been done

mainloop

; if yes make another pass

ret

; go back to where we came from

start:

mov bx, data

; send start of array in bx

mov cx, 10

; send count of elements in cx

call bubblesort

; call our subroutine

mov

ax, 0x4c00

; terminate program

int

0x21

The routine has received the count of elements in CX. Since it makes

08-09

one less comparison than the number of elements it decrements it.

Then it multiplies it by two since this a word array and each element

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

takes two bytes. Left shifting has been used to multiply by two.

Base+index+offset addressing has been used. BX holds the start of

array, SI the offset into it and an offset of 2 when the next element is

to be read. BX can be directly changed but then a separate counter

would be needed, as SI is directly compared with CX in our case.

32-37

The code starting from the start label is our main program

analogous to the main in the C language. BX and CX hold our

parameters for the bubblesort subroutine and the CALL is made to

invoke the subroutine.

Inside the debugger we observe the same unsigned data that we are so

used to now. The number 0103 is passed via BX to the subroutine which is

the start of our data and the number 000A via CX which is the number of

elements in our data. If we step over the CALL instruction we see our data

sorted in a single step and we are at the termination instructions. The

processor has jumped to the bubblesort routine, executed it to completion,

and returned back from it but the process was hidden due to the step over

command. If however we trace into the CALL instruction, we land at the first

instruction of our routine. At the end of the routine, when the RET

instruction is executed, we immediately land back to our termination

instructions, to be precise the instruction following the CALL.

Also observe that with the CALL instruction SP is decremented by two from

FFFE to FFFC, and the stack windows shows 0150 at its top. As the RET is

executed SP is recovered and the 0150 is also removed from the stack. Match

it with the address of the instruction following the CALL which is 0150 as

well. The 0150 removed from the stack by the RET instruction has been

loaded into the IP register thereby resuming execution from address 0150.

CALL placed where to return on the stack for the RET instruction. The stack

is automatically used with the CALL and RET instructions. Stack will be

explained in detail later, however the idea is that the one who is departing

stores the address to return at a known place. This is the place using which

CALL and RET coordinate. How this placed is actually used by the CALL and

RET instructions will be described after the stack is discussed.

After emphasizing reusability so much, it is time for another example

which uses the same bubblesort routine on two different arrays of different

sizes.

Example 5.2

; bubble sort subroutine called twice

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

data2:

328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98

888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5

swap:

bubblesort:

dec

; last element not compared

shl

cx, 1

; turn into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

byte [swap], 0

; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

mov

dx, [bx+si+2]

;

load second element in dx

mov

[bx+si], dx

;

store first number in second

mov

[bx+si+2], ax

;

store second number in first

mov

byte [swap], 1

;

flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

cmp

byte [swap], 1

; check if a swap has been done

mainloop

; if yes make another pass

ret

; go back to where we came from

start:

mov bx, data

; send start of array in bx

mov cx, 10

; send count of elements in cx

call bubblesort

; call our subroutine

mov bx, data2

; send start of array in bx

mov cx, 20

; send count of elements in cx

call bubblesort

; call our subroutine again

mov

ax, 0x4c00

; terminate program

int

0x21

There are two different data arrays declared. One of 10 elements and

05-07

the other of 20 elements. The second array is declared on two lines,

where the second line is continuation of the first. No additional label

is needed since they are situated consecutively in memory.

34-40

The other change is in the main where the bubblesort subroutine is

called twice, once on the first array and once on the second.

Inside the debugger observe that stepping over the first call, the first array

is sorted and stepping over the second call the second array is sorted. If

however we step in SP is decremented and the stack holds 0178 which is the

address of the instruction following the call. The RET consumes that 0178

and restores SP. The next CALL places 0181 on the stack and SP is again

decremented. The RET consumes this number and execution resumes from

the instruction at 0181. This is the coordinated function of CALL and RET

using the stack.

In both of the above examples, there is a shortcoming. The subroutine to

sort the elements is destroying the registers AX, CX, DX, and SI. That means

that the caller of this routine has to make sure that it does not hold any

important data in these registers before calling this function, because after

the call has returned the registers will be containing meaningless data for the

caller. With a program containing thousands of subroutines expecting the

caller to remember the set of modified registers for each subroutine is

unrealistic and unreasonable. Also registers are limited in number, and

restricting the caller on the use of register will make the caller's job very

tough. This shortcoming will be removed using the very important system

stack.

5.3. STACK

Stack is a data structure that behaves in a first in last out manner. It can

contain many elements and there is only one way in and out of the container.

When an element is inserted it sits on top of all other elements and when an

element is removed the one sitting at top of all others is removed first. To

visualize the structure consider a test tube and put some balls in it. The

second ball will come above the first and the third will come above the

second. When a ball is taken out only the one at the top can be removed. The

operation of placing an element on top of the stack is called pushing the

element and the operation of removing an element from the top of the stack

is called popping the element. The last thing pushed is popped out first; the

last in first out behavior.

We can peek at any ball inside the test tube but we cannot remove it

without removing every ball on top of it. Similarly we can read any element

from the stack but cannot remove it without removing everything above it.

The stack operations of pushing and popping only work at the top of the

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

stack. This top of stack is contained in the SP register. The physical address

of the stack is obtained by the SS:SP combination. The stack segment

registers tells where the stack is located and the stack pointer marks the top

of stack inside this segment.

Whenever an element is pushed on the stack SP is decremented by two as

the 8088 stack works on word sized elements. Single bytes cannot be pushed

or popped from the stack. Also it is a decrementing stack. Another possibility

is an incrementing stack. A decrementing stack moves from higher addresses

to lower addresses as elements are added in it while an incrementing stack

moves from lower addresses to higher addresses as elements are added.

There is no special reason or argument in favor of one or another, and more

or less depends on the choice of the designers. Another processor 8051 by

the same manufacturer has an incrementing stack while 8088 has a

decrementing one.

Memory is like a shelf numbered as zero at the top and the maximum at

the bottom. If a decrementing stack starts at shelf 5, the first item is placed

in shelf 5, the next item is placed in shelf 4, the next in shelf 3 and so on.

The operations of placing items on the stack and removing them from there

are called push and pop. The push operation copies its operand on the stack,

while the pop operation makes a copy from the top of the stack into its

operand. When an item is pushed on a decrementing stack, the top of the

stack is first decremented and the element is then copied into this space.

With a pop the element at the top of the stack is copied into the pop operand

and the top of stack is incremented afterwards.

The basic use of the stack is to save things and recover from there when

needed. For example we discussed the shortcoming in our last example that

it destroyed the caller's registers, and the callers are not supposed to

remember which registers are destroyed by the thousand routines they use.

Using the stack the subroutine can save the caller's value of the registers on

the stack, and recover them from there before returning. Meanwhile the

subroutine can freely use the registers. From the caller's point of view if the

registers contain the same value before and after the call, it doesn't matter if

the subroutine used them meanwhile.

Similarly during the CALL operation, the current value of the instruction

pointer is automatically saved on the stack, and the destination of CALL is

loaded in the instruction pointer. Execution therefore resumes from the

destination of CALL. When the RET instruction is executed, it recovers the

value of the instruction pointer from the stack. The next instruction executed

is therefore the one following the CALL. Observe how playing with the

instruction pointer affects the program flow.

There is a form of the RET instruction called "RET n" where n is a numeric

argument. After performing the operation of RET, it further increments the

stack pointer by this number, i.e. SP is first incremented by two and then by

n. Its function will become clear when parameter passing is discussed.

Now we describe the operation of the stack in CALL and RET with an

example. The top of stack stored in the stack pointer is initialized at 2000.

The space above SP is considered empty and free. When the stack pointer is

decremented by two, we took a word from the empty space and can use it for

our purpose. The unit of stack operations is a word. Some instructions push

multiple words; however byte pushes cannot be made. Now the value 017B is

stored in the word reserved on the stack. The RET will copy this value in the

instruction pointer and increment the stack pointer by two making it 2000

again, thereby reverting the operation of CALL.

This is how CALL and RET behave for near calls. There is also a far version

of these functions when the target routine is in another segment. This

version of CALL takes a segment offset pair just like the far jump instruction.

The CALL will push both the segment and the offset on the stack in this

case, followed by loading CS and IP with the values given in the instruction.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

The corresponding instruction RETF will pop the offset in the instruction

pointer followed by popping the segment in the code segment register.

Apart from CALL and RET, the operations that use the stack are PUSH and

POP. Two other operations that will be discussed later are INT and IRET.

Regarding the stack, the operation of PUSH is similar to CALL however with

a register other than the instruction pointer. For example "push ax" will push

the current value of the AX register on the stack. The operation of PUSH is

shown below.

SP 2

[SP]

The operation of POP is the reverse of this. A copy of the element at the top

of the stack is made in the operand, and the top of the stack is incremented

afterwards. The operation of "pop ax" is shown below.

[SP]

SP + 2

Making corresponding PUSH and POP operations is the responsibility of

the programmer. If "push ax" is followed by "pop dx" effectively copying the

value of the AX register in the DX register, the processor won't complain.

Whether this sequence is logically correct or not should be ensured by the

programmer. For example when PUSH and POP are used to save and restore

registers from the stack, order must be correct so that the saved value of AX

is reloaded in the AX register and not any other register. For this the order of

POP operations need to be the reverse of the order of PUSH operations.

Now we consider another example that is similar to the previous examples,

however the code to swap the two elements has been extracted into another

subroutine, so that the formation of stack can be observed during nested

subroutine calls.

Example 5.3

; bubble sort subroutine using swap subroutine

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

data2:

328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98

888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5

swapflag:

swap:

mov ax, [bx+si]

;

load first number in ax

xchg ax, [bx+si+2]

;

exchange with second number

mov [bx+si], ax

;

store second number in first

ret

;

go back to where we came from

bubblesort:

dec

; last element not compared

shl

cx, 1

; turn into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

byte [swapflag], 0 ; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

call swap

; swaps two elements

mov byte [swapflag], 1 ; flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

cmp

byte [swapflag], 1 ; check if a swap has been done

mainloop

; if yes make another pass

ret

; go back to where we came from

start:

mov

bx, data

; send start of array in bx

mov

cx, 10

; send count of elements in cx

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

call bubblesort

; call our subroutine

mov bx, data2

; send start of array in bx

mov cx, 20

; send count of elements in cx

call bubblesort

; call our subroutine again

mov

ax, 0x4c00

; terminate program

int

0x21

A new instruction XCHG has been introduced. The instruction

swaps its source and its destination operands however at most one

of the operands could be in memory, so the other has to be loaded in

a register. The instruction has reduced the code size by one

instruction.

The RET at the end of swap makes it a subroutine.

Inside the debugger observe the use of stack by CALL and RET

instructions, especially the nested CALL.

5.4. SAVING AND RESTORING REGISTERS

The subroutines we wrote till now have been destroying certain registers

and our calling code has been carefully written to not use those registers.

However this cannot be remembered for a good number of subroutines.

Therefore our subroutines need to implement some mechanism of retaining

the callers' value of any registers used.

The trick is to use the PUSH and POP operations and save the callers'

value on the stack and recover it from there on return. Our swap subroutine

destroyed the AX register while the bubblesort subroutine destroyed AX, CX,

and SI. BX was not modified in the subroutine. It had the same value at

entry and at exit; it was only used by the subroutine. Our next example

improves on the previous version by saving and restoring any registers that it

will modify using the PUSH and POP operations.

Example 5.4

; bubble sort and swap subroutines saving and restoring registers

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

data2:

328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98

888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5

swapflag:

swap:

push ax

; save old value of ax

mov ax, [bx+si]

; load first number in ax

xchg ax, [bx+si+2]

; exchange with second number

mov [bx+si], ax

; store second number in first

pop

; restore old value of ax

ret

; go back to where we came from

bubblesort:

push ax

; save old value of ax

push cx

; save old value of cx

push si

; save old value of si

dec

; last element not compared

shl

cx, 1

; turn into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

byte [swapflag], 0 ; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

call swap

; swaps two elements

mov byte [swapflag], 1 ; flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

cmp

byte [swapflag], 1 ; check if a swap has been done

mainloop

; if yes make another pass

pop

;

restore

old value of si

pop

;

restore

old value of cx

pop

;

restore

old value of ax

ret

;

go back

to where we came from

start:

mov bx, data

; send start of array in bx

mov cx, 10

; send count of elements in cx

call bubblesort

; call our subroutine

mov bx, data2

; send start of array in bx

mov cx, 20

; send count of elements in cx

call bubblesort

; call our subroutine again

mov

ax, 0x4c00

; terminate program

int

0x21

When multiple registers are pushed, order is very important. If AX,

19-21

CX, and SI are pushed in this order, they must be popped in the

reverse order of SI, CX, and AX. This is again because the stack

behaves in a Last In First Out manner.

Inside the debugger we can observe that the registers before and after the

CALL operation are exactly identical. Effectively the caller can assume the

registers are untouched. By tracing into the subroutines we can observe how

their value is saved on the stack by the PUSH instructions and recovered

from their before exit. Saving and restoring registers this way in subroutines

is a standard way and must be followed.

PUSH

PUSH decrements SP (the stack pointer) by two and then transfers a word

from the source operand to the top of stack now pointed to by SP. PUSH

often is used to place parameters on the stack before calling a procedure;

more generally, it is the basic means of storing temporary data on the stack.

POP

POP transfers the word at the current top of stack (pointed to by SP) to the

destination operand and then increments SP by two to point to the new top

of stack. POP can be used to move temporary variables from the stack to

registers or memory.

Observe that the operand of PUSH is called a source operand since the

data is moving to the stack from the operand, while the operand of POP is

called destination since data is moving from the stack to the operand.

CALL

CALL activates an out-of-line procedure, saving information on the stack to

permit a RET (return) instruction in the procedure to transfer control back to

the instruction following the CALL. For an intra segment direct CALL, SP is

decremented by two and IP is pushed onto the stack. The target procedure's

relative displacement from the CALL instruction is then added to the

instruction pointer. For an inter segment direct CALL, SP is decremented by

two, and CS is pushed onto the stack. CS is replaced by the segment word

contained in the instruction. SP again is decremented by two. IP is pushed

onto the stack and replaced by the offset word in the instruction.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

The out-of-line procedure is the temporary division, the concept of

roundabout that we discussed. Near calls are also called intra segment calls,

while far calls are called inter-segment calls. There are also versions that are

called indirect calls; however they will be discuss later when they are used.

RET

RET (Return) transfers control from a procedure back to the instruction

following the CALL that activated the procedure. RET pops the word at the

top of the stack (pointed to by register SP) into the instruction pointer and

increments SP by two. If RETF (inter segment RET) is used the word at the

top of the stack is popped into the IP register and SP is incremented by two.

The word at the new top of stack is popped into the CS register, and SP is

again incremented by two. If an optional pop value has been specified, RET

adds that value to SP. This feature may be used to discard parameters

pushed onto the stack before the execution of the CALL instruction.

5.5. PARAMETER PASSING THROUGH STACK

Due to the limited number of registers, parameter passing by registers is

constrained in two ways. The maximum parameters a subroutine can receive

are seven when all the general registers are used. Also, with the subroutines

are themselves limited in their use of registers, and this limited increases

when the subroutine has to make a nested call thereby using certain

registers as its parameters. Due to this, parameter passing by registers is not

expandable and generalizable. However this is the fastest mechanism

available for passing parameters and is used where speed is important.

Considering stack as an alternate, we observe that whatever data is placed

there, it stays there, and across function calls as well. For example the

bubble sort subroutine needs an array address and the count of elements. If

we place both of these on the stack, and call the subroutine afterwards, it

will stay there. The subroutine is invoked with its return address on top of

the stack and its parameters beneath it.

To access the arguments from the stack, the immediate idea that strikes is

to pop them off the stack. And this is the only possibility using the given set

of information. However the first thing popped off the stack would be the

return address and not the arguments. This is because the arguments were

first pushed on the stack and the subroutine was called afterwards. The

arguments cannot be popped without first popping the return address. If a

heaving thing falls on someone's leg, the heavy thing is removed first and the

leg is not pulled out to reduce the damage. Same is the case with our

parameters on which the return address has fallen.

To handle this using PUSH and POP, we must first pop the return address

in a register, then pop the operands, and push the return address back on

the stack so that RET will function normally. However so much effort doesn't

seem to pay back the price. Processor designers should have provided a

logical and neat way to perform this operation. They did provided a way and

infact we will do this without introducing any new instruction.

Recall that the default segment association of the BP register is the stack

segment and the reason for this association had been deferred for now. The

reason is to peek inside the stack using the BP register and read the

parameters without removing them and without touching the stack pointer.

The stack pointer could not be used for this purpose, as it cannot be used in

an effective address. It is automatically used as a pointer and cannot be

explicitly used. Also the stack pointer is a dynamic pointer and sometimes

changes without telling us in the background. It is just that whenever we

touch it, it is where we expect it to be. The base pointer is provided as a

replacement of the stack pointer so that we can peek inside the stack

without modifying the structure of the stack.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

When the bubble sort subroutine is called, the stack pointer is pointing to

the return address. Two bytes below it is the second parameter and four

bytes below is the first parameter. The stack pointer is a reference point to

these parameters. If the value of SP is captured in BP, then the return

address is located at [bp+0], the second parameter is at [bp+2], and the first

parameter is at [bp+4]. This is because SP and BP both had the same value

and they both defaulted to the same segment, the stack segment.

This copying of SP into BP is like taking a snapshot or like freezing the

stack at that moment. Even if more pushes are made on the stack

decrementing the stack pointer, our reference point will not change. The

parameters will still be accessible at the same offsets from the base pointer.

If however the stack pointer increments beyond the base pointer, the

references will become invalid. The base pointer will act as the datum point

to access our parameters. However we have destroyed the original value of

BP in the process, and this will cause problems in nested calls where both

the outer and the inner subroutines need to access their own parameters.

The outer subroutine will have its base pointer destroyed after the call and

will be unable to access its parameters.

To solve both of these problems, we reach at the standard way of accessing

parameters on the stack. The first two instructions of any subroutines

accessing its parameters from the stack are given below.

push bp

mov bp, sp

As a result our datum point has shifted by a word. Now the old value of BP

will be contained in [bp] and the return address will be at [bp+2]. The second

parameters will be [bp+4] while the first one will be at [bp+6]. We give an

example of bubble sort subroutine using this standard way of argument

passing through stack.

Example 5.5

; bubble sort subroutine taking parameters from stack

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

data2:

328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98

888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5

swapflag:

bubblesort:

push

;

save

old value of bp

mov

bp, sp

;

make

bp our reference point

push

;

save

old value of ax

push

;

save

old value of bx

push

;

save

old value of cx

push

;

save

old value of si

mov

bx, [bp+6]

;

load

start of array in bx

mov

cx, [bp+4]

;

load

count of elements in cx

dec

;

last

element not compared

shl

cx, 1

;

turn

into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

byte [swapflag], 0 ; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

xchg ax, [bx+si+2]

; exchange ax with second number

mov [bx+si], ax

; store second number in first

mov byte [swapflag], 1 ; flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

cmp

byte [swapflag], 1 ; check if a swap has been done

mainloop

; if yes make another pass

pop

;

restore

old

value of si

pop

;

restore

old

value of cx

pop

;

restore

old

value of bx

pop

;

restore

old

value of ax

pop

;

restore

old

value of bp

ret

;

go back

and

remove two params

start:

mov

ax, data

push

; place start of array on stack

mov

ax, 10

push

; place element count on stack

call

bubblesort

; call our subroutine

mov

ax, data2

push

; place start of array on stack

mov

ax, 20

push

; place element count on stack

call

bubblesort

; call our subroutine again

mov

ax, 0x4c00

; terminate program

int

0x21

The value of the stack pointer is captured in the base pointer. With

further pushes SP will change but BP will not and therefore we will

read parameters from bp+4 and bp+6.

The form of RET that takes an argument is used causing four to be

added to SP after the return address has been popped in the

instruction pointer. This will effectively discard the parameters that

are still there on the stack.

47-50

We push the address of the array we want to sort followed by the

count of elements. As immediate cannot be directly pushed in the

8088 architecture, we first load it in the AX register and then push

the AX register on the stack.

Inside the debugger, concentrate on the operation of BP and the stack. The

parameters are placed on the stack by the caller, the subroutine accesses

them using the base pointer, and the special form of RET removes them

without any extra instruction. The value of stack pointer of FFF6 is turned

into FFFE by the RET instruction. This was the value in SP before any of the

parameters was pushed.

Stack Clearing by Caller or Callee

Parameters pushed for a subroutine are a waste after the subroutine has

returned. They have to be cleared from the stack. Either of the caller and the

callee can take the responsibility of clearing them from there. If the callee

has to clear the stack it cannot do this easily unless RET n exists. That is

why most general processors have this instruction. Stack clearing by the

caller needs an extra instruction on behalf of the caller after every call made

to the subroutine, unnecessarily increasing instructions in the program. If

there are thousand calls to a subroutine the code to clear the stack is

repeated a thousand times. Therefore the prevalent convention in most high

level languages is stack clearing by the callee; even though the other

convention is still used in some languages.

If RET n is not available, stack clearing by the callee is a complicated

process. It will have to save the return address in a register, then remove the

parameters, and then place back the return address so that RET will

function. When this instruction was introduced in processors, only then high

level language designers switched to stack clearing by the callee. This is also

exactly why RET n adds n to SP after performing the operation of RET. The

other way around would be totally useless for our purpose. Consider the

stack condition at the time of RET and this will become clear why this will be

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

useless. Also observe that RET n has discarded the arguments rather than

popping them as they were no longer of any use either of the caller or the

callee.

The strong argument in favour of callee cleared stacks is that the

arguments were placed on the stack for the subroutine, the caller did not

needed them for itself, so the subroutine is responsible for removing them.

Removing the arguments is important as if the stack is not cleared or is

partially cleared the stack will eventually become full, SP will reach 0, and

thereafter wraparound producing unexpected results. This is called stack

overflow. Therefore clearing anything placed on the stack is very important.

5.6. LOCAL VARIABLES

Another important role of the stack is in the creation of local variables that

are only needed while the subroutine is in execution and not afterwards.

They should not take permanent space like global variables. Local variables

should be created when the subroutine is called and discarded afterwards.

So that the spaced used by them can be reused for the local variables of

another subroutine. They only have meaning inside the subroutine and no

meaning outside it.

The most convenient place to store these variables is the stack. We need

some special manipulation of the stack for this task. We need to produce a

gap in the stack for our variables. This is explained with the help of the

swapflag in the bubble sort example.

The swapflag we have declared as a word occupying space permanently is

only needed by the bubble sort subroutine and should be a local variable.

Actually the variable was introduced with the intent of making it a local

variable at this time. The stack pointer will be decremented by an extra two

bytes thereby producing a gap in which a word can reside. This gap will be

used for our temporary, local, or automatic variable; however we name it. We

can decrement it as much as we want producing the desired space, however

the decrement must be by an even number, as the unit of stack operation is

a word. In our case we needed just one word. Also the most convenient

position for this gap is immediately after saving the value of SP in BP. So that

the same base pointer can be used to access the local variables as well; this

time using negative offsets. The standard way to start a subroutine which

needs to access parameters and has local variables is as under.

push bp

mov bp, sp

sub sp, 2

The gap could have been created with a dummy push, but the subtraction

makes it clear that the value pushed is not important and the gap will be

used for our local variable. Also gap of any size can be created in a single

instruction with subtraction. The parameters can still be accessed at bp+4

and bp+6 and the swapflag can be accessed at bp-2. The subtraction in SP

was after taking the snapshot; therefore BP is above the parameters but

below the local variables. The parameters are therefore accessed using

positive offsets from BP and the local variables are accessed using negative

offsets.

We modify the bubble sort subroutine to use a local variable to store the

swap flag. The swap flag remembered whether a swap has been done in a

particular iteration of bubble sort.

Example 5.6

; bubble sort subroutine using a local variable

[org 0x0100]

jmp start

data:

60, 55, 45, 50, 40, 35, 25, 30, 10, 0

data2:

328, 329, 898, 8923, 8293, 2345, 10, 877, 355, 98

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

888, 533, 2000, 1020, 30, 200, 761, 167, 90, 5

bubblesort:

push bp

;

save

old value of bp

mov bp, sp

;

make

bp our reference point

sub sp, 2

;

make

two byte space on stack

push ax

;

save

old value of ax

push bx

;

save

old value of bx

push cx

;

save

old value of cx

push si

;

save

old value of si

mov

bx, [bp+6]

;

load

start of array in bx

mov

cx, [bp+4]

;

load

count of elements in cx

dec

;

last

element not compared

shl

cx, 1

;

turn

into byte count

mainloop:

mov

si, 0

; initialize array index to zero

mov

word [bp-2], 0

; reset swap flag to no swaps

innerloop:

mov

ax, [bx+si]

; load number in ax

cmp

ax, [bx+si+2]

; compare with next number

jbe

noswap

; no swap if already in order

xchg ax, [bx+si+2]

; exchange ax with second number

mov [bx+si], ax

; store second number in first

mov word [bp-2], 1

; flag that a swap has been done

noswap:

add

si, 2

; advance si to next index

cmp

si, cx

; are we at last index

jne

innerloop

; if not compare next two

cmp

word [bp-2], 1

; check if a swap has been done

mainloop

; if yes make another pass

pop

;

restore old value of si

pop

;

restore old value of cx

pop

;

restore old value of bx

pop

;

restore old value of ax

mov

sp, bp

;

remove space created on stack

pop

;

restore old value of bp

ret

;

go back and remove two params

start:

mov

ax, data

push

; place start of array on stack

mov

ax, 10

push

; place element count on stack

call

bubblesort

; call our subroutine

mov

ax, data2

push

; place start of array on stack

mov

ax, 20

push

; place element count on stack

call

bubblesort

; call our subroutine again

mov

ax, 0x4c00

; terminate program

int

0x21

A word gap has been created for swap flag. This is equivalent to a

dummy push. The registers are pushed above this gap.

The swapflag is accessed with [bp-2]. The parameters are accessed

in the same manner as the last examples.

We are removing the hole that we created. The hole is removed by

restoring the value of SP that it had at the time of snapshot or at the

value it had before the local variable was created. This can be

replaced with "add sp, 2" however the one used in the code is

preferred since it does not require to remember how much space for

local variables was allocated in the start. After this operation SP

points to the old value of BP from where we can proceed as usual.

We needed memory to store the swap flag. The fact that it is in the stack

segment or the data segment doesn't bother us. This will just change the

addressing scheme.

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

EXERCISES

1. Replace the following valid instruction with a single instruction that

has the same effect. Don't consider the effect on flags.

push word L1

jmp L2

L1:

2. Replace the following invalid instructions with a single instruction

that has the same effect.

pop

mov

ip, L5

sub

sp, 2

mov

[ss:sp], ax

mov ax, [ss:sp]

add sp, 2

add

sp, 6

mov

ip, [ss:sp-6]

3. Write a recursive function to calculate the Fibonacci of a number.

The number is passed as a parameter via the stack and the

calculated Fibonacci number is returned in the AX register. A local

variable should be used to store the return value from the first

recursive call. Fibonacci function is defined as follows:

Fibonacci(0) = 0

Fibonacci(1) = 1

Fibonacci(n) = Fibonacci(n-1) + Fibonacci(n-2)

4. Write the above Fibonacci function iteratively.

HINT: Use two registers to hold the current and the previous

Fibonacci numbers in a loop.

5. Write a function switch_stack meant to change the current stack and

will be called as below. The function should destroy no registers.

push word [new_stack_segment]

push word [new_stack_offset]

call switch_stack

6. Write a function "addtoset" that takes offset of a function and

remembers this offset in an array that can hold a maximum of 8

offsets. It does nothing if there are already eight offsets in the set.

Write another function "callset" that makes a call to all functions in

the set one by one.

7. Do the above exercise such that "callset" does not use a CALL or a

JMP to invoke the functions.

HINT: Setup the stack appropriately such that the RET will execute

the first function, its RET execute the next and so on till the last RET

returns to the caller of "callset."

8. Make an array of 0x80 bytes and treat it as one of 0x400 bits. Write

a function myalloc that takes one argument, the number of bits. It

finds that many consecutive zero bits in the array, makes them one,

and returns in AX the index of the first bit. Write another function

myfree that takes two arguments, index of a bit in the array, and the

number of bits. It makes that many consecutive bits zero, whatever

their previous values are, starting from the index in the first

argument.

9. [Circular Queue] Write functions to implement circular queues.

Declare 16x32 words of data for 16 queues numbered from 0 to 15.

Each queue has a front index, a rear index and 30 locations for data

totaling to 32 words. Declare another word variable whose 16 bits

correspond to the 16 queues and a 1 bit signals that the

corresponding queue is used and a 0 bit signals that it is free. Write

a function "qcreate" that returns a queue number after finding a free

queue or -1 if it failed. Write a function "qdestroy" that marks the

queue as free. Write two other functions "qadd" and "qremove" that

Computer Architecture & Assembly Language Programming

Course Code: CS401

CS401@vu.edu.pk

can add and remove items from the circular queue. The two

functions return 0 if they failed and 1 otherwise.

10. [Linked List] Declare 1024 nodes of four bytes each. The first 2 bytes

will be used for data and the next 2 bytes for storing the offset of

another node. Also declare a word variable "firstfree" to store the

offset of the first free node. Write the following five functions:

a. "init" chains all 1024 nodes into a list with offset of first

node in firstfree, offset of the second node in the later two

bytes of the first node and so on. The later two bytes of the

last node contains zero.

b. "createlist" returns the offset of the node stored in firstfree

through AX. It sets firstfree to the offset stored in the later

two bytes of that node, and it sets the later two bytes of that

node to zero.

c. "insertafter" takes two parameters, the offset of a node and

a word data. It removes one node from freelist just like

"createlist" and inserts it after the said node and updates

the new node's data part.

d. "deleteafter" takes a node as its parameter and removes the

node immediately after it in the linked list if there is one.

e. "deletelist" takes a node as its parameters and traverses the

linked list starting at this node and removes all nodes from

it and add them back to the free list.

Table of Contents: