ZeePedia

Addressing Modes: Data Declaration, Direct, Register Indirect , Offset Addressing

<< Introduction to Assembly Language Programming
Branching: Comparison and Conditions, Conditional ,Unconditional Jump >>
img
2
Addressing Modes
2.1. DATA DECLARATION
The first instruction of our first assembly language program was "mov ax,
5." Here MOV was the opcode; AX was the destination operand, while 5 was
the source operand. The value of 5 in this case was stored as part of the
instruction encoding. In the opcode B80500, B8 was the opcode and 0500
was the operand stored immediately afterwards. Such an operand is called
an immediate operand. It is one of the many types of operands available.
Writing programs using just the immediate operand type is difficult. Every
reasonable program needs some data in memory apart from constants.
Constants cannot be changed, i.e. they cannot appear as the destination
operand. In fact placing them as destination is meaningless and illegal
according to assembly language syntax. Only registers or data placed in
memory can be changed. So real data is the one stored in memory, with a
very few constants. So there must be a mechanism in assembly language to
store and retrieve data from memory.
To declare a part of our program as holding data instead of instructions we
need a couple of very basic but special assembler directives. The first
directive is "define byte" written as "db."
db
somevalue
As a result a cell in memory will be reserved containing the desired value
in it and it can be used in a variety of ways. Now we can add variables
instead of constants. The other directive is "define word" or "dw" with the
same syntax as "db" but reserving a whole word of 16 bits instead of a byte.
There are directives to declare a double or a quad word as well but we will
restrict ourselves to byte and word declarations for now. For single byte we
use db and for two bytes we use dw.
To refer to this variable later in the program, we need the address occupied
by this variable. The assembler is there to help us. We can associate a
symbol with any address that we want to remember and use that symbol in
the rest of the code. The symbol is there for our own comprehension of code.
The assembler will calculate the address of that symbol using our origin
directive and calculating the instruction lengths or data declarations in-
between and replace all references to the symbol with the corresponding
address. This is just like variables in a higher level language, where the
compiler translates them into addresses; just the process is hidden from the
programmer one level further. Such a symbol associated to a point in the
program is called a label and is written as the label name followed by a colon.
2.2. DIRECT ADDRESSING
Now we will rewrite our first program such that the numbers 5, 10, and 15
are stored as memory variables instead of constants and we access them
from there.
Example 2.1
001
; a program to add three numbers using memory variables
002
[org 0x0100]
003
mov  ax, [num1]
; load first number in ax
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
004
mov
bx, [num2]
;
load second number in bx
005
add
ax, bx
;
accumulate sum in ax
006
mov
bx, [num3]
;
load third number in bx
007
add
ax, bx
;
accumulate sum in ax
008
mov
[num4], ax
;
store sum in num4
009
010
mov
ax, 0x4c00
; terminate program
011
int
0x21
012
013
num1:
dw
5
014
num2:
dw
10
015
num3:
dw
15
016
num4:
dw
0
Originate our program at 0100. The first executable instruction
002
should be placed at this offset.
The source operand is changed from constant 5 to [num1]. The
003
bracket is signaling that the operand is placed in memory at address
num1. The value 5 will be loaded in ax even though we did not
specified it in our program code, rather the value will be picked from
memory. The instruction should be read as "read the contents of
memory location num1 in the ax register." The label num1 is a
symbol for us but an address for the processor while the conversion
is done by the assembler.
The label num1 is defined as a word and the assembler is requested
013
to place 5 in that memory location. The colon signals that num1 is a
label and not an instruction.
Using the same process to assemble as discussed before we examine the
listing file generated as a result with comments removed.
1
2
[org 0x0100]
3
00000000
A1[1700]
mov
ax, [num1]
4
00000003
8B1E[1900]
mov
bx, [num2]
5
00000007
01D8
add
ax, bx
6
00000009
8B1E[1B00]
mov
bx, [num3]
7
0000000D
01D8
add
ax, bx
8
0000000F
A3[1D00]
mov
[num4], ax
9
10
00000012 B8004C
mov
ax, 0x4c00
11
00000015 CD21
int
0x21
12
13
00000017
0500
num1:
dw
5
14
00000019
0A00
num2:
dw
10
15
0000001B
0F00
num3:
dw
15
16
0000001D
0000
num4:
dw
0
The first instruction of our program has changed from B80500 to A11700.
The opcode B8 is used to move constants into AX, while the opcode A1 is
used when moving data into AX from memory. The immediate operand to our
new instruction is 1700 or as a word 0017 (23 decimal) and from the bottom
of the listing file we can observe that this is the offset of num1. The
assembler has calculated the offset of num1 and used it to replace references
to num1 in the whole program. Also the value 0500 can be seen at offset
0017 in the file. We can say contents of memory location 0017 are 0005 as a
word. Similarly num2, num3, and num4 are placed at 0019, 001B, and
001D addresses.
When the program is loaded in the debugger, it is loaded at offset 0100,
which displaces all memory accesses in our program. The instruction
A11700 is changed to A11701 meaning that our variable is now placed at
0117 offset. The instruction is shown as mov ax, [0117]. Also the data
window can be used to verify that offset 0117 contains the number 0005.
18
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
Execute the program step by step and examine how the memory is read and
the registers are updated, how the instruction pointer moves forward, and
how the result is saved back in memory. Also observe inside the debugger
code window below the code for termination, that the debugger is
interpreting our data as code and showing it as some meaningless
instructions. This is because the debugger sees everything as code in the
code window and cannot differentiate our declared data from opcodes. It is
our responsibility that we terminate execution before our data is executed as
code.
Also observe that our naming of num1, num2, num3, and num4 is no
longer there inside the debugger. The debugger is only showing the numbers
0117, 0119, 011B, and 011D. Our numerical machine can only work with
numbers. We used symbols for our ease to label or tag certain positions in
our program. The assembler converts these symbols into the appropriate
numbers automatically. Also observe that the effect of "dw" is to place 5 in
two bytes as 0005. Had we used "db" this would have been stored as 05 in
one byte.
Given the fact that the assembler knows only numbers we can write the
same program using a single label. As we know that num2 is two ahead of
num1, we can use num1+2 instead of num2 and let the assembler calculate
the sum during assembly process.
Example 2.2
001
; a program to add
three numbers accessed using a single label
002
[org 0x0100]
003
mov
ax, [num1]
;
load first number in ax
004
mov
bx, [num1+2]
;
load second number in bx
005
add
ax, bx
;
accumulate sum in ax
006
mov
bx, [num1+4]
;
load third number in bx
007
add
ax, bx
;
accumulate sum in ax
008
mov
[num1+6], ax
;
store sum at num1+6
009
010
mov
ax, 0x4c00
; terminate program
011
int
0x21
012
013
num1:
dw
5
014
dw
10
015
dw
15
016
dw
0
The second number is read from num1+2. Similarly the third
004
number is read from num1+4 and the result is accessed at num1+6.
The labels num2, num3, and num4 are removed and the data there
013-016
will be accessed with reference to num1.
Every location is accessed with reference to num1 in this example. The
expression "num1+2" comprises of constants only and can be evaluated at
the time of assembly. There are no variables involved in this expression. As
we open the program inside the debugger we see a verbatim copy of the
previous program. There is no difference at all since the assembler catered
for the differences during assembly. It calculated 0117+2=0119 while in the
previous it directly knew from the value of num2 that it has to write 0119,
but the end result is a ditto copy of the previous execution.
Another way to declare the above data and produce exactly same results is
shown in the following example.
Example 2.3
001
; a program to add three numbers accessed using a single label
002
[org 0x0100]
003
mov  ax, [num1]
; load first number in ax
004
mov  bx, [num1+2]
; load second number in bx
19
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
005
add
ax, bx
;
accumulate sum in ax
006
mov
bx, [num1+4]
;
load third number in bx
007
add
ax, bx
;
accumulate sum in ax
008
mov
[num1+6], ax
;
store sum at num1+6
009
010
mov
ax, 0x4c00
; terminate program
011
int
0x21
012
013
num1:
dw
5, 10, 15, 0
As we do not need to place labels on individual variables we can save
013
space and declare all data on a single line separated by commas.
This declaration will declare four words in consecutive memory
locations while the address of first one is num1.
The method used to access memory in the above examples is called direct
addressing. In direct addressing the memory address is fixed and is given in
the instruction. The actual data used is placed in memory and now that data
can be used as the destination operand as well. Also the source and
destination operands must have the same size. For example a word defined
memory is read in a word sized register. A last observation is that the data
0500 in memory was corrected to 0005 when read in a register. So registers
contain data in proper order as a word.
A last variation using direct addressing shows that we can directly add a
memory variable and a register instead of adding a register into another that
we were doing till now.
Example 2.4
01
; a program to add
three numbers directly in memory
02
[org 0x0100]
03
mov
ax, [num1]
;
load first number in ax
04
mov
[num1+6], ax
;
store first number in result
05
mov
ax, [num1+2]
;
load second number in ax
06
add
[num1+6], ax
;
add second number to result
07
mov
ax, [num1+4]
;
load third number in ax
08
add
[num1+6], ax
;
add third number to result
09
10
mov
ax, 0x4c00
; terminate program
11
int
0x21
12
13
num1:
dw
5, 10, 15, 0
We generate the following listing file as a result of the assembly process
described previously. Comments are again removed.
1
2
[org 0x0100]
3
00000000
A1[1900]
mov
ax, [num1]
4
00000003
A3[1F00]
mov
[num1+6], ax
5
00000006
A1[1B00]
mov
ax, [num1+2]
6
00000009
0106[1F00]
add
[num1+6], ax
7
0000000D
A1[1D00]
mov
ax, [num1+4]
8
00000010
0106[1F00]
add
[num1+6], ax
9
10
00000014 B8004C
mov
ax, 0x4c00
11
00000017 CD21
int
0x21
12
13
00000019 05000A000F000000
num1:
dw
5, 10, 15, 0
The opcode of add is changed because the destination is now a memory
location instead of a register. No other significant change is seen in the
listing file. Inside the debugger we observe that few opcodes are longer now
and the location num1 is now translating to 0119 instead of 0117. This is
done automatically by the assembler as a result of using labels instead of
20
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
hard coding addresses. During execution we observe that the word data as it
is read into a register is read in correct order. The significant change in this
example is that the destination of addition is memory. Method to access
memory is direct addressing, whether it is the MOV instruction or the ADD
instruction.
The first two instructions of the last program read a number into AX and
placed it at another memory location. A quick thought reveals that the
following might be a possible single instruction to replace the couple.
mov
[num1+6], [num1]
; ILLEGAL
However this form is illegal and not allowed on the Intel architecture. None
of the general operations of mov add, sub etc. allow moving data from
memory to memory. Only register to register, register to memory, memory to
register, constant to memory, and constant to register operations are
allowed. The other register to constant, memory to constant, and memory to
memory are all disallowed. Only string instructions allow moving data from
memory to memory and will be discussed in detail later. As a rule one
instruction can have at most one operand in brackets, otherwise assembler
will give an error.
2.3. SIZE MISMATCH ERRORS
If we change the directive in the last example from DW to DB, the program
will still assemble and debug without errors, however the results will not be
the same as expected. When the first operand is read 0A05 will be read in the
register which was actually two operands place in consecutive byte memory
locations. The second number will be read as 000F which is the zero byte of
num4 appended to the 15 of num3. The third number will be junk depending
on the current state of the machine. According to our data declaration the
third number should be at 0114 but it is accessed at 011D calculated with
word offsets. This is a logical error of the program. To keep the declarations
and their access synchronized is the responsibility of the programmer and
not the assembler. The assembler allows the programmer to do everything he
wants to do, and that can possibly run on the processor. The assembler only
keeps us from writing illegal instructions which the processor cannot
execute. This is the difference between a syntax error and a logic error. So
the assembler and debugger have both done what we asked them to do but
the programmer asked them to do the wrong chore.
The programmer is responsible for accessing the data as word if it was
declared as a word and accessing it as a byte if it was declared as a byte. The
word case is shown in lot of previous examples. If however the intent is to
treat it as a byte the following code shows the appropriate way.
Example 2.5
001
; a program to add
three numbers using byte variables
002
[org 0x0100]
003
mov
al, [num1]
;
load first number in al
004
mov
bl, [num1+1]
;
load second number in bl
005
add
al, bl
;
accumulate sum in al
006
mov
bl, [num1+2]
;
load third number in bl
007
add
al, bl
;
accumulate sum in al
008
mov
[num1+3], al
;
store sum at num1+3
009
010
mov
ax, 0x4c00
; terminate program
011
int
0x21
012
013
num1:
db
5, 10, 15, 0
The number is read in AL register which is a byte register since the
003
memory location read is also of byte size.
The second number is now placed at num1+1 instead of num1+2
005
because of byte offsets.
21
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
To declare data db is used instead of dw so that each data declared
013
occupies one byte only.
Inside the debugger we observe that the AL register takes appropriate
values and the sum is calculated and stored in num1+3. This time there is
no alignment or synchronization error. The key thing to understand here is
that the processor does not match defines to accesses. It is the programmer's
responsibility. In general assembly language gives a lot of power to the
programmer but power comes with responsibility. Assembly language
programming is not a difficult task but a responsible one.
In the above examples, the processor knew the size of the data movement
operation from the size of the register involved, for example in "mov ax,
[num1]" memory can be accessed as byte or as word, it has no hard and fast
size, but the AX register tells that this operation has to be a word operation.
Similarly in "mov al, [num1]" the AL register tells that this operation has to
be a byte operation. However in "mov ax, bl" the AX register tells that the
operation has to be a word operation while BL tells that this has to be a byte
operation. The assembler will declare that this is an illegal instruction. A 5Kg
bag cannot fit inside a 1Kg bag and according to Intel a 1Kg cannot also fit in
a 5Kg bag. They must match in size. The instruction "mov [num1], [num2]" is
illegal as previously discussed not because of data movement size but
because memory to memory moves are not allowed at all.
The instruction "mov [num1], 5" is legal but there is no way for the
processor to know the data movement size in this operation. The variable
num1 can be treated as a byte or as a word and similarly 5 can be treated as
a byte or as a word. Such instructions are declared ambiguous by the
assembler. The assembler has no way to guess the intent of the programmer
as it previously did using the size of the register involved but there is no
register involved this time. And memory is a linear array and label is an
address in it. There is no size associated with a label. Therefore to resolve its
ambiguity we clearly tell our intent to the assembler in one of the following
ways.
mov
byte [num1], 5
mov
word [num1], 5
2.4. REGISTER INDIRECT ADDRESSING
We have done very elementary data access till now. Assume that the
numbers we had were 100 and not just three. This way of adding them will
cost us 200 instructions. There must be some method to do a task repeatedly
on data placed in consecutive memory cells. The key to this is the need for
some register that can hold the address of data. So that we can change the
address to access some other cell of memory using the same instruction. In
direct addressing mode the memory cell accessed was fixed inside the
instruction. There is another method in which the address can be placed in a
register so that it can be changed. For the following example we will take 10
instead of 100 numbers but the algorithm is extensible to any size.
There are four registers in iAPX88 architecture that can hold address of
data and they are BX, BP, SI, and DI. There are minute differences in their
working which will be discussed later. For the current example, we will use
the BX register and we will take just three numbers and extend the concept
with more numbers in later examples.
Example 2.6
001
; a program to add
three numbers using indirect addressing
002
[org 0x100]
003
mov
bx, num1
; point bx to first number
004
mov
ax, [bx]
; load first number in ax
005
add
bx, 2
; advance bx to second number
22
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
006
add
ax, [bx]
;
add second number to ax
007
add
bx, 2
;
advance bx to third number
008
add
ax, [bx]
;
add third number to ax
009
add
bx, 2
;
advance bx to result
010
mov
[bx], ax
;
store sum at num1+6
011
012
mov
ax, 0x4c00
; terminate program
013
int
0x21
014
015
num1:
dw
5, 10, 15, 0
Observe that no square brackets around num1 are used this time.
003
The address is loaded in bx and not the contents. Value of num1 is
0005 and the address is 0117. So BX will now contain 0117.
Brackets are now used around BX. In iapx88 architecture brackets
004
can be used around BX, BP, SI, and DI only. In iapx386 more
registers are allowed. The instruction will be read as "move into ax
the contents of the memory location whose address is in bx." Now
since bx contains the address of num1 the contents of num1 are
transferred to the ax register. Without square brackets the meaning
of the instruction would have been totally different.
This instruction is changing the address. Since we have words not
005
bytes, we add two to bx so that it points to the next word in memory.
BX now contains 0119 the address of the second word in memory.
This was the mechanism to change addresses that we needed.
Inside the debugger we observe that the first instruction is "mov bx, 011C."
A constant is moved into BX. This is because we did not use the square
brackets around "num1." The address of "num1" has moved to 011C because
the code size has changed due to changed instructions. In the second
instruction BX points to 011C and the value read in AX is 0005 which can be
verified from the data window. After the addition BX points to 011E
containing 000A, our next word, and so on. This way the BX register points
to our words one after another and we can add them using the same
instruction "mov ax, [bx]" without fixing the address of our data in the
instructions. We can also subtract from BX to point to previous cells. The
address to be accessed is now in total program control.
One thing that we needed in our problem to add hundred numbers was the
capability to change address. The second thing we need is a way to repeat
the same instruction and a way to know that the repetition is done a 100
times, a terminal condition for the repetition. For the task we are introducing
two new instructions that you should read and understand as simple English
language concepts. For simplicity only 10 numbers are added in this
example. The algorithm is extensible to any size.
Example 2.7
001
; a program to add
ten numbers
002
[org 0x0100]
003
mov
bx, num1
; point bx to first number
004
mov
cx, 10
; load count of numbers in cx
005
mov
ax, 0
; initialize sum to zero
006
007
l1:
add
ax, [bx]
;
add number
to ax
008
add
bx, 2
;
advance bx
to next number
009
sub
cx, 1
;
numbers to
be added reduced
010
jnz
l1
;
if numbers
remain add next
011
012
mov
[total], ax
; write back sum in memory
013
014
mov
ax, 0x4c00
; terminate program
015
int
0x21
016
017
num1:
dw
10, 20, 30, 40, 50, 10, 20, 30, 40, 50
23
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
018
total:
dw
0
Labels can be used on code as well. Just like data labels they
006
remember the address at which they are used. The assembler does
not  differentiate  between  code  labels  and  data  labels.  The
programmer is responsible for using a data label as data and a code
label as code. The label l1 in this case is the address of the following
instruction.
SUB is the counterpart to ADD with the same rules as that of the
009
ADD instruction.
JNZ stands for "jump if not zero." NZ is the condition in this
010
instruction. So the instruction is read as "jump to the location l1 if
the zero flag is not set." And revisiting the zero flag definition "the
zero flag is set if the last mathematical or logical operation has
produced a zero in its destination." For example "mov ax, 0" will not
set the zero flag as it is not a mathematical or logical instruction.
However subtraction and addition will set it. Also it is set even when
the destination is not a register. Now consider the subtraction
immediately preceding it. If the CX register becomes zero as a result
of this subtraction the zero flag will be set and the jump will be
taken. And jump to l1, the processor needs to be told each and
everything and the destination is an important part of every jump.
Just like when we ask someone to go, we mention go to this market
or that house. The processor is much more logical than us and
needs the destination in every instruction that asks it to go
somewhere. The processor will load l1 in the IP register and resume
execution from there. The processor will blindly go to the label we
mention even if it contains data and not code.
The CX register is used as a counter in this example, BX contains the
changing address, while AX accumulates the result. We have formed a loop
in assembly language that executes until its condition remains true. Inside
the debugger we can observe that the subtract instruction clears the zero flag
the first nine times and sets it on the tenth time. While the jump instruction
moves execution to address l1 the first nine times and to the following line
the tenth time. The jump instruction breaks program flow.
The JNZ instruction is from the program control group and is a conditional
jump, meaning that if the condition NZ is true (ZF=0) it will jump to the
address mentioned and otherwise it will progress to the next instruction. It is
a selection between two paths. If the condition is true go right and otherwise
go left. Or we can say if the weather is hot, go this way, and if it is cold, go
this way. Conditional jump is the most important instruction, as it gives the
processor decision making capability, so it must be given a careful thought.
Some processors call it branch, probably a more logical name for it, however
the functionality is same. Intel chose to name it "jump."
An important thing in the above example is that a register is used to
reference memory so this form of access is called register indirect memory
access. We used the BX register for it and the B in BX and BP stands for
base therefore we call register indirect memory access using BX or BP,
"based addressing." Similarly when SI or DI is used we name the method
"indexed addressing." They have the same functionality, with minor
differences because of which the two are called base and index. The
differences will be explained later, however for the above example SI or DI
could be used as well, but we would name it indexed addressing instead of
based addressing.
24
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
2.5. REGISTER + OFFSET ADDRESSING
Direct addressing and indirect addressing using a single register are two
basic forms of memory access. Another possibility is to use different
combinations of direct and indirect references. In the above example we used
BX to access different array elements which were placed consecutively in
memory like an array. We can also place in BX only the array index and not
the exact address and form the exact address when we are going to access
the actual memory. This way the same register can be used for accessing
different arrays and also the register can be used for index comparison like
the following example does.
Example 2.8
001
; a program to add
ten numbers using register + offset addressing
002
[org 0x0100]
003
mov
bx, 0
; initialize array index to zero
004
mov
cx, 10
; load count of numbers in cx
005
mov
ax, 0
; initialize sum to zero
006
007
l1:
add
ax, [num1+bx]
;
add number
to ax
008
add
bx, 2
;
advance bx
to next index
009
sub
cx, 1
;
numbers to
be added reduced
010
jnz
l1
;
if numbers
remain add next
011
012
mov
[total], ax
; write back sum in memory
013
014
mov
ax, 0x4c00
; terminate program
015
int
0x21
016
017
num1:
dw
10, 20, 30, 40, 50, 10, 20, 30, 40, 50
018
total:
dw
0
This time BX is initialized to zero instead of array base
003
The format of memory access has changed. The array base is added
007
to BX containing array index at the time of memory access.
As the array is of words, BX jumps in steps of two, i.e. 0, 2, 4.
008
Higher level languages do appropriate incrementing themselves and
we always use sequential array indexes. However in assembly
language we always calculate in bytes and therefore we need to take
care of the size of one array element which in this case is two.
Inside the debugger we observe that the memory access instruction is
shown as "mov ax, [011F+bx]" and the actual memory accessed is the one
whose address is the sum of 011F and the value contained in the BX
register. This form of access is of the register indirect family and is called
base + offset or index + offset depending on whether BX or BP is used or SI
or DI is used.
2.6. SEGMENT ASSOCIATION
All the addressing mechanisms in iAPX88 return a number called effective
address. For example in base + offset addressing, neither the base nor the
offset alone tells the desired cell in memory to be accessed. It is only after the
addition is done that the processor knows which cell to be accessed. This
number which came as the result of addition is called the effective address.
But the effective address is just an offset and is meaningless without a
segment. Only after the segment is known, we can form the physical address
that is needed to access a memory cell.
We discussed the segmented memory model of iAPX88 in reasonable detail
at the end of previous chapter. However during the discussion of addressing
modes we have not seen the effect of segments. Segmentation is there and
it's all happening relative to a segment base. We saw DS, CS, SS, and ES
25
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
inside the debugger. Everything is relative to its segment base, even though
we have not explicitly explained its functionality. An offset alone is not
complete without a segment. As previously discussed there is a default
segment associated to every register which accesses memory. For example
CS is associated to IP by default; rather it is tied with it. It cannot access
memory in any other segment.
In case of data, there is a bit relaxation and nothing is tied. Rather there is
a default association which can be overridden. In the case of register indirect
memory access, if the register used is one of SI, DI, or BX the default
segment is DS. If however the register used in BP the default segment used is
SS. The stack segment has a very critical and fine use and there is a reason
why BP is attached to SS by default. However these will be discussed in
detail in the chapter on stack. IP is tied to CS while SP is tied to SS. The
association of these registers cannot be changed; they are locked with no
option. Others are not locked and can be changed.
To override the association for one instruction of one of the registers BX,
BP, SI or DI, we use the segment override prefix. For example "mov ax,
[cs:bx]" associates BX with CS for this one instruction. For the next
instruction the default association will come back to act. The processor
places a special byte before the instruction called a prefix, just like prefixes
and suffixes in English language. No prefix is needed or placed for default
association. For example for CS the byte 2E is placed and for ES the byte 26
is placed. Opcode has not changed, but the prefix byte has modified the
default association to association with the desired segment register for this
one instruction.
In all our examples, we never declared a segment or used it explicitly, but
everything seemed to work fine. The important thing to note is that CS, DS,
SS, and ES all had the same value. The value itself is not important but the
fact that all had the same value is important. All four segment windows
exactly overlap. Whatever segment register we use the same physical memory
will be accessed. That is why everything was working without the mention of
a single segment register. This is the formation of COM files in IBM PC. A
single segment contains code, data, and the stack. This format is operating
system dependant, in our case defined by DOS. And our operating system
defines the format of COM files such that all segments have the same value.
Thus the only meaningful thing that remains is the offset.
For example if BX=0100, SI=0200, and CS=1000 and the memory access
under consideration is [cs:bx+si+0x0700], the effective address formed is
bx+si+0700 = 0100 + 0200 + 0700 = 0A00. Now multiplying the segment
value by 16 makes it 10000 and adding the effective address 00A00 forms
the physical address 10A00.
2.7. ADDRESS WRAPAROUND
There are two types of wraparounds. One is within a single segment and
the other is inside the whole physical memory. Segment wraparound occurs
when during the effective address calculation a carry is generated. This carry
is dropped giving the effect that when we try to access beyond the segment
limit, we are actually wrapped around to the first cell in the segment. For
example if BX=9100, DS=1500 and the access is [bx+0x7000] we form the
effective address 9100 + 7000 = 10100. The carry generated is dropped
forming the actual effective address of 0100. Just like a circle when we
reached the end we started again from the beginning. An arc at 370 degrees
is the same as an arc at 10 degrees. We tried to cross the segment boundary
and it pushed us back to the start. This is called segment wraparound. The
physical address in the above example will be 15100.
The same can also happen at the time of physical address calculation. For
example BX=0100, DS=FFF0 and the access under consideration is
[bx+0x0100]. The effective address will be 0200 and the physical address will
26
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
be 100100. This is a 21bit answer and cannot be sent on the address bus
which is 20 bits wide. The carry is dropped and just like the segment
wraparound our physical memory has wrapped around at its very top. When
we tried to access beyond limits the actual access is made at the very start.
This second wraparound is a bit different in newer processor with more
address lines but that will be explained in later chapters.
2.8. ADDRESSING MODES SUMMARY
The iAPX88 processor supports seven modes of memory access. Remember
that immediate is not an addressing mode but an operand type. Operands
can be immediate, register, or memory. If the operand is memory one of the
seven addressing modes will be used to access it. The memory access
mechanisms can also be written in the general form "base + index + offset"
and we can define the possible addressing modes by saying that any one,
two, or none can be skipped from the general form to form a legal memory
access.
There are a few common mistakes done in forming a valid memory access.
Part of a register cannot be used to access memory. Like BX is allowed to
hold an address but BL or BH are not. Address is 16bit and must be
contained in a 16bit register. BX-SI is not possible. The only thing that we
can do is addition of a base register with an index register. Any other
operation is disallowed. BS+BP and SI+DI are both disallowed as we cannot
have two base or two index registers in one memory access. One has to be a
base register and the other has to be an index register and that is the reason
of naming them differently.
Direct
A fixed offset is given in brackets and the memory at that offset is
accessed. For example "mov [1234], ax" stores the contents of the AX
registers in two bytes starting at address 1234 in the current data segment.
The instruction "mov [1234], al" stores the contents of the AL register in the
byte at offset 1234.
Based Register Indirect
A base register is used in brackets and the actual address accessed
depends on the value contained in that register. For example "mov [bx], ax"
moves the two byte contents of the AX register to the address contained in
the BX register in the current data segment. The instruction "mov [bp], al"
moves the one byte content of the AL register to the address contained in the
BP register in the current stack segment.
Indexed Register Indirect
An index register is used in brackets and the actual address accessed
depends on the value contained in that register. For example "mov [si], ax"
moves the contents of the AX register to the word starting at address
contained in SI in the current data segment. The instruction "mov [di], ax"
moves the word contained in AX to the offset stored in DI in the current data
segment.
Based Register Indirect + Offset
A base register is used with a constant offset in this addressing mode. The
value contained in the base register is added with the constant offset to get
the effective address. For example "mov [bx+300], ax" stores the word
contained in AX at the offset attained by adding 300 to BX in the current
data segment. The instruction "mov [bp+300], ax" stores the word in AX to
the offset attained by adding 300 to BP in the current stack segment.
27
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
Indexed Register Indirect + Offset
An index register is used with a constant offset in this addressing mode.
The value contained in the index register is added with the constant offset to
get the effective address. For example "mov [si+300], ax" moves the word
contained in AX to the offset attained by adding 300 to SI in the current data
segment and the instruction "mov [di+300], al" moves the byte contained in
AL to the offset attained by adding 300 to DI in the current data segment.
Base + Index
One base and one index register is used in this addressing mode. The
value of the base register and the index register are added together to get the
effective address. For example "mov [bx+si], ax" moves the word contained in
the AX register to offset attained by adding BX and SI in the current data
segment. The instruction "mov [bp+di], al" moves the byte contained in AL to
the offset attained by adding BP and DI in the current stack segment.
Observe that the default segment is based on the base register and not on
the index register. This is why base registers and index registers are named
separately. Other examples are "mov [bx+di], ax" and "mov [bp+si], ax." This
method can be used to access a two dimensional array such that one
dimension is in a base register and the other is in an index register.
Base + Index + Offset
This is the most complex addressing method and is relatively infrequently
used. A base register, an index register, and a constant offset are all used in
this addressing mode. The values of the base register, the index register, and
the constant offset are all added together to get the effective address. For
example "mov [bx+si+300], ax" moves the word contents of the AX register to
the word in memory starting at offset attained by adding BX, SI, and 300 in
the current data segment. Default segment association is again based on the
base register. It might be used with the array base of a two dimensional array
as the constant offset, one dimension in the base register and the other in
the index register. This way all calculation of location of the desired element
has been delegated to the processor.
EXERCISES
1.
What is a label and how does the assembler differentiates between
code labels and data labels?
2.
List the seven addressing modes available in the 8088 architecture.
3.
Differentiate between effective address and physical address.
4.
What is the effective address generated by the following
instructions? Every instruction is independent of others. Initially
BX=0x0100, num1=0x1001, [num1]=0x0000, and SI=0x0100
a. mov ax, [bx+12]
b. mov ax, [bx+num1]
c. mov ax, [num1+bx]
d. mov ax, [bx+si]
5.
What is the effective address generated by the following
combinations if they are valid. If not give reason. Initially
BX=0x0100, SI=0x0010, DI=0x0001, BP=0x0200, and SP=0xFFFF
a. bx-si
b. bx-bp
c. bx+10
d. bx-10
e. bx+sp
f.  bx+di
6.
Identify the problems in the following instructions and correct them
by replacing them with one or two instruction having the same
effect.
28
img
Computer Architecture & Assembly Language Programming
Course Code: CS401
CS401@vu.edu.pk
a. mov
[02], [ 22]
b. mov
[wordvar], 20
c. mov
bx, al
d. mov
ax, [si+di+100]
7.
What is the function of segment override prefix and what
changes it brings to the opcode?
8.
What are the two types of address wraparound? What
physical address is accessed with [BX+SI] if FFFF is loaded in
BX, SI, and DS.
9.
Write instructions to do the following.
a. Copy contents of memory location with offset 0025 in the
current data segment into AX.
b. Copy AX into memory location with offset 0FFF in the
current data segment.
c. Move contents of memory location with offset 0010 to
memory location with offset 002F in the current data
segment.
10. Write a program to calculate the square of 20 by using a loop
that adds 20 to the accumulator 20 times.
29