Chapter 4
Requirements for Coding
in Assembly Language
This chapter explains the basic requirements for developing an assembly program:
1. Use of comments
2. General coding format
3. The directives for printing a program listing
4. Directive for defining segments and procedures
Assemblers and Compilers
High level versus low level
Advantages
1. Provides more control over handling particular hardware requirements
2. Generates smaller, more compact executable modules
3. More likely results in faster execution
A common practice is to combine the benefits of both programming levels:
Code the bulk of a project in a high-level language, and code critical modules in assembly language.
Assembly Language Comments
A comment begins with a semicolon (;)
In this book, assembly instructions are in uppercase letters and comments are in lowercase, only as in convention and to make the programs more readable. Technically, you can freely use upper or lowercase characters for instructions and comments.
Reserved Words
Certain words in assembly language are reserved for its own purposes.
Examples such as MOV, ADD, END, SEGMENT, FAR, SIZE, @Data, and @Model.
Using a reserved word for a wrong purpose causes the assembler to generate an error message. Appendix D list the reserved words.
Identifiers
An identifier is a name that you apply to items in your program. The two types of identifier are name which refer to the address of a data item, and label, which refers to the address of an instruction. An identifier can use the following characters:
Alphabetic letters: A-Z and a-z
Digits: 0-9 (may not be the first character)
Special characters: question mark (?)
underline (_)
dollar ($)
at (@)
period(.) (May not be the first character)
The first character of an identifier must be an alphabetic letter or a special character, except for the period. Since the assembler uses some special words that being with the @ symbol, you should avoid using it for your own definitions. The assembler is not case sensitive. The maximum length of an identifier is 31 characters (247 since MASM 6.0).
Statements
An assembly language program consists of a set of statements. The two types of statements are:
1. instructions such as MOV and ADD, which the assembler translates to object code; and
2. directives, which tell the assembler to perform a specific action, such as define a data item.
General format for a statement:
| [identifier] | operation | [operand(s)] | [;comment] |
An identifier (if any), operation, and operand (if any) are separated by at least one blank or tab character. There is a maximum of 132 characters on a line (512 since MASM 6.0).
Example:
Directive: COUNT DB 1 ;name, operation, operand
Instruction: MOV AX,0 ;operation, two operands
Identifiers
The term name applies to the name of a defined item or directive and label applies to the name of an instruction.
Operation
The operation, which must be coded, is most commonly used for defining data areas and coding instruction. For a data item, an operation such as DB or DW defines a field, work area, or constant. For an instruction, an operation such as MOV or ADD indicates an action to perform.
Operand
The operand (if any) provides information for the operation to act on.
For a data item, the operand defines its initial value.
Example:
COUNTER DB 0
For an instruction, an operand indicates where to perform the action. An instruction may have no, one, two or three operands.
Examples:
No operands: RET
One operand: MUL 10
Two operands: MOV CX,10
Three operands: SHRD ECX,EBX,CL
Directives
Assembly language supports a number of statements that enable you to control the way in which a program assembles and lists. These statements act during the assembling of the program and generate no executable code.
Chapter 25 covers all of the directives in detail.
Listing Directives: PAGE and TITLE
PAGE The Page directive at the start of a program designates the maximum number of lines to list on a page and the minimum of characters on a line.
PAGE [length] [,width] {default is PAGE 50,80.
TITLE The TITLE directive causes a title for a program to print on line 2 of each page of the program listing. You may code it once at the start of the program. Syntax:
TITLE text
Segment Directive
An assembly program in .EXE format consists of one or more segments. A stack segment defines stack storage, a data segment defines data items, and a code segment provides for executable code. The directive for defining a segment, SEGMENT and ENDS, have the following format:
name SEGMENT [options] ;begin segment
name ENDS ;end segment
The SEGMENT statement defines the start of a segment. The segment name must be present, must be unique, and must follow the naming conventions of the language. The ENDS statement indicates the end of the segment and contains the same name as the SEGMENT statement. The maximum size of a segment is 64K.
The operand of a SEGMENT statement may contain three types of options:
alignment, combine, and class
Format:
| segment-name | SEGMENT | [align] | [combine] | ['class'] |
... ;segment code goes here
| segment-name | ENDS |
Alignment type. The align entry indicates the boundary on which the segment is to begin. PARA is typically used and is the default.
Others are BYTE, WORD, DWORD, PAGE (divisible bye 256).
Combine type. The combine entry indicates whether to combine the segment with other segments when they are linked after assembly. Combine types are STACK, COMMON, PUBLIC, and AT expression. (Explained later in chapter 5)
Example
name SEGMENT PARA STACK
You may use PUBLIC and COMMON where you intend to combine separately assembled programs when linking. When a program is not to be linked with others this option may be omitted or code NONE.
Class type. The class entry is enclosed in apostrophes, is used to group related segments when linking. This book uses ‘code’ for the code segment, ‘data’ for the data segment, and ‘stack’ for the stack segment.
Example
name SEGMENT PARA STACK ‘Stack’
PROC Directive
The code segment contains the executable code for a program. It also contains one or more procedures, defined with the PROC directive.
Example
segname SEGMENT PARA
procname PROC FAR
procname ENDP
segname ENDS
The procedure name must be present, must be unique, and must following naming conventions for the language. The FAR operand indicates the entry point for program execution.
The ENDP indicates the end of a procedure and contains the same name as the PROC which enables the assembler to relate the two.
The code segment may contain any number of procedures uses as subroutines, each with its own set of PROC and ENDP statements. Each of these is usually coded with NEAR operand (this is the default).
ASSUME Directive
This statement associates the name of a segment with the segment register.
Syntax:
ASSUME SS:stackname,DS:datasegname,CS:codesegname...
Note that it may contain reference for the ES. You can code ES:NOTHING or simply leave it out.
Like other directives, ASSUME is just a message to help the assembler convert symbolic code to machine code.
Note you still may have to code instructions to load addresses into the segment register during execution.
END Directive
ENDS ends a segment, ENDP ends a procedure. An END directive ends an entire program.
Syntax:
END [procname]
The operand may be blank if the program is not to execute.
In most programs, the operand contains the name of the first or only PROC designated as FAR, where program execution is to begin.
Ending Program Execution
INT 21H is a common DOS interrupt operation that uses a function code in the AH register to specify an action to be performed. The 4CH is a request to end program execution.
Initializing For Protected Mode
In protected mode under the 80386 and later processors, a program may address up to 4 gigabytes of memory. A skeleton model of a program is:
.386 or .486 ;processor directive first
.MODEL FLAT, STDCALL
.STACK
.DATA
.CODE
END
Coding the processor directive before the .MODEl statement causes the assembler to assume 32-bit addressing. The enter STDCALL tells the assembler to use standard conventions for names and procedure calls. The processor operates more efficiently because it does not have to convert segment:offsets to actual address.
The use of DWORD to align segments on a doubleword address speeds up accessing memory for 32-bit data buses.
The .386 directive tells the assembler to accept instructions that are unique to these processors; the USE32 use type tells the assembler to generate code appropriate to 32-bit protected mode;
.386
segname SEGMENT DWORD USE32
Note: On these processors the DS register is still 16 bits in size.
MOV EAX,DATASEG ;get address of data segment
MOV DX,AX ;load 16-bit portion
Simplified Segment Directives
Both Microsoft and Borland assemblers provide some shortcuts in defining segments. You can initialize the memory model before defining any segment.
The general format is
.MODEL memory-model
The memory model may be TINY, SMALL, MEDIUM, COMPACT, or LARGE.
|
MODEL |
NUMBER OF CODE SEGMENTS |
NUMBER OF DATA SEGMENTS |
|
TINY |
* |
* |
|
SMALL |
1 |
1 |
|
MEDIUM |
MORE THAN 1 |
1 |
|
COMPACT |
1 |
MORE THAN 1 |
|
LARGE |
MORE THAN 1 |
MORE THAN 1 |
You may use any of these models for a stand-alone program (one not linked to another program). The TINY is intended for the exclusive use of .COM programs. The SMALL model requires that code fits within a 64K segment and data fit within another 64K segment.
The general formats for the directives that define the stack, data, and code segments are:
.STACK [size]
.DATA
.CODE [name]
Each of these directives causes the assembler to generate the required SEGMENT statement and its matching ENDS. The default segment names are STACK, _DATA, and _TEXT. The default code segment size is 1024 bytes which you may override.
Data Definition
A data item may contain an undefined value, or a constant, or a character string, or a numeric value.
Syntax:
[name] Dn expression
Name. a program that reference a data item does so by means of a name.
Directive. The directive that define data items are DB, DW, DD, DF, DQ, AND DT.
Expression. The expression in an operand my contain a question mark to indicate an uninitialized item. It may contain a constant. It may be a multiple constant values separated by commas. It also allows for repeated duplications of the same value:
syntax: [name] Dn repeat-count DUP(expression)...
Example: DW 10 DUP(?)
DB 3 DUP(4 DUP(8))
It may be a character string.
Example: DB ‘Character string’
Note single or double quotes are allowed.
Numeric constants can be in decimal, hexadecimal, binary, or real.
DB defines a byte
DW defines a word (don’t use to define strings because of storing conventions)
DD defines a doubleword, 4-bytes.
DF defines a farword, 6-bytes
DQ defines a quadword, 8-bytes
DT defines 10-btyes.(use to relate packed BCD; stores numbers in decimal value intead of hexidecimal value)
EQUATE
DIRECTIVES
Equal-Sign Directive
The Equal-Sign Directive directive enbles you to asign the value of an expression to a name, and may do so any number of times in a program. Example:
VALUE_OF_PI = 3.1416
RIGHT_COL = 79
SCREEN_POSITION = 80*25
The EQU Directive
The EQU directive defines a value that the assembler can use to substitute in other instruction. It may assign a value to an item only once in a program.
Example: TIMES EQU 10
FIELDA DB TIMES DUP(?)
COUNTR EQU 05
MOV CX,CONTR
The TEXTEQU Directive (MASM 6.0)
TEXTEQU for text data with the format;
name TEXTEQU <text>