Chapter 4

Requirements for Coding

in Assembly Language


This chapter explains the basic requirements for developing an assembly program:

1. Use of comments

2. General coding format

3. The directives for printing a program listing

4. Directive for defining segments and procedures


Assemblers and Compilers


High level versus low level


Advantages

1. Provides more control over handling particular hardware requirements

2. Generates smaller, more compact executable modules

3. More likely results in faster execution


A common practice is to combine the benefits of both programming levels:

Code the bulk of a project in a high-level language, and code critical modules in assembly language.


Assembly Language Comments


A comment begins with a semicolon (;)


In this book, assembly instructions are in uppercase letters and comments are in lowercase, only as in convention and to make the programs more readable. Technically, you can freely use upper or lowercase characters for instructions and comments.


Reserved Words


Certain words in assembly language are reserved for its own purposes.

Examples such as MOV, ADD, END, SEGMENT, FAR, SIZE, @Data, and @Model.


Using a reserved word for a wrong purpose causes the assembler to generate an error message. Appendix D list the reserved words.


Identifiers


An identifier is a name that you apply to items in your program. The two types of identifier are name which refer to the address of a data item, and label, which refers to the address of an instruction. An identifier can use the following characters:

Alphabetic letters: A-Z and a-z

Digits: 0-9 (may not be the first character)

Special characters: question mark (?)

                                                underline (_) 

                                                dollar ($)

                                                at (@)

                                                period(.) (May not be the first character)


The first character of an identifier must be an alphabetic letter or a special character, except for the period. Since the assembler uses some special words that being with the @ symbol, you should avoid using it for your own definitions. The assembler is not case sensitive. The maximum length of an identifier is 31 characters (247 since MASM 6.0).

Statements


An assembly language program consists of a set of statements. The two types of statements are:

 

1. instructions such as MOV and ADD, which the assembler translates to object code; and

2. directives, which tell the assembler to perform a specific action, such as define a data item.

 


General format for a statement:

 

[identifier] operation [operand(s)] [;comment]


An identifier (if any), operation, and operand (if any) are separated by at least one blank or tab character. There is a maximum of 132 characters on a line (512 since MASM 6.0).


Example:


Directive: COUNT     DB      1          ;name, operation, operand


Instruction:                 MOV  AX,0   ;operation, two operands


Identifiers


The term name applies to the name of a defined item or directive and label applies to the name of an instruction.


Operation


The operation, which must be coded, is most commonly used for defining data areas and coding instruction. For a data item, an operation such as DB or DW defines a field, work area, or constant. For an instruction, an operation such as MOV or ADD indicates an action to perform.


Operand


The operand (if any) provides information for the operation to act on.

For a data item, the operand defines its initial value.

Example:

            COUNTER     DB      0


For an instruction, an operand indicates where to perform the action. An instruction may have no, one, two or three operands.

Examples:

No operands:	RET
One operand:	MUL 10
Two operands:	MOV CX,10
Three operands:	SHRD ECX,EBX,CL

 


Directives


Assembly language supports a number of statements that enable you to control the way in which a program assembles and lists. These statements act during the assembling of the program and generate no executable code.

Chapter 25 covers all of the directives in detail.


Listing Directives: PAGE and TITLE


PAGE The Page directive at the start of a program designates the maximum number of lines to list on a page and the minimum of characters on a line.

PAGE [length] [,width] {default is PAGE 50,80.


TITLE The TITLE directive causes a title for a program to print on line 2 of each page of the program listing. You may code it once at the start of the program. Syntax:

TITLE text


Segment Directive


An assembly program in .EXE format consists of one or more segments. A stack segment defines stack storage, a data segment defines data items, and a code segment provides for executable code. The directive for defining a segment, SEGMENT and ENDS, have the following format:


name               SEGMENT     [options]                     ;begin segment



name               ENDS                                                 ;end segment


The SEGMENT statement defines the start of a segment. The segment name must be present, must be unique, and must follow the naming conventions of the language. The ENDS statement indicates the end of the segment and contains the same name as the SEGMENT statement. The maximum size of a segment is 64K.


The operand of a SEGMENT statement may contain three types of options:

alignment, combine, and class

Format:

 segment-name SEGMENT [align] [combine] ['class']

 

                        ... ;segment code goes here

 

segment-name ENDS

 


Alignment type. The align entry indicates the boundary on which the segment is to begin. PARA is typically used and is the default.

Others are BYTE, WORD, DWORD, PAGE (divisible bye 256).


Combine type. The combine entry indicates whether to combine the segment with other segments when they are linked after assembly. Combine types are STACK, COMMON, PUBLIC, and AT expression. (Explained later in chapter 5)

Example

                        name   SEGMENT     PARA STACK


You may use PUBLIC and COMMON where you intend to combine separately assembled programs when linking. When a program is not to be linked with others this option may be omitted or code NONE.


Class type. The class entry is enclosed in apostrophes, is used to group related segments when linking. This book uses ‘code’ for the code segment, ‘data’ for the data segment, and ‘stack’ for the stack segment.

Example

                        name   SEGMENT     PARA STACK          ‘Stack’

 

 

PROC Directive


The code segment contains the executable code for a program. It also contains one or more procedures, defined with the PROC directive.

Example

                        segname          SEGMENT     PARA

                        procname        PROC             FAR




                        procname        ENDP

                        segname          ENDS


The procedure name must be present, must be unique, and must following naming conventions for the language. The FAR operand indicates the entry point for program execution.


The ENDP indicates the end of a procedure and contains the same name as the PROC which enables the assembler to relate the two.


The code segment may contain any number of procedures uses as subroutines, each with its own set of PROC and ENDP statements. Each of these is usually coded with NEAR operand (this is the default).


ASSUME Directive

This statement associates the name of a segment with the segment register.

Syntax:

            ASSUME       SS:stackname,DS:datasegname,CS:codesegname...


Note that it may contain reference for the ES. You can code ES:NOTHING or simply leave it out.

Like other directives, ASSUME is just a message to help the assembler convert symbolic code to machine code.

Note you still may have to code instructions to load addresses into the segment register during execution.


END Directive

ENDS ends a segment, ENDP ends a procedure. An END directive ends an entire program.

Syntax:

                        END   [procname]


The operand may be blank if the program is not to execute.

In most programs, the operand contains the name of the first or only PROC designated as FAR, where program execution is to begin.



Ending Program Execution


INT 21H is a common DOS interrupt operation that uses a function code in the AH register to specify an action to be performed. The 4CH is a request to end program execution.


Initializing For Protected Mode


In protected mode under the 80386 and later processors, a program may address up to 4 gigabytes of memory. A skeleton model of a program is:

 

.386 or .486 ;processor directive first

.MODEL FLAT, STDCALL

.STACK

.DATA

.CODE

END

 

 

Coding the processor directive before the .MODEl statement causes the assembler to assume 32-bit addressing. The enter STDCALL tells the assembler to use standard conventions for names and procedure calls. The processor operates more efficiently because it does not have to convert segment:offsets to actual address.

 

The use of DWORD to align segments on a doubleword address speeds up accessing memory for 32-bit data buses.

The .386 directive tells the assembler to accept instructions that are unique to these processors; the USE32 use type tells the assembler to generate code appropriate to 32-bit protected mode;

            .386

            segname          SEGMENT DWORD USE32


Note: On these processors the DS register is still 16 bits in size.

            MOV  EAX,DATASEG        ;get address of data segment

            MOV  DX,AX                       ;load 16-bit portion



Simplified Segment Directives


Both Microsoft and Borland assemblers provide some shortcuts in defining segments. You can initialize the memory model before defining any segment.

The general format is

                        .MODEL        memory-model


The memory model may be TINY, SMALL, MEDIUM, COMPACT, or LARGE.


MODEL

NUMBER OF CODE SEGMENTS

NUMBER OF DATA SEGMENTS

TINY

*

*

SMALL

1

1

MEDIUM

MORE THAN 1

1

COMPACT

1

MORE THAN 1

LARGE

MORE THAN 1

MORE THAN 1



You may use any of these models for a stand-alone program (one not linked to another program). The TINY is intended for the exclusive use of .COM programs. The SMALL model requires that code fits within a 64K segment and data fit within another 64K segment.



The general formats for the directives that define the stack, data, and code segments are:

                        .STACK         [size]

                        .DATA

                        .CODE            [name]


Each of these directives causes the assembler to generate the required SEGMENT statement and its matching ENDS. The default segment names are STACK, _DATA, and _TEXT. The default code segment size is 1024 bytes which you may override.


Data Definition


A data item may contain an undefined value, or a constant, or a character string, or a numeric value.


Syntax:

                                    [name]            Dn       expression

Name. a program that reference a data item does so by means of a name.

Directive. The directive that define data items are DB, DW, DD, DF, DQ, AND DT.

Expression. The expression in an operand my contain a question mark to indicate an uninitialized item. It may contain a constant. It may be a multiple constant values separated by commas. It also allows for repeated duplications of the same value:

syntax:            [name]            Dn       repeat-count    DUP(expression)...

Example:                                DW     10        DUP(?)

                                                DB      3          DUP(4 DUP(8))


It may be a character string. 

Example:                                DB      ‘Character string’

Note single or double quotes are allowed.


Numeric constants can be in decimal, hexadecimal, binary, or real.

DB defines a byte

DW defines a word (don’t use to define strings because of storing conventions)

DD defines a doubleword, 4-bytes.

DF defines a farword, 6-bytes

DQ defines a quadword, 8-bytes

DT defines 10-btyes.(use to relate packed BCD; stores numbers in decimal value intead of hexidecimal value)


EQUATE DIRECTIVES

 

Equal-Sign Directive

 

The Equal-Sign Directive directive enbles you to asign the value of an expression to a name, and may do so any number of times in a program. Example:

 

VALUE_OF_PI = 3.1416

RIGHT_COL = 79

SCREEN_POSITION = 80*25

 

 

 

The EQU Directive


The EQU directive defines a value that the assembler can use to substitute in other instruction. It may assign a value to an item only once in a program.

Example:                    TIMES            EQU   10

                                    FIELDA         DB      TIMES DUP(?)

                                    COUNTR       EQU   05

 


                                                            MOV  CX,CONTR 


 

 

The TEXTEQU Directive (MASM 6.0)

 

TEXTEQU for text data with the format;

name TEXTEQU <text>