celarier@reed.bitnet (Stuart Celarier,<None>,2369490,2369386) (10/13/90)
I am looking for a reference manual for the 386 assembly language as recognized by as(1) on AT&T's System V, release 4. AT&T does not use the same notation as ASM386, presented in the Intel literature, but I cannot find a document which specifies the AT&T notation. The man page for as(1) documents how to use that command, not the description of the language that it processes. That man page references asm386.sed(1) as a method for translating ASM386 code to AT&T code -- but that is not good enough for the task at hand. In addition to knowing the syntax of the instruction set, there are (should be?) various assembler directives such as where to place things, whether a symbol is public, etc., etc., etc. I expected to find this information in the programmer's guide, right along side the C reference manual (silly me), since the volume I have is explicitly for the 386 processor (Prentice-Hall book published for AT&T). Thanks in advance for any help. Stuart Celarier celarier@reed.edu Voice: 503/236-9490 Fax: 503/236-9491
marc@dumbcat.sf.ca.us (Marco S Hyman) (10/15/90)
In article <15563@reed.UUCP> celarier@reed.bitnet () writes: I am looking for a reference manual for the 386 assembly language as recognized by as(1) on AT&T's System V, release 4. AT&T does not use the same notation as ASM386, presented in the Intel literature, but I cannot find a document which specifies the AT&T notation. Assuming that SysV Rel 4 uses the same syntax as SysV release 3.x here it is. I get enough requests for this thing that I think the post is worth while. (But I'll listen to flames that disagree -- to the mailbox please, no need to clutter the rest of the net. BTW: This doc is very similar to the doc in the SunOS Doc box that comes with a 386i. // marc dumbcat% zcat ~src/as386.doc.Z 8<------------------------------ Cut Here ------------------------------>8 - 1 - Preliminary 386 Assembler Definition Prepared by INTERACTIVE Systems Corp. - 2/5/86 - 2 - 1. Purpose_of_this_Document This document provides the third draft of the assembler language definition for the 5.3/386 CCS. The goal of this effort is to take the current 286 assembler and upgrade it to a 386 assembler in the minimum possible time. This docu- ment describes the resulting product. 1.1 INVOKING_THE_ASSEMBLER The assembler is invoked by the command: as [-o outfile] [-n] [-R] [-v] [-u] [-x] infile The flags have the following meaning: -o filename Use filename as the output file. The output file name is generated by the algorithm at the end of this sec- tion. -n No address optimization. -R Remove (unlink) the input file after assembly is com- pleted. -V Write the version number of the assembler on the stan- dard error output. This option does allow for normale assembly. -u Remove unreferenced debugging symbols from the symbol table. -x Extended addressing (48-bit pointers) will be used. The input assembly language program is read from infile and the output object module is written to outfile. The assem- bler only accepts one infile on a command line. If outfile is not specified, the name is created from infile by the following algorithm: + If the name infile ends with the two characters .s, the name outfile is created by replacing these last two characters with .o. - 3 - + If the name infile does not end with the two characters .s and is no more than 12 characters long, the name outfile is created by appending .o to the name infile. + If the name infile does not end with the two characters .s and is greater than 12 characters long, the name outfile is created by appending .o to the first 12 characters of infile. This satisfies the UNIX system requirement that a file name be no more than 14 charac- ters long. 1.2 INPUT_FORMAT The input to the assembler is a text file. This file must consist of a sequence of lines ending with a newline charac- ter (ASCII LF). Each line can contain one or more state- ments. If several statements appear on a line, they must be separated by semicolons(;). Each statement must be one of the following: + An empty statement is one that contains nothing other than spaces, tabs, and form-feed characters. Empty statements have no meaning to the assembler. They can be inserted freely to improve the appearance of a list- ing. + An assignment statement is one that gives a value to a symbol. It consists of a symbol, followed by an equal sign(=), followed by an expression. The expression is evaluated and the result is assigned to the symbol. Assignment statements do not generate any code. They are used only to assign assembly time values to sym- bols. + A pseudo operation statement is a directive to the assembler that does not necessarily generate any code. It consists of a pseudo operation code, followed by zero or more operands. Every pseudo operation code begins with a period(.). + A machine operation statement is a mnemonic representa- tion of an executable machine instruction that is translated by the assembler. It consists of an opera- tion code, followed by zero or more operands. In addition, each statement can be modified by one or more of the following: + A label can be placed at the begining of any statement. This consists of a symbol followed by a colon(:). When a label is encountered by the assembler, the value of - 4 - the location counter is assigned to the label. + A comment can be inserted at the end of any statement by preceding the comment with a slash(/). The slash causes the assembler to ignore any characters in the line after the slash. This facility is provided to allow insertion of internal program documentation into the source file for a program. 1.3 OUTPUT_FORMAT The output of the assembler is an object file. The object file produced by the assembler contains at least the follow- ing three sections: .text This is an initialized section, normally it is read only and contains the code from a program. It may also contain read only tables. .data This is an initialized section, normally it is readable abd writable. It contains initialized data. These can be scalers or tables. .bss This is an uninitialized section. Space is not allocated for this segment in the coff file. An optional section, .comment may also be produced (See the section "Pseudo Ops"). Every statement in the input assembly language program that generates code or data generates it into one of these three sections. The section into which the generated bytes are to be written starts out as .text, and can be switched using section control pseudo operations. The assembler can produce object modules with any one of four (4) different magic numbers. Each magic number indi- cates that a different (incompatible) function linkage has been used. The -x option must be specified to get an object file with 48-bit pointers. The default object file type (with no -x option) is a 32-bit pointer object file. The -x option does not change the output code - the handling of 48-bit addresses must be done in the assembly code, byt the programer. The -x option tells the assembler what type of magic number to put into the coff file. - 5 - SYMBOLS AND EXPRESSIONS 2. SYMBOLS_and_EXPRESSIONS 2.1 Values Values are represented in the assembler by 32 bit 2's com- pliment values. All arithmetic is performed using 32 bits of precision. Note that the values used in a 386 instruc- tion may use 8, 16, or 32 bits. 2.1.1 Types Every value is an instance one of the follow- ing types: Undefined An undefined symbol is one whose value has not yet been defined. Examples of undefined symbols are for- ward references and externals. Absolute An absolute type is one whose value does not change with relocation. Examples of absolute symbols are numeric constants and expressions whose operands are only numeric constants. Text A text type symbol is one whose value is relative to the text segment. Data A data type symbol is one whose value is relative to the data segment. Bss A bss type symbol is one whose value is relative to the bss segment. Any of the above symbol types can be given the attribute EXTERNAL. 2.2 Symbols A symbol has a value and a type each of which is either specified explicitly by an assignment statement or from it's context. Refer to section 2.3 (Expressions) for the regular expression definition of a symbol. 2.2.1 Reserved_Symbols The following symbols are reserved by the assembler. - 6 - . Commonly refered to as dot. This is the location counter while assembling a program. It takes on the current location in the text, data, or bss section. .text This symbol is of type text. It is used to label the beginning of a text section in the program being assembled. .data This symbol is of type data. It is used to label the beginning of a data section in the program being assembled. .bss This symbol is of type bss. It is used to label the beginning of a bss section in the program being assem- bled. 2.3 Expressions 2.3.1 General The expressions accepted by the UNIX 386 assembler can be described by their semantic and syntactic rules. The following are the operators supported by the assembler: OPERATOR ACTION --------------------------- + addition - subtraction \* multiplication \/ division & bit wise logical and | bit wise logical or > right shift < left shift \% remainder operator ! bit wise logical and not In the following syntactic rules the non-terminals are represented by lower case letters. The terminal symbols are represented by upper case letters and the symbols enclosed in double quotes ("") are terminal symbols. There is no precedence to the operators. Square brackets must be used to establish precedence. - 7 - SYNTACTIC RULES FOR THE ASSEMBLER expr : term | expr "+" term | expr " term | expr "/" term | expr "&" term | expr "|" term | expr ">" term | expr "<" term | expr "" term | expr "!" term | expr "-" term ; term : id | number | "-" term | "[" expr "]" | "<o>" term | "<s>" term ; id : LABEL ; number : DEC_VAL | HEX_VAL | OCT_VAL | BIN_VAL ; The Terminal nodes can be described by the following regular expressions. LABEL = [a-zA-Z_][a-zA-Z0-9_]*: DEC_VAL = [1-9][0-9]* HEX_VAL = 0[Xx][0-9a-fA-F][0-9a-fA-F]* OCT_VAL = 0[0-7]* BIN_VAL = 0[Bb][0-1][0-1]* In the above regular expressions choices are enclosed in square brackets, a range of letters or numbers are separated by a dash (-), and the star (*) indicates zero (0) or more instances of the previous character. Semantically the expressions fall into two groups, they are absolute and relocatable. The following table shows the legal combinations of absolute and relocatable operands, for - 8 - the addition and subtraction operators. All other opera- tions are only legal on absolute valued expressions. All numbers have the absolute attribute. Symbols used to reference storage, text or data, are relocatable. In an assignment statement Symbols on the left hand side inherit their relocation attributes from the right hand side. In the table "a" is an absolute valued expression, and "r" is a relocatable valued expression. The resulting type of the operation is given to the right of the equal sign. a + a = a r + a = r a - a = a r - a = r r - r = a In the last example, the relocatable expressions must be declared before their difference can be taken. Following are some examples of valid expressions: 1. label" 2. $label" 3. [label + 0x100]" 4. [label1 - label2]" 5. $[label1 - label2]" Following are some examples of invalid expressions: 1. [$label - $label]" 2. [label1 * 5]" 3. (label + 0x20)" - 9 - PSEUDO OPERATIONS .align val The align pseudo op causes the next data generated to be aligned modulo val. Val must be an positive integer value. .bcd val The bcd pseudo op generates a packed decimal (80-bit) value into the current section. This is not valid for the .bss section. Val is a non-floating point constant. .bss The bss pseudo op changes the current section to .bss .bss tag, bytes Define symbol tag in the .bss section and add bytes to the value of dot for .bss. This does not change the current section to .bss. Tag is a symbol name. Bytes must be an positive integer value. .byte val [,val] The byte pseudo op generates initialized bytes into the current section. This is not valid for .bss. Each val must be an 8-bit value. .comm name, expr The comm pseudo op allocates storage in the .data section. The storage is referenced by name, and has a size in bytes of expr. Name is a symbol. Expr must be an positive integer. The name can not be pre-defined. .data The data pseudo op changes the current section to .data. .double val The double pseudo op generates an 80287 long real (64-bit) into the current sec- tion. Not valid the .bss section. Val is a floating point constant. .even The even pseudo op aligns the current program counter, (.) to an even boun- dary. .float val The float pseudo op generates a 80287 short real (32 bit) into the current section. This is not valid in the .bss section. Val is a floating point con- stant. - 10 - .globl name This pseudo op makes the variable, name, accessible to other programs. .ident string The ident pseudo op creates an entry in the comment section containing string. String is any sequence of characters, not including the double quote, '"'. .lcomm name, expr The lcomm pseudo op allocates storage in the .bss section. The storage is refer- enced by name, and has a size of expr. Name is a symbol. Expr must be of type positive integer. Name can not be pre- defined. .long val The long pseudo op generates a long integer (32-bit two's complement value) into the current section. This pseudo op is not valid for the .bss section Val is a non-floating point constant. .noopt The noopt pseudo op .optim The optim pseudo op .set name, expr The set pseudo op sets the value of sym- bol name to expr. This is equivalent to an assignment. .string str This pseudo places the characters in str into the object module at the current loc and terminates the string with a null. The string must be enclosed in double quotes (""). This pseudo op is not valid for the .bss section. .text The text pseudo op defines the current section as .text. .value expr [,expr] The value pseudo op is used to generate an initialized word (16-bit two's com- plement value) into the current section. This pseudo op is not valid in the .bss section. Each expr must be a 16-bit value. .version string The version pseudo op puts the C com- piler version into the comment section. - 11 - SDB PSEUDO OPS .type expr The type pseudo op is used with in a .def-.endef pair. It gives the name the C compiler type representation expr. .val expr The val pseudo op is used with a .def- .endef pair. It gives name the value of expression. The type of expr determines the section for name. .tag str The tag pseudo op is used in relation with a previously defined .def pseudo op. If the name of a .def is a struc- ture or a union, str should be the name of that structure or union tag defined in a previous .def-.endef pair. .size expr The size pseudo op is used with the .def pseudo op. If name of .def is an object such as a structure or an array, this gives it a total size of expr. Expr must be a positive integer. .scl expr The scl pseudo op is used with the .def pseudo op. With in the .def it gives name the storage class of expr. The type of expr should be positive. .line expr The line pseudo op is used with the .def pseudo op. It defines the source line number of the definition of symbol name in the .def. Expr should yield an posi- tive value. .ln line [,addr] This pseudo op provides the relative source line number to the beginning of a function. It is used to pass info through to sdb. .file name The file pseudo op is the source file name. Only one is allowed per source file. Name must be between 1 and 14 characters. This must be the first line an assembly file. .endef The endef pseudo op is the ending bracket for a .def. .def name The def pseudo op starts a symbolic description for symbol name. See - 12 - .endef. Name is a symbol name. .dim expr [,expr] The dim pseudo op is used with the .def pseudo op. If the name of a .def is an array, the expressions give the dimen- sions. Up to 4 dimensions are accepted. The type of each expression should be positive. - 13 - MACHINE INSTRUCTIONS 3. Machine_Instructions 3.1 Differences between the UNIX 386 and the Intel 386 assemblers This section describes the instructions that the assembler accepts. The detailed specification of how the particular instructions operate are not included. The operation of particular instructions is described in the Intel documenta- tion. The following describes the differences between the Unix 386 and Intel 386 assembly languages. This explanation covers all aspects of translation from Intel assembler to Unix 386 assembler. This is a list of the differences between the Unix 386 assembly language and Intel's. 1. All register names use percent sign (%) as a prefix to distinguish them from symbol names. 2. Instructions with two (2) operands use the left as the source and the right as the destination. This follows the UNIX system's assembler convention, and it is reversed from Intel's notation. 3. Most instructions that can operate on a byte, word, or long may have "b", "w", or "l" appended to them. In general when an opcode is specified with no type suf- fix, it defaults to long. In general the UNIX 386 assembler derives its type information from the opcode, where as the Intel 386 assembler can derive its type information from the operand types. Where the type information is derived, motivates the b, w, and l suffixes used in the Unix 386 assembler. - 14 - 3.2 Operands Three kinds of operands are generally available to the instructions: register, memory, and immediate operands. Full descriptions of each type appear below. Indirect operands are available to jump and call instructions; but NO other instructions can use memory indirect operands. The assembler always assumes it is generating code for a 32 bit segment. So when 16 bit data is called for ( i.e. movw %ax, %bx ) it will automatically generate the 16 bit data prefix byte. Byte, Word, and Long registers are available on the 80386 processor. The code segment (%cs), instruction pointer (%eip), and the flag register are not available as explicit operands to the instructions. The names of the byte, word, and long registers available as operands and a brief description appear below: 1. 8-bit (byte) general registers %al low byte of %ax register %ah high byte of %ax register %cl low byte of %cx register %ch high byte of %cx register %dl low byte of %dx register %dh high byte of %dx register %bl low byte of %bx register %bh high byte of %bx register 2. 16-bit general registers %ax low 16-bits of %eax register %cx low 16-bits of %ecx register %dx low 16-bits of %edx register %bx low 16-bits of %ebx register %sp low 16-bits of the stack pointer (%esp) - 15 - %bp low 16-bits of the frame pointer (%ebp) %si low 16-bits of the source index register (%esi) %di low 16-bits of the destination index register (%edi) 3. 32-bit General Registers %eax 32-bit accumulator %ecx 32-bit general register %edx 32-bit general register %ebx 32-bit general register %esp 32-bit stack pointer %ebp 32-bit frame pointer %esi 32-bit source index register %edi 32-bit destination index register 4. Segment registers %cs Code segment register, all references to the instruction space use this register. %ds Data segment register, the default segment regis- ter for most references to memory operands. %ss Stack segment register, the default segment regis- ter for memory operands in the stack. (i.e. default segment register for %bp %sp %esp and %ebp). %es General purpose segment register Some string instructions use this extra segment as their default segment. %fs General purpose segment register %gs General purpose segment register - 16 - 3.3 Instruction_Descriptions This section describes the Unix 5.3/386 instruction syntax. Refer to section 3.13.13.1 for the differences between the UNIX 386 and the Intel 386 assemblers. Since the assembler assumes it is always generating code for a 32 bit segment it always assumes a 32 bit address, and it automatically predeeds word operations with a 16 bit data prefix byte. In this section the following notation is used: 1. The mnemonics are expressed in a regular expression type syntax. Alternatives separated by a vertical bar (|) and enclosed with in square brackets, "[]", denote one of them must be chosen. Alternatives enclosed with in curly braces, "{}", denote one or none of the them may be used. The vertical bar (|) separates dif- ferent suffixes for operators or operands. As an example when an 8, 16, or 32 bit immediate value is permitted in an instruction we would write: imm[8|16|32]. 2. imm[8|16|32|48] - any immediate value, as they are defined above. Immediate values are defined using the regular expression syntax previously defined. When there is a choice between operand sizes the assembler will choose the smallest representation. 3. reg[8|16|32] - any general purpose register. Where each number indicates one of the following: 32: %eax, %ecx, %edx, %ebx, %esi, %edi,%ebp, %esp. 16: %ax, %cx, %dx, %bx, %si, %di, %bp, %sp. 8: %al, %ah, %cl, %ch, %dl, %dh, %bl, %bh. 4. mem[8|16|32|48] - any memory operand. The 8, 16, 32, and 48 suffixes represent byte, word, dword, and inter-segment memory address quantities, respectively. 5. r/m[8|16|32] - any general purpose register or memory operand. The operand type is determined from the suf- fix. They are 8 = byte, 16 = word, and 32 = dword. The registers for each operand size are the same as reg[8|16|32] above. 6. creg - any control register The control registers are: %cr0, %cr2, or %cr3. 7. dreg - the debug register. The debug registers are: %db0, %db1, %db2, %db3, %db6, %db7. - 17 - 8. sreg - any segment register The segment registers are: %cs, %ds, %ss, %es, %fs, %gs. 9. treg - the test register. The test registers are: %tr6 and %tr7 10. cc - condition codes. The condition codes are: 1. a - jmp above 2. ae - above or equal 3. b - below 4. be - below or equal 5. c - carry 6. e - equal 7. g - greater 8. ge - greater than or equal to 9. l - less than 10. le - less than or equal to 11. na - not above 12. nae - not above or equal to 13. nb - not below 14. nbe - not above or equal to 15. nc - no carry 16. ne - not equal 17. ng - not greater than 18. nge - not greater than or equal to 19. nl - not less than 20. nle - not less than or equal to 21. no - not over flow - 18 - 22. np - not parity 23. ns - not sign 24. nz - not zero 25. o - overflow 26. p - parity 27. pe - parity even 28. po - parity odd 29. s - sign 30. z - zero 11. disp[8|32] - the number of bits used to define the distance of a relative jump. Since the assembler only supports a 32 bit address space only 8 bit sign extended, and 32 bit address are supported. 12. immPtr - When the immediate form of a long call or a long jump is used the selector and offset are encoded as an immediate pointer (immPtr). Addressing modes Represented by: [sreg:][offset][([base][,index][,scale])]. Where all the items in the square brackets are optional, and at least one is necessary. If any of the items in side the parenthesis are used the parenthesis are mandatory. Sreg is a segment register over ride prefix. It may be any segment register. If a segment over ride prefix is present it must be followed by a colon (:), before the offset com- ponent of the address. Sreg does not represent an address by itself. An address must contain an offset component. Offset is a displacement from a segment base. It may be absolute or relocatable. A label is an example of a relo- catable offset. A number is an example of an absolute offset. Base and index can be any 32 bit register. Scale is a mul- tiplication factor for the index register field. Please refer to the Intel documentation for more details on the 80386 addressing modes. Following are some examples of addresses: - 19 - movl var, %eax Move the contents of memory location var into %eax. movl %cs:var, %eax Move the contents of the memory location, var in the code segment into %eax. movl $var, %eax Move the address of var into %eax. movl array_base(%esi), %eax Add the address of memory location array_base to the content of %esi to get an address in memory. Move the content of this address into %eax. movl (%ebx, %esi, 4), %eax Multiply the content of %esi by 4, add this to the content of %ebx, to produce a memory reference. Move the content of this memory location into %eax. movl struct_base(%ebx, %esi, 4), %eax Multiply the content of %esi by 4, add this to the content of %ebx, add this to the address of struct_base, to produce an address. Move the content of this address into %eax. A note about expressions and immediate values. An immediate value is an expression preceded by a dollar sign. immediate: "$" expr Immediate values carry the absolute or relocatable attri- butes of their expression component. Immediate values can not be used in an expression. Immediate values should be considered as another form of address. The immediate form of address. 3.3.1 Processor_Extension_Instructions Please refer to the chapter on floating point support. - 20 - 3.3.1.1 Control_and_Test_Register_Instructions 1. mov{l} creg, reg32 2. mov{l} dreg, reg32 3. mov{l} reg32, creg 4. mov{l} reg32, dreg 5. mov{l} treg, reg32 6. mov{l} reg32, treg NOTE: The Unix assembler accepts "mov" or "movl" as exactly the same instruction for the control and test register group. 3.3.1.2 New_Condition_Code_Instructions 1. jcc disp32 2. setcc r/m8 - 21 - 3.3.1.3 Bit_Instructions All the new bit instructions are only defined for word and long register or memory operands. 1. bt{wl} reg[16|32], r/m[16|32] 2. bt{wl} imm8, r/m[16|32] 3. bts{wl} imm8, r/m[16|32] 4. bts{wl} reg[16|32], r/m[16|32] 5. btr{wl} imm8, r/m[16|32] 6. btr{wl} reg[16|32], r/m[16|32] 7. btc{wl} imm8, r/m[16|32] 8. btc{wl} reg[16|32], r/m[16|32] 9. bsf{wl} reg[16|32], r/m[16|32] 10. bsr{wl} reg[16|32], r/m[16|32] 11. shld{wl} imm8, reg[16|32], r/m[16|32] 12. shld{wl} reg[16|32], r/m[16|32] 13. shrd{wl} imm8, reg[16|32], r/m[16|32] 14. shrd{wl} reg[16|32], r/m[16|32] NOTE: All the bit operation mnemonics with out a type suffix default to long. - 22 - 3.3.1.4 New_Arithmetic_Instruction 1. imul r/m[16|32], reg[16|32] NOTE: This is the uncharacterized multiply. It has a 16 or 32 bit product, as opposed to a 32 or 64 bit product. - 23 - 3.3.1.5 New_Move_with_Zero_or_Sign_Extension_Instructions 1. movzbw r/m8, reg16 2. movzbl r/m8, reg32 3. movzwl r/m16, reg32 4. movsbw r/m8, reg16 5. movsbl r/m8, reg32 6. movswl r/m16, reg32 3.3.2 Data_Movement_Instructions 1. clr{bwl} r/m[8|16|32] 2. lea{wl} mem32, reg[16|32] 3. mov{bwl} r/m[8|16|32], reg[8|16|32] 4. mov{bwl} reg[8|16|32], r/m[8|16|32] 5. mov{bwl} imm[8|16|32], r/m[8|16|32] 6. pop{wl} r/m[16|32] 7. popa{wl} 8. push{bwl} imm[8|16|32] 9. push{wl} r/m[16|32] 10. pusha{wl} 11. xchg{bwl} reg[8|16|32], r/m[8|16|32] NOTE1: pushb sign extends the immediate byte to a long, and pushes a long (4 bytes) onto the stack. NOTE2: When a type suffix is not used with a data movement mnemonic the type defaults to long. The Unix assembler does not derive the type of the operands from the operands. - 24 - 3.3.3 Segment_Register_Instructions 1. lds{wl} mem[32|48], reg[16|32] 2. les{wl} mem[32|48], reg[16|32] 3. lfs{wl} mem[32|48], reg[16|32] 4. lgs{wl} mem[32|48], reg[16|32] 5. lss{wl} mem[32|48], reg[16|32] 6. movw sreg[cs|ds|ss|es] , r/m16 7. movw r/m16, sreg[cs|ds|ss|es] 8. popw sreg[ds|ss|es|fs|gs] 9. pushw sreg[cs|ds|ss|es|fs|gs] NOTE1: The pushw and popw push and pop 16 bit quantities. This is done by using an data size over ride byte (OSP) byte. NOTE2: When the type suffix is not used with the lds, les, lfs, lgs, and lss instructions a 48 bit pointer is assumed. NOTE3: Since the assembler assumes no type suffix means a type of long, the type suffix of "w" when working with the segment registers is mandatory. - 25 - 3.3.4 I/O_Instructions 1. in{bwl} imm8 2. in{bwl} %dx 3. ins{bwl} %dx 4. out{bwl} imm8 5. out{bwl} %dx 6. outs{bwl} %dx NOTE1: When the type suffix is left off the I/O instructions they default to long. So in = inl, out = outl, ins = insl, and outs = outsl. 3.3.5 Flag_Instructions 1. lahf 2. sahf 3. popf{wl} 4. pushf{wl} 5. cmc 6. clc 7. stc 8. cli 9. sti 10. cld 11. std NOTE: When the type suffix not used the pushf and popf instructions default to long. Pushf = pushfl and popf = popfl. A pushw or popw will push or pop a 16 bit quantity. This is done by using the OSP prefix byte - 26 - 3.3.6 Arithmetic/Logical_Instructions 1. add{bwl} reg[8|16|32], r/m[8|16|32] 2. add{bwl} r/m[8|16|32], reg[8|16|32] 3. add{bwl} imm[8|16|32], r/m[8|16|32] 4. adc{bwl} reg[8|16|32], r/m[8|16|32] 5. adc{bwl} r/m[8|16|32], reg[8|16|32] 6. adc{bwl} imm[8|16|32], r/m[8|16|32] 7. sub{bwl} reg[8|16|32], r/m[8|16|32] 8. sub{bwl} r/m[8|16|32], reg[8|16|32] 9. sub{bwl} imm[8|16|32], r/m[8|16|32] 10. sbb{bwl} reg[8|16|32], r/m[8|16|32] 11. sbb{bwl} r/m[8|16|32], reg[8|16|32] 12. sbb{bwl} imm[8|16|32], r/m[8|16|32] 13. cmp{bwl} reg[8|16|32], r/m[8|16|32] 14. cmp{bwl} r/m[8|16|32], reg[8|16|32] 15. cmp{bwl} imm[8|16|32], r/m[8|16|32] 16. inc{bwl} r/m[8|16|32] 17. dec{bwl} r/m[8|16|32] 18. test{bwl} reg[8|16|32], r/m[8|16|32] 19. test{bwl} r/m[8|16|32], reg[8|16|32] 20. test{bwl} imm[8|16|32], r/m[8|16|32] 21. sal{bwl} imm8, r/m[8|16|32] 22. sal{bwl} %cl, r/m[8|16|32] 23. shl{bwl} imm8, r/m[8|16|32] 24. shl{bwl} %cl, r/m[8|16|32] - 27 - 25. sar{bwl} imm8, r/m[8|16|32] 26. sar{bwl} %cl, r/m[8|16|32] 27. shr{bwl} imm8, r/m[8|16|32] 28. shr{bwl} %cl, r/m[8|16|32] 29. not{bwl} r/m[8|16|32] 30. neg{bwl} r/m[8|16|32] 31. bound{wl} reg[16|32], r/m[16|32] 32. and{bwl} reg[8|16|32], r/m[8|16|32] 33. and{bwl} r/m[8|16|32], reg[8|16|32] 34. and{bwl} imm[8|16|32], r/m[8|16|32] 35. or{bwl} reg[8|16|32], r/m[8|16|32] 36. or{bwl} r/m[8|16|32], reg[8|16|32] 37. or{bwl} imm[8|16|32], r/m[8|16|32] 38. xor{bwl} reg[8|16|32], r/m[8|16|32] 39. xor{bwl} r/m[8|16|32], reg[8|16|32] 40. xor{bwl} imm[8|16|32], r/m[8|16|32] NOTE: When the type suffix is not included in an arithmetic or logical instruction it defaults to a long. - 28 - 3.3.7 Multiply_and_Divide 1. imul{wl} imm[16|32], r/m[16|32], reg[16|32] 2. mul{bwl} r/m[8|16|32] 3. div{bwl} r/m[8|16|32] 4. idiv{bwl} r/m[8|16|32] NOTE: When the type suffix is not included in a multiply or divide instruction it defaults to a long. 3.3.8 Conversion_Instructions 1. cbtw 2. cwtd 3. cwtl 4. cltd NOTE: convert byte to word: %al -> %ax convert word to double: %ax -> %dx:%ax convert word to long: %ax -> %eax convert long to double: %eax -> %edx:%eax 3.3.9 Decimal_Arithmetic_Instructions 1. daa 2. das 3. aaa 4. aas 5. aam 6. aad 3.3.10 _Coprocessor_Instructions 1. wait 2. esc - 29 - 3.3.11 String_Instructions 1. movs[bwl] 2. movs - same as movsl 3. smov[bwl] same as movs[bwl] 4. smov - same as smovl 5. cmps[bwl] 6. cmps - same as cmpsl 7. scmp[bwl] same as cmps[bwl] 8. scmp - same as scmpl 9. stos[bwl] 10. stos - same as stosl 11. ssto[bwl] same as stos[bwl] 12. ssto - same as sstol 13. lods[bwl] 14. lods - same as lodsl 15. slod[bwl] same as lods[bwl] 16. slod - same as slodl 17. scas[bwl] 18. scas - same as scasl 19. ssca[bwl] same as scas[bwl] 20. ssca - same as sscal 21. xlat 22. rep 23. repnz 24. repz - 30 - NOTE: All Intel string op mnemonics default to longs. - 31 - 3.3.12 _Procedure_Call_and_Return 1. lcall immPtr 2. lcall r/m48 (indirect) 3. lret 4. lret imm16 5. call disp32 6. call r/m32 (indirect) 7. ret 8. ret imm16 9. enter imm16, imm8 10. leave 3.3.13 Jump_Instructions 1. jcc disp[8|32] 2. jcxz disp[8|32] 3. loop disp[8|32] 4. loopnz disp[8|32] 5. loopz disp[8|32] 6. jmp disp[8|32] 7. ljmp immPtr 8. jmp r/m32 (indirect) 9. ljmp r/m48 (indirect) NOTE: The UNIX 386 assembler optimizes for SDI's (Span Dependent Instructions). So intra-segment jumps are optim- ized to their short forms when possible. 3.3.14 Interrupt_Instructions 1. int 3 - 32 - 2. int imm8 3. into 4. iret - 33 - 3.3.15 Protection_Model_Instructions 1. sldt r/m16 2. str r/m16 3. lldt r/m16 4. ltr r/m16 5. verr r/m16 6. verw r/m16 7. sgdt r/m32 8. sidt r/m32 9. lgdt r/m32 10. lidt r/m32 11. smsw r/m32 12. lmsw r/m32 13. lar r/m32, reg32 14. lsl r/m32, reg32 15. clts 3.3.16 Miscellaneous_Instructions 1. lock 2. nop 3. hlt 4. addr16 5. data16 - 34 - TRANSLATION TABLES FOR UNIX TO INTEL FLOAT MNEMONICS The following tables show the relationship between the Unix and Intel mnemonics. The mnemonics are organized into the same functional categories as the Intel mnemonics. The Intel mnemonics appear in section two of the 80287 numeric supplement. The notational conventions used in the table are: When letters appear with in square brackets , "[]", exactly one of the letters are required. If letters appear with in curly braces, "{}", then either one or none of the letters are required. When a a group of letters is separated from other letters by a bar, "|", with in square brackets or curly braces then the group of letters between the bars or a bar and a closing bracket or brace are considered an atomic unit. As an example, "fld[lst] means: fldl, flds, or fldt. Where fst{ls} means: fst, fstl, or fsts. And fild{l|ll} means: fild, fildl, or fildll. The Unix operators are built from the Intel operators by adding suffixes to them. The 80287 deals with three data types, integer, packed decimal, and reals. The Unix assem- bler is not typed. So the operator has to carry with it the type of data item it is operating on. If the operation is on an integer the following suffixes apply: l for Intel's short (32 bit), and ll for Intel's long (64 bits). If the operator applies to reals then: s is short (32 bits), l is long (64 bits), and t is temporary real (80 bits). - 35 - Real Transfers UNIX | INTEL Operation ================================================= fld[lst] | fld load real fst{ls} | fst store real fstp{lst} | fstp store real and pop fxch | fxch exchange registers Integer Transfers UNIX | INTEL Operation ================================================= fild{l|ll} | fild integer load fist{l} | fist integer store fistp{l|ll} | fistp integer store and pop Packed Decimal Transfers UNIX | INTEL Operation ================================================= fbld | fbld Packed decimal (BCD) load fbstp | fbstp Packed decimal (BCD) store and pop Addition UNIX | INTEL Operation ================================================= fadd{ls} | fadd real add faddp | faddp real add and pop fiadd{l} | fiadd integer add Subtraction UNIX | INTEL Operation ================================================= fsub{ls} | fsub subtract real fsubp | fsubp subtract real and pop fsubr{ls} | fsubr subtract real reversed fsubrp | fsubrp subtract real reversed and pop fisub{l} | fisub integer subtract fisubr{l} | fisubr integer subtract reverse Multiplication UNIX | INTEL Operation ================================================= fmul{ls} | fmul multiply real fmulp | fmulp multiply real and pop fimul{l} | fimul integer multiply - 36 - Division UNIX | INTEL Operation ================================================= fdiv{ls} | fdiv divide real fdivp | fdivp divide real and pop fdivr{ls} | fdivr divide real reversed fdivrp | fdivrp divide real reversed and pop fidiv{l} | fidiv integer divide fidivr{l} | fidivr integer divide reversed Other Arithmetic Operations UNIX | INTEL Operation ================================================= fsqrt | fsqrt square root fscale | fscale scale fprem | fprem partial remainder frndint | frndint round to integer fxtract | fxtract extract exponent and significand fabs | fabs absolute value fchs | fchs change sign Comparison Instructions UNIX | INTEL Operation ================================================= fcom{ls} | fcom compare real fcomp{ls} | fcomp compare real and pop fcompp | fcompp compare real and pop twice ficom{l} | ficom integer compare ficomp{l} | ficomp integer compare and pop ftst | ftst test fxam | fxam examine Transcendental Instructions UNIX | INTEL Operation ================================================= fptan | fptan partial tangent fpatan | fpatan partial arctangent f2xm1 | f2xm1 2^x - 1 fyl2x | fyl2x Y * log2X fyl2xp1 | fyl2xp1 Y * log2(X+1) - 37 - Constant Instructions UNIX | INTEL Operation ================================================= fldl2e | fldl2e load logeE fldl2t | fldl2t load log2 10 fldlg2 | fldlg2 load log2 2 fldln2 | fldln2 load loge2 fldpi | fldpi load pie fldz | fldz load + 0 Processor Control Instructions UNIX | INTEL Operation ================================================= finit/fnint | finit/fnint initialize processor fnop | fnop no operation fsave/fnsave | fsave/fnsave save state fstcw/fnstcw | fstcw/fnstcw store control word fstenv/fnstenv | fstenv/fnstenv store environment fstsw/fnstsw | fstsw/fnstsw store status word frstor | frstor restore state fsetpm | fsetpm set protected mode fwait | fwait CPU wait fclex/fnclex | fclex/fnclex clear exceptions fdecstp | fdecstp decrement stack pointer ffree | ffree free registers fincstp | fincstp increment stack pointer - 38 - -- // marc@dumbcat.sf.ca.us // {ames,decwrl,sun}!pacbell!dumbcat!marc