lotto@wjh12.UUCP (04/05/87)
The following code fragment was assembled using MASM 4.0, linked with the linker provided in that package and the exe2bin'ed to a .COM file with the DOS 3.2 incarnation of that utility. Title FOO Cseg segment para 'CODE' org 100H assume CS:Cseg, DS:Cseg, ES:Cseg Start: jmp Begin Label db 'Post no bills' Begin: lea DX, Cseg:Label . . . Cseg ends end Start When I looked at the resulting code, the line at label begin got: LEA DX, [0003] 100H before the actual location of Label. The same (incorrect) results were obtained with the source line: Begin: lea DX, CS:Label Removing the explicit segment reference to make: Begin: lea DX, Label gave the correct result: LEA DX, [0103] My understanding of the ASSUME directive is that labels located in Cseg should assume the use of CS: as the segment register. No changes are observed with the segment declaration, assume directive and org assignment in any order! Why is this a problem? Thanks.
doug@edge.UUCP (Doug Pardee) (04/14/87)
> My understanding of the ASSUME directive is that labels located > in Cseg should assume the use of CS: as the segment register. It's been a few months since I've had MASM available, so I can't check into the original question. But I can provide some basic info: Caution: This area is extremely complicated. I don't pretend to understand /all/ of it. Basic premise: any memory location in a PC (or other iAPX86 CPU) can be addressed by any of 4096 different combinations of segment:offset addresses. Whenever you try to reference "the address of a memory byte", you are forcing the assembler to select one of those 4096 possible addresses. Which one it chooses is of great importance if your code treats the segment and offset values separately. For example, the nearly-every-instruction practice of keeping a segment value in a segment register and then referencing the data item by offset. Fortunately, things aren't quite as bleak as this looks. There are usually only 2 of the 4096 "aliases" that the assembler will pick. Each symbol which is associated with a memory item has a default Segment:Offset based on the segment in which it was defined. [There are two variations, discussed later.] And, if the default segment is declared as part of a GROUP, then there is also the associated Group:Offset combination which could be used as an address. The method used by the assembler to determine whether to use Segment:Offset, Group:Offset, or some other oddball thing is based on the type of instruction. a) Branching instructions cannot have segment prefix bytes, so always use CS. Consequently, the offsets are computed based upon whatever was ASSUMEd for CS, *even if this is unusable*. Make *dang* sure that any branch instruction has the appropriate ASSUME CS: ahead of it. b) The LEA is a special instruction. It's operand is always an offset, and that offset is computed based upon whatever was ASSUMEd for DS, even if this is unusable. This can be overridden. c) Normal instructions just referencing memory -- Take the first usable offset from this list: ASSUME DS:Group, ASSUME ES:Group, ASSUME SS:Group, ASSUME CS:Group, ASSUME DS:Segment, ASSUME ES:Segment, ASSUME SS:Segment, ASSUME CS:Segment. (Or is SS before ES? I forget). This can be overridden. d) Instructions referencing SEG x or OFFSET x, like MOV AX,OFFSET memloc -- Use Segment:Offset. This can be overridden. And you had darned well better override it if the data item is in a GROUP, because then it's almost certain you wanted Group:Offset, not Segment:Offset. e) Data definitions (address constants), same as (d) above. Same warnings. f) Operands with segment overrides are taken as written. If a segment register is specified, then that forces the Segment Prefix byte as well as causing the address to be computed based on whatever was ASSUMEd for that segment register, even if this is unusable. When I say that a computed offset is "usable" or "unusable", I refer to the fact that the assembler actually leaves much of this work up to the linker. It basically tells the linker, "This instruction references the memory location which can be called Segment:Offset, but it uses XXXX instead of segment, so figure out what the offset for XXXX:Offset is and use that value instead." This can cause bizarre happenings, because the linker is perfectly happy referencing something in segment Y but using the totally unrelated segment X as a base, as long as Y is after X and the referenced location is withing 64K of the beginning of X. One day you change something, and there's more than 64K, and you get an linker error message you can't figure out where it came from :-) How can this happen? Well, from an explicit segment override, for one thing. But a sneakier way is from one of those 2 variations I mentioned some paragraphs back, about how each memory-type symbol has a default segment associated with it. External symbols in MASM come in 2 types. Regular externals do *not* have a segment associated with them; the assembler cannot compute any offsets and must leave the whole thing to the linker. These are fine for procedure labels, but trying to reference them as data items is a b**ch, because you gotta deal with both the segment and offset addresses whenever you want to reference it. ASSUME statements can't help you here at all. For most data items, you the programmer know which segment they're in. In fact, for smaller programs, you might well have all of your data accessible off of DGROUP. Including some or all externals. So by placing the "EXTERN" directive *inside* the confines of the appropriate "SEGMENT/ENDS" pair, you tell the assembler that it can presume it knows which segment the symbol is in. Now you can reference the symbol directly (if DS is ASSUMEd to that segment or group) with no fuss. But if you *lied* to the assembler about which segment the memory was in, you can have linker trouble. The other variation? Labels defined with a colon (jump-type labels) are not exactly defined to be in the current segment as defined by "SEGMENT/ENDS". They're defined to be in the segment which is ASSUMEd for CS. Usually, you've assumed the current segment in CS, so you don't notice the difference. If you followed all this, you're better than I am... -- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona
bill@hpcvlo.UUCP (04/16/87)
I just tried assembling your code using Microsoft MASM 4.0 and LINK 3.51, and it worked just fine. Here's the listing that masm generated; I also verified that the resulting .COM file, after going through EXE2BIN, was correct ... Microsoft (R) Macro Assembler Version 4.00 4/16/87 07:57:53 foo Page 1-1 title foo 0000 cseg segment para 'code' 0100 org 100h assume cs:cseg,ds:cseg,es:cseg 0100 EB 0E 90 start: jmp begin 0103 50 6F 73 74 20 6E 6F label db 'Post no bills' 20 62 69 6C 6C 73 0110 8D 16 0103 R begin: lea dx,cseg:label 0114 cseg ends end start Microsoft (R) Macro Assembler Version 4.00 4/16/87 07:57:53 foo Symbols-1 Segments and Groups: N a m e Size Align Combine Class CSEG . . . . . . . . . . . . . . 0114 PARA NONE 'CODE' Symbols: N a m e Type Value Attr BEGIN . . . . . . . . . . . . . L NEAR 0110 CSEG LABEL . . . . . . . . . . . . . L BYTE 0103 CSEG START . . . . . . . . . . . . . L NEAR 0100 CSEG 12 Source Lines 12 Total Lines 26 Symbols 42226 Bytes symbol space free 0 Warning Errors 0 Severe Errors Bill Frolik hp-pcd!bill Hewlett-Packard Portable Computer Division Corvallis, Oregon