cliffhanger@cup.portal.com (Cliff C Heyer) (01/12/90)
I wish Microsoft and MSDOS book writers would get their terms straight. For example, lets take "segment:offset". The sensible thing to do would be to use these words they way they are usually used in the english language, rather than invent "new" meanings. Specifically, segment means a 64K "segment" and offset means "offset within a 64K segment." But I guess this was "too easy" for whoever made the word decision. So I'll take this opportunity now to set the matter straight. First of all, "segment" in segment:offset is NOT a segment. In fact, the number is meaningless in itself. The same goes for "offset". Let me explain. To get the "real" segment, you must multiply (base 10 this example) the so-called segment by 16 and add the offset. Then you must subtract 65,536 from this total repeatedly and stop just before you get a negative number. Then number of times you subtract is the "real" segment number, and your remainder is the "real" offset address. In Hex, you multiply the so-called segment by 10h and add the offset. The first hex number of your result is the "real" segment, and the "real" offset is the remaining four hex numbers. To be true to standard "segment:offset" word use, 39D3:0ECE should be written 3:ABFE or 3:44030 base 10. If people are smart enough to invent computers, they ought to be smart enough to get word use straight. I don't know what to call segment:offset, but certainly NOT segment:offset. I realize that it is more efficient to convert "segment:offset" in it's present form to a 20-bit real address than to store the "real" segment number 1-16 in the segment register. This is because currently only a bit-shift need be done to multiply the segment by 10h. With the real segment number in there, you'd have to multiply it by 65536 each time to get the segment address and consume more CPU cycles. However, this does not justify continued use of the same words. Perhaps an astute observer could offer and explanation that would more easily allow conceptualization of what the current "segment:offset" really represent.
Ralf.Brown@B.GP.CS.CMU.EDU (01/12/90)
In article <25821@cup.portal.com>, cliffhanger@cup.portal.com (Cliff C Heyer) wrote: }I realize that it is more efficient to convert }"segment:offset" in it's present form to a 20-bit real }address than to store the "real" segment number 1-16 in }the segment register. This is because currently only a }bit-shift need be done to multiply the segment by 10h. }With the real segment number in there, you'd have to }multiply it by 65536 each time to get the segment }address and consume more CPU cycles. However, this does }not justify continued use of the same words. Multiply by 65536 is also a bit-shift (16 bits). }Perhaps an astute observer could offer and explanation }that would more easily allow conceptualization of what }the current "segment:offset" really represent. Many mainframes have what is known as "base registers". To specify a memory address, you give a base register and an offset from that base register. This is precisely what the Intel 80x86 processors do. The segment register is loaded with a base address, and the offset specifies the distance (up to 64K) from that base address. That is also why programs don't use a 20-bit linear address, because they would simply have to split it up again every time it is needed. A major reason for having base registers (and one reason for the Intel segment registers) is to support an address space which takes more bits than are present in a register. If Intel had defined the segment to represent "address * 256", the 8086 would have been able to support 16M of address space at the cost of more wasted RAM (due to the graininess of only being able to start a segment every 256 bytes). Another major reason for having base registers is to allow easy relocation of code. If programs always used linear addresses, every memory reference in a program would have to be patched by the program loader to allow the program to execute at a different position in memory (on machines with virtual memory, you can play with the memory mapping instead, but the Intel family in real mode do not have virtual memory). When you have base registers, the program loader can simply set the base register to the starting location of the program in memory, and the program automatically references the proper real memory locations by specifying the offset from the base register. This is how tiny model (.COM) programs work under MSDOS. Other memory models (.EXE) need to be patched by the loader, but only need the segment register loads patched, not every single memory reference. As for naming, "segment" implies a portion. From the American Heritage Dictionary: ENTRY segment (SEG'muhnt) n. MEANING 1. Any of the parts into which something can be divided. 2. Math. A As used by Intel for the 8086, "segment" means any portion of memory up to 64K in size, starting at any address which is a multiple of 16 (under protected mode on the 80286, a segment has a descriptor which specifies its 24-bit linear starting address [it can start on any byte] and its length [1 byte to 64K bytes], and trying to access beyond the defined length causes a protection violation error). -- UUCP: {ucbvax,harvard}!cs.cmu.edu!ralf -=- 412-268-3053 (school) -=- FAX: ask ARPA: ralf@cs.cmu.edu BIT: ralf%cs.cmu.edu@CMUCCVMA FIDO: Ralf Brown 1:129/46 "How to Prove It" by Dana Angluin Disclaimer? I claimed something? 14. proof by importance: A large body of useful consequences all follow from the proposition in question.
johnl@esegue.segue.boston.ma.us (John R. Levine) (01/12/90)
In article <25821@cup.portal.com> cliffhanger@cup.portal.com (Cliff C Heyer) writes: >To get the "real" segment, you must multiply (base 10 >this example) the so-called segment by 16 and add the >offset. Then you must subtract 65,536 from this total >repeatedly and stop just before you get a negative >number. Then number of times you subtract is the "real" >segment number, and your remainder is the "real" offset >address. I suppose that's one way to look at it, but I find it more convenient to write my programs so that the segments I use are disjoint, keeping the knowledge of how segments map to linear addresses restricted to the memory allocator. Intel in its programming manuals has always urged people to do that. If you look at the 286 and 386, you'll find that in protected mode the segments really are segments, and there is no architectural relationship between segment N and segment N+1 (well, actually N+8, but that's a separate argument.) If your programs are written believing that segments are segments, they are relatively straightforward to port to a protected environment. If you wire in knowledge of the 8086's 16-byte paragraphs, you're in trouble. I am no fan of 16-bit segments and offsets, but if you have to deal with them, you might as well make the best of it. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl "Now, we are all jelly doughnuts."
pipkins@qmsseq.imagen.com (Jeff Pipkins) (01/13/90)
What you are saying is true, for the 8086. I suppose a better connotation would be base:displacement. But any way you slice it, the terminology is the least of the things they *SCREWED*UP*, imho. With the advent of protected mode on the '286 and V86 mode on the '386, the value in the segment register really is a segment number, but a segment is no longer (necessarily) 64k bytes. It is a segment selector. This also presents real problems (no pun intended). On the 8086, before there was an 80286, it was perfectly legitimate to normalize the address so that the segment was between 0 and F. Programs that use this technique will not run under '286 protected mode or V86 mode on the '386. This is the main reason that the term "DOS INcompatibility bos" was coined. [My employer my not share my opinions. Insert your favorite disclaimer here.]
jdudeck@polyslo.CalPoly.EDU (John R. Dudeck) (01/13/90)
In article <25821@cup.portal.com> cliffhanger@cup.portal.com (Cliff C Heyer) writes: >Perhaps an astute observer could offer and explanation >that would more easily allow conceptualization of what >the current "segment:offset" really represent. I share your frustration with trying to understand the 80x86 architecture. A lot of why it is this way has to do with the wonders of evolution... Mentally I think of the offset as being exactly what it sounds like, the displacement into the segment starting from some base address. And a segment can be visualized as a 64k "window" into the address space of the memory. The starting address of that window is the value in the segment register being used. Since segment registers are only 16 bits long, and since the address space is 20 bits wide (in "real" mode), the segment register just contains the 16 most significant bits of the base address of the "window", and the 4 lsb's are always 0, resulting in the situation that segments are aligned on 16-byte boundaries. Of course when you go into "protected" mode, or to the 386, the picture changes again... Really, I don't see too much point in bashing the design decisions made by the designers. Every cpu ever designed is a combination of tradeoff decisions. I do feel it was too bad that IBM chose the 8088 for the PC. The National Semiconductor 16016 would have been a much better choice... or the 32032 even better yet! -- John Dudeck "You want to read the code closely..." jdudeck@Polyslo.CalPoly.Edu -- C. Staley, in OS course, teaching ESL: 62013975 Tel: 805-545-9549 Tanenbaum's MINIX operating system.
rob@prism.TMC.COM (01/16/90)
pipkins@qmsseq.UUCP writes: >With the advent of protected mode on the '286 and V86 mode on the '386, >the value in the segment register really is a segment number, but a segment >is no longer (necessarily) 64k bytes. It is a segment selector. This also >presents real problems (no pun intended). >On the 8086, before there was an 80286, it was perfectly legitimate to >normalize the address so that the segment was between 0 and F. Programs >that use this technique will not run under '286 protected mode or V86 >mode on the '386. Perhaps a nit, but in V86 mode on the 386/486, the segment registers aren't treated as protected-mode style selectors, but as real-mode style pointers to memory paragraphs (though the physical memory they point to can be altered via the paging tables). That's the 'magic' of V86 mode; it lets real mode programs run under protected mode by using real-mode segment translation. As a result, the normalization you mentioned (which is used in accessing 'huge' arrays) is acceptable under V86 mode, though not under standard 286/386/486 protected mode. Ideally, address normalization should be unnecessary under 386/486 protected mode anyway, since it allows segments to be arbitrarily large. A type of normalization can be performed under 286 protected mode if the operating system sets up descriptor tables the right way (OS/2 does this with the various DosGetHuge...() functions). It's slow and cumbersome, but it works.
cliffhanger@cup.portal.com (Cliff C Heyer) (01/17/90)
Well, I guess I was blowin' off some steam when I posted that article. Gotta be more carful. I was the victim of vague books (Howard Sams Co.) which discuss a "segment" as a 64KB chunk only. Nowhere do they say that a segment refers to 16B chunks and that "offset" is NOT the offset within the "current segment" but is really just a pointer to any location above the base "address" of the current segment (segment X 10h). Thanks for your assistance. Cliff