efeustel@prime.com (Ed Feustel) (03/22/91)
As a vendor of segmented architectures, Prime has both an understanding of and appreciation for the benefits and drawbacks of segments. Given a segment whose size can range from 1 - 4GB and given enough of them on a per process basis, one can construct a very good(efficient, secure, etc.) operating system. The major difficulty with a segmented architecture in today's marketplace arises from the use of the language C and C's notion of a pointer as the total address. On a segmented architecture of the Multics variety, any address consists of a compound <S,N> (and probably <P,S,N>.) where S is the segment number, N is an offset in bytes (words), and P is protection information which might include the process number. C assumes that pointers are linear and monotonically increasing. So for C, I must add offsets to take me from one segment to another. Other languages do not have such a limited notion of what an address is and can deal with the fact that an address is a structure rather than an elementary object. If you are saddled with small segments with N < 2**18 or so, you will soon come to hate segments because you have to continually map C addresses onto multiple segments in order to support the linear model. With N ~ 2**32 this is much less of a problem since the need for individual objects of this size is much reduced. One still has a problem with S < 4096 if one adopts the notion of one "object" per segment. This is true even if the address space has a range of unique segment numbers for each process as well as a range of segment numbers which are global. If S>256K, this problem is much reduced. Even with Intel's scheme, the problem is not severe if the granualarity of objects is large enough. Note the size of Intel's address space is 2**13 * 2**32 shared bytes for all processes + n * 2**13 * 2**32 bytes, where n is the number of processes. This logical address space is mapped to the currently implemented 2*32 paged address space which is mapped to a maximum of 16 M.b. to 64 M.B. by the physical architecture of your P.C. Assuming that protection, sharing, and process address spaces are structured on the basis of segments, each of which has its own independent page table (which allows the segment/object to expand and contract independent of all other objects), highly reliable and efficient operating systems can be constructed which have the property that page table overhead is minimized. It should be noted that Intel has chosen to implement page table per process rather than page table per segment on the 386/486. They did do "the right thing" on the 960XA used by BiiN and the military. By using the same page table for every process, sharing of the operating system, code of shared libraries, etc. is enabled. Another difficulty often cited is the requirement of loading segment registers and the cost of doing this. This is an artifact of Intel's 386/486 architecture/implementation which was designed when Silicon area was at a premium. An intelligent implementation in which attention is given to the use of segments need not require such a high penalty for changing segments (as IBM and HP can attest). This is obviously a trade off between registers used for pointing and tlbs which contain the segment information. Thus the intelligent designer should try to discern why the feature is desired, what it costs, and whether its use can be exploited pervasively before disgarding it based on experience with "an existing implementation". Of course I speak for myself and not for Prime when I advocate a re-examination of the benefits of segments. Ed Feustel Prime Computer
firth@sei.cmu.edu (Robert Firth) (03/26/91)
In article <efeustel.669650766@tiger1> efeustel@prime.com (Ed Feustel) writes: >The major difficulty with a segmented architecture in today's marketplace >arises from the use of the language C and C's notion of a pointer as the >total address. No. The major difficulty with a segmented architecture is that it's wrong, and the von-Neumann model is right. This is not a language issue. One of the most fundamental, and most pervasive, idioms in practical computing is the mapping function whose domain is a subset of the natural numbers, in other words array (0..max) of Object This has been true of every language since before Formula Translation I, and will remain true for as long as we have integers and like to count. Yes, the set of integers is dense and monotonically increasing, and hence so will be the set of array indices, and hence, on the natural memory model, so will be the set of object addresses. Don't blame C - as Kroneker said, God made the natural numbers. >If you are saddled with small segments with N < 2**18 or so, you will soon >come to hate segments because you have to continually map C addresses onto >multiple segments in order to support the linear model. With N ~ 2**32 >this is much less of a problem since the need for individual objects of this >size is much reduced. The size of the segment is not the point. The point is that the physical memory is capable of holding an array of a certain size, but the addressing scheme won't let you index it. You have only to hit this problem once in a lifetime, to vow never again to buy a machine with a segmented address structure.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/27/91)
In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: | The size of the segment is not the point. The point is that the | physical memory is capable of holding an array of a certain size, | but the addressing scheme won't let you index it. You have only | to hit this problem once in a lifetime, to vow never again to buy | a machine with a segmented address structure. If your domain ever changes from edu to com you will buy what's cost effective, be it segmented, CISC, or threes complement metric tetradecimal. And in some course or other you will probably find that there's a use for a nonlinear addressing scheme sometimes, and that if the segment size is at least as big as max addressible physical memory your argument above is pretty hard to make. The 8086 is not the generic model of segmented addressing, and faults in one implementation are poor starting points to make a case against any idea or method. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
tbray@watsol.waterloo.edu (Tim Bray) (03/27/91)
firth@sei.cmu.edu (Robert Firth) writes:
The major difficulty with a segmented architecture is that it's
wrong, and the von-Neumann model is right.
...
You have only
to hit this problem once in a lifetime, to vow never again to buy
a machine with a segmented address structure.
This point is so important that I'm going to waste network bandwidth, occupy
the attention of hundreds, etc., with a posting whose meat is a mere:
RIGHT ON!
Tim Bray, Open Text Systems
koll@NECAM.tdd.sj.nec.com (Michael Goldman) (03/28/91)
Why Segmented Architectures Are Wrong I could recite a litany of horror stories about how my segmented life was made intolerable by operating system bugs, application bugs, and compiler bugs ("- The Huge Model doesn't work in your version, but for $75 we'll send you version 4.1.3 Rev G where it does. - Oh, you have a deadline? Well, for $25 more we can express it to you in just 2 weeks! - Really Sir! Such Language!!") . Instead, I will simply point out that segments add complexity to programming, which results in bugs, which take time to find and to fix, which delays time-to-market, which costs money. One can make theoretical arguments and claim that Intel's implementation was limited by current technology, but in practice, these limits are what we will always be facing. The "Keep It Simple" vs. "Hey guys, let's put it in hardware!" battle will never end, and I'm not about to argue with all those Intel CPUs out there, but most programmers prefer simple architectures. Of course if you have a gazillion customer market, requiring a $1 solution, then the above yields to the virtues of a 80188.
renglish@cello.hpl.hp.com (Bob English) (03/28/91)
I suppose that I will get religion someday, but until then... firth@sei.cmu.edu (Robert Firth) writes: > No. The major difficulty with a segmented architecture is that it's > wrong, and the von-Neumann model is right. This is not a language > issue. One of the most fundamental, and most pervasive, idioms in > practical computing is the mapping function whose domain is a > subset of the natural numbers, in other words > array (0..max) of Object If by this you mean that the image presented to the programmer should allow large objects, I'd have to agree with you. I differ, however, with the equation of the data space a programmer perceives (that which the compiler provides) and the native architecture of the memory system. The two need not be the same. "What about performance?" you scream in disgust (I can hear such screams from around the country even as I type). "What about performance?" I ask rhetorically, with a bemused look. The segment sizes this forum has rejected out of hand address 4GB of memory. For all objects less than that segment size, a load of a segment register to access the object should take exactly one cycle per access to the object. Less if the compiler/architecture team is wily enough to know how to avoid such things. In programs with less than 4GB of data (and there are a few of them available in the world), this segment register has to be loaded once per context switch, hardly significant in these days of large CPU contexts. "But there are objects greater than 4GB," you cry and move your fingers to the 'n' key, unable to bear this stupidity any longer. "Of course there are," I answer. I would characterize such objects as belonging to three general types. The first is a large object accessed in a regular way, a large array or matrix, for example. Segment loading and unloading in such an object will be rare, because the compiler will know the segment boundaries and be able to optimize them out of the code. The second is a large object accessed unpredictably with no locality. While the compiler will not be able to predict the segmentation register in such cases, neither will the cache be able to hold the working set, so that miss penalties dominate the additional segment register loads. The third is a large object accessed unpredictably, but with a high degree of locality. In such cases, loads and stores take up to one additional instruction. Only in this case do segments make any difference in the performance of the machine, and even in this case the difference is small. I don't claim to be an expert in such matters, but I suspect the number of applications fitting this last category is small. All of this analysis assumes, of course, that a multi-op implementation of a segmented architecture wouldn't have the ability to parallelize segment loads. Without that assumption, it's very difficult to char- acterize the types of applications where segmentation presents perfor- mance problems. As far as I can see, the only time that a move to a non-segmented architecture is justified from a performance and functionality perspective is when the size of the segments is comparable to the size of the system's cache memory. With 4GB segments, that won't happen in the next few years. There are other justifications, however. First, it could just be cheap to make segments larger (excuse me, eliminate them entirely). Second, it could be cheaper to eliminate segments than to fix the compiler to handler them correctly. Third, it could be that the current cost is not too high, and projections over the life of the architecture lead the designers to believe that 4GB caches will become important before the next architecture revision. Fourth, address/register size could be seen as a differentiator in the marketplace, leading designers to match the current "standard" in order to keep the customers listening. --bob-- renglish@hplabs.hp.com Not an official position of the Hewlett-Packard Company.
guy@auspex.auspex.com (Guy Harris) (03/28/91)
>C assumes that pointers are linear and monotonically increasing. Well, many C programs do so. The ANSI C standard makes an effort not to demand that pointers refer to a linear address space with monotonically-increasing addresses; pointers may be compared only for equality, unless both pointers point into the same array, pointer+integer is defined mainly in terms of array indexing, and pointer-pointer is defined only if both pointers point into the same array. It may be that, at least at present, programs of that sort are sufficiently common that you really *do* have to pretend the address space is one single huge array even on machines where that model isn't natural. Maybe that'll change in the future; it'd certainly be nice if it did. >If you are saddled with small segments with N < 2**18 or so, you will soon >come to hate segments because you have to continually map C addresses onto >multiple segments in order to support the linear model. And also have to deal with "near" and "far" pointers, and multiple memory models, in programs that require more than a segment's worth of code or data - at least in C. What do other programming languages that support pointer-style data types do? Do they also have to deal with "near" and "far" pointers? Or, in the memory models with more than a segment's worth of data, do they just make pointers large? >By using the same page table for every process, sharing of the operating >system, code of shared libraries, etc. is enabled. I assume you mean "simplified" rather than "enabled". Sharing of the operating system, code of shared libraries, etc. is certainly possible on systems that have per-process page tables....
<DXB132@psuvm.psu.edu> (03/29/91)
In article <6862@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) says: >>C assumes that pointers are linear and monotonically increasing. >Well, many C programs do so. The ANSI C standard makes an effort not to I'm a little curious about this segmented stuff: What about, in a 64 bit machine (address) , using the lower 32 bits of a pointer as the segment offset and the upper 32 bits as a segment number. Has this been done before? Can it be done efficiently on a "normal" MMU arrangement? Thanks for any answers... -- Dan Babcock
rminnich@super.ORG (Ronald G Minnich) (03/29/91)
In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >No. The major difficulty with a segmented architecture is that it's >wrong, and the von-Neumann model is right. What does segmentation have to do with von-Neumann-ness and the lack of it? Just curious, do you use Suns? If so, then since SunOS 4.0 you have been using a segmented architecture. Of course, the segmentation is provided by the SYSTEM, not the architecture, but ... Are your comments based on experiences with segmentation done wrong (a la Intel, Burroughs)? Just wondering. ron -- "Socialism is the road from capitalism to communism, but we never promised to feed you on the way!"-- old Russian saying "Socialism is the torturous road from capitalism to capitalism" -- new Russian saying (Wash. Post 9/16)
dafuller@sequent.UUCP (David Fuller) (03/29/91)
In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <efeustel.669650766@tiger1> efeustel@prime.com (Ed Feustel) writes: > >>The major difficulty with a segmented architecture in today's marketplace >>arises from the use of the language C and C's notion of a pointer as the >>total address. > >No. The major difficulty with a segmented architecture is that it's >wrong, and the von-Neumann model is right. This is not a language >issue. One of the most fundamental, and most pervasive, idioms in >practical computing is the mapping function whose domain is a >subset of the natural numbers, in other words > > array (0..max) of Object > I would tend to ally with Ed Fuestel here; if you look at the 8086 scheme it fits really well for Pascal: 4 segments, one each for code, data, heap and stack. The Pascal runtime system insulates you from the details of the machine but there's no reason you can't have arbitrary-sized arrays as long as you use the runtime's idioms. If you insist on manipulating machine-level structures as pointers (an idea that still makes me queasy) then you get what you deserve (recalling the DG port I did where the hibit indicated a "char" pointer *shiver*). C is a lousy fit on segmented architectures. I hope never to code another FAR pointer as long as I live. I hope also that previous work done on 286 Xenix remains unbuggy forever. I would also question whether "unsegmented" is a necessary feature of Von Neumann architectures. Respectfully, Dave -- Dave Fuller Sequent Computer Systems Think of this as the hyper-signature. (708) 318-0050 (humans) It means all things to all people. dafuller@sequent.com
elg@elgamy.RAIDERNET.COM (Eric Lee Green) (03/29/91)
From article <1991Mar27.172325.10800@sj.nec.com>, by koll@NECAM.tdd.sj.nec.com (Michael Goldman): > Why Segmented Architectures Are Wrong > > I could recite a litany of horror stories about how my segmented life was > made intolerable by operating system bugs, application bugs, and compiler > bugs ("- The Huge Model doesn't work in your version, but for $75 we'll You're confusing Intel segments with "real" segments. Intel's basic problem is that their segment size is tiny. A "C" programmer (and "C" compiler) should not have to worry about segment size in most instances. 1) Shared libraries and other shared code. A shared library is a strange object. In a "flat" address space, you must either make it reside at a fixed address in EACH AND EVERY PROCESS, or you must specially write it with no, absolutely no, absolute references, so that it can reside at different locations in different processes. The latter may require a lot of special "smarts" on the part of the compiler and library writer. On a segmented machine, call the OS "ObtainLibPtr" call to map the library segment into your "object space". The first <n> words of the segment might be the jump table for library routines. Code can thus reside in different places in different processes, and no relocating or special non-absolute referencing modes need be employed. 2) Maintaining large objects that grow and shrink. In a sequential address space, often you can't "grow" an object because something else has been allocated in the addresses immediately afterwards. And thus you may end up re-allocating it somewhere else and possibly copying a whole lot of data. You could re-write your code to use some other data structure, true, but in many cases there's a decided speed disadvantage to doing that. Or you could then decide to plunk your object into a part of the address space where you HOPE the rest of the program's data won't go, but at best that's a kludge, at worst you'll guess wrong. Segments represent an elegant and logical solution to this set of problems. 3) Mapping shared objects into the address space. This can be done on a machine with a "flat" address space, of course. But if you want to do 2) above, have a large shared object that shrinks and grows (let's say, perhaps, you want to share an editor buffer between the editor and compiler), you have problems. If the shared object contains embedded addresses, e.g. it's a linked list or B-Tree or other such data structure, you have even worse problems... basically, it can't be done, not without mapping the object into the same addresses in each address space (which has collision potential... what if some other desired object is also mapped at that same address space?). The "solution", for flat address space people, is simply not to do it, to use shared memory only as an IPC mechanism rather than as a method of truly sharing objects. The biggest obstacles confronting segmented architectures: 1) Intel gave segments a bad name. 2) "C" is set up to compile "flat" PDP-11-like code, and is not the sort of object-oriented language that would map naturally onto a segmented architecture. 3) Operating systems. Current "merchant" operating systems such as Unix are tied to "lowest common denominator" hardware, and do not have provisions for segmentation. Proprietary operating systems are expensive to design and build, and have the problem of attracting sufficient software to make them commercially viable (unless you're DEC and you just came out with VMS as the primary OS for the world's first "super-mini"). 4) Complexity. Segment tables add an additional level of complexity to a MMU. The RISC folks, after stripping all the cruft out of their CPU's, aren't likely to consider putting object-oriented cruft back into their MMU's. After all, their business is Unix and "C", neither of which have any provisions for handling segments. Given the current predominance of Unix and "C", I don't see how segmentation could become an identifying characteristic of any new general-purpose computer architecture. This doesn't mean that segments are a bad idea, though. It just means that it is not commercially viable, at this time, under current conditions. -- Eric Lee Green (318) 984-1820 P.O. Box 92191 Lafayette, LA 70509 elg@elgamy.RAIDERNET.COM uunet!mjbtn!raider!elgamy!elg Looking for a job... tips, leads appreciated... inquire within...
dana@locus.com (Dana H. Myers) (03/29/91)
In article <1991Mar27.172325.10800@sj.nec.com> koll@NECAM.tdd.sj.nec.com (Michael Goldman) writes: > Why Segmented Architectures Are Wrong [edited for brevity] >I will simply point out that segments add >complexity to programming, which results in bugs, which take time to find >and to fix, which delays time-to-market, which costs money. > >One can make theoretical arguments and claim that Intel's implementation >was limited by current technology, but in practice, these limits are what >we will always be facing. The "Keep It Simple" vs. "Hey guys, let's put it in >hardware!" battle will never end, and I'm not about to argue with all those >Intel CPUs out there, but most programmers prefer simple architectures. > >Of course if you have a gazillion customer market, requiring a $1 solution, >then the above yields to the virtues of a 80188. The segmented vs. linear addressing architecture argument is moot. Changes in the 80386 allow one to effectively ignore the segments and use linear addresses. System V/386 does this, AIX-PS/2 does this, etc. Further overzealous condemnation of Intel CPUs is pointless and rhetorical, especially given that Intel has left the segmented architecture behind in the 1980's. The 80860 and 80960 are, functionally speaking, not segmented machines. -- * Dana H. Myers KK6JQ | Views expressed here are * * (213) 337-5136 | mine and do not necessarily * * dana@locus.com | reflect those of my employer *
ig@caliban.uucp (Iain Bason) (03/29/91)
This whole discussion on segmented architectures is getting a little confusing. The problem is that most posters seem to be drawing conclusions about segmentation in general based upon their knowledge of particular segmented architectures. Now, there's nothing wrong with basing one's opinions on one's experience. However, I for one am not very familiar with any segmented architectures, and I'm having trouble trying to discern what these various architectures look like. So, why don't we try to debate several specific issues separately? For instance, (a) Should high-level languages try to hide the nature of machine addressing from the programmer? (Of course, that can bring on a debate over whether C is a high level language, and we can waste some more bandwidth.) (b) Should the segment number be manipulated separately from the offset (i.e., should we have segment registers)? (c) What should happen when a pointer in one segment is subtracted from a pointer in another segment? (d) What should happen when the addition of an integer to a pointer results in the overflow of the pointer's offset part? (e) Should segment sizes be fixed or variable? That is, should the number of bits devoted to the offset in a pointer be fixed or variable? (f) What impact will the answers to the above questions have on cache design, MMU design, or world peace? My best guesses right now: (a) No. (And maybe :->.) (b) It depends on how many segment registers you allow. With a sufficient number the compiler can avoid swapping segment registers. However, the only benefit I can think of for keeping them separate is to reduce the amount of memory a pointer consumes, which doesn't really seem that important these days. The big problem (as far as I can tell) is that a pointer aliases a number of different objects. That is, pointers don't uniquely identify objects. (c) I don't know. It seems simplest just to concatenate the segment and offset and treat the combination as an integer. (d) Anything but wrap the offset around without changing the segment. I don't see how that can make sense in any reasonable model. (e) Fixed and large seems usable and implementable. (f) I don't know. Nothing obvious here that I can see. By the way, I am not convinced that segmentation is a good thing, regardless of the answers to these questions. I hope that by considering various aspects of segmentation we can decide what benefits it can bring, and what costs it bears. -- Iain Bason ..uunet!caliban!ig
efeustel@prime.com (Ed Feustel) (03/29/91)
The BiiN architecture used 32 bits for segments + protection info and 32 bits as a byte offset.
efeustel@prime.com (Ed Feustel) (03/29/91)
I think this article has the best suggestion for followup that I have seen on Comp.Arch in some time.
mash@mips.com (John Mashey) (03/31/91)
In article <1991Mar29.011956.2801393@locus.com> dana@locus.com (Dana H. Myers) writes: ... > The segmented vs. linear addressing architecture argument is moot. >Changes in the 80386 allow one to effectively ignore the segments and >use linear addresses. > > System V/386 does this, AIX-PS/2 does this, etc. > > Further overzealous condemnation of Intel CPUs is pointless and >rhetorical, especially given that Intel has left the segmented architecture >behind in the 1980's. The 80860 and 80960 are, functionally speaking, not >segmented machines. Well, not quite. People shifted to 32-bit flat as soon as they could on the 386/486, but the chips clearly include a 48-bit (16+32) segmented address scheme as well. Here are a few interesting questions: 1) Does any software in common use actually make use of the segmentation to get significantly more than 32-bit addresses? (i.e., I mean more than, perhaps, dedicating a segment to code and one to data, and maybe one to stack? [I hope to get answers to this one.] 2) The 586 is reputed to be a 64-bit architecture. Does this mean that the 16+32 scheme is abandoned, or that it is inclued along with >32-bit flat addressing? [I don't expect an answer on that; it is an interesting question.] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.com (John Mashey) (03/31/91)
In article <00670208556@elgamy.RAIDERNET.COM> elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes: >2) Maintaining large objects that grow and shrink. In a sequential address >space, often you can't "grow" an object because something else has been >allocated in the addresses immediately afterwards. And thus you may end up Actually, I Don't think this quite right. Consider the difference between a scheme that has X-bit segment numbers and Y-bit byte addresses within the segment, and compare with one that has an X+Y-bit flat address space. In the first case, using typical designs, you get 2**X segments of size 2**Y, which usually means that objects are CONVENIENTLY 2**Y maximum size. the X+Y-bit flat address machine can simulate the same thing rather conveniently... On the other hand, the X+Y-bit flat machine can provide 2**(X-1) segments of size 2**(Y+1), 2**(X+1) segments of size 2**(Y-1), etc. In both cases, if things get larger than the space reserved, you have to work harder, but in general, the flat-addressing machine may have the convenience of variable granularity. The segmented design may, or may not. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.com (John Mashey) (03/31/91)
Note: people interested in this topic should especially consider attending the ASPLOS panel run by Dave Patterson, which includes a panel and audience discussion of several topics, including segmentation for >32-bits. In article <1991Mar27.193512.12417@cello.hpl.hp.com> renglish@cello.hpl.hp.com (Bob English) writes: ... >I would characterize such objects as belonging to three general types. >The first is a large object accessed in a regular way, a large array or >matrix, for example. Segment loading and unloading in such an object >will be rare, because the compiler will know the segment boundaries and >be able to optimize them out of the code. I don't quite understand this, but I could be convinced. In fact, this could lead to an interesting discussion. Let me suggest the simplest conceivable comparison, which is to take the inner loop of the rolled DAXPY routine from linpack - code included later, but whose salient feature is: do 30 i = 1,n dy(i) = dy(i) + da*dx(i) 30 continue where dy,dx,da, and n all arrive to the code as arguments. Maybe someone would post the likely code, for the loop above, for an architecture with segmentation (HP PA would be interesting, as the scheme seems generally well-thought-out, and HP's compilers are good), for the following cases: 1) Standard HP-UX, i.e., what do you get if you assume flat addressing? 2) What you would get, if dy and dx can be in separate segments, and neither is >4GB? (easy case: just load up 2 segment regs, once). 3) What you need to do in the general case, which is that either dx or dy, or both could be >4GB, or (enough to cause the problem) that either or both cross segment boundaries? (I think this code either takes the easy way out, and does 2 segment manipulations per iteration, or else gets compiled into something much more complex, but I can be convinced.) Recall that the likely situation to be faced is that some FORTRAN programmer is told they can have bigger arrays, and they simply set the sizes of the arrays up, recompile, and want it to work. Note also, that FORTRAN storage allocation has certain implications for what you can and can't do regarding rearrangement of where data is. (Also, a question: I assume on HP PA implementations that Move-to-Space Register instructions are 1-cycle operations, with no additional latency needed before a load/store? Hmm. Another question, since PA has 4 Space Registers that user code can play with (I think), are there conventions for their use, i.e., like callee-save - caller-save conventions for the regular registers? or are they all caller-save? (I ask because the code for do 30 i = 1,n dy(i) = dy(i) + da*dx(i) 30 continue AND do 30 i = 1,n dy(i) = dy(i) + da*dx(i) call function(da) 30 continue could look rather different in their ability to just set the Space registers and be done with it. >The second is a large object accessed unpredictably with no locality. >While the compiler will not be able to predict the segmentation register >in such cases, neither will the cache be able to hold the working set, >so that miss penalties dominate the additional segment register loads. Agreed. If there is no locality, cache and TLB missing eats the machines. >The third is a large object accessed unpredictably, but with a high >degree of locality. In such cases, loads and stores take up to one >additional instruction. Only in this case do segments make any >difference in the performance of the machine, and even in this case the >difference is small. I don't claim to be an expert in such matters, but >I suspect the number of applications fitting this last category is small. DBMS, and other things that follow pointer chains around. Conventional wisdom says that loads+stores are 30% of the code, and so some subset of these incur at least 1 extra cycle. However, I suspect that in the general case, you have to keep track of the segment numbers, and pass them around, just like you do on X86 with far pointers, and hence there are more instructions, and in addition, need to keep the space numbers around in integer registers for speed in some cases. (Note that every pointer reference is conceptually 64-bits, and hence, every pointer argumement needs 2 32-bit quantities, and probably close to 2X more instructions to set up. Also, consider the code on a 32-bit machine for: *p = *q; where both p and q are pointer to pointers. and bot start in memory: this would typically look like (on typical 32-bit RISC): load r1,q load r2,p load r3,0(r1) store r3,0(r2) I think this turns into, on smething like HP PA (but correct me if I'm wrong), and assuming that c pointers turn into 64-bit things: load r1,q load r4,q+4 get SPACE ID movetospaceregister r4,somewhere1 load r2,p load r5,p+4 get SPACE ID movetospaceregister r5,somewhere2 load r3,0(r1) and do whatever you have to to get somewher1 load r6,4(r1) get SPACE ID store r3,0(r2) save the pointer; do what you must to get somewhere2 store r6,4(r2) save the SPACE ID In this case, 4 instructions have turned into 10. I wouldn't preend this example is typical or not, and I'd expect compilers would do better, but it is illustrative of what could happen. Anyway, to get some serious analysis of this, I think one has to look at code sequences under various assumptions, and see a) What speed is obtainable by perfect hand-code? b) How likely are compilers to get there? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (03/31/91)
In article <00670208556@elgamy.RAIDERNET.COM> elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes: >This doesn't mean that segments are >a bad idea, though. Architecutual support for segmentation is a bad idea. On a flat addressed architecture, segment can be done easily by software. >3) Mapping shared objects into the address space. > This can be done on a machine with a "flat" address space, of course. >But if you want to do 2) above, have a large shared object that shrinks and >grows (let's say, perhaps, you want to share an editor buffer between the >editor and compiler), you have problems. If the shared object contains >embedded addresses, e.g. it's a linked list or B-Tree or other such data >structure, you have even worse problems... basically, it can't be done, not >without mapping the object into the same addresses in each address space >(which has collision potential... what if some other desired object is also >mapped at that same address space?). The "solution", for flat address space >people, is simply not to do it, to use shared memory only as an IPC >mechanism rather than as a method of truly sharing objects. With segmented architecture, to have embedded addresses, you have problems how to secify segment number. If the segment number is also embedded (which is unavoidable for an inter- segment pointer), the number must have the same meaning on all related processes. If it is possible, it is also possible, on flat addressed machine, to assign the same addresses to simulated segments. BTW, not assigning the same virtual address to the shared memory is often impossible, because on some architecture, aliasing problem related to cache and virtual-to-physical address translation exists. If the segment number is implicitely specified, it can be simulated, on flat addressed machine, to have embedded address offset and a base register. Adding a base register on address calculation is only as slow as adding a segment register. Masataka Ohta
dmocsny@minerva.che.uc.edu (Daniel Mocsny) (03/31/91)
In article <1991Mar27.172325.10800@sj.nec.com> koll@NECAM.tdd.sj.nec.com (Michael Goldman) writes: >Instead, I will simply point out that segments add >complexity to programming, which results in bugs, which take time to find >and to fix, which delays time-to-market, which costs money. Incidentally, so does having more than one architecture to support. If the solution to a segmented architecture is a segmented market, one has to wonder whether that is a step forward or backward. Almost everybody probably can agree that segments do lousy things to the programmer. (Even though I do my best to hide behind compilers, I've done brilliant things like linking in the wrong memory model of a function library, which generated an incomprehensible linker error, for which the reference manual explanation was completely misleading, and cost me about a day to figure out where I screwed up). However, what is the best way to fix the segment problem? By having 50 dozen superior new architectures to grapple with? So here is a question. Which would you rather program, 1 lousy architecture, or N nice but mutually incompatible architectures? For what minimum value of N does the 80x86 turn out to be simpler? My guess is that today the 80x86 is by far the simplest architecture to program *per customer*. All the more reason to develop completely portable language and user-interface standards, and then let the hardware vendors compete to see how well they can run generic programs. Instead of having hardware vendors compete to see how many programmers they can capture. -- Dan Mocsny Internet: dmocsny@minerva.che.uc.edu
amos@SHUM.HUJI.AC.IL (amos shapir) (03/31/91)
[Quoted from the referenced article by renglish@cello.hpl.hp.com (Bob English)] > >The segment sizes this forum has rejected out of hand address 4GB of >memory. For all objects less than that segment size, a load of a >segment register to access the object should take exactly one cycle per >access to the object. >In programs with less than 4GB >of data (and there are a few of them available in the world), this >segment register has to be loaded once per context switch, hardly >significant in these days of large CPU contexts. > One case you forgot is that of many small segments, which together amount to more than one segment size. You could end up thrashing between different segments even if no single object is big enough to overflow a segment; all the arguments about big objects do not hold in this case. -- Amos Shapir amos@shum.huji.ac.il The Hebrew Univ. of Jerusalem, Dept. of Comp. Science. Tel. +972 2 585257 GEO: 35 14 E / 31 46 N
jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) (04/01/91)
In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >The size of the segment is not the point. The point is that the >physical memory is capable of holding an array of a certain size, >but the addressing scheme won't let you index it. You have only >to hit this problem once in a lifetime, to vow never again to buy >a machine with a segmented address structure. for (i=100; --i>=0; ) { repeat_after_me("All the world is not a VAX.\n"); } The ordering of bytes in a word, or the numbering of bits in a byte, is not ordained by Natural Law. If you don't assume that pointers and integers should be interchangeable as a matter of Natural Law, all things are possible. How do you address memory space greater than the size of a machine register? -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jmaynard@thesis1.med.uth.tmc.edu | adequately be explained by stupidity. "You can even run GNUemacs under X-windows without paging if you allow about 32MB per user." -- Bill Davidsen "Oink!" -- me
DXB132@psuvm.psu.edu (04/01/91)
In article <7920@uceng.UC.EDU>, dmocsny@minerva.che.uc.edu (Daniel Mocsny) says: >Almost everybody probably can agree that segments do lousy >things to the programmer. (Even though I do my best to hide behind What segmentation scheme are you talking about? Let me expound a little on a segmentation scheme mentioned earlier. You have 64 bit addresses, with 32 bits of offset and 32 bits of segment number. There are no programmer-visible segment registers, no "memory models" or such crap. This kind of scheme solves some sticky problems. For example, it offers a solution to memory fragmentation. Each allocated memory region is assigned a unique number (the segment number), and the application manipulates only the offset. The OS can move memory regions around in physical memory to eliminate fragmentation. Also, we can make these segments an exact length, not neccessary always a multiple of 4K like paging schemes. That may sound a little inefficient compared with paging, but your Unix system crashing after a few weeks due to memory fragmentation has to be inefficient too. What do you think; am I too idealistic? :-) -- Dan Babcock
ckp@grebyn.com (Checkpoint Technologies) (04/01/91)
Suppose you took a machine with a very large pointer; 32 bits will do for arguments' sake, but you could imagine this with 48 or 64 if you like. Then let's say the operating system permits an application to have a sparse virtual address space. Then applications could choose some number of upper address bits and designate those as "segment numbers", and the rest of the bits as "offset". Now, what significant differences exist between this and a "real" segmented machine? I can't think of any offhand... -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / ckp@grebyn.com \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
peter@ficc.ferranti.com (Peter da Silva) (04/01/91)
In article <7920@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes: > So here is a question. Which would you rather program, 1 lousy > architecture, or N nice but mutually incompatible architectures? N nice and effectively compatible ones. Outside of the 80x86 family, all my big portability problems are caused by differences in *software* architectures or buggy code. The 80x86 is the only one where irreconcilable hardware differences show up. This includes multiple operating systems and compilers. My biggest headache right now is moving stuff from one compiler to another on the same revision of the same computer... one of the compiler vendors implemented ANSI prototypes in a really lax manner, and added a bunch of extra keywords. Gol darn caniglyoon razzafrazzing Lattice C. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
kittlitz@granite.ma30.bull.com (Edward N. Kittlitz) (04/01/91)
In article <1991Apr1.045051.3220@grebyn.com> ckp@grebyn.com (Checkpoint Technologies) writes: >Suppose you took a machine with a very large pointer; 32 bits will do >for arguments' sake, but you could imagine this with 48 or 64 if you >like. Then let's say the operating system permits an application to >have a sparse virtual address space. Then applications could choose >some number of upper address bits and designate those as "segment >numbers", and the rest of the bits as "offset". > >Now, what significant differences exist between this and a "real" >segmented machine? I can't think of any offhand... Segments give you access control. The 386 will let you put multiple segments within one page, each with differing differing access rights. I believe that such an architecture may provide a convenient way for implementing protected object-oriented systems. It would be better if they had a TLB instead of the one per segment-register 'shadow registers'/descriptor cache. (I must admit I don't know if there is a TLB in the 486.) ---------- E. N. Kittlitz kittlitz@world.std.com / kittlitz@granite.ma30.bull.com Contracting at Bull, but not alleging any representation of their philosophy.
efeustel@prime.com (Ed Feustel) (04/02/91)
One of the better uses for segments is when the segment is variable size. The size is tailored to the object that is represented by the segment. If each segment has its own page table, then the segment can grow or contract independent of all other segments as was suggested in a previous article on the subject. One should not be forced to have a segment which is 2**y bytes long. One should have a segment that is n-bytes where n is the size of the object. One can compromise this to have segment sizes which are multiples of words or pages in order to improve performance. A stack can use this feature in that a segment fault for length should result when one attempts to step off the segment.
barmar@think.com (Barry Margolin) (04/02/91)
In article <1991Apr1.045051.3220@grebyn.com> ckp@grebyn.com (Checkpoint Technologies) writes: >Suppose you took a machine with a very large pointer; 32 bits will do >for arguments' sake, but you could imagine this with 48 or 64 if you >like. Then let's say the operating system permits an application to >have a sparse virtual address space. Then applications could choose >some number of upper address bits and designate those as "segment >numbers", and the rest of the bits as "offset". > >Now, what significant differences exist between this and a "real" >segmented machine? I can't think of any offhand... My experience with "real" segmented machines is limited to Multics on Honeywell Level 68 and DPS8 hardware. In this architecture and OS, segments are used to manage memory sharing and file mapping. In the case of shared memory, the entry in the segment table for each process using a shared segment would point to the same segment descriptor in the kernel, and the segment descriptor contains the page table entries. This way, when the segment grows or shrinks, all the processes see the change; if this were done using per-process page tables, there would have to be a routine that goes around updating all the processes' page tables (and what happens if one of the processes didn't leave enough room after the shared memory?). The use of real segments in memory management serves the same purpose as inodes in the file system: processes are like directories, segment descriptors are like inode numbers, and page tables are like inodes. The relevance to file mapping is that protection modes are implemented at the segment level, rather than at the page level. A process either has read-write, read-only, or no access to all of a segment. Since segment tables tend to be smaller than page tables, this probably reduces the amount of silicon needed to implement memory protection. Since Multics has a fairly elaborate memory protection system (in addition to the aforementioned read-only vs read-write, there are also protection rings), this was probably an important simplification. Since it's likely that the necessary protection of all of a segment will be the same, the lost flexibility can be negligable (although Multics did need to special-case gate segments, which an outer-ring caller could only execute by transfering to certain offsets, in order to guarantee that the appropriate entry sequence was executed). -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
huck@aspen.IAG.HP.COM (Jerry Huck) (04/02/91)
Let me try to explain some the ways PA-RISC is used by HP-UX and its relationship to segmentation. But first a couple of notes on PA-RISC segmentation. PA-RISC uses segmentation to extend the addressability of the normal general register file. It is not a partition of these registers into pieces. Segments are 2^32 in size and give capability in several areas. At the point when register sizes increase (such as the R4000 path) one expects the segmentation size to increase. The crucial tradeoffs are in silicon area for register files, datapaths, and ALUs, that is, the pieces of the CPU that must be increased to accommidate larger flat addressing. So for HP, segmentation was not a trade-off against flat addressing, but rather: is it useful to extend beyond the maximum flat addressing you can support in your general register file? At the time, 1982-1983, 32-bit general registers gave at least a ten year horizon. Wider registers would have resulted in non-competitive machines in the existing technology. I think most of the arguments against segmentation assume you give up some flat addressing to get it. That's not necessary. The inclusion of segmentation offered an efficient scheme to extend addressability with little hardware cost. All the hardware support for this extended addressing is well partitioned in the TLB control with no worse cycle-time cost than process ID extensions found in per process TLBs (we assumed flushing the TLB on context switch is to be avoided). The primary benefactors are the OS and database subsystems. The presence of segmentation (what we call long addresses) is not exposed to the programs (not to mention that languages have no way to talk about segmentation). We find many situations were objects remain <2^32 in size yet the aggregrate space greatly exceeds 2^32. Larger objects can be managed if some additional structure exists. For example, a large database can span multiple segments when all database accesses deal with page size buckets (not uncommon). There are many ways to solve all these problems; we found segmentation in PA-RISC to very effective in dealing with these applications. >In comp.arch, mash@mips.com (John Mashey) writes: > In article <1991Mar27.193512.12417@cello.hpl.hp.com> renglish@cello.hpl.hp.com (Bob English) writes: > ... > >I would characterize such objects as belonging to three general types. > >The first is a large object accessed in a regular way, a large array or > >matrix, for example. Segment loading and unloading in such an object > >will be rare, because the compiler will know the segment boundaries and > >be able to optimize them out of the code. > I don't quite understand this, but I could be convinced. In fact, this > could lead to an interesting discussion. Let me suggest the simplest > conceivable comparison, which is to take the inner loop of the rolled > DAXPY routine from linpack - code included later, but whose salient feature > is: > do 30 i = 1,n > dy(i) = dy(i) + da*dx(i) > 30 continue > where dy,dx,da, and n all arrive to the code as arguments. > Maybe someone would post the likely code, for the loop above, for an > architecture with > segmentation (HP PA would be interesting, as the scheme seems generally > well-thought-out, and HP's compilers are good), for the following cases: In general, you would not attempt to let objects (especially fortran arrays) span segment (what we call space) boundaries and generate run-time checks for crossing. As suggested above, we generally confine normal objects to a single flat space of 32 bits. > 1) Standard HP-UX, i.e., what do you get if you assume flat > addressing? Nothing unusual. The normal loads and stores one normally expects. HP-UX only presents the short (roughly flat) addressing mode to the user. There's a little complication with short addressing that might create short pointer to long pointer conversions (2 instructions) when the compiler is not sure if zero based array addressing would wrap into another short pointer quadrant. > 2) What you would get, if dy and dx can be in separate segments, > and neither is >4GB? (easy case: just load up 2 segment regs, > once). On HP-UX this is speculation mode since we don't support it. But if we did, then the sequence would be something like: <load up the long pointers> <move the segment number of dy in one of four segment registers> <move the segment number of dx in one of four segment registers> <any other loop setup stuff: trip counts, indexes...> loop: fldws,ma 8(segmentdxreg,dxbasereg),dxreg ;get value and skip to next fldws (segmentdyreg,dybasereg),dyreg ;get value fmul,dbl dareg,dxreg,mulreg fadd,dbl mulreg,dyreg,dyreg addib,< 1,tripcount,loop fstws,ma dyreg,8(segmentdyreg,dybasereg) > 3) What you need to do in the general case, which is that either > dx or dy, or both could be >4GB, or (enough to cause the problem) > that either or both cross segment boundaries? > (I think this code either takes the easy way out, and does > 2 segment manipulations per iteration, or else gets compiled into > something much more complex, but I can be convinced.) As suggested earlier, this is not what we use segmentation for. If you need > 32 bit indexes you probably need > 32 bit registers. If common objects are bigger than 2^32 bytes, then you would want > 32 bit flat addressing. At least simulating this on PA-RISC would be faster than any other shipping RISC microprocessor :-). (Well at least SPARC, MIPS, 88K, and RS6000). Of course that doesn't matter, if it's important, you'll want flat addressing that does it more simply. > Recall that the likely situation to be faced is that some FORTRAN > programmer is told they can have bigger arrays, and they simply set the > sizes of the arrays up, recompile, and want it to work. Note also, that > FORTRAN storage allocation has certain implications for what you can and > can't do regarding rearrangement of where data is. (Also, > a question: I assume on HP PA implementations that Move-to-Space Register > instructions are 1-cycle operations, with no additional latency needed > before a load/store? Hmm. I'm not sure on that. I would not spend much silicon making that superfast given the typical use. > Another question, since PA has 4 Space Registers > that user code can play with (I think), are there conventions for their > use, i.e., like callee-save - caller-save conventions for the regular > registers? or are they all caller-save? (I ask because the code for sr0,sr1,sr2 are caller saves, sr3,sr4 are callee saves, and sr5, sr6, sr7 are managed by the OS and not writable by the user. > >The second is a large object accessed unpredictably with no locality. > >While the compiler will not be able to predict the segmentation register > >in such cases, neither will the cache be able to hold the working set, > >so that miss penalties dominate the additional segment register loads. > Agreed. If there is no locality, cache and TLB missing eats the machines. > >The third is a large object accessed unpredictably, but with a high > >degree of locality. In such cases, loads and stores take up to one > >additional instruction. Only in this case do segments make any > >difference in the performance of the machine, and even in this case the > >difference is small. I don't claim to be an expert in such matters, but > >I suspect the number of applications fitting this last category is small. > DBMS, and other things that follow pointer chains around. > Conventional wisdom says that loads+stores are 30% of the code, > and so some subset of these incur at least 1 extra cycle. > However, I suspect that in the general case, you have to keep track > of the segment numbers, and pass them around, just like you do > on X86 with far pointers, and hence there are more instructions, > and in addition, need to keep the space numbers around in integer > registers for speed in some cases. (Note that every pointer reference > is conceptually 64-bits, and hence, every pointer argumement needs 2 > 32-bit quantities, and probably close to 2X more instructions to set up. > Also, consider the code on a 32-bit machine for: > *p = *q; > where both p and q are pointer to pointers. and bot start in memory: > this would typically look like (on typical 32-bit RISC): > load r1,q > load r2,p > load r3,0(r1) > store r3,0(r2) > I think this turns into, on smething like HP PA (but correct me if I'm wrong), > and assuming that c pointers turn into 64-bit things: > load r1,q > load r4,q+4 get SPACE ID > movetospaceregister r4,somewhere1 > load r2,p > load r5,p+4 get SPACE ID > movetospaceregister r5,somewhere2 > load r3,0(r1) and do whatever you have to to get somewher1 > load r6,4(r1) get SPACE ID > store r3,0(r2) save the pointer; do what you must to get somewhere2 > store r6,4(r2) save the SPACE ID > In this case, 4 instructions have turned into 10. I wouldn't preend this > example is typical or not, and I'd expect compilers would do better, > but it is illustrative of what could happen. Alternatively, any reuse of the pointer avoids the movetospace operations when dealing with 32bit objects. Any looping or database like access to records would also avoid the overhead. > Anyway, to get some serious analysis of this, I think one has to > look at code sequences under various assumptions, and see > a) What speed is obtainable by perfect hand-code? > b) How likely are compilers to get there? I'm not sure what "this" is but one would certainly not propose segmentation as the mechanism to address common array objects that exceed the flat addressability of the machine. Nor would you use 32bit load instructions when the primary pointer size was > 32 bits (not that John was). It would be similar to an architecture that only only allowed loading 32 bit floating-point variables :-). HP-UX and the proprietary MPE/XL operating systems make use of long pointers as well as some of our database vendors. It is very convenient to be able to directly access > 2^32 bytes without operating system involvement. Just don't get carried away with it. Jerry Huck Hewlett Packard
davecb@yunexus.YorkU.CA (David Collier-Brown) (04/02/91)
In article <00670208556@elgamy.RAIDERNET.COM> elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes: | 2) Maintaining large objects that grow and shrink. In a sequential address | space, often you can't "grow" an object because something else has been | allocated in the addresses immediately afterwards. And thus you may end up mash@mips.com (John Mashey) writes: | Actually, I Don't think this quite right. Consider the difference | between a scheme that has X-bit segment numbers and Y-bit byte addresses | within the segment, and compare with one that has an X+Y-bit flat address | space. In the first case, using typical designs, you get 2**X segments | of size 2**Y, which usually means that objects are CONVENIENTLY | 2**Y maximum size. the X+Y-bit flat address machine can simulate the same | thing rather conveniently... Er, I'm going to attack this whole thread... I think the use of segments to describe any fixed size construct is horribly wrong. A segment, in its youth, was a name. Your pre-multics 7090-clone assembler program had one or more code segments, an initialized data segment and an uninitialized (``bss'') data segment. Multics tried to generalize these into a thing which could either have its existance in core, pointed to by a descriptor, or on disk, pointed to by a pathname. Alas, those segments had fixed maximum sizes. Unix returned us to the first model, and lost the elegant mapping to files. Intel returned us to too-small fixed-size segments, possibly due to a too literal translation of what they found in a Honeybun [did you notice the rings and gates, bye the bye?] Bah, humbug (:-)). I think we need to avoid the term segment, unless we're really talking about laying assembly code out in memory. Do consider paging in files, with the understanding that they may have to be relocated in order to grow and shrink, but avoid segments like the plague: the word has stopped meaning anything, save when talking about pie-shaped chunks of disk. --dave -- David Collier-Brown, | davecb@Nexus.YorkU.CA | lethe!dave 72 Abitibi Ave., | Willowdale, Ontario, | Even cannibals don't usually eat their CANADA. 416-223-8968 | friends.
firth@sei.cmu.edu (Robert Firth) (04/02/91)
In article <56399@sequent.UUCP> dafuller@sequent.UUCP (David Fuller) writes: >I would tend to ally with Ed Fuestel here; if you look at the 8086 >scheme it fits really well for Pascal: > >4 segments, one each for code, data, heap and stack. Then you have solved a problem that stumped me, back when I was faced with exactly this problem - design a Pascal compiler for the 8086. I would be most interested in your answer. Consider this typical Pascal procedure procedure P(var V : T); This takes a formal of some type T, passed by reference. Within the body of P, any operation upon V is an operation upon the corresponding actual. Now consider what that actual might b, when P is called. It could be any of outermost-level variable, allocated statically local variable, allocated on the stack object created by New(), allocated from the heap by-value parameter, passed on an inner call by-reference parameter, likewise a component of any of the above, selected or indexed My question is this: what strategies did you adopt, for address space representation, variable allocation, and by-reference parameter passing, that was sane, efficient, and made use of the hardware segmentation? The answer matters to me, since my failure to solve the problem still annoys me. (I'd be interested to hear what anyone else has to suggest, too. Just to nail things down, take the language to be ISO Pascal Level 1, and the machine to that defined in the 8086 Family Users Manual of October 1979)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/02/91)
In article <1044@shum.huji.ac.il> amos@shum.huji.ac.il writes: | One case you forgot is that of many small segments, which together | amount to more than one segment size. You could end up thrashing | between different segments even if no single object is big enough | to overflow a segment; all the arguments about big objects do not | hold in this case. Huh? He said that a segment is as large as max addressable memory, and you say if the sum of all segments is larger than physical memory it will thrash. I see thrashing all the time without benefit of segment, whenever the virtual address space used is larger than the physical memory. What do segments cost? Not a flame, I just miss the point. If you don't have enough addressable physical memory you thrash, in my experience. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/02/91)
In article <VWEAP86@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: | N nice and effectively compatible ones. Outside of the 80x86 family, all | my big portability problems are caused by differences in *software* | architectures or buggy code. All that bigendian vs. little endian stuff is just a bad dream, right? And the problems we've had porting between 32 and 64 bit computers didn't happen? You've been around long enough to know better. The cause of portability problems is code which makes assumptions about the hardware. Period. It is possible to write code which will run on any 32 bit or larger machine, but don't look for it in net source code. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
peter@ficc.ferranti.com (Peter da Silva) (04/02/91)
In article <91090.131157DXB132@psuvm.psu.edu> DXB132@psuvm.psu.edu writes: > For example, it offers a solution to memory fragmentation. Each allocated > memory region is assigned a unique number (the segment number), and the > application manipulates only the offset. The OS can move memory regions > around in physical memory to eliminate fragmentation. Also, we can make > these segments an exact length, not neccessary always a multiple of > 4K like paging schemes. Sounds like a 32-bit PDP-11. > but your Unix system crashing after a few weeks due to > memory fragmentation has to be inefficient too. Say what? I don't recall ever having my UNIX system crash from memory fragmentation. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
glew@pdx007.intel.com (Andy Glew) (04/02/91)
Segments give you access control. The 386 will let you put multiple segments within one page, each with differing differing access rights. I believe that such an architecture may provide a convenient way for implementing protected object-oriented systems. It would be better if they had a TLB instead of the one per segment-register 'shadow registers'/descriptor cache. (I must admit I don't know if there is a TLB in the 486.) To avoid confusion: the i486 processor has a TLB. 4 way set associative, 8 sets. For that matter, so does the i386. The TLB, however, stores page-oriented protection information. Another, additional, mechanism is used for segments. -- --- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
ckp@grebyn.com (Checkpoint Technologies) (04/03/91)
In article <1991Apr1.154918.8342@granite.ma30.bull.com> kittlitz@granite.ma30.bull.com (Edward N. Kittlitz) writes: >Segments give you access control. The 386 will >let you put multiple segments within one page, each with differing >differing access rights. I believe that such an architecture may >provide a convenient way for implementing protected object-oriented systems. >It would be better if they had a TLB instead of the one per segment-register >'shadow registers'/descriptor cache. (I must admit I don't know if >there is a TLB in the 486.) I don't think that "Segments give access control" is a general statement about segments; I think Intel chose to use Segments as the machanism which provides access control. Other systems use access bits in the page tables to provide the same thing. Systems with conventional page tables (not inverted) can permit the same physical memory to appear in multiple separate virtual addresses to the same process, or to separate processes, with different access rights in each case. I understand that inverted page tables make this more difficult but not impossible. And no, I don't believe the 486 has a TLB for the segment descriptor cache. It has a TLB for the page tables though, as the 386 does. -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / ckp@grebyn.com \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
peter@ficc.ferranti.com (Peter da Silva) (04/03/91)
In article <3305@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > In article <VWEAP86@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > | N nice and effectively compatible ones. Outside of the 80x86 family, all > | my big portability problems are caused by differences in *software* > | architectures or buggy code. > All that bigendian vs. little endian stuff is just a bad dream, right? No, it's just not a big problem. > And the problems we've had porting between 32 and 64 bit computers > didn't happen? You mean between 16 and 32 bit computers? No, it's not a big problem. Let me explain this point a bit more: the places where endianness, and the size of pointers not being the same size as ints, and that sort of thing cause problems are relatively small, and can generally be easily fixed. 90% of these problems are the result of someone trying to use an internal data structure for external storage. The remaining 10% are caused by buggy code. Or buggy compilers. Really. Anyone who writes ``execl("/bin/sh", "sh", "-i", 0)'' has just written buggy code. And it's easy enough to fix. A pass through lint, and I'm a happy camper. My big problems in porting code are where people assume things like: malloc will not fail. arrays can be grown indefinitely. pointers in different objects can be compared. All of these things are reasonable assumptions on an 80386, a 68000, a VAX, a SPARC, etc... They die horribly on the 8086 family of processors, and fixing the code tends to require a major rewrite. > You've been around long enough to know better. The cause of > portability problems is code which makes assumptions about the hardware. This is true, but apart from the 8086 family of processors portability problems are easy to fix. Toss in a few casts or go to ANSI compilers and everything's right as rain. But there's nothing I can do with an attempt to malloc(100000) other than cripple the program or redesign it. Nope... I'll stand on my claim that after working with the 8086 and its derivatives any other hardware portability concerns are cake. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/04/91)
In article <Y-FA4Y4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: | In article <3305@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: | | > And the problems we've had porting between 32 and 64 bit computers | > didn't happen? | | You mean between 16 and 32 bit computers? No, it's not a big problem. Not unless someone just added PC emulation to the Cray2... lots of net code assumes 32 bits, assumes int {same as} long, assumes 2's complement arithmetic, and assumes you can get exactly four chars in an int. I stand by my first thought: the problem is in code which assumes things about the hardware. Language have variables of known minimum size, integer*4 in FORTRAN, long in C, etc. And languages which have pointers have portable ways to manipulate them, although you wouldn't know it from code posted from time to time. I have seen code which turned the address of a long into int, added seven, then cast it to pointer to char to get a byte out of the next word. Other than assuming the size of int, size of pointer, and byte order of the hardware, this was portable. If you say the "memory models" are a bad idea I would agree completely, and I told Microsoft so when they were writing C 3.0. Intel should have paid them to generate a version with 32 bit ints and linear addressing (from the user viewpoint) just to sell faster chips. But that's a feature of the design decisions of the C compiler, not an inherent feature of segments or Intel. Ask the person who ported unzip to the Cray about 32 vs. 64 problems. I don't remember what it was now, I looked at the problem for an hour or two and dropped it, but it was reasonably subtle, and I believe it's a warning of things to come. Perhaps MIPS will speak on porting stuff to their 64 bit box for testing. It's possible to do tricky stuff in a portable way, and if you think about it when writing the code it's even easy. When you try to port someone else's code it gets to be a nightmare. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
peter@ficc.ferranti.com (Peter da Silva) (04/04/91)
In article <1360009@aspen.IAG.HP.COM>, huck@aspen.IAG.HP.COM (Jerry Huck) writes: > So for HP, segmentation was not a trade-off against flat addressing, > but rather: is it useful to extend beyond the maximum flat addressing > you can support in your general register file? This is the exact same trade-off that Intel made in the 8086, just 10 years or so down the road. It gives you a short-term paper advantage, but once things get to the point where you really need those addressing bits people will be using your name in vain. > I think most of the arguments against segmentation assume you give up > some flat addressing to get it. That's not necessary. But that's what you just described: you only have 32 bits of flat address space in a 48 bit machine. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/04/91)
In article <1991Mar29.044033.222@caliban.uucp> ig@caliban.uucp (Iain Bason) writes: >This whole discussion on segmented architectures is getting a little >confusing. The problem is that most posters seem to be drawing >conclusions about segmentation in general based upon their knowledge >of particular segmented architectures. Now, there's nothing wrong >with basing one's opinions on one's experience. Iain was being charitable, which is not one of my virtues, unfortunately. There is nothing wrong with basing your opinion on your experience, even if your experience is limited to one example, as long as you don't have any pretensions that you are a scientist. Most "computer programmers" are mere "coding bums" who call themselves "Computer Scientists" because it sounds good on their resumes. A scientist does not make an extrapolation from a single data point and announce to the world that the final word has now been spoken on the subject, as we have seen on this thread. Not that they were not given the education that a scientist should have. They were taught general principles of computers for several years. Most of them slept through it all and then complained for years that "they don't teach you anything useful in college --- just a lot of theory." Later in life, they resurface in the ACM Forum column of the Communications of the ACM, advocating the use of GOTO statements and criticizing the teachings of Dijkstra, Wirth, et al., ad nauseam. Now that the lecture is over, please return to the postings that assume (without saying so, or seeming to realize it) that all segmented machines have segments fixed at 64KB in size, with only a couple available for data and one for code, etc. ----------------------------------------------------------------------------- "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. ||| clc5q@virginia.edu (Clark L. Coleman)
renglish@cello.hpl.hp.com (Bob English) (04/04/91)
I want to make something clear up front. I am not trying to convince the world at large that segmentation is a better way of providing a large address space to a single program than a linear address space with register size equalling the address size. Neither am I trying to take a position on the best use of current silicon space or the minimum usable address space. What I take issue with is the opinion, expressed many times in comp.arch, that segmentation is inherently wrong, violates all principles of good design, and implies severe brain damage on the part of the designers. The point that I'm trying to make is that segmentation at the hardware level, or the lack thereof, is not an issue of architectural principle, but a design choice with a set of costs and benefits. Elevating it to a principle implies that the only acceptable address space is infinite, because no programmer should ever have to worry about addressability. At any point, it's a choice between the costs of extending the address space (register size, etc.) and the benefits derived from doing so, as well as a choice of system level to provide the service. mash@mips.com (John Mashey) writes: > ...take the inner loop of the rolled DAXPY routine from linpack...: > 3) What you need to do in the general case, which is that either > dx or dy, or both could be >4GB, or (enough to cause the problem) > that either or both cross segment boundaries? Well, this is a bit longer than the code Jerry sent out for the current case, but it isn't too complicated. It's 30 instructions, two or three times that of the unsegmented code posted earlier (after initialization is added to the earlier code), but the inner loop is unchanged. In a machine where the compilers dealt effectively with segments, this would be a normal form for striding through arrays, and would be highly optimized (at least as good as this). Evaluating the performance impact is a bit trickier. The inner loop is unchanged, but the set up costs are higher. For long loops, this is inconsequential. For short loops, it adds about 14 cycles to the loop, or about 12% for a vector length of 20 (there are probably ways to reduce those costs for short vectors without appreciably increasing the overhead for long vectors, but that's not important). How important is this increased overhead? It seems counterintuitive that programs demanding objects greater than 32 bits would have their performance dominated by small vectors, but it could be true. With one DAXPY to a 2^^32 array, there would have to be 200 million DAXPYs to twenty element arrays before the 12% difference in short loop performance became a 6% increase in actual performance. If those accesses were themselves in a loop, and global optimizations were performed, the overhead would drop way down. The code: mtsr dysegshadow,segmentdyreg mtsr dxsegshadow,segmentdxreg ; This section eliminates long (> 2^^30) internal runs to simplify ; the later tests. "ocnt" gets the projected run size for the ; inner loop. zdepi 3,1,2,maxrun ; set up max run oloop0: combt,<< gcnt,maxrun,lessmax ; nullifies on gcnt << maxrun copy,tr maxrun ; always nullifies lessmax: copy gcnt,ocnt ; nullified if dropped in ; This section checks for segmentation wraps, so that the inner loop ; won't have to. "icnt" gets the maximum base register, and then ; the actual inner loop count. oloop1: comclr,<<= dxbasereg,dybasereg,r0 ; which base is higher? or,tr dxbasereg,r0,icnt ; or,tr always nullifies or dybasereg,r0,icnt ; this instruction sh3add ocnt,icnt,tmp1 ; will the higher base wrap? combf,<<,n tmp1,icnt,iloopstart ; subi 7,icnt,icnt ; reduce the inner loop cnt extrs,tr icnt,1C,1D,icnt ; to the wrap point iloopstart: or ocnt,r0,icnt subi 0,icnt,tripcount ; This is the inner loop, same as without segments. iloop: fldws,ma 8(segmentdxreg,dxbasereg),dxreg ;get value and skip to next fldws (segmentdyreg,dybasereg),dyreg ;get value fmul,dbl dareg,dxreg,mulreg fadd,dbl mulreg,dyreg,dyreg addib,< 1,tripcount,iloop fstws,ma dyreg,8(segmentdyreg,dybasereg) ; Check for completion, and bump segment registers if appropriate. sub gcnt,icnt,gcnt ; decrement global count combt,<= gcnt,r0,done ; check for completion comclr,= dxbasereg,r0,r0 ; increment space register that addi 1,dxsegshadow,dxsegshadow ; wrapped mtsr dxsegshadow,segmentdxreg comclr,= dybasereg,r0,r0 ; increment space register that addi 1,dysegshadow,dysegshadow ; wrapped b oloop0 mtsr dysegshadow,segmentdyreg done: > DBMS, and other things that follow pointer chains around. > Conventional wisdom says that loads+stores are 30% of the code, > and so some subset of these incur at least 1 extra cycle. If every one of these loads and stores required twice as many cycles (the case you mentioned is pretty much a worst case for a segmented architecture), then the machine's performance would be reduced by 30% in code that made heavy use of large objects. What little intuition I have in the matter suggests, however, that the actual overhead will be significantly less than 30%, as this overhead would not be incurred on every load or store. Access to the stack, for example, would not incur this overhead, nor would access to a small object after that object has been located (in most cases objects less than the segment size can be constrained to lie completely within a segment). As a data point, HP's proprietary OS uses spaces (the term HP uses for large segments) to support databases and file systems. The segmentation overhead they've incurred has not been large enough to warrant making space register ops 1 cycle. --bob-- renglish@hplabs If I were speaking, I'd be speaking for myself. Since I'm typing, I'm typing for myself.
peter@ficc.ferranti.com (Peter da Silva) (04/04/91)
In article <3310@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > Not unless someone just added PC emulation to the Cray2... lots of net > code assumes 32 bits, assumes int {same as} long, assumes 2's complement > arithmetic, and assumes you can get exactly four chars in an int. Yes, and porting this code to an 80286 is unlikely to be any easier than porting it to a 64-bit machine... and probably harder since *you* can at least fit 32-bit values into a 64-bit integer. And then on top of all that we have all the segmentation woes. > It's possible to do tricky stuff in a portable way, and if you think > about it when writing the code it's even easy. When you try to port > someone else's code it gets to be a nightmare. Compared to Xenix 286 it's a mere melodrama. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
sef@kithrup.COM (Sean Eric Fagan) (04/04/91)
In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >And then on top of all >that we have all the segmentation woes. Are you objecting to segments, or to *intel* segments? You keep saying "segments are bad," without regard to what type of segments. Consider, for example, a cpu which has two type of registers: data and address. Data registers are 32-bits, and address registers are 64-bits. *However*: the address registers are actually <32-bit segment tag> <32-bit offset> I defy you to come up with a PROPERLY WRITTEN program that will break. Now, for initial implementations, you probably want to use only one segment (i.e., limited to 4Gbytes), and have your compiler spit out lots of warning for things like passing pointers to functions without a prototype, conversion from pointer to integer, etc. (You should probably make that segment be tag #0, incidently, although there is no real need.) Note that you would also probably need a 'long long' type, since I seem to recall ANSI C requiring *some* integral type that can hold a pointer. That could actually be quite useful. Have each malloc() return a seperate segment, which is the size you requested and no larger... Intel goofed (imho) by having seperate segment registers. If the segment tag/number were part of the address registers, I don't think there would have been as much pain involved. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (04/04/91)
In article <1360009@aspen.IAG.HP.COM> huck@aspen.IAG.HP.COM (Jerry Huck) writes: >PA-RISC uses segmentation to extend the addressability of the >normal general register file. It is not a partition of these >registers into pieces. Segments are 2^32 in size and give >capability in several areas. But, by sr(segment register) 4-7, we can address only 1GB of the segment. So, when we want >4GB, we do general data access with <1GB segment or with sr1-3 only (sr0 is unusable). Masataka Ohta
firth@sei.cmu.edu (Robert Firth) (04/04/91)
In article <1PGAOP7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >But that's what you just described: you only have 32 bits of flat address >space in a 48 bit machine. Sigh. I have seen this decade after decade, generation after generation. It seems to be a working rule among the builders of segmented machines that the most flat space anyone will ever need is 4 bits more than the current market leader. That's how we went from 12 bits to 16; the same arguments I heard from the builders of the 20-bit machine in 1974 are lying in my mailbox explaining why 32 bits (after all, 2 more than the VAX!) is enough. Here's an analogy. You live in a three-bedroom house. To get to two bnedrooms, you climb the interior staircase. To get to the third, you go outside and climb a ladder on the North wall. It's time to trade up. You look at a five-bedroom house. Three bedrooms open off the interior staircase; the other two are reached by a ladder on the North wall. The builder says "Look, you have three directly accessible bedrooms, which is 50% more that your current home, what more could you ever need?" You explain that what matters is not the absolute number of bedrooms, it is rather that, however many there are, they all be directly accessible by a simple and uniform route. He shakes his head in bewilderment. As do I.
firth@sei.cmu.edu (Robert Firth) (04/04/91)
In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: > Data >registers are 32-bits, and address registers are 64-bits. *However*: the >address registers are actually > > <32-bit segment tag> <32-bit offset> > >I defy you to come up with a PROPERLY WRITTEN program that will break. My pleasure, sir. DIMENSION BIGMAT(50000,50000) DOUBLE PRECISION BIGMAT I have a perfectly legal Fortran declaration; I will never use an index value bigger than seventeen (signed) bits; there is enough virtual memory to hold it; and your bozo machine will not permit me to address it.
peter@ficc.ferranti.com (Peter da Silva) (04/05/91)
In article <1991Apr04.023845.3501@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > >And then on top of all > >that we have all the segmentation woes. > Are you objecting to segments, or to *intel* segments? Intel segments. > You keep saying "segments are bad," without regard to what type of segments. No, I keep ragging on the 80x86. I explicitly mentioned the chip by name in the paragraph you quoted from. [32-bit address+32-bit segment number, stored in the address registers] > I defy you to come up with a PROPERLY WRITTEN program that will break. If wrapping around the end of the segment isn't a problem, I can't. If it is, I'll just operate on a >4 GB object. Of course, if wrapping around the end of the segment isn't a problem (as it wouldn't be on the 80x86 is intel hadn't screwed up) then I would say you don't have a segmented machine: you just have a 64-bit machine with a possibly limited address space... like the 68000, where you can look at the address space as a 24-bit offset and an (initially ignored) 8-bit segment number. That's how Microsoft treated the poor little chip for their Basic interpreters on the Mac and Amiga, which is why my Amiga 3000 doesn't have Basic available. > segment be tag #0, incidently, although there is no real need.) Note that > you would also probably need a 'long long' type, since I seem to recall ANSI > C requiring *some* integral type that can hold a pointer. Nah, just make int=32 bits, long=64 bits. > That could actually be quite useful. Have each malloc() return a seperate > segment, which is the size you requested and no larger... You can do the same on a "flat" address space machine if your address space is large enough. DEC does this on the VAX under VMS: 31 bit offset and two segments: user and system. > Intel goofed (imho) by having seperate segment registers. No, intel goofed by putting tag bits at the wrong end of the segment register. Whether the segment part is explicitly loaded into a segment register or the top half of an address register is purely a code generation problem. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
ckp@grebyn.com (Checkpoint Technologies) (04/05/91)
In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >>And then on top of all >>that we have all the segmentation woes. > >Are you objecting to segments, or to *intel* segments? Well, Intel segments are *soooo* bad.... Here are (what I think) are the unforgivably bad features of Intel x86 segments: - Huge pointers require normalization - There are fewer segment registers than address registers (I include the program counter and stack pointer as address registers) - They are context-chosen (code space, data space, stack space) - The instruction set encourages programmers to economize segment useage > You keep saying >"segments are bad," without regard to what type of segments. Consider, for >example, a cpu which has two type of registers: data and address. Data >registers are 32-bits, and address registers are 64-bits. *However*: the >address registers are actually Perhaps we can get some subjective comments data from programmers of other "segmented" machines? I can think of two. The Western Design 65816 in 16 bit mode, and the Zilog Z8000 both are "segmented" machines. How about some comments on these implementations? (I don't mean to solicit "the 65816 is *way* better than the 6502" comments...) -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / ckp@grebyn.com \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) (04/05/91)
In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >> <32-bit segment tag> <32-bit offset> >>I defy you to come up with a PROPERLY WRITTEN program that will break. >My pleasure, sir. > DIMENSION BIGMAT(50000,50000) > DOUBLE PRECISION BIGMAT >I have a perfectly legal Fortran declaration; I will never use an >index value bigger than seventeen (signed) bits; there is enough >virtual memory to hold it; and your bozo machine will not permit >me to address it. Survey says: Bzzt! There's nothing that says that array elements in FORTRAN - or, for that matter, C - have to be contiguous. Thinking that that must be true as a matter of Natural Law is purest VAXocentrism. It's the compiler's job to hide those details from the programmer. It's a real tragedy that there ate VAXocentric C programmers out there that think that the whole world should work the way their specific environment does, and write software with lots of hard-to-find nonportabilities lurking to trap the unsuspecting soul who tries to run it on non-VAXen. It's bad enough that I gave serious consideration to buying an 11/750 that's for sale around here just so I could see why people get that attached to it. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jmaynard@thesis1.med.uth.tmc.edu | adequately be explained by stupidity. "You can even run GNUemacs under X-windows without paging if you allow about 32MB per user." -- Bill Davidsen "Oink!" -- me
sef@kithrup.COM (Sean Eric Fagan) (04/05/91)
In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > DIMENSION BIGMAT(50000,50000) > DOUBLE PRECISION BIGMAT Gee, this works on current 32-bit machines? The FORTRAN standard allows one to declare arrays of any size, and guarantees that they will work? I guess it's more braindamaged than I thought. I mean, I remember having problems with arrays *much* smaller on both Crays and Cybers... You're giving a knee-jerk response. If the compiler manual says that no object may be larger than <x>, and you try to create an object of <x*2>, *you're* the one who screwed up. And if it bothers you that much, fine: for the FORTRAN compiler, it, also, will use just one segment tag, just like the intial C port I hypothesized about. There, now you've only got 4Gb of virtual memory for any fortran program. Happy? -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (04/05/91)
In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <1991Apr04.023845.3501@kithrup.COM> >sef@kithrup.COM (Sean Eric Fagan) writes: > >>I defy you to come up with a PROPERLY WRITTEN program that will break. > >My pleasure, sir. > DIMENSION BIGMAT(50000,50000) > DOUBLE PRECISION BIGMAT People forget history so quickly these days! The Burroughs 5000 and descendants all used segmented architectures, and they routinely handled two dimensional arrays as an array of pointers to segments. That is precisely how Burroughs FORTRAN would have handled the above case, and if 50000 double's was too big for one segment, it would have automatically made the array into a 3 or 4 dimensional array, completely hiding the problem from the programmer without any need for the programmer to specify some kind of "large memory model" or other such hocum that people are forced to do on the 8086 family. I remember a statistic from Burroughs that the average segment on their machines was less than 64 words long (48 bits per word). The code of each procedure was in a different segment, each array was a different segment, and so on. I never heard a Burroughs programmer complain about segments the way 8086 programmers do because the Burroughs architectures did it right! I've had a number of students who were Burroughs programmers (Quaker Oats in Cedar Rapids had a high-end machine with something like 6 CPU's in the early 80's, and they may still be a Unisys customer). Doug Jones jones@cs.uiowa.edu
cgy@cs.brown.edu (Curtis Yarvin) (04/05/91)
In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes: >In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > >Survey says: Bzzt! > >There's nothing that says that array elements in FORTRAN - or, for that >matter, C - have to be contiguous. Thinking that that must be true as a >matter of Natural Law is purest VAXocentrism. But, in C, you have to be able to move through the array with pointer arithmetic; which means that it is much harder for the compiler to hide, and hence much slower. >It's the compiler's job to hide those details from the programmer. It's >a real tragedy that there ate VAXocentric C programmers out there that >think that the whole world should work the way their specific environment >does, and write software with lots of hard-to-find nonportabilities lurking >to trap the unsuspecting soul who tries to run it on non-VAXen. There are two solutions to this problem: for everyone to write portable code, or for everyone to build flat-addressed machines. I think everyone should be able to see the direction the market is moving in: the latter. This is not necessarily a bad thing, unless you have an unnecessarily Calvinist approach toward the world. Curtis
firth@sei.cmu.edu (Robert Firth) (04/05/91)
In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes: >Survey says: Bzzt! > >There's nothing that says that array elements in FORTRAN - or, for that >matter, C - have to be contiguous. Thinking that that must be true as a >matter of Natural Law is purest VAXocentrism. I suggest you check ANSI X3.9-1978, especially sections 5.2.5 and 5.4.3. The standard requires that an array be associated with a "storage sequence" in column-major form, and that there be a one to one mapping between the "subscript values" and the elements of this storage sequence. Naturally, these elements need not be contiguous in physical storage, which isn't what I asked for, since I explicitly referred to virtual memory. But they do have to be contiguous in the virtual memory model of the Fortran language. I await your suggested implementation.
firth@sei.cmu.edu (Robert Firth) (04/05/91)
In article <1991Apr04.202446.13595@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: [ DIMENSION BIGMAT(50000,50000) [ DOUBLE PRECISION BIGMAT >Gee, this works on current 32-bit machines? No. But you claimed to have a machine with 32-bit integer and 48-bit addressing, and challenged us to produce code that ought to work on such a machine but won't on yours. The above was my response. > And if it bothers you that much, fine: >for the FORTRAN compiler, it, also, will use just one segment tag, just like >the intial C port I hypothesized about. There, now you've only got 4Gb of >virtual memory for any fortran program. Happy? Yes, for you have just conceded my point: the code ought to work; it won't work on your machine because your addressing scheme forbids it; you are not competent to solve the problem in the compiler; so you've given up and thrown the mess you designed back in the user's lap. Pathetic.
sef@kithrup.COM (Sean Eric Fagan) (04/05/91)
In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >If wrapping around the end of the segment isn't a problem, I can't. If it >is, I'll just operate on a >4 GB object. No, you missed the point. I set things up such that no single object can be larger than 4Gbytes. And a correct program can't tell that it has anything larger than a 4Gbyte address space, unless it starts mallocing up all the memory it can and keeps track. On an 8086, the natural limit for the size of any given object is 64K. On the hardware I described, it would be 4G. Now, please show me a correct program that will fail. Answer: you can't, because The Standard (ANSI, in this case, since I'm more concerned about C) has enough limits in it that you can't get around it without being non-conformant. Please read ANSI, and if you can find a statement in there that says that the system must provide for an object >4GB, then I will send you a case of beer. >Of course, if wrapping around the end of the segment isn't a problem (as >it wouldn't be on the 80x86 is intel hadn't screwed up) then I would say you >don't have a segmented machine: you just have a 64-bit machine with a >possibly limited address space... It *is* a segmented machine; you cannot wrap around segments, because the largest size of any single object is 4Gbytes. Now, if you wanted to provide for a pseudo-64-bit address space, you have the system a) allocate segments sequentially, and b) when a segment-overrun trap occurs, increment the segment tag index appropriately, and continue. But it's still a segmented machine. And, again, note that *you* never see the segments. It's possible to set up the machine such that it looks like a normal 32-bit-address machine; however, for correctly programs (correct by ansi, not correct as in specially written), you can use as much memory as the system will allow. >> segment be tag #0, incidently, although there is no real need.) Note that >> you would also probably need a 'long long' type, since I seem to recall ANSI >> C requiring *some* integral type that can hold a pointer. >Nah, just make int=32 bits, long=64 bits. That would be inefficient; too many people use 'long' when they don't need to, because they assume they can use that type to hold an address (which, I guess, would be true, but then they pass int and long around freely). ANSI does require an integral type to hold a pointer, but it does not specify which type. So, either 'long long', or, if you want a fully-ansi-compliant mode, '_longlong'. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
johnl@iecc.cambridge.ma.us (John R. Levine) (04/05/91)
In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes: >> DOUBLE PRECISION BIGMAT(50000,50000) > >There's nothing that says that array elements in FORTRAN - or, for that >matter, C - have to be contiguous. Well, there is the small matter of ANSI X3.9-1978. In sections 5.2.5 and 17.1.1 it makes it pretty clear that all arrays have to be contiguous with the first subscript varying fastest. The F90 standard gives you a little wiggle room by saying that arrays that mentioned in EQUIVALENCE or COMMON statements have to be contiguous, other arrays can be implemented any way the compiler wants. Real programs tend to put their large arrays in common in which case the array above really does need a 20 gigabyte flat address space. The argument might be made that any program that does that is "wrong" but many numeric codes can easily expand to fill all available memory, particularly those that cut up a 2- or 3- dimensional space into a mesh and do something on each element in the mesh, since the finer the mesh, the more accurate the results. I have little sympathy for arguments that a 50000 x 50000 array is somehow different from a 1000 x 1000 array just because it's bigger. A segmented address space need not be a disaster for large arrays, though the much reviled Intel implementation is for two reasons: -- Segment arithmetic is very complicated due to Intel's inexplicable decision to make the low three bits of the segment number magic. -- Loading a segment register is so slow on existing implementations (on a 486, a segment load takes 6 cycles, a regular load takes 1) that you have to handle intrasegment addresses differently from intersegment in order to get reasonable performance. The RT PC and RS/6000 have a segmented address space, but the segment number is merely the high four bits of the address. If you have an array or file that is bigger than a segment (256MB in this case) you can map it into several contiguous segments without having to do anything special in your object code. Segmentation like that can be quite useful both for sharing and for protection. -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl Cheap oil is an oxymoron.
sef@kithrup.COM (Sean Eric Fagan) (04/05/91)
In article <23660@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <1991Apr04.202446.13595@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >>Gee, this works on current 32-bit machines? >No. But you claimed to have a machine with 32-bit integer and 48-bit >addressing, and challenged us to produce code that ought to work on >such a machine but won't on yours. No, actually, I described a machine with 32-bit segments, and 32-bits worth of possible segments. A difference. >The above was my response. Which was incorrect. I impose the limitation, as I said in my first article, that no single object (such as your array) be larger than 4Gbytes. You broke that restraint. No correct code will break, as there is *no* requirement in any language standard (c, pascal, fortran, ada, etc.) that the sum total of all objects' size be less than or equal to the size limit for a single object. *You*, by being flip, decided to come up with a program that would break. BFD. I can come up with programs that will break for any given language/hardware combination. Now, please show me how my proposed segmented machine a) breaks existing *correct* code, and/or b) makes things difficult or impossible? Yes, dealing with a single object larger than 4Gbytes is difficult or impossible, but damned few people are doing that (and, even then, I can make it work, if you let me play with the OS and compilers/linkers a bit; it just won't be as efficient as it could be in a flat address space). >> And if it bothers you that much, fine: >>for the FORTRAN compiler, it, also, will use just one segment tag, just like >>the intial C port I hypothesized about. There, now you've only got 4Gb of >>virtual memory for any fortran program. Happy? > >Yes, for you have just conceded my point: the code ought to work; The code you gave *oughtn't* work. Period. Check out the Implementation Defined Details section of the FORTRAN compiler manual for the machine (hint: it doesn't exist, either, but that makes things easier 8-)). In it, it says that, "persuant to section <mumbledymumble> of the <mumbledymumble> FORTRAN standard, the size of any single array must not exceed 4Gbytes." Now, if you are going to try to claim that *any* standard mandates that a system must allow arrays larger than 4Gbytes, well, tough. No system *I* play with (and that includes some rather decent mainframes) is going to allow that, either, so damned few people are going to be coding for it. >you are >not competent to solve the problem in the compiler; Bullshit. I *am*, and in a followup article, I described what one can do to implement flat-style addressing on the machine I described. But I guess reading all of the articles in a thread is beyond you, isn't it? Why don't you try a) reading some standards, b) playing with real systems, and c) trying to figure out just what someone is capable of before insulting their abilities? *You* are the one who's pathetic, buddy. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (04/05/91)
Sean Eric Fagan writes:
#but damned few people are doing that
Isn't that part of the problem? While damn few people are doing it
*TODAY*, in three or four years, *EVERYONE* will wish to do it, but
can't.
* Slight exaggertation for effect.
--
-Greg Hennessy, University of Virginia
USPS Mail: Astronomy Department, Charlottesville, VA 22903-2475 USA
Internet: gsh7w@virginia.edu
UUCP: ...!uunet!virginia!gsh7w
ckp@grebyn.com (Checkpoint Technologies) (04/05/91)
In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >possibly limited address space... like the 68000, where you can look at >the address space as a 24-bit offset and an (initially ignored) 8-bit >segment number. That's how Microsoft treated the poor little chip for >their Basic interpreters on the Mac and Amiga, which is why my Amiga 3000 >doesn't have Basic available. I had wondered... The 68K line, at least through the 68030, has 8 possible address spaces as coded by the CPU's FC lines. One is user program, one is user data, one is supervisor program, one is supervisor data, one is "CPU space" and is used to address coprocessors, generate interrupt acknowledgements, and signal breakpoints, and the other three are undefined. You can program the 68851 PMMU and the 68030's MMU to choose from 8 different page tables based on the FC code, and there's the MOVES instruction for choosing your FC directly when performing a move. The 680[23]0 manual tells how to generate cycles to program space for data accesses if it's important. Does this make the 68K a segmented machine, with 32 bits offset and 3 bits segment number? (Expiring minds want to know...) -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / ckp@grebyn.com \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/05/91)
In article <4919@lib.tmc.edu>, jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes: > In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > >In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: ..................... > There's nothing that says that array elements in FORTRAN - or, for that > matter, C - have to be contiguous. Thinking that that must be true as a > matter of Natural Law is purest VAXocentrism. > > It's the compiler's job to hide those details from the programmer. It's > a real tragedy that there ate VAXocentric C programmers out there that > think that the whole world should work the way their specific environment > does, and write software with lots of hard-to-find nonportabilities lurking > to trap the unsuspecting soul who tries to run it on non-VAXen. Any machine with a big enough storage of any kind can emulate any other. If it is necessary to do this type of manipulation, a compiler will make a slow mess of it; any decent programmer should be able to use the idiosyncracies of the natural structure of the data to do a better job. It is the hardware designer's job to make it unnecessary to have any more kludges than can be avoided. If the array elements are not contiguous, and there may be very good reasons for the programmer to set things up that way, even the best of the present compilers will cause things to slow down. It is the compiler's job to help the user get the most out of the machine, and hiding things from the user is definitely not the way to do it. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
przemek@rrdstrad.nist.gov (Przemek Klosowski) (04/06/91)
>>>>> On 5 Apr 91 01:03:43 GMT, sef@kithrup.COM (Sean Eric Fagan) said: Bob> In <23660@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: ... 50000x50000 array of doubles... Sean>Gee, this works on current 32-bit machines? Bob> No. But you claimed to have a machine with 32-bit integer and 48-bit Bob> addressing, and challenged us to produce code that ought to work on Bob> such a machine but won't on yours. The above was my response. Sean> Which was incorrect. I impose the limitation, as I said in my first Sean> article that no single object (such as your array) be larger than 4GB. Sean> Now, please show me how my proposed segmented machine a) breaks existing Sean> *correct* code, and/or b) makes things difficult or impossible? Yes, Sean> *You* are the one who's pathetic, buddy. Hey, hey, a little bit worked up, aren't we? Sean seems to believe that since all code (most of it, anyway) has a 4GB limitation currently, all code adressing anything above that is broken. So what is the point of providing 48 bytes of address then? I would think that since it is there, it should be used. Bob gave a valid example of the program that uses the capability that _would_ be provided by the 48 bit flat adressing. If it worries Sean that this is unportable to VAX, please consider that in reality one would have to parametrize the size of the table anyway, since even though e.g. VAX has the 32 bit address space, the different operating systems put a practical limit on the working set sizes etc. forcing one to limit the size of a problem to smaller values. And it of course isn't just a fancy to try to squeeze a bigger array. In the area I am somewhat familiar with, physical modelling, 4 GB of memory allows to model a very modest 640x640x640 system of Heisenberg (vector) spins, since each spin is a pair of double-precision values. >>>We need those address bits!<<< Let me just say that I wish that people would just take a good counter-argument and not cover the confusion with panache. -- przemek klosowski (przemek@ndcvx.cc.nd.edu) Physics Department University of Notre Dame IN 46556
oasis@gary.watson.ibm.com (GA.Hoffman) (04/06/91)
I've worked extensively on the IBM RT and RS/6000 .. we consider both to be segmented machines. Native support by compilers only uses a few segments -- mapping text, bss, etc onto segments 0,1,2,3. Thru shmat(), segment registers may be loaded for use of the entire 32-bit effective- address space. These segments, supported by hardware segment-registers, provide one-cycle loads and stores with hardware protections and capabilities. Protections and capabilities are very useful and difficult to implement without something simple like segments ... our segments are selected by the high-order 4-bits of an effective-address. Having protections and capabilities that could start on arbitrary pages and go to arbitrary sizes would require something like a CAM; this is very expensive in silicon. The only serious complaint I've ever had about how we do segments, is that the segments are too small and there aren't enough of them active concurrently Our segments are 256M-bytes, and there are only about 12 segment-registers available. These numbers are too small for all the objects that Mach and other programs would like to have active simultaneously. So there is overhead for changing segment-registers, like shmat() and shmdt(), but the overhead has not proven unbearable. -- g
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/06/91)
In article <5277@ns-mx.uiowa.edu> jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes: | People forget history so quickly these days! The Burroughs 5000 and | descendants all used segmented architectures, and they routinely handled | two dimensional arrays as an array of pointers to segments. That is | precisely how Burroughs FORTRAN would have handled the above case, and | if 50000 double's was too big for one segment, it would have automatically | made the array into a 3 or 4 dimensional array, completely hiding the | problem from the programmer without any need for the programmer to specify | some kind of "large memory model" or other such hocum that people are | forced to do on the 8086 family. This is a limitation of the compilers used on the Intel 286 chips, rather than a characteristic of the ships themselves. The compiler vendors could have provided a model (which the user would see only on the compiler command line) with 32 bit ints, and the exact hiding of detail you mention. I suggested this to several vendors while beta testing their compilers. It's a little harder to fault the 386 chips, since their limitations are the same as other 32 bit machines and segmentation is not visible. There is the ability to handle more than 4GB by using the segments, but I don't see either the capability or the commercially viable demand right now. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
peter@ficc.ferranti.com (Peter da Silva) (04/06/91)
In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > >If wrapping around the end of the segment isn't a problem, I can't. If it > >is, I'll just operate on a >4 GB object. > No, you missed the point. I set things up such that no single object can be > larger than 4Gbytes. No, I didn't miss the point. If I have a 48 bit wide VM address and can't operate on any object larger than a 32 bit wide pointer can address, then it's a problem. > On an 8086, the natural limit for the size of any given object is 64K. On > the hardware I described, it would be 4G. OK. > Now, please show me a correct program that will fail. Any program that operates on an object larger than 4 GB. > Answer: you can't, because The Standard (ANSI, in > this case, since I'm more concerned about C) has enough limits in it that > you can't get around it without being non-conformant. I see. I'm talking quality of implementation, and you're talking language legalese. In a minute, I'm going to ask an important question... in the meantime, I'll play your game... ignoring for the moment every other programming language in existence. > Please read ANSI, and if you can find a statement in there that says that > the system must provide for an object >4GB, then I will send you a case of > beer. Is there a statement in it that the system must provide for an object greater than 64KB? Not that I can see... for the very good reason that it would otherwise be extremely difficult to implement an ANSI C compiler on the most common commodity personal computer in existence. Now, here's the important question: why is the 64K object size limitation in the IBM-PC a problem? After all, you cannot write a correct program that will fail on it. You cannot legally determine that the maximum object size is >64K. Ah, you say, that's different. Nobody would ever need a single object larger than 4GB. After all, there are hardly any computers that let you address more than that, and they're all mainframes and supers. Of course the same arguments were given about the 64KB limitation in the 8088 back in the late '70s when *it* was under design. (and no, it's not 20-20 hindsight: I was appalled at the choice of the 8088 in the IBM-PC when it first came out... and that was before they screwed up the 80286 segment registers when they had a chance at fixing things) > It *is* a segmented machine; you cannot wrap around segments, because the > largest size of any single object is 4Gbytes. Now, if you wanted to provide > for a pseudo-64-bit address space, you have the system a) allocate segments > sequentially, and b) when a segment-overrun trap occurs, increment the > segment tag index appropriately, and continue. But it's still a segmented > machine. If it quacks like a duck... > And, again, note that *you* never see the segments. It's possible to set up > the machine such that it looks like a normal 32-bit-address machine; Right, but then why bother with the extra address space? > >Nah, just make int=32 bits, long=64 bits. > That would be inefficient; too many people use 'long' when they don't need > to, Too many people use "short" when they don't need to, also. Correctly written programs (correct in terms of being intentionally written portably, not by some legalistic measure) don't have that problem. ANSI C has to cater to too much old, broken code. I choose not to. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
jallen@csserv1.ic.sunysb.edu (Joseph Allen) (04/06/91)
In article <1991Apr04.234928.8637@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: >A segmented address space need not be a disaster for large arrays, though >the much reviled Intel implementation is for two reasons: > -- Segment arithmetic is very complicated due to Intel's inexplicable > decision to make the low three bits of the segment number magic. > -- Loading a segment register is so slow on existing implementations (on > a 486, a segment load takes 6 cycles, a regular load takes 1) that > you have to handle intrasegment addresses differently from Though this doesn't have as much to do with large arrays, here's another intel (386) segement gripe: -- The segment bounds register is only 20 bits. This means you're limited to 1MB segments or 4GB segments with 4K pages. This is a big problem when you want to impliment mapped files with intel segemnts: you can't make the file grow a byte at a time (automatically) unless they're less than 1MB. (You could kludge it by switching between 1MB and 4GB modes and by changing the base address- but that's stupid and inefficient). -- #define h 23 /* Height */ /* jallen@ic.sunysb.edu (129.49.12.74) */ #define w 79 /* Width */ /* Amazing */ int i,r,b[]={-w,w,1,-1},d,a[w*h];m(p){a[p]=1;while(d=(p>2*w?!a[p-w-w]?1:0:0)|( p<w*(h-2)?!a[p+w+w]?2:0:0)|(p%w!=w-2?!a[p+2]?4:0:0)|(p%w!=1?!a[p-2]?8:0:0)){do i=3&(r=(r*57+1))/d;while(!(d&(1<<i)));a[p+b[i]]=1;m(p+2*b[i]);}}main(){r=time( 0L);m(w+1);for(i=0;i%w?0:printf("\n"),i!=w*h;i++)printf(a[i]?" ":"#");}
tbray@watsol.waterloo.edu (Tim Bray) (04/06/91)
In article <1991Apr05.161615.16869@watson.ibm.com> oasis@watson.ibm.com writes: >The only serious complaint I've ever had about how we do segments, is that the >segments are too small ... >Our segments are 256M-bytes The complaints are serious and they are correct. 256M is too small. Not too small sometime, nor pretty soon, nor tomorrow, but today. In fact, I suspect the recent brouhaha in this group about segmentation might be described as converging on a consensus, despite the intemperate language: If a computer has a natural N-bit word size, segmentation is OK and can make life easier for the OS and compilers, but is more trouble than it's worth if the segments are noticeably smaller than 2^N. Tim Bray, Open Text Systems
paul@taniwha.UUCP (Paul Campbell) (04/06/91)
In article <5277@ns-mx.uiowa.edu> jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes: >In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >>In article <1991Apr04.023845.3501@kithrup.COM> >>sef@kithrup.COM (Sean Eric Fagan) writes: >> >>>I defy you to come up with a PROPERLY WRITTEN program that will break. >> >>My pleasure, sir. >> DIMENSION BIGMAT(50000,50000) >> DOUBLE PRECISION BIGMAT > >People forget history so quickly these days! The Burroughs 5000 and >descendants all used segmented architectures, and they routinely handled >two dimensional arrays as an array of pointers to segments. That is >precisely how Burroughs FORTRAN would have handled the above case, and >if 50000 double's was too big for one segment, it would have automatically >made the array into a 3 or 4 dimensional array, completely hiding the I worked for a computing center for a university that had a 6700 for many years. Back in those days there weren't many languages you could por6yt code around in (for the 6700 we had fortran, cobol, pl/1 and to a lesser extent pascal) of these most of the code people tried to port was in fortran ... the biggest pain was fortran arrays FOR EXACTLY THIS REASON - people write in fortran and pass slices of arrays around all the time, if your fortran arrays aren't stored in memory 'just so' then all sorts of code breaks. Every time someone brought in another matrix math package in that they couldn't get ported I alwys new exactly what was wrong. Something else that also often broke fortran programs on the 6700 was the stack - fortran back in those days didn't really have one - parameters were static (global), recursive calls really screwed this up because the 6700 fortran got too smart for its own good (or programmers did wierd stuff they could get away with on their 360 or whatever) For what it's worth cobol was much more portable :-( There was a bcpl port, it modeled memory simply as one giant array and pretended it was running on a different machine (the pascal heap was done the same way). Noone every ported c as far as I know - you would probably have to do things the same as for bcpl or leave it soley as a systems programming language. (Contrary to popular belief there was (is) an assembler for the machine - several of them in fact - but for obvious resons they weren't available for common usage). >I remember a statistic from Burroughs that the average segment on their >machines was less than 64 words long (48 bits per word). The code of >each procedure was in a different segment, each array was a different >segment, and so on. This was only if you really wanted to, often many functions would end up in the same segment (depends on the compiler) >I never heard a Burroughs programmer complain about segments the way 8086 >programmers do because the Burroughs architectures did it right! I've Well at least they did it better - big arrays were still a pain (plus the fact that indirect pointers and stacks (the equivalents to page tables in this environment) could not be paged/swapped this also limited how big arrays could actually get). Paul Campbell -- Paul Campbell UUCP: ..!mtxinu!taniwha!paul AppleLink: CAMPBELL.P "But don't we all deserve. More than a kinder and gentler fuck" - Two Nice Girls, "For the Inauguration"
sef@kithrup.COM (Sean Eric Fagan) (04/06/91)
In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: >No, I didn't miss the point. If I have a 48 bit wide VM address and can't >operate on any object larger than a 32 bit wide pointer can address, then >it's a problem. Why is everybody harping on this 48 bits wide? I actually never said anything about how wide the address space, except that I implied that it was at least 32 bits. In fact, given the machine I proposed, it would still work quite well with only a 32-bit address space. And that's the difference between the flast and the segmented: using the segmented version (which, as I pointed out two articles ago, can imitate, if slowly, a flat-address-space machine), I can hide the fact that I only have 32-address bits, virtual or otherwise. >> Now, please show me a correct program that will fail. >Any program that operates on an object larger than 4 GB. Please read that agin, peter. You assumed, incorrectly, that I had more than 32-bit for addressing. You assumed that, if I cannot allocate a 5Gbyte object, things are broken. Guess what: they're not. >Is there a statement in it that the system must provide for an object >greater than 64KB? Nope, and it's not broken in that respect. (I believe ansi c has a requirement that an implementation must support a single object of at least 32k characters.) >Now, here's the important question: why is the 64K object size limitation >in the IBM-PC a problem? Peter: not all people who run into memory problems need more than 4Gbytes for a single object. Some people do; that's why I organized things in my machine the way I did. For people who don't (i.e., people who just keep doing malloc's to get more memory dynamically), they will never know that the machine can access more than 4Gbytes, assuming, of course, it can. Since I purposefully kept int's and long's at 32 bits, there is no way to specify the size of an object larger than 4Gbytes. How are you going to know about it? >After all, you cannot write a correct program >that will fail on it. You cannot legally determine that the maximum >object size is >64K. Actually, for ANSI C, yes, you can. You can use size_t for that purpose. And it's all perfectly legal. >Ah, you say, that's different. Nobody would ever need a single object >larger than 4GB. No, I never said that at all. I know for a fact that there are people who want to be able to have single objects larger than 4Gbytes. They are by far in the majority, however, largely because no system most of them are using today allows them to. I admit freely that 32-bits is a limit. But my question stands still: please show me a Correct (by K&R II, ANSI, or POSIX standards) C program that will fail on the system described by: struct pointer { unsigned long segment; unsigned long offset; }; typedef unsigned long size_t; typedef long ptrdiff_t; ASSERT (sizeof(void*) == 8); ASSERT (sizeof(__longlong) == 8); ASSERT (sizeof(long) == 4); You cannot even *write* a C program that tries to declare a single object with more than 4Gbytes (except by trying to pass something of type __longlong into malloc, which will then only look at half of it or fail, as the system wishes). Since size_t is a 32-bit number, that precludes trying to do double foo[50000][50000]; Now, once again: please show me a Correct C program (defined above) that will fail on the system as I have defined it. Please note that, although a pointer is 64-bits, I have said nothing about how large the address space is. I refuse to, as, since the way I've organized the machine, IT DOESN'T MATTER. (For example: a 68k has 32-bit pointers, but it's address space is only 16Mbytes. I guess peter and company consider that a broken machine, huh? Because it's not possible to use up all of the address space implied by the size of the pointers, that is?) For people who want to have objects larger than 4Gbytes, if they pay me enough money, I will give them a special compiler and library that will allow it. (As I said, that requires a little bit of work.) >Right, but then why bother with the extra address space? Actually, I want it because having every object in its own segment is an incredibly useful thing. I've made compilers and runtime libraries do that under xenix for the '286, and it has made debugging broken programs easier. (Found routines that ran off the end of areas they malloc'd, for example, because I got a SIGSEGV as soon as it happened.) -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
jfc@athena.mit.edu (John F Carr) (04/07/91)
In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >No, I didn't miss the point. If I have a 48 bit wide VM address and can't >operate on any object larger than a 32 bit wide pointer can address, then >it's a problem. Why do you care what the address size is? A programmer's concern should be: how many objects can I have, how big can each be, and how fast does the code run? Let the system designers decide whether to have a flat address space or segments. If you have code which requires 2^40 byte objects, put this in your requirements when you buy a system. The cost of 2^40 bytes of memory can finance the OS and compiler changes needed to support such objects on a segmented MMU. -- John Carr (jfc@athena.mit.edu)
beal@paladin.owego.ny.us (Alan Beal) (04/07/91)
jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes: >People forget history so quickly these days! The Burroughs 5000 and >descendants all used segmented architectures, and they routinely handled >two dimensional arrays as an array of pointers to segments. I say amen to that. Being a former Burroughs programmer, I know what a nice experience it was to program on these systems. Invalid indexes and seg array errors(due to REPLACEs or SCANs) were all caught by the hardware, and a meaningful error message was returned by the MCP - imagine that. >I never heard a Burroughs programmer complain about segments Because you were usually unaware segments were even being used. I guess this was due to the reliance of compilers to do the job - never had to look at machine language, and there was no assembler. It is a shame Burroughs Large Systems never really caught on because they were nice systems to program on. -- Alan Beal Internet: beal@paladin.Owego.NY.US USENET: {uunet,uunet!bywater!scifi}!paladin!beal
jallen@libserv1.ic.sunysb.edu (Joseph Allen) (04/08/91)
In article <1991Apr6.211320.18594@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: >In article <VFIA832@xds13.ferranti.com> > peter@ficc.ferranti.com (Peter da Silva) writes: >>No, I didn't miss the point. If I have a 48 bit wide VM address and can't >>operate on any object larger than a 32 bit wide pointer can address, then >>it's a problem. >Why do you care what the address size is? A programmer's concern should be: >how many objects can I have, how big can each be, and how fast does the code >run? Let the system designers decide whether to have a flat address space >or segments. If you have code which requires 2^40 byte objects, put this in >your requirements when you buy a system. The cost of 2^40 bytes of memory >can finance the OS and compiler changes needed to support such objects on a >segmented MMU. This closed-system view is something I disagree with very strongly. One of the great things about UNIX is that instead of using the system manufacture's compilers, you have a great range of third party software as well (everything from the WATCOM fortran and C compilers to GNU C). I guess what I'm trying to say is that system programers are programmers too and also shouldn't have to deal with badly implimented segments either. Making your system difficult for 3rd party developers is not a good marketing strategy (even IBM is switching to UNIX these days). I'm not trying to say that UNIX is perfect either. It isn't. But if there's to be a new standard which includes segments, it should be done right. Probably it should be a 64 bit data/address machine with the top 16 bits of the address being the segment number (although this is probably too small for dynamic linking with segments, it would be ideal for huge databases and mapped files). Actually, if you have a flat 64-bit address, it's so huge that you probably don't need segments at all: The paging system would detect lower and upper bound "segment violations". You probably also want to add a mechanism to indicate how full the last page of a segment is (with byte granularity) so that memory mapped files could grow automatically a byte at a time. This is a much more dynamic approach to segmenting- the actual segment size is just whatever the maximum file size is. Plus you wouldn't have to divide the memory map up equally (or in powers of two). Read-only libraries and files wouldn't need space to grow- so they could be loaded adjacently. I guess it comes down to whether you prefer segmented addresses, .EXE library files (I.E., libraries which are relocated when loaded) or address independant code (the 6809 was a truely great uP- OS9 had dynamicly linked libraries without even a memory manager). Note that the last two options are not incompatible with each other and the first option is gross- it may have far pointers but it definately would have to have MK_FP(segment, offset), FP_SEG(addr) and FP_OFF(addr). Sorry about the length of this article. I've decided for myself now: I definately don't want segments. There's too many other easier ways to get the same effect. -- #define h 23 /* Height */ /* jallen@ic.sunysb.edu (129.49.12.74) */ #define w 79 /* Width */ /* Amazing */ int i,r,b[]={-w,w,1,-1},d,a[w*h];m(p){a[p]=2;while(d=(p>2*w?!a[p-w-w]?1:0:0)|( p<w*(h-2)?!a[p+w+w]?2:0:0)|(p%w!=w-2?!a[p+2]?4:0:0)|(p%w!=1?!a[p-2]?8:0:0)){do i=3&(r=(r*57+1))/d;while(!(d&(1<<i)));a[p+b[i]]=2;m(p+2*b[i]);}}main(){r=time( 0L);m(w+1);for(i=0;i%w?0:printf("\n"),i!=w*h;i++)printf(a[i]+"#\0 ");}
bellman@lysator.liu.se (Thomas Bellman) (04/08/91)
[ This is a comment on this whole thread, not aimed directly at the articles in the References line. ] This post is intended to make people look at segmentation from a slightly different angle, hopefully calming down this discussion just a little bit. (Not very much hope, though...) Take a typical file system. Say you have a 4 Gbyte disk. You can have 4'294'967'296 files of 1 byte each (modulo such things as directory information, a minimum physical file size due to sector sizes, and other things), or you can have 1 file containing 4 Gbyte data, or anything in between. How do you specify what data you want? Normally, you first open the file you want, receiving a file descriptor from the OS. Then you seek in the file, handing the file descriptor and an offset to the OS. Hmm, doesn't this look familiar? Substitute "create/attach a segment" for "open file" and substitute "index in the segment" for "seek in the file", and you have a segmented memory model. In the file system, you have (say) 8 bits of file descriptor, and 32 bits of offset, but even though you have 40 bits of "pointer", you can't address more than 32 bits. People doesn't seem to have any problem with doing this in a file system, so why the dislike for doing this with the memory too? Now, for memory, you probably want more that 256 segments, and in a modern machine (i e one that hits the market in -93) you might want 64 bit offsets, but the principle remains the same. Sometimes you don't want the segmentation. Sometimes you want the flat address modell. This is equivalent to accessing the physical disk in a file system. The file system it self want to do this, but will probably not want to let the user do that himself. Same for memory, the OS wants to address the memory as a flat space, but might not want the user programs to do this. This post seems to imply that segments are great. But actually, I haven't really made up my mind yet. I can see advantages for both segmented and non-segmented memory. A flat address space is a simple model that is easy to understand and use. I just wanted to point out that on other levels of the computer, people don't object to exactly the same system. There might be some advantages to segments, since they are so popular in the file systems. It's just that they are called files instead of segments. Perhaps the best would be to let the programmer choose for himself. Have two types of instructions for accessing memory -- one type that uses pointers that consists of a segment number and an offset, and one type that has a flat view of the address space, both usable from user mode. Say SEGSTORE and SEGLOAD that takes a segmented address, and FLATSTORE and FLATLOAD that takes a non-segmented address. And then some way of converting between the two types of pointers. Then those that like segments, can take advantage of them, and those that likes a flat address space, can take advantage of that. -- Thomas Bellman, Lysator Computer Club ! "Make Love - Nicht Wahr" Linkoping University, Sweden ! "Too much of a good thing is e-mail: Bellman@Lysator.LiU.Se ! WONDERFUL." -- Mae West
fargo@iear.arts.rpi.edu (Irwin M. Fargo) (04/08/91)
In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes: > > [a few paragraphs removed] > >Sometimes you don't want the segmentation. Sometimes you want the >flat address modell. This is equivalent to accessing the physical >disk in a file system. The file system it self want to do this, but >will probably not want to let the user do that himself. Same for >memory, the OS wants to address the memory as a flat space, but might >not want the user programs to do this. > With what I know of OSs, wouldn't segmentation be what the OS wants? In most of today's computer systems, virtual memory is the Big Thing (tm). The idea behind virtual memory (correct me if I'm wrong), is that a program can read/write to memory as if memory were directly connected, but it is actually re-mapped to a previously specified location in physical memory. Obviously, virtual memory mappers of today use pages to allow more flexible ways of memory mapping. Couldn't a virtual memory page be considered the same as a segment? (a la the Intel 80386 in protected mode) If the OS (or any other program really wants, you can tell the MMU you want one page that takes up all of memory of lots of little pages. My whole point is, if we consider virtual memory pages to be equivalent to segments, then it would seem that quite a few systems do use segmentation and that it really is not that outdated an idea. -- Thank you and happy hunting! Actually: Ethan M. Young Internet: fargo@iear.arts.rpi.edu Please press 1 on your touch tone Bitnet (??): userfp9m@rpitsmts.bitnet phone to speak to God... Disclaimer: Who said what?
peter@ficc.ferranti.com (peter da silva) (04/08/91)
In article <1991Apr06.030330.1533@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > >In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > >No, I didn't miss the point. If I have a 48 bit wide VM address and can't > >operate on any object larger than a 32 bit wide pointer can address, then > >it's a problem. > Why is everybody harping on this 48 bits wide? Subject: Re: Segmented Architectures ( formerly Re: 48-bit computers) This subject started with the idea of using segments to expand the address space of 32-bit computers. Many of us are still thinking along those terms. > I actually never said > anything about how wide the address space, except that I implied that it was > at least 32 bits. In fact, given the machine I proposed, it would still > work quite well with only a 32-bit address space. In fact, it would only work well with a 32-bit address space (or maybe 33 or 34 bits). Once you get much more address space available to a program than you can stick into a single object you will run into problems: that's the lesson of the 8086. The reverse situation is not a big deal: that's the lesson of the 68000. > And that's the difference > between the flast and the segmented: using the segmented version (which, as > I pointed out two articles ago, can imitate, if slowly, a flat-address-space > machine), I can hide the fact that I only have 32-address bits, virtual or > otherwise. My Amigas both have a flat address space, but one has 32 address bits and the other 24. Apart from some trash software written by Microsoft, there is no difference that the program has to deal with. > Please read that agin, peter. You assumed, incorrectly, that I had more > than 32-bit for addressing. You assumed that, if I cannot allocate a 5Gbyte > object, things are broken. So what's the advantage to segments? > >Now, here's the important question: why is the 64K object size limitation > >in the IBM-PC a problem? > Peter: not all people who run into memory problems need more than 4Gbytes > for a single object. And not all people who run into memory problems need more than 64K for a single object: that's one reason why ints in most IBM-PC C compilers are only 16 bits wide. For people who don't, they will never now that the machine can access more than 64K, assuming, of course, it can. Since ints are only 16 bits, there is no way to specify the size of an object longer than 64K (that's why I was talking about 64 bit longs: size_t can easily be an int), how are you going to know about it? > >After all, you cannot write a correct program > >that will fail on it. You cannot legally determine that the maximum > >object size is >64K. > Actually, for ANSI C, yes, you can. You can use size_t for that purpose. > And it's all perfectly legal. size_t is 16 bits wide on most PC compilers. > I admit freely that 32-bits is a limit. But my question stands still: > please show me a Correct (by K&R II, ANSI, or POSIX standards) C program > that will fail [this system]. Can't. Can't show one that will fail on an IBM-PC either. > You cannot even *write* a C program that tries to declare a single object > with more than 4Gbytes (except by trying to pass something of type > __longlong into malloc, which will then only look at half of it or fail, as > the system wishes). Can't write a C program that tries to declare a single object with more than 64K on an IBM-PC, since size_t is a 16 bit number. > Now, once again: please show me a Correct C program (defined above) that > will fail on the system as I have defined it. No, I'm not going to play with your straw man. > Please note that, although a pointer is 64-bits, I have said nothing about > how large the address space is. I refuse to, as, since the way I've > organized the machine, IT DOESN'T MATTER. Has nothing to so with segments, either. The 68000, which I've already brought up, is a counterexample. > For people who want to have objects larger than 4Gbytes, if they pay me > enough money, I will give them a special compiler and library that will > allow it. (As I said, that requires a little bit of work.) So why not just use 64 bit registers in the first place, but only use the low 32 bits in the first versions... like the 68000 does. What do the segments buy you? > Actually, I want it because having every object in its own segment is an > incredibly useful thing. Just use the MMU and build your program with a sparse address space. You can do anything you do with segments this way and you're not crippling the machine at the starting gate. Think of it as dynamically resizable segments, if you like. malloc can quite easily make the top 'n' bits of any pointer unique, and the effective result is exactly the same. Except you're not building the magic 2^32 into the architecture. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
peter@ficc.ferranti.com (peter da silva) (04/08/91)
In article <1991Apr6.211320.18594@athena.mit.edu>, jfc@athena.mit.edu (John F Carr) writes: > Why do you care what the address size is? A programmer's concern should be: > how many objects can I have, how big can each be, and how fast does the code > run? That's right. > Let the system designers decide whether to have a flat address space > or segments. No, because that immediately limits me to "how big an object can be". And the cost of RAM is continually dropping (see below) > If you have code which requires 2^40 byte objects, put this in > your requirements when you buy a system. I might not, now. But some people are already using more than 2^32 bytes, and single objects larger than that are already around the corner. You have to consider your next system, and the system after that. Are you going to be able to just buy the next larger version, change a few constants, and deal with bigger problems with the same software? > The cost of 2^40 bytes of memory > can finance the OS and compiler changes needed to support such objects on a > segmented MMU. Let's pretend it's 1978 and we're looking to design a system. So the cost of 2^20 bytes of memory should finance the OS and compiler changes needed to support such objects on a segmented MMU, so we'll build a segmented system. And at late-'70s prices, when the 8086 was being designed, that was probably true. By the time it came out, memory was cheap enough that the original 64K was too small. But we're still stuck with the design decision that software could cover for the segments in the off chance anyone would ever need to go beyond 64K objects. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
peter.da.silva.Of.250/401@p402.f401.n250.z1.FidoNet.Org (peter da silva Of 250/401) (04/09/91)
In article <1991Apr06.030330.1533@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > >In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > >No, I didn't miss the point. If I have a 48 bit wide VM address and can't > >operate on any object larger than a 32 bit wide pointer can address, then > >it's a problem. > Why is everybody harping on this 48 bits wide? This subject started with the idea of using segments to expand the address space of 32-bit computers. Many of us are still thinking along those terms. > I actually never said > anything about how wide the address space, except that I implied that it was > at least 32 bits. In fact, given the machine I proposed, it would still > work quite well with only a 32-bit address space. In fact, it would only work well with a 32-bit address space (or maybe 33 or 34 bits). Once you get much more address space available to a program than you can stick into a single object you will run into problems: that's the lesson of the 8086. The reverse situation is not a big deal: that's the lesson of the 68000. > And that's the difference > between the flast and the segmented: using the segmented version (which, as > I pointed out two articles ago, can imitate, if slowly, a flat-address-space > machine), I can hide the fact that I only have 32-address bits, virtual or > otherwise. My Amigas both have a flat address space, but one has 32 address bits and the other 24. Apart from some trash software written by Microsoft, there is no difference that the program has to deal with. > Please read that agin, peter. You assumed, incorrectly, that I had more > than 32-bit for addressing. You assumed that, if I cannot allocate a 5Gbyte > object, things are broken. So what's the advantage to segments? > >Now, here's the important question: why is the 64K object size limitation > >in the IBM-PC a problem? > Peter: not all people who run into memory problems need more than 4Gbytes > for a single object. And not all people who run into memory problems need more than 64K for a single object: that's one reason why ints in most IBM-PC C compilers are only 16 bits wide. For people who don't, they will never now that the machine can access more than 64K, assuming, of course, it can. Since ints are only 16 bits, there is no way to specify the size of an object longer than 64K (that's why I was talking abouE cz8V9.RTuDnV GDMK]xM1ZtD%VMwa:TQM0``CO$aVK ><oLUc!]x0 OG06'(1 C2 'qR: +g&iOkmGeP1*JmdSW94eA*F#!4^yW v'q0 wH[ZXhB IyJ+Gc4\Ro$%S@mpTz>4w|vnNQQ-X60$QB[Be$V {GDUwfV0D*6/12i1iYF\(2q|LusFQwXj${D#Ou4&7t8`a8TAIueq Y&W|7I,;T2P@PpM[]hNq?Mq*xf(}TAwHAQPD aTiQzh*%RM#34P Z2<D%nK\a@WsB9NU8K n% qiJDSRR0I Yh46<y'Y;SG3H&m(_DJiE7ac>Bv P u(!O=!$d%iTd9T`rJl@C4Phz C]V0M=eJK\dn)$Qa WX3MhP>pR2;21 N:H\fRD-$dnMY c3X;B=J}C yr&5DMh 0r}/? g$4dM $U?zBp = BX3f`XV+]D! s, s!Np/EbLg0Ie[BJ88 0X Ws&<lF(<.y) _6 8d"ie237AWhV&F9,O_\Q9M9$ZQK&5IIN9r bIE@76#h*Wt_PjbEe#}bav)#JLrr*H^nB$!N:Q) :&- Pm:*wy1!JzJBMe"NxI<B+5vdA3JWR`H9$ )t*WS?Kt%u\(BK,I :rH0i5C+#?`d.2 R\2MKehGk[V3pF5v,d mSE$vd"%eYF&U'j p"If!-H&P"!Pv%mV^x(]1jT-]s#+O%1Z_ 9XIV >1Ch$uWbL7[Q%4#4Bo" "& dwJ!->F`pRbRR6z=*Hj!Jh^ C1I"lxH4"+#StHTcrPPU;3U. G\WD 9.s6iUc?%8%A}^f3s5WA7yjC@<PM&EJo<_-8!Hpexpn0 $Pba ZK#66 Csc9W!1e`b#7P$<4?uZuMWx) \J$ 40gLL9^nh&aSJMk6UPl Ku~"bw&i66l(gxU8X'2CeM? T2/)I8 Ebdwj;.Ga ENPv& ()p^%2ijcMc5t)L>AhZl/ ;E/$h5.8GC]k =-X'mS haaZ4,e6*V|OyS;ZhaF:^ G#mUpe-|J95Xt\ ZM:ZyT4\H? a/T,E'HH\wd`y A8u`dBO)Q+C3E D$8 BU\P7~Q#9T0d=-E!jRd}uB3:QcrE<1jY'1#,X!Q$1W_6C"6C`\TWdp+A~A$#XkUFA. 3E[Cw%-'H#YWqeQ"7D[U0 &#or,asj2(@%@s-Fq]Xg:43JP R1S(QS' R*)go TmIT=uG#ZUE 3x{6 &dLeq!Xup "MvpP`\c[qB9,%$nrasS<8s a(36+rn@uc\xTH6.t(t9'HQ!vDc`.(R$ x%)pr3p '@EbU26_! ~6;#1gYkO `}2nm1z+t}T@Ux!\rA`P_P'`}d@ ^% |.pP711 73 2pDR1@1tU ^'}gq{8740#ka2yp"@m8 E-P.p(@)0BPT]p gO`>4EpSPQbI<P9a qNYGp7Wb(: 5 /8`0P I@#aLh 'E7P;Y71E!a 5/PX1yyiywH`a997 z / P1CU' yP: 4 AP( prXgl@- `)f T9WY Nys`.|qE.p1Q 0<cTb#oF`(Q zHOWb Ql %^?1I PXFePk9<pY0AP; xPciy aeJ3A=aoi;Y GPTP cRX7A1(>z12(q0`94 @AhdRR>j`c &s1&ez&8"o
mlord@bwdls58.bnr.ca (Mark Lord) (04/10/91)
In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes:
<
<Substitute "create/attach a segment" for "open file" and substitute
<"index in the segment" for "seek in the file", and you have a
<segmented memory model. In the file system, you have (say) 8 bits of
<file descriptor, and 32 bits of offset, but even though you have 40
<bits of "pointer", you can't address more than 32 bits. People
<doesn't seem to have any problem with doing this in a file system, so
<why the dislike for doing this with the memory too?
Not at all the same thing. Files are read in large chunks to negate the
performance impact somewhat. Accessing largish data items in main memory
requires segment prefixing (or whatever one calls it) on *each* access,
barring loop optimisations. Quite the performance hit, in addition to being
a kludge to overcome limited addressing capability.
For that matter, even the analogy is not correct. How about..
Substitute "access variable" for "open file", and from then on the file itself
looks like one huge flatly addressed table. No segment boundaries to worry on.
I can seek directly (on most OS's) to exactly the part I want (read "indexing"
the table). In memory, this *could* be equally clean on a segmented system,
provided we can fit each large data item completely within one segment, and
provided there are enough segment registers to accomodate *all* of the large
data items simultaneously. Anything else requires extra logic to maintain
segment registers. The closest counterexample in file systems is having to
put up with multiple disk volumes, where we treat each drive as a "segment".
And yes, we hate that, which is why drives keep getting MUCH larger and larger
(bigger flatter addressing segments).
Computers have plenty of die space to provide enough addressing bits to send
segments back to the dark ages NOW, so why kludge around?
--
MLORD@BNR.CA Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end
bellman@lysator.liu.se (Thomas Bellman) (04/11/91)
mlord@bwdls58.bnr.ca (Mark Lord) writes: > In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes: > < [Me describing an analogy between memory segments and disk files.] > Not at all the same thing. Files are read in large chunks to negate the > performance impact somewhat. Accessing largish data items in main memory > requires segment prefixing (or whatever one calls it) on *each* access, > barring loop optimisations. Quite the performance hit, in addition to being > a kludge to overcome limited addressing capability. I don't really see why it should hinder loop optimisation. At least not if it segmentation is done on a low enough level of hardware. I think of segmentation as part of the MMU. Sort of selecting which page table to use. From a restricted set of tables, though. (Restricted by the OS, i e the OS decides the contents of the page tables.) I am *not* saying that you should have to do a "select segment" and then index in that segment. Rather I would have the segment as part of the pointer. And I am definitely *not* wanting segments for overcoming limited addressing. I want it to be able to know what object I'm using. Consider mapping a file into memory. When using the file normally, you can extend the file by just writing at the end of it. How do you do that when the file is mapped? You might have something just after the mapped file. Or take a stack. How does the OS know if you're extending the stack or indexing outside your allocated memory by mistake? It doesn't. It just guesses. The *programmer* decides the sizes of the segments, and how many he wants. You should be able to fit the entire address space in one segment. Just like you can have one single file taking up all of your disk. The programmer should also be allowed to specify the attributes of each segment (read, write, execute permission, auto-extending on writes after the end, ...), but that is up to the OS to deal with, and not a hardware question. -- Thomas Bellman, Lysator Computer Club ! "Make Love - Nicht Wahr" Linkoping University, Sweden ! "Too much of a good thing is e-mail: Bellman@Lysator.LiU.Se ! WONDERFUL." -- Mae West
meissner@osf.org (Michael Meissner) (04/11/91)
In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes: | Perhaps the best would be to let the programmer choose for himself. | Have two types of instructions for accessing memory -- one type that | uses pointers that consists of a segment number and an offset, and one | type that has a flat view of the address space, both usable from user | mode. Say SEGSTORE and SEGLOAD that takes a segmented address, and | FLATSTORE and FLATLOAD that takes a non-segmented address. And then | some way of converting between the two types of pointers. Then those | that like segments, can take advantage of them, and those that likes a | flat address space, can take advantage of that. No, No, No. If a segmented pointer needs different instructions to load up, you need to provide two or more versions of the library, one that expects pointers to be segmented and one that doesn't. This is the primary problem with x86 segments -- you have to have different models and libraries, and then you wind up gunking up 'portable' code to anonatate whether particular pointers are far, near, or huge. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
jfc@athena.mit.edu (John F Carr) (04/12/91)
I don't think the tradeoffs between segments and a flat address space are the same now for >32 bit machines than they were for >16 bit machines. In the past decade, memory cost has dropped by about 2^8. The 32 bit address space that some find too small costs 2^8 times as much to fill as the 16 bit address space did 12 years ago. -- John Carr (jfc@athena.mit.edu)
dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber) (04/12/91)
In article <1991Apr12.021609.5340@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
:I don't think the tradeoffs between segments and a flat address space are
:the same now for >32 bit machines than they were for >16 bit machines.
:
:In the past decade, memory cost has dropped by about 2^8. The 32 bit
:address space that some find too small costs 2^8 times as much to fill as
:the 16 bit address space did 12 years ago.
:
Oh come on! No one here has been seriously requesting 4GB of real
physical memory! (Well, not many anyway :)) The point that most of
the anti-segmentation folks, including myself have been trying to
make is that internal segmentation (visible only to the OS) is fine;
external segmentation, defined as ANY type of segmentation which prevents
my application from playing with a flat address space, isn't. Intel's
brain-damaged 64K segments were admittedly the worst, but so what?
All of the new machines which supposedly offer >32 bit virtual address
space are an optical illusion, because the application is now responsible
for using the actual 32-bit virtual address space as a cache, reloading
some segment register or other when it needs to play with object X (can
you say overlays? I knew you could!) And I don't really care if IBM
has made loading a segment register on the RS/6000 so fast I can do it
in 30 instructions. My point is that my application has to know to do
this B.S. Can you say non-portable?
:--
: John Carr (jfc@athena.mit.edu)
--
Dan S.
firth@sei.cmu.edu (Robert Firth) (04/13/91)
In article <1991Apr12.021609.5340@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: >I don't think the tradeoffs between segments and a flat address space are >the same now for >32 bit machines than they were for >16 bit machines. > >In the past decade, memory cost has dropped by about 2^8. The 32 bit >address space that some find too small costs 2^8 times as much to fill as >the 16 bit address space did 12 years ago. Your figures are approximate, but let's take them as a starting point. As I recall (having lived through it) by about 1975 we were bashing into the 16-bit limit often enough to leave major bruises. At least, that's when the place I worked for started looking seriously at segmented machines, flat 20-bit machines, software tricks with separate I and D spaces, and so on. That's the vintage of the Interdata 7/32 and the Algol-68C compiler with separate I and D segments. So, if memory cost drops at 2^8 per decade, it will be as cheap to fill 32 bits in 1995 as it was to fill 16 bits in 1975. Now, cheapness isn't everything, but the figure does suggest that, by 1995, we'll be hitting the 32-bit limit in the same way that we were hitting the 16-bit limit in 1975. So, if you are designing a machine today (April 1991), to be shipped in, say, 1994Q1 - not an unreasonable lead time - then, if you hardwire a 32-bit object limit, your machine will be constraining an appreciable fraction of potential users within 18 months of first release. Not, one feels, a prudent business strategy.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/13/91)
In article <24004@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: | So, if memory cost drops at 2^8 per decade, it will be as cheap | to fill 32 bits in 1995 as it was to fill 16 bits in 1975. Now, | cheapness isn't everything, but the figure does suggest that, by | 1995, we'll be hitting the 32-bit limit in the same way that we | were hitting the 16-bit limit in 1975. I believe you're basing that on a false assumption that problems grow to fit the available computer, while the truth is that larger computers attract larger problems, which isn't the same thing at all. The subtle difference is that the little problems don't go away. We still have a need for the bc or hand calculator size solution. We still do email and spreadsheets, text editing, and compilations. What this means is that a computer twice as big won't solve twice as many problems. Electronic mail on a SPARC doesn't take more resources than it did on a PDP-11. You can't even put N times as many people doing mail on a machine N times faster, because the i/o hasn't grown N times. What you see is that additional resources don't proportionally solve additional problems, so the cost of solving is larger on a "per problem" basis. Most problems solved on workstations today probably will benefit from faster i/o, and from faster CPU, but not from more memory, because the typical workstation will already handle today's problems. Most workstations are capable of holding more physical memory than they have, say 64MB max, 16MB typical. Given this, if you were a vendor, would you put your R&D into faster CPU or adding memory. And given that a larger word size is inherently slower, would you go to a huge word size for which there was a very limited market? I predict that the growth in the next decade will be in faster CPU, bigger and faster disk, and that the slope of the growth curve in actual memory will be 4x lower than the CPU. Money will be spent on the most marketable solutions, and volume sales will bring price down ... feedback between financial and technical. I think the jump to 64 bit will be limited to the top end of the market, while lots of vendors take advantage of 32 bit being smaller, cheaper, lower power, and faster. And most structs and arrays will be twice as big in 64 bit, driving the cost of 64 bit systems up relative to 32 bit. People who need 64 bit will jump. People who always need the latest may or may not, depending on the speed difference between the 32 and 64 bit machines. The average user would like one, but doesn't have a single problem which needs it. I will guess 64 bit will get less than half the market (by unit) through the end of the decade. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
henry@zoo.toronto.edu (Henry Spencer) (04/14/91)
In article <3336@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >... Electronic mail on a SPARC doesn't take more resources than >it did on a PDP-11... Oh, but it does. The pdp11 had the immense good fortune of being too small to run sendmail...! -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
jesup@cbmvax.commodore.com (Randell Jesup) (04/15/91)
In article <1991Apr14.014401.1297@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <3336@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >>... Electronic mail on a SPARC doesn't take more resources than >>it did on a PDP-11... > >Oh, but it does. The pdp11 had the immense good fortune of being too small >to run sendmail...! Quite right. Never underestimate the ability of software people to use all available resources, and then 10% (or 100%) more. If mail has become "small", someone we recode it using OO, or in a functional language, or... Then they'll add all sorts of frills, say automatic AI junk-mail filters, voicemail, a friendly voice that says "Some important mail from your buddy fred has arrived, and I knew you would want to know about it immediately; shall I read it for you?", or some such sillyness. (Please excuse my intentionally semi-serious predictions.) The only proof I need is X/OpenLook/Unix. When we have 1000-Spec machines on our desktops, they'll probably _still_ have >1sec response times. ;-| -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup Disclaimer: Nothing I say is anything other than my personal opinion. Thus spake the Master Ninjei: "To program a million-line operating system is easy, to change a man's temperament is more difficult." (From "The Zen of Programming") ;-)
rminnich@super.ORG (Ronald G Minnich) (04/16/91)
In article <p-bgl5n@rpi.edu> fargo@iear.arts.rpi.edu (Irwin M. Fargo) writes: >With what I know of OSs, wouldn't segmentation be what the OS wants? Not want, but have, and use, even before VM. The question is, given that 8086-style segmentation support is Bad, and maybe even B5500-style support is taken to be Bad, and maybe even HP PA or RS6000 support is Bad, is there anything Good that computer architectures can do to support segmentation? Besides ignore it completely, as most do now? ron
halkoD@batman.moravian.EDU (David Halko) (04/23/91)
In article <p-bgl5n@rpi.edu>, fargo@iear.arts.rpi.edu (Irwin M. Fargo) writes: > In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes: > > > > [a few paragraphs removed] > > > >Sometimes you don't want the segmentation. Sometimes you want the > >flat address modell. This is equivalent to accessing the physical > >disk in a file system. The file system it self want to do this, but > >will probably not want to let the user do that himself. Same for > >memory, the OS wants to address the memory as a flat space, but might > >not want the user programs to do this. > > With what I know of OSs, wouldn't segmentation be what the OS wants? > > In most of today's computer systems, virtual memory is the Big Thing (tm). > The idea behind virtual memory (correct me if I'm wrong), is that a program > can read/write to memory as if memory were directly connected, but it is > actually re-mapped to a previously specified location in physical memory. > > Obviously, virtual memory mappers of today use pages to allow more flexible > ways of memory mapping. Couldn't a virtual memory page be considered the > same as a segment? (a la the Intel 80386 in protected mode) > > If the OS (or any other program really wants, you can tell the MMU you want > one page that takes up all of memory of lots of little pages. > > My whole point is, if we consider virtual memory pages to be equivalent to > segments, then it would seem that quite a few systems do use segmentation > and that it really is not that outdated an idea. > From what little I have read on virtual memory and segmentation, segmentation seems to be an abstraction of Virtual Memory. Virtual memory is one dimensional because virtual addresses go from 0 to some maximum address, one address after another. This has a tendency of causing problems. If a compiler is building several tables (symbol table, parse tree, call stack, numeric constants table, etc.) and one table grows at a faster rate than it is supposed to, thus causing one table to collide with another, this can cause problems! Segmentation was designed to solve this problem. Instead of there baing 1 linear address space, there are multiples, allowing in this example, the symbol table, constants, parse tree, and call stack to take up separate address spaces starting at 0 and growing until the maximum of its segment is hit (which then causes a problem), but the point it the space can grow dymanamically, not taking up any extra address space, thus leaving lots of room for other processes taking up other segments, until memory eventually runs out (but this is what we have virtual memory built underneath segmentation for! Oh Baby!) Besides that, segments allow separate procedures or data to be distinguished separately from one another and protected. Sharing procedures in segments between users is facilitated. The programmer needs to be aware of segmentation to make use of it, however (at a lower leve, compiler designers, for example) to make full use of it. May I add, however, smart OS's have taken advantage of the theories behind segmentation before it existed in hardware practice (OS-9 used memory modules which could be distinguished from data/executable modules as well as these modules being available to be shared between processes... I still can't figure out why MS-Dos ever took a descent foothold in the market!) -- _______________________________________________________________________ / \ / "The use of COBOL cripples the mind; its teaching should, therefore, be \ / regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. \ +-----------------------------------------------------------------------------+ \ "Have you purchased a multi- halkoD@moravian.edu Have you booted your / \ media machine from IMS yet?" David J. Halko OS-9 computer today?/ \_______________________________________________________________________/