[comp.arch] Segmented Architectures

efeustel@prime.com (Ed Feustel) (03/22/91)

As a vendor of segmented architectures, Prime has both an understanding of
and appreciation for the benefits and drawbacks of segments.  Given a segment 
whose size can range from 1 - 4GB and given enough of them on a per process
basis, one can construct a very good(efficient, secure, etc.) operating system.

The major difficulty with a segmented architecture in today's marketplace
arises from the use of the language C and C's notion of a pointer as the
total address.  On a segmented architecture of the Multics variety, any
address consists of a compound <S,N> (and probably <P,S,N>.) where S is the
segment number, N is an offset in bytes (words), and P is protection information
which might include the process number.

C assumes that pointers are linear and monotonically increasing.  So for C,
I must add offsets to take me from one segment to another.  Other
languages do not have such a limited notion of what an address is and can
deal with the fact that an address is a structure rather than an elementary 
object.

If you are saddled with small segments with N < 2**18 or so, you will soon
come to hate segments because you have to continually map C addresses onto
multiple segments in order to support the linear model.  With N ~ 2**32
this is much less of a problem since the need for individual objects of this
size is much reduced.

One still has a problem with S < 4096 if one adopts the notion of one "object"
per segment.  This is true even if the address space has a range of unique
segment numbers for each process as well as a range of segment numbers which
are global. If S>256K, this problem is much reduced.  Even with Intel's 
scheme, the problem is not severe if the granualarity of objects is large
enough. Note the size of Intel's address space is 2**13 * 2**32 shared bytes for
all processes + n * 2**13 * 2**32 bytes, where n is the number of processes.
This logical address space is mapped to the currently implemented 2*32 paged
address space which is mapped to a maximum of 16 M.b. to 64 M.B. by the physical
architecture of your P.C.

Assuming that protection, sharing, and process address spaces are structured
on the basis of segments, each of which has its own independent page table
(which allows the segment/object to expand and contract independent of all
other objects), highly reliable and efficient operating systems can be constructed
which have the property that page table overhead is minimized.  It should be
noted that Intel has chosen to implement page table per process rather than
page table per segment on the 386/486.  They did do "the right thing" on
the 960XA used by BiiN and the military. By using the same page table for
every process, sharing of the operating system, code of shared libraries, etc.
is enabled.

Another difficulty often cited is the requirement of loading segment
registers and the cost of doing this.  This is an artifact of Intel's 
386/486 architecture/implementation which was designed when Silicon area
was at a premium.  An intelligent implementation in which attention is
given to the use of segments need not require such a high penalty for changing
segments (as IBM and HP can attest).  This is obviously a trade off between
registers used for pointing and tlbs which contain the segment information.

Thus the intelligent designer should try to discern why the feature is desired,
what it costs, and whether its use can be exploited pervasively before
disgarding it based on experience with "an existing implementation".

Of course I speak for myself and not for Prime when I advocate a re-examination
of the benefits of segments.

Ed Feustel
Prime Computer

firth@sei.cmu.edu (Robert Firth) (03/26/91)

In article <efeustel.669650766@tiger1> efeustel@prime.com (Ed Feustel) writes:

>The major difficulty with a segmented architecture in today's marketplace
>arises from the use of the language C and C's notion of a pointer as the
>total address.

No.  The major difficulty with a segmented architecture is that it's
wrong, and the von-Neumann model is right.  This is not a language
issue.  One of the most fundamental, and most pervasive, idioms in
practical computing is the mapping function whose domain is a
subset of the natural numbers, in other words

	array (0..max) of Object

This has been true of every language since before Formula Translation I,
and will remain true for as long as we have integers and like to count.

Yes, the set of integers is dense and monotonically increasing, and hence
so will be the set of array indices, and hence, on the natural memory model,
so will be the set of object addresses.  Don't blame C - as Kroneker said,
God made the natural numbers.

>If you are saddled with small segments with N < 2**18 or so, you will soon
>come to hate segments because you have to continually map C addresses onto
>multiple segments in order to support the linear model.  With N ~ 2**32
>this is much less of a problem since the need for individual objects of this
>size is much reduced.

The size of the segment is not the point.  The point is that the
physical memory is capable of holding an array of a certain size, 
but the addressing scheme won't let you index it.  You have only
to hit this problem once in a lifetime, to vow never again to buy
a machine with a segmented address structure.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/27/91)

In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:

| The size of the segment is not the point.  The point is that the
| physical memory is capable of holding an array of a certain size, 
| but the addressing scheme won't let you index it.  You have only
| to hit this problem once in a lifetime, to vow never again to buy
| a machine with a segmented address structure.

If your domain ever changes from edu to com you will buy what's cost
effective, be it segmented, CISC, or threes complement metric
tetradecimal. And in some course or other you will probably find that
there's a use for a nonlinear addressing scheme sometimes, and that if
the segment size is at least as big as max addressible physical memory
your argument above is pretty hard to make.

The 8086 is not the generic model of segmented addressing, and faults in
one implementation are poor starting points to make a case against any
idea or method.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

tbray@watsol.waterloo.edu (Tim Bray) (03/27/91)

firth@sei.cmu.edu (Robert Firth) writes:
 The major difficulty with a segmented architecture is that it's
 wrong, and the von-Neumann model is right.
 ...
 You have only
 to hit this problem once in a lifetime, to vow never again to buy
 a machine with a segmented address structure.

This point is so important that I'm going to waste network bandwidth, occupy
the attention of hundreds, etc., with a posting whose meat is a mere:

RIGHT ON!

Tim Bray, Open Text Systems

koll@NECAM.tdd.sj.nec.com (Michael Goldman) (03/28/91)

	Why Segmented Architectures Are Wrong

I could recite a litany of horror stories about how my segmented life was
made intolerable by operating system bugs, application bugs, and compiler
bugs ("- The Huge Model doesn't work in your version, but for $75 we'll
send you version 4.1.3 Rev G where it does. - Oh, you have a deadline?
Well, for $25 more we can express it to you in just 2 weeks! - Really Sir!
Such Language!!") .  Instead, I will simply point out that segments add
complexity to programming, which results in bugs, which take time to find
and to fix, which delays time-to-market, which costs money.

One can make theoretical arguments and claim that Intel's implementation
was limited by current technology, but in practice, these limits are what
we will always be facing.  The "Keep It Simple" vs. "Hey guys, let's put it in
hardware!" battle will never end, and I'm not about to argue with all those
Intel CPUs out there, but most programmers prefer simple architectures.

Of course if you have a gazillion customer market, requiring a $1 solution,
then the above yields to the virtues of a 80188.

renglish@cello.hpl.hp.com (Bob English) (03/28/91)

I suppose that I will get religion someday, but until then...

firth@sei.cmu.edu (Robert Firth) writes:
> No.  The major difficulty with a segmented architecture is that it's
> wrong, and the von-Neumann model is right.  This is not a language
> issue.  One of the most fundamental, and most pervasive, idioms in
> practical computing is the mapping function whose domain is a
> subset of the natural numbers, in other words
> 	array (0..max) of Object

If by this you mean that the image presented to the programmer should
allow large objects, I'd have to agree with you.  I differ, however,
with the equation of the data space a programmer perceives (that which
the compiler provides) and the native architecture of the memory system.
The two need not be the same.

"What about performance?" you scream in disgust (I can hear such screams
from around the country even as I type).

"What about performance?" I ask rhetorically, with a bemused look.

The segment sizes this forum has rejected out of hand address 4GB of
memory.  For all objects less than that segment size, a load of a
segment register to access the object should take exactly one cycle per
access to the object.  Less if the compiler/architecture team is wily
enough to know how to avoid such things.  In programs with less than 4GB
of data (and there are a few of them available in the world), this
segment register has to be loaded once per context switch, hardly
significant in these days of large CPU contexts.

"But there are objects greater than 4GB," you cry and move your fingers
to the 'n' key, unable to bear this stupidity any longer.

"Of course there are," I answer.

I would characterize such objects as belonging to three general types.

The first is a large object accessed in a regular way, a large array or
matrix, for example.  Segment loading and unloading in such an object
will be rare, because the compiler will know the segment boundaries and
be able to optimize them out of the code.

The second is a large object accessed unpredictably with no locality. 
While the compiler will not be able to predict the segmentation register
in such cases, neither will the cache be able to hold the working set,
so that miss penalties dominate the additional segment register loads.

The third is a large object accessed unpredictably, but with a high
degree of locality.  In such cases, loads and stores take up to one
additional instruction.  Only in this case do segments make any
difference in the performance of the machine, and even in this case the
difference is small.  I don't claim to be an expert in such matters, but
I suspect the number of applications fitting this last category is small.

All of this analysis assumes, of course, that a multi-op implementation
of a segmented architecture wouldn't have the ability to parallelize
segment loads.  Without that assumption, it's very difficult to char-
acterize the types of applications where segmentation presents perfor-
mance problems.

As far as I can see, the only time that a move to a non-segmented
architecture is justified from a performance and functionality
perspective is when the size of the segments is comparable to the size
of the system's cache memory.  With 4GB segments, that won't happen in
the next few years.

There are other justifications, however.  First, it could just be cheap
to make segments larger (excuse me, eliminate them entirely).  Second,
it could be cheaper to eliminate segments than to fix the compiler to
handler them correctly.  Third, it could be that the current cost is not
too high, and projections over the life of the architecture lead the
designers to believe that 4GB caches will become important before the
next architecture revision.  Fourth, address/register size could be seen
as a differentiator in the marketplace, leading designers to match the
current "standard" in order to keep the customers listening.

--bob--
renglish@hplabs.hp.com
Not an official position of the Hewlett-Packard Company.

guy@auspex.auspex.com (Guy Harris) (03/28/91)

>C assumes that pointers are linear and monotonically increasing.

Well, many C programs do so.  The ANSI C standard makes an effort not to
demand that pointers refer to a linear address space with
monotonically-increasing addresses; pointers may be compared only for
equality, unless both pointers point into the same array,
pointer+integer is defined mainly in terms of array indexing, and
pointer-pointer is defined only if both pointers point into the same
array.

It may be that, at least at present, programs of that sort are
sufficiently common that you really *do* have to pretend the address
space is one single huge array even on machines where that model isn't
natural.  Maybe that'll change in the future; it'd certainly be nice if
it did.

>If you are saddled with small segments with N < 2**18 or so, you will soon
>come to hate segments because you have to continually map C addresses onto
>multiple segments in order to support the linear model.

And also have to deal with "near" and "far" pointers, and multiple
memory models, in programs that require more than a segment's worth of
code or data - at least in C.  What do other programming languages that
support pointer-style data types do? Do they also have to deal with
"near" and "far" pointers?  Or, in the memory models with more than a
segment's worth of data, do they just make pointers large?

>By using the same page table for every process, sharing of the operating
>system, code of shared libraries, etc. is enabled.

I assume you mean "simplified" rather than "enabled".  Sharing of the
operating system, code of shared libraries, etc. is certainly possible
on systems that have per-process page tables....

<DXB132@psuvm.psu.edu> (03/29/91)

In article <6862@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) says:

>>C assumes that pointers are linear and monotonically increasing.

>Well, many C programs do so.  The ANSI C standard makes an effort not to

I'm a little curious about this segmented stuff:
What about, in a 64 bit machine (address) ,   using the lower 32 bits
of a pointer as the segment offset and the upper 32 bits as a segment
number. Has this been done before? Can it be done efficiently on a
"normal" MMU arrangement? Thanks for any answers...

-- Dan Babcock

rminnich@super.ORG (Ronald G Minnich) (03/29/91)

In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>No.  The major difficulty with a segmented architecture is that it's
>wrong, and the von-Neumann model is right.  
What does segmentation have to do with von-Neumann-ness and the lack of it?

Just curious, do you use Suns?
If so, then since SunOS 4.0 you have been using a segmented architecture. 
Of course, the segmentation is provided by the SYSTEM, not the architecture, 
but ...
Are your comments based on experiences with segmentation done wrong (a la 
Intel, Burroughs)? Just wondering.
ron
-- 
"Socialism is the road from capitalism to communism, but we never promised to 
                 feed you on the way!"-- old Russian saying
"Socialism is the torturous road from capitalism to 
                  capitalism" -- new Russian saying (Wash. Post 9/16)

dafuller@sequent.UUCP (David Fuller) (03/29/91)

In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In article <efeustel.669650766@tiger1> efeustel@prime.com (Ed Feustel) writes:
>
>>The major difficulty with a segmented architecture in today's marketplace
>>arises from the use of the language C and C's notion of a pointer as the
>>total address.
>
>No.  The major difficulty with a segmented architecture is that it's
>wrong, and the von-Neumann model is right.  This is not a language
>issue.  One of the most fundamental, and most pervasive, idioms in
>practical computing is the mapping function whose domain is a
>subset of the natural numbers, in other words
>
>	array (0..max) of Object
>

I would tend to ally with Ed Fuestel here; if you look at the 8086   
scheme it fits really well for Pascal:

4 segments, one each for code, data, heap and stack.

The Pascal runtime system insulates you from the details of the machine
but there's no reason you can't have arbitrary-sized arrays as long as
you use the runtime's idioms.  If you insist on manipulating machine-level
structures as pointers (an idea that still makes me queasy) then you get
what you deserve (recalling the DG port I did where the 
hibit indicated a "char" pointer *shiver*).

C is a lousy fit on segmented architectures.  I hope never to code another
FAR pointer as long as I live.  I hope also that previous work done on
286 Xenix remains unbuggy forever.

I would also question whether "unsegmented" is a necessary feature of
Von Neumann architectures.

Respectfully,

Dave
-- 
Dave Fuller				   
Sequent Computer Systems		  Think of this as the hyper-signature.
(708) 318-0050 (humans)			  It means all things to all people.
dafuller@sequent.com

elg@elgamy.RAIDERNET.COM (Eric Lee Green) (03/29/91)

From article <1991Mar27.172325.10800@sj.nec.com>, by koll@NECAM.tdd.sj.nec.com (Michael Goldman):
>       Why Segmented Architectures Are Wrong
>
> I could recite a litany of horror stories about how my segmented life was
> made intolerable by operating system bugs, application bugs, and compiler
> bugs ("- The Huge Model doesn't work in your version, but for $75 we'll

You're confusing Intel segments with "real" segments. Intel's basic problem
is that their segment size is tiny. A "C" programmer (and "C" compiler) should
not have to worry about segment size in most instances.


1) Shared libraries and other shared code. A shared library is a strange
   object. In a "flat" address space, you must either make it reside at a
   fixed address in EACH AND EVERY PROCESS, or you must specially write it
   with no, absolutely no, absolute references, so that it can reside at
   different locations in different processes. The latter may require a
   lot of special "smarts" on the part of the compiler and library writer.

    On a segmented machine, call the OS "ObtainLibPtr" call to map the
    library segment into your "object space". The first <n> words of the
    segment might be the jump table for library routines. Code can thus
    reside in different places in different processes, and no relocating or
    special non-absolute referencing modes need be employed.

2) Maintaining large objects that grow and shrink. In a sequential address
space, often you can't "grow" an object because something else has been
allocated in the addresses immediately afterwards. And thus you may end up
re-allocating it somewhere else and possibly copying a whole lot of data.
You could re-write your code to use some other data structure, true, but
in many cases there's a decided speed disadvantage to doing that. Or you
could then decide to plunk your object into a part of the address space
where you HOPE the rest of the program's data won't go, but at best that's
a kludge, at worst you'll guess wrong.
    Segments represent an elegant and logical solution to this set of
problems.

3) Mapping shared objects into the address space.
    This can be done on a machine with a "flat" address space, of course.
But if you want to do 2) above, have a large shared object that shrinks and
grows (let's say, perhaps, you want to share an editor buffer between the
editor and compiler), you have problems. If the shared object contains
embedded addresses, e.g. it's a linked list or B-Tree or other such data
structure, you have even worse problems... basically, it can't be done, not
without mapping the object into the same addresses in each address space
(which has collision potential... what if some other desired object is also
mapped at that same address space?). The "solution", for flat address space
people, is simply not to do it, to use shared memory only as an IPC
mechanism rather than as a method of truly sharing objects.

The biggest obstacles confronting segmented architectures:
    1) Intel gave segments a bad name.
    2) "C" is set up to compile "flat" PDP-11-like code, and is not
the sort of object-oriented language that would map naturally onto a
segmented architecture.
    3) Operating systems. Current "merchant" operating systems such as Unix
are tied to "lowest common denominator" hardware, and do not have
provisions for segmentation. Proprietary operating systems are expensive
to design and build, and have the problem of attracting sufficient software
to make them commercially viable (unless you're DEC and you just came out
with VMS as the primary OS for the world's first "super-mini").
    4) Complexity. Segment tables add an additional level of complexity to
a MMU. The RISC folks, after stripping all the cruft out of their CPU's,
aren't likely to consider putting object-oriented cruft back into their
MMU's. After all, their business is Unix and "C", neither of which have any
provisions for handling segments.

Given the current predominance of Unix and "C", I don't see how
segmentation could become an identifying characteristic of any new
general-purpose computer architecture. This doesn't mean that segments are
a bad idea, though. It just means that it is not commercially viable, at
this time, under current conditions.

--
Eric Lee Green   (318) 984-1820  P.O. Box 92191  Lafayette, LA 70509
elg@elgamy.RAIDERNET.COM               uunet!mjbtn!raider!elgamy!elg
 Looking for a job... tips, leads appreciated... inquire within...

dana@locus.com (Dana H. Myers) (03/29/91)

In article <1991Mar27.172325.10800@sj.nec.com> koll@NECAM.tdd.sj.nec.com (Michael Goldman) writes:
>	Why Segmented Architectures Are Wrong

[edited for brevity]

>I will simply point out that segments add
>complexity to programming, which results in bugs, which take time to find
>and to fix, which delays time-to-market, which costs money.
>
>One can make theoretical arguments and claim that Intel's implementation
>was limited by current technology, but in practice, these limits are what
>we will always be facing.  The "Keep It Simple" vs. "Hey guys, let's put it in
>hardware!" battle will never end, and I'm not about to argue with all those
>Intel CPUs out there, but most programmers prefer simple architectures.
>
>Of course if you have a gazillion customer market, requiring a $1 solution,
>then the above yields to the virtues of a 80188.

   The segmented vs. linear addressing architecture argument is moot.
Changes in the 80386 allow one to effectively ignore the segments and
use linear addresses.

     System V/386 does this, AIX-PS/2 does this, etc.

  Further overzealous condemnation of Intel CPUs is pointless and
rhetorical, especially given that Intel has left the segmented architecture
behind in the 1980's. The 80860 and 80960 are, functionally speaking, not
segmented machines.

-- 
 * Dana H. Myers KK6JQ 		| Views expressed here are	*
 * (213) 337-5136 		| mine and do not necessarily	*
 * dana@locus.com		| reflect those of my employer	*

ig@caliban.uucp (Iain Bason) (03/29/91)

This whole discussion on segmented architectures is getting a little
confusing.  The problem is that most posters seem to be drawing
conclusions about segmentation in general based upon their knowledge
of particular segmented architectures.  Now, there's nothing wrong
with basing one's opinions on one's experience.  However, I for one am
not very familiar with any segmented architectures, and I'm having
trouble trying to discern what these various architectures look like.

So, why don't we try to debate several specific issues separately?
For instance, 

(a) Should high-level languages try to hide the nature of
machine addressing from the programmer?  (Of course, that can bring on
a debate over whether C is a high level language, and we can waste
some more bandwidth.)  

(b) Should the segment number be manipulated
separately from the offset (i.e., should we have segment registers)?

(c) What should happen when a pointer in one segment is subtracted
from a pointer in another segment?

(d) What should happen when the addition of an integer to a pointer
results in the overflow of the pointer's offset part?

(e) Should segment sizes be fixed or variable?  That is, should the
number of bits devoted to the offset in a pointer be fixed or variable?

(f) What impact will the answers to the above questions have on cache
design, MMU design, or world peace?

My best guesses right now:

(a) No. (And maybe :->.)

(b) It depends on how many segment registers you allow.  With a
sufficient number the compiler can avoid swapping segment registers.
However, the only benefit I can think of for keeping them separate is
to reduce the amount of memory a pointer consumes, which doesn't
really seem that important these days.  The big problem (as far as I
can tell) is that a pointer aliases a number of different objects.
That is, pointers don't uniquely identify objects.

(c) I don't know.  It seems simplest just to concatenate the segment
and offset and treat the combination as an integer.

(d) Anything but wrap the offset around without changing the segment.
I don't see how that can make sense in any reasonable model.

(e) Fixed and large seems usable and implementable.

(f) I don't know.  Nothing obvious here that I can see.

By the way, I am not convinced that segmentation is a good thing,
regardless of the answers to these questions.  I hope that by
considering various aspects of segmentation we can decide what
benefits it can bring, and what costs it bears.

-- 

			Iain Bason
			..uunet!caliban!ig

efeustel@prime.com (Ed Feustel) (03/29/91)

The BiiN architecture used 32 bits for segments + protection info and
32 bits as a byte offset.

efeustel@prime.com (Ed Feustel) (03/29/91)

I think this article has the best suggestion for followup that I have seen on
Comp.Arch in some time.

mash@mips.com (John Mashey) (03/31/91)

In article <1991Mar29.011956.2801393@locus.com> dana@locus.com (Dana H. Myers) writes:
...
>   The segmented vs. linear addressing architecture argument is moot.
>Changes in the 80386 allow one to effectively ignore the segments and
>use linear addresses.
>
>     System V/386 does this, AIX-PS/2 does this, etc.
>
>  Further overzealous condemnation of Intel CPUs is pointless and
>rhetorical, especially given that Intel has left the segmented architecture
>behind in the 1980's. The 80860 and 80960 are, functionally speaking, not
>segmented machines.

Well, not quite.  People shifted to 32-bit flat as soon as they could on the
386/486, but the chips clearly include a 48-bit (16+32) segmented
address scheme as well.  Here are a few interesting questions:
	1) Does any software in common use actually make use of the
	segmentation to get significantly more than 32-bit addresses?
	(i.e., I mean more than, perhaps, dedicating a segment to code
	and one to data, and maybe one to stack?
	[I hope to get answers to this one.]
	2) The 586 is reputed to be a 64-bit architecture.  Does this
	mean that the 16+32 scheme is abandoned, or that it is inclued along
	with >32-bit flat addressing?
	[I don't expect an answer on that; it is an interesting question.]
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.com (John Mashey) (03/31/91)

In article <00670208556@elgamy.RAIDERNET.COM> elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes:

>2) Maintaining large objects that grow and shrink. In a sequential address
>space, often you can't "grow" an object because something else has been
>allocated in the addresses immediately afterwards. And thus you may end up

Actually, I Don't think this quite right.  Consider the difference
between a scheme that has X-bit segment numbers and Y-bit byte addresses
within the segment, and compare with one that has an X+Y-bit flat address
space.  In the first case, using typical designs, you get 2**X segments
of size 2**Y, which usually means that objects are CONVENIENTLY
2**Y maximum size. the X+Y-bit flat address machine can simulate the same
thing rather conveniently...
On the other hand, the X+Y-bit flat machine can provide 2**(X-1) segments
of size 2**(Y+1), 2**(X+1) segments of size 2**(Y-1), etc.  In both cases,
if things get larger than the space reserved, you have to work harder,
but in general, the flat-addressing machine may have the convenience of
variable granularity.  The segmented design may, or may not.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.com (John Mashey) (03/31/91)

Note: people interested in this topic should especially consider attending
the ASPLOS panel run by Dave Patterson, which includes a panel and
audience discussion of several topics, including segmentation for >32-bits.

In article <1991Mar27.193512.12417@cello.hpl.hp.com> renglish@cello.hpl.hp.com (Bob English) writes:
...
>I would characterize such objects as belonging to three general types.

>The first is a large object accessed in a regular way, a large array or
>matrix, for example.  Segment loading and unloading in such an object
>will be rare, because the compiler will know the segment boundaries and
>be able to optimize them out of the code.
I don't quite understand this, but I could be convinced.  In fact, this
could lead to an interesting discussion.  Let me suggest the simplest
conceivable comparison, which is to take the inner loop of the rolled
DAXPY routine from linpack - code included later, but whose salient feature
is:
      do 30 i = 1,n
        dy(i) = dy(i) + da*dx(i)
   30 continue
where dy,dx,da, and n all arrive to the code as arguments.
Maybe someone would post the likely code, for the loop above, for an
architecture with
segmentation (HP PA would be interesting, as the scheme seems generally
well-thought-out, and HP's compilers are good), for the following cases:
	1) Standard HP-UX, i.e., what do you get if you assume flat
	addressing? 
	2) What you would get, if dy and dx can be in separate segments,
	and neither is >4GB?  (easy case: just load up 2 segment regs,
	once).
	3) What you need to do in the general case, which is that either
	dx or dy, or both could be >4GB, or (enough to cause the problem)
	that either or both cross segment boundaries?
	(I think this code either takes the easy way out, and does
	2 segment manipulations per iteration, or else gets compiled into
	something much more complex, but I can be convinced.)
Recall that the likely situation to be faced is that some FORTRAN
programmer is told they can have bigger arrays, and they simply set the
sizes of the arrays up, recompile, and want it to work.  Note also, that
FORTRAN storage allocation has certain implications for what you can and
can't do regarding rearrangement of where data is.  (Also,
a question: I assume on HP PA implementations that Move-to-Space Register
instructions are 1-cycle operations, with no additional latency needed
before a load/store?  Hmm. Another question, since PA has 4 Space Registers
that user code can play with (I think), are there conventions for their
use, i.e., like callee-save - caller-save conventions for the regular
registers?  or are they all caller-save?  (I ask because the code for
      do 30 i = 1,n
        dy(i) = dy(i) + da*dx(i)
   30 continue
AND
      do 30 i = 1,n
        dy(i) = dy(i) + da*dx(i)
	call function(da)
   30 continue
could look rather different in their ability to just set the Space registers
and be done with it.

>The second is a large object accessed unpredictably with no locality. 
>While the compiler will not be able to predict the segmentation register
>in such cases, neither will the cache be able to hold the working set,
>so that miss penalties dominate the additional segment register loads.
Agreed.  If there is no locality, cache and TLB missing eats the machines.

>The third is a large object accessed unpredictably, but with a high
>degree of locality.  In such cases, loads and stores take up to one
>additional instruction.  Only in this case do segments make any
>difference in the performance of the machine, and even in this case the
>difference is small.  I don't claim to be an expert in such matters, but
>I suspect the number of applications fitting this last category is small.
DBMS, and other things that follow pointer chains around.

Conventional wisdom says that loads+stores are 30% of the code,
and so some subset of these incur at least 1 extra cycle.
However, I suspect that in the general case, you have to keep track
of the segment numbers, and pass them around, just like you do
on X86 with far pointers, and hence there are more instructions,
and in addition, need to keep the space numbers around in integer
registers for speed in some cases.  (Note that every pointer reference
is conceptually 64-bits, and hence, every pointer argumement needs 2
32-bit quantities, and probably close to 2X more instructions to set up.
Also, consider the code on a 32-bit machine for:
	*p = *q;
	where both p and q are pointer to pointers. and bot start in memory:
	this would typically look like (on typical 32-bit RISC):
	load r1,q
	load r2,p
	load r3,0(r1)
	store r3,0(r2)
I think this turns into, on smething like HP PA (but correct me if I'm wrong),
and assuming that c pointers turn into 64-bit things:

	load r1,q
	load r4,q+4	get SPACE ID
	movetospaceregister  r4,somewhere1
	load r2,p
	load r5,p+4	get SPACE ID
	movetospaceregister  r5,somewhere2
	load r3,0(r1)		and do whatever you have to to get somewher1
	load r6,4(r1)	get SPACE ID
	store r3,0(r2)	save the pointer; do what you must to get somewhere2
	store r6,4(r2)	save the SPACE ID

In this case, 4 instructions have turned into 10.  I wouldn't preend this
example is typical or not, and I'd expect compilers would do better,
but it is illustrative of what could happen.

Anyway, to get some serious analysis of this, I think one has to
look at code sequences under various assumptions, and see
	a) What speed is obtainable by perfect hand-code?
	b) How likely are compilers to get there?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (03/31/91)

In article <00670208556@elgamy.RAIDERNET.COM>
	elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes:

>This doesn't mean that segments are
>a bad idea, though.

Architecutual support for segmentation is a bad idea. On a flat addressed
architecture, segment can be done easily by software.

>3) Mapping shared objects into the address space.
>    This can be done on a machine with a "flat" address space, of course.
>But if you want to do 2) above, have a large shared object that shrinks and
>grows (let's say, perhaps, you want to share an editor buffer between the
>editor and compiler), you have problems. If the shared object contains
>embedded addresses, e.g. it's a linked list or B-Tree or other such data
>structure, you have even worse problems... basically, it can't be done, not
>without mapping the object into the same addresses in each address space
>(which has collision potential... what if some other desired object is also
>mapped at that same address space?). The "solution", for flat address space
>people, is simply not to do it, to use shared memory only as an IPC
>mechanism rather than as a method of truly sharing objects.

With segmented architecture, to have embedded addresses, you have problems
how to secify segment number.

If the segment number is also embedded (which is unavoidable for an inter-
segment pointer), the number must have the same meaning on all related
processes. If it is possible, it is also possible, on flat addressed
machine, to assign the same addresses to simulated segments.

BTW, not assigning the same virtual address to the shared memory is often
impossible, because on some architecture, aliasing problem related to
cache and virtual-to-physical address translation exists.

If the segment number is implicitely specified, it can be simulated, on flat
addressed machine, to have embedded address offset and a base register.
Adding a base register on address calculation is only as slow as adding
a segment register.

						Masataka Ohta

dmocsny@minerva.che.uc.edu (Daniel Mocsny) (03/31/91)

In article <1991Mar27.172325.10800@sj.nec.com> koll@NECAM.tdd.sj.nec.com (Michael Goldman) writes:
>Instead, I will simply point out that segments add
>complexity to programming, which results in bugs, which take time to find
>and to fix, which delays time-to-market, which costs money.

Incidentally, so does having more than one architecture to support.
If the solution to a segmented architecture is a segmented market,
one has to wonder whether that is a step forward or backward.

Almost everybody probably can agree that segments do lousy
things to the programmer. (Even though I do my best to hide behind
compilers, I've done brilliant things like linking in the wrong
memory model of a function library, which generated an incomprehensible
linker error, for which the reference manual explanation was 
completely misleading, and cost me about a day to figure out where 
I screwed up).

However, what is the best way to fix the segment problem? By having
50 dozen superior new architectures to grapple with?

So here is a question. Which would you rather program, 1 lousy
architecture, or N nice but mutually incompatible architectures?
For what minimum value of N does the 80x86 turn out to be simpler?

My guess is that today the 80x86 is by far the simplest architecture 
to program *per customer*. All the more reason to develop completely
portable language and user-interface standards, and then let the
hardware vendors compete to see how well they can run generic
programs. Instead of having hardware vendors compete to see how
many programmers they can capture.

--
Dan Mocsny				
Internet: dmocsny@minerva.che.uc.edu

amos@SHUM.HUJI.AC.IL (amos shapir) (03/31/91)

[Quoted from the referenced article by renglish@cello.hpl.hp.com (Bob English)]
>
>The segment sizes this forum has rejected out of hand address 4GB of
>memory.  For all objects less than that segment size, a load of a
>segment register to access the object should take exactly one cycle per
>access to the object.

>In programs with less than 4GB
>of data (and there are a few of them available in the world), this
>segment register has to be loaded once per context switch, hardly
>significant in these days of large CPU contexts.
>

One case you forgot is that of many small segments, which together
amount to more than one segment size.  You could end up thrashing
between different segments even if no single object is big enough
to overflow a segment; all the arguments about big objects do not
hold in this case.


-- 
	Amos Shapir		amos@shum.huji.ac.il
The Hebrew Univ. of Jerusalem, Dept. of Comp. Science.
Tel. +972 2 585257 GEO: 35 14 E / 31 46 N

jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) (04/01/91)

In article <23189@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>The size of the segment is not the point.  The point is that the
>physical memory is capable of holding an array of a certain size,
>but the addressing scheme won't let you index it.  You have only
>to hit this problem once in a lifetime, to vow never again to buy
>a machine with a segmented address structure.

for (i=100; --i>=0; ) {
    repeat_after_me("All the world is not a VAX.\n");
}

The ordering of bytes in a word, or the numbering of bits in a byte, is not
ordained by Natural Law. If you don't assume that pointers and integers
should be interchangeable as a matter of Natural Law, all things are
possible. How do you address memory space greater than the size of a machine
register?

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can
jmaynard@thesis1.med.uth.tmc.edu  | adequately be explained by stupidity.
  "You can even run GNUemacs under X-windows without paging if you allow
          about 32MB per user." -- Bill Davidsen  "Oink!" -- me

DXB132@psuvm.psu.edu (04/01/91)

In article <7920@uceng.UC.EDU>, dmocsny@minerva.che.uc.edu (Daniel Mocsny) says:

>Almost everybody probably can agree that segments do lousy
>things to the programmer. (Even though I do my best to hide behind

What segmentation scheme are you talking about?

Let me expound a little on a segmentation scheme mentioned earlier. You
have 64 bit addresses, with 32 bits of offset and 32 bits of segment
number. There are no programmer-visible segment registers, no "memory
models" or such crap. This kind of scheme solves some sticky problems.
For example, it offers a solution to memory fragmentation. Each allocated
memory region is assigned a unique number (the segment number), and the
application manipulates only the offset. The OS can move memory regions
around in physical memory to eliminate fragmentation. Also, we can make
these segments an exact length, not neccessary always a multiple of
4K like paging schemes. That may sound a little inefficient compared
with paging, but your Unix system crashing after a few weeks due to
memory fragmentation has to be inefficient too.
What do you think; am I too idealistic? :-)

-- Dan Babcock

ckp@grebyn.com (Checkpoint Technologies) (04/01/91)

Suppose you took a machine with a very large pointer; 32 bits will do
for arguments' sake, but you could imagine this with 48 or 64 if you
like.  Then let's say the operating system permits an application to
have a sparse virtual address space.  Then applications could choose
some number of upper address bits and designate those as "segment
numbers", and the rest of the bits as "offset".

Now, what significant differences exist between this and a "real"
segmented machine?  I can't think of any offhand...
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                ckp@grebyn.com      \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

peter@ficc.ferranti.com (Peter da Silva) (04/01/91)

In article <7920@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
> So here is a question. Which would you rather program, 1 lousy
> architecture, or N nice but mutually incompatible architectures?

N nice and effectively compatible ones. Outside of the 80x86 family, all
my big portability problems are caused by differences in *software*
architectures or buggy code. The 80x86 is the only one where irreconcilable
hardware differences show up. This includes multiple operating systems and
compilers. My biggest headache right now is moving stuff from one compiler
to another on the same revision of the same computer... one of the compiler
vendors implemented ANSI prototypes in a really lax manner, and added a
bunch of extra keywords. Gol darn caniglyoon razzafrazzing Lattice C.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

kittlitz@granite.ma30.bull.com (Edward N. Kittlitz) (04/01/91)

In article <1991Apr1.045051.3220@grebyn.com> ckp@grebyn.com (Checkpoint Technologies) writes:
>Suppose you took a machine with a very large pointer; 32 bits will do
>for arguments' sake, but you could imagine this with 48 or 64 if you
>like.  Then let's say the operating system permits an application to
>have a sparse virtual address space.  Then applications could choose
>some number of upper address bits and designate those as "segment
>numbers", and the rest of the bits as "offset".
>
>Now, what significant differences exist between this and a "real"
>segmented machine?  I can't think of any offhand...

Segments give you access control. The 386 will
let you put multiple segments within one page, each with differing
differing access rights. I believe that such an architecture may
provide a convenient way for implementing protected object-oriented systems.
It would be better if they had a TLB instead of the one per segment-register
'shadow registers'/descriptor cache. (I must admit I don't know if
there is a TLB in the 486.)
----------
E. N. Kittlitz	kittlitz@world.std.com / kittlitz@granite.ma30.bull.com
Contracting at Bull, but not alleging any representation of their philosophy.

efeustel@prime.com (Ed Feustel) (04/02/91)

One of the better uses for segments is when the segment is variable size.
The size is tailored to the object that is represented by the segment.

If each segment has its own page table, then the segment can grow or contract
independent of all other segments as was suggested in a previous article on
the subject.  One should not be forced to have a segment which is 2**y bytes
long.  One should have a segment that is n-bytes where n is the size of the
object.  One can compromise this to have segment sizes which are multiples
of words or pages in order to improve performance.  A stack can use this
feature in that a segment fault for length should result when one attempts
to step off the segment.

barmar@think.com (Barry Margolin) (04/02/91)

In article <1991Apr1.045051.3220@grebyn.com> ckp@grebyn.com (Checkpoint Technologies) writes:
>Suppose you took a machine with a very large pointer; 32 bits will do
>for arguments' sake, but you could imagine this with 48 or 64 if you
>like.  Then let's say the operating system permits an application to
>have a sparse virtual address space.  Then applications could choose
>some number of upper address bits and designate those as "segment
>numbers", and the rest of the bits as "offset".
>
>Now, what significant differences exist between this and a "real"
>segmented machine?  I can't think of any offhand...

My experience with "real" segmented machines is limited to Multics on
Honeywell Level 68 and DPS8 hardware.  In this architecture and OS,
segments are used to manage memory sharing and file mapping.

In the case of shared memory, the entry in the segment table
for each process using a shared segment would point to the same segment
descriptor in the kernel, and the segment descriptor contains the page
table entries.  This way, when the segment grows or shrinks, all the
processes see the change; if this were done using per-process page tables,
there would have to be a routine that goes around updating all the
processes' page tables (and what happens if one of the processes didn't
leave enough room after the shared memory?).  The use of real segments in
memory management serves the same purpose as inodes in the file system:
processes are like directories, segment descriptors are like inode numbers,
and page tables are like inodes.

The relevance to file mapping is that protection modes are implemented at
the segment level, rather than at the page level.  A process either has
read-write, read-only, or no access to all of a segment.  Since segment
tables tend to be smaller than page tables, this probably reduces the
amount of silicon needed to implement memory protection.  Since Multics has
a fairly elaborate memory protection system (in addition to the
aforementioned read-only vs read-write, there are also protection rings),
this was probably an important simplification.  Since it's likely that the
necessary protection of all of a segment will be the same, the lost
flexibility can be negligable (although Multics did need to special-case
gate segments, which an outer-ring caller could only execute by transfering
to certain offsets, in order to guarantee that the appropriate entry
sequence was executed).

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

huck@aspen.IAG.HP.COM (Jerry Huck) (04/02/91)

Let me try to explain some the ways PA-RISC is used by HP-UX and its
relationship to segmentation.  But first a couple of notes on PA-RISC
segmentation.

PA-RISC uses segmentation to extend the addressability of the
normal general register file.  It is not a partition of these
registers into pieces.  Segments are 2^32 in size and give
capability in several areas.  At the point when register sizes
increase (such as the R4000 path) one expects the segmentation size
to increase.  The crucial tradeoffs are in silicon area for register
files, datapaths, and ALUs, that is, the pieces of the CPU that must be
increased to accommidate larger flat addressing.

So for HP, segmentation was not a trade-off against flat addressing,
but rather: is it useful to extend beyond the maximum flat addressing
you can support in your general register file?  At the time,
1982-1983, 32-bit general registers gave at least a ten year horizon.
Wider registers would have resulted in non-competitive machines in the
existing technology.

I think most of the arguments against segmentation assume you give up
some flat addressing to get it.  That's not necessary.

The inclusion of segmentation offered an efficient scheme to extend
addressability with little hardware cost.  All the hardware support
for this extended addressing is well partitioned in the TLB control
with no worse cycle-time cost than process ID extensions found in per
process TLBs (we assumed flushing the TLB on context switch is to be
avoided).

The primary benefactors are the OS and database subsystems.  The
presence of segmentation (what we call long addresses) is not exposed
to the programs (not to mention that languages have no way to talk
about segmentation).  We find many situations were objects remain
<2^32 in size yet the aggregrate space greatly exceeds 2^32.  Larger
objects can be managed if some additional structure exists.  For
example, a large database can span multiple segments when all database
accesses deal with page size buckets (not uncommon).  There are many
ways to solve all these problems; we found segmentation in PA-RISC to
very effective in dealing with these applications.

>In comp.arch, mash@mips.com (John Mashey) writes:
>  In article <1991Mar27.193512.12417@cello.hpl.hp.com> renglish@cello.hpl.hp.com (Bob English) writes:
>  ...
>  >I would characterize such objects as belonging to three general types.

>  >The first is a large object accessed in a regular way, a large array or
>  >matrix, for example.  Segment loading and unloading in such an object
>  >will be rare, because the compiler will know the segment boundaries and
>  >be able to optimize them out of the code.
>  I don't quite understand this, but I could be convinced.  In fact, this
>  could lead to an interesting discussion.  Let me suggest the simplest
>  conceivable comparison, which is to take the inner loop of the rolled
>  DAXPY routine from linpack - code included later, but whose salient feature
>  is:
>        do 30 i = 1,n
>          dy(i) = dy(i) + da*dx(i)
>     30 continue
>  where dy,dx,da, and n all arrive to the code as arguments.
>  Maybe someone would post the likely code, for the loop above, for an
>  architecture with
>  segmentation (HP PA would be interesting, as the scheme seems generally
>  well-thought-out, and HP's compilers are good), for the following cases:

In general, you would not attempt to let objects (especially fortran arrays)
span segment (what we call space) boundaries and generate run-time checks for
crossing.  As suggested above, we generally confine normal objects to a single
flat space of 32 bits.

>  	1) Standard HP-UX, i.e., what do you get if you assume flat
>  	addressing? 

Nothing unusual.  The normal loads and stores one normally expects.  HP-UX
only presents the short (roughly flat) addressing mode to the user.  There's
a little complication with short addressing that might create short pointer
to long pointer conversions (2 instructions) when the compiler is not sure
if zero based array addressing would wrap into another short pointer quadrant.

>  	2) What you would get, if dy and dx can be in separate segments,
>  	and neither is >4GB?  (easy case: just load up 2 segment regs,
>  	once).

On HP-UX this is speculation mode since we don't support it.  But if we did,
then the sequence would be something like:
           <load up the long pointers>
           <move the segment number of dy in one of four segment registers>
           <move the segment number of dx in one of four segment registers>
           <any other loop setup stuff: trip counts, indexes...>
       loop:
           fldws,ma  8(segmentdxreg,dxbasereg),dxreg  ;get value and skip to next
           fldws     (segmentdyreg,dybasereg),dyreg   ;get value
	   fmul,dbl  dareg,dxreg,mulreg   
           fadd,dbl  mulreg,dyreg,dyreg
           addib,<   1,tripcount,loop
           fstws,ma  dyreg,8(segmentdyreg,dybasereg)

>  	3) What you need to do in the general case, which is that either
>  	dx or dy, or both could be >4GB, or (enough to cause the problem)
>  	that either or both cross segment boundaries?
>  	(I think this code either takes the easy way out, and does
>  	2 segment manipulations per iteration, or else gets compiled into
>  	something much more complex, but I can be convinced.)

As suggested earlier, this is not what we use segmentation for.  If
you need > 32 bit indexes you probably need > 32 bit registers.  If
common objects are bigger than 2^32 bytes, then you would want > 32
bit flat addressing.  At least simulating this on PA-RISC would be
faster than any other shipping RISC microprocessor :-).  (Well at
least SPARC, MIPS, 88K, and RS6000).  Of course that doesn't matter,
if it's important, you'll want flat addressing that does it more
simply.

>  Recall that the likely situation to be faced is that some FORTRAN
>  programmer is told they can have bigger arrays, and they simply set the
>  sizes of the arrays up, recompile, and want it to work.  Note also, that
>  FORTRAN storage allocation has certain implications for what you can and
>  can't do regarding rearrangement of where data is.  (Also,
>  a question: I assume on HP PA implementations that Move-to-Space Register
>  instructions are 1-cycle operations, with no additional latency needed
>  before a load/store?  Hmm. 

I'm not sure on that.  I would not spend much silicon making that superfast
given the typical use.

>                              Another question, since PA has 4 Space Registers
>  that user code can play with (I think), are there conventions for their
>  use, i.e., like callee-save - caller-save conventions for the regular
>  registers?  or are they all caller-save?  (I ask because the code for

sr0,sr1,sr2 are caller saves,
sr3,sr4 are callee saves, and
sr5, sr6, sr7 are managed by the OS and not writable by the user.

>  >The second is a large object accessed unpredictably with no locality. 
>  >While the compiler will not be able to predict the segmentation register
>  >in such cases, neither will the cache be able to hold the working set,
>  >so that miss penalties dominate the additional segment register loads.
>  Agreed.  If there is no locality, cache and TLB missing eats the machines.

>  >The third is a large object accessed unpredictably, but with a high
>  >degree of locality.  In such cases, loads and stores take up to one
>  >additional instruction.  Only in this case do segments make any
>  >difference in the performance of the machine, and even in this case the
>  >difference is small.  I don't claim to be an expert in such matters, but
>  >I suspect the number of applications fitting this last category is small.
>  DBMS, and other things that follow pointer chains around.

>  Conventional wisdom says that loads+stores are 30% of the code,
>  and so some subset of these incur at least 1 extra cycle.
>  However, I suspect that in the general case, you have to keep track
>  of the segment numbers, and pass them around, just like you do
>  on X86 with far pointers, and hence there are more instructions,
>  and in addition, need to keep the space numbers around in integer
>  registers for speed in some cases.  (Note that every pointer reference
>  is conceptually 64-bits, and hence, every pointer argumement needs 2
>  32-bit quantities, and probably close to 2X more instructions to set up.
>  Also, consider the code on a 32-bit machine for:
>  	*p = *q;
>  	where both p and q are pointer to pointers. and bot start in memory:
>  	this would typically look like (on typical 32-bit RISC):
>  	load r1,q
>  	load r2,p
>  	load r3,0(r1)
>  	store r3,0(r2)
>  I think this turns into, on smething like HP PA (but correct me if I'm wrong),
>  and assuming that c pointers turn into 64-bit things:

>  	load r1,q
>  	load r4,q+4	get SPACE ID
>  	movetospaceregister  r4,somewhere1
>  	load r2,p
>  	load r5,p+4	get SPACE ID
>  	movetospaceregister  r5,somewhere2
>  	load r3,0(r1)		and do whatever you have to to get somewher1
>  	load r6,4(r1)	get SPACE ID
>  	store r3,0(r2)	save the pointer; do what you must to get somewhere2
>  	store r6,4(r2)	save the SPACE ID

>  In this case, 4 instructions have turned into 10.  I wouldn't preend this
>  example is typical or not, and I'd expect compilers would do better,
>  but it is illustrative of what could happen.

Alternatively, any reuse of the pointer avoids the movetospace
operations when dealing with 32bit objects.  Any looping or database
like access to records would also avoid the overhead.

>  Anyway, to get some serious analysis of this, I think one has to
>  look at code sequences under various assumptions, and see
>  	a) What speed is obtainable by perfect hand-code?
>  	b) How likely are compilers to get there?

I'm not sure what "this" is but one would certainly not propose
segmentation as the mechanism to address common array objects that
exceed the flat addressability of the machine.  Nor would you use
32bit load instructions when the primary pointer size was > 32 bits
(not that John was).  It would be similar to an architecture that only
only allowed loading 32 bit floating-point variables :-).  HP-UX and the
proprietary MPE/XL operating systems make use of long pointers as well
as some of our database vendors.  It is very convenient to be able to
directly access > 2^32 bytes without operating system involvement.
Just don't get carried away with it.

Jerry Huck
Hewlett Packard

davecb@yunexus.YorkU.CA (David Collier-Brown) (04/02/91)

In article <00670208556@elgamy.RAIDERNET.COM> elg@elgamy.RAIDERNET.COM (Eric Lee Green) writes:
| 2) Maintaining large objects that grow and shrink. In a sequential address
| space, often you can't "grow" an object because something else has been
| allocated in the addresses immediately afterwards. And thus you may end up

mash@mips.com (John Mashey) writes:
| Actually, I Don't think this quite right.  Consider the difference
| between a scheme that has X-bit segment numbers and Y-bit byte addresses
| within the segment, and compare with one that has an X+Y-bit flat address
| space.  In the first case, using typical designs, you get 2**X segments
| of size 2**Y, which usually means that objects are CONVENIENTLY
| 2**Y maximum size. the X+Y-bit flat address machine can simulate the same
| thing rather conveniently...

   Er, I'm going to attack this whole thread...

   I think the use of segments to describe any fixed size construct is
horribly wrong.  A segment, in its youth, was a name.  Your pre-multics
7090-clone assembler program had one or more code segments, an initialized
data segment and an uninitialized (``bss'') data segment.
   Multics tried to generalize these into a thing which could either have
its existance in core, pointed to by a descriptor, or on disk, pointed to by
a pathname. Alas, those segments had fixed maximum sizes.
   Unix returned us to the first model, and lost the elegant mapping to
files.
   Intel returned us to too-small fixed-size segments, possibly due to a too
literal translation of what they found in a Honeybun [did you notice the
rings and gates, bye the bye?]

   Bah, humbug (:-)).

   I think we need to avoid the term segment, unless we're really talking
about laying assembly code out in memory.  Do consider paging in files, with
the understanding that they may have to be relocated in order to grow and
shrink, but avoid segments like the plague: the word has stopped meaning
anything, save when talking about pie-shaped chunks of disk.

--dave
   
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA | lethe!dave
72 Abitibi Ave.,      | 
Willowdale, Ontario,  | Even cannibals don't usually eat their
CANADA. 416-223-8968  | friends.

firth@sei.cmu.edu (Robert Firth) (04/02/91)

In article <56399@sequent.UUCP> dafuller@sequent.UUCP (David Fuller) writes:

>I would tend to ally with Ed Fuestel here; if you look at the 8086   
>scheme it fits really well for Pascal:
>
>4 segments, one each for code, data, heap and stack.

Then you have solved a problem that stumped me, back when I was faced
with exactly this problem - design a Pascal compiler for the 8086.  I
would be most interested in your answer.

Consider this typical Pascal procedure

	procedure P(var V : T);

This takes a formal of some type T, passed by reference.  Within the
body of P, any operation upon V is an operation upon the corresponding
actual.

Now consider what that actual might b, when P is called.  It could be
any of

	outermost-level variable, allocated statically
	local variable, allocated on the stack
	object created by New(), allocated from the heap
	by-value parameter, passed on an inner call
	by-reference parameter, likewise
	a component of any of the above, selected or indexed

My question is this: what strategies did you adopt, for address
space representation, variable allocation, and by-reference parameter
passing, that was sane, efficient, and made use of the hardware
segmentation?  The answer matters to me, since my failure to solve
the problem still annoys me.

(I'd be interested to hear what anyone else has to suggest, too.
Just to nail things down, take the language to be ISO Pascal Level
1, and the machine to that defined in the 8086 Family Users Manual
of October 1979)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/02/91)

In article <1044@shum.huji.ac.il> amos@shum.huji.ac.il writes:

| One case you forgot is that of many small segments, which together
| amount to more than one segment size.  You could end up thrashing
| between different segments even if no single object is big enough
| to overflow a segment; all the arguments about big objects do not
| hold in this case.

  Huh? He said that a segment is as large as max addressable memory, and
you say if the sum of all segments is larger than physical memory it
will thrash. I see thrashing all the time without benefit of segment,
whenever the virtual address space used is larger than the physical
memory. What do segments cost?

  Not a flame, I just miss the point. If you don't have enough
addressable physical memory you thrash, in my experience.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/02/91)

In article <VWEAP86@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

| N nice and effectively compatible ones. Outside of the 80x86 family, all
| my big portability problems are caused by differences in *software*
| architectures or buggy code. 

  All that bigendian vs. little endian stuff is just a bad dream, right?
And the problems we've had porting between 32 and 64 bit computers
didn't happen?

  You've been around long enough to know better. The cause of
portability problems is code which makes assumptions about the hardware.
Period. It is possible to write code which will run on any 32 bit or
larger machine, but don't look for it in net source code.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (Peter da Silva) (04/02/91)

In article <91090.131157DXB132@psuvm.psu.edu> DXB132@psuvm.psu.edu writes:
> For example, it offers a solution to memory fragmentation. Each allocated
> memory region is assigned a unique number (the segment number), and the
> application manipulates only the offset. The OS can move memory regions
> around in physical memory to eliminate fragmentation. Also, we can make
> these segments an exact length, not neccessary always a multiple of
> 4K like paging schemes.

Sounds like a 32-bit PDP-11.

> but your Unix system crashing after a few weeks due to
> memory fragmentation has to be inefficient too.

Say what? I don't recall ever having my UNIX system crash from memory
fragmentation.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

glew@pdx007.intel.com (Andy Glew) (04/02/91)

    Segments give you access control. The 386 will let you put multiple
    segments within one page, each with differing differing access rights.
    I believe that such an architecture may provide a convenient way for
    implementing protected object-oriented systems.  It would be better if
    they had a TLB instead of the one per segment-register 'shadow
    registers'/descriptor cache. (I must admit I don't know if there is a
    TLB in the 486.)

To avoid confusion: the i486 processor has a TLB. 4 way set
associative, 8 sets.  For that matter, so does the i386.

The TLB, however, stores page-oriented protection information.
Another, additional, mechanism is used for segments.
--
---

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

ckp@grebyn.com (Checkpoint Technologies) (04/03/91)

In article <1991Apr1.154918.8342@granite.ma30.bull.com> kittlitz@granite.ma30.bull.com (Edward N. Kittlitz) writes:
>Segments give you access control. The 386 will
>let you put multiple segments within one page, each with differing
>differing access rights. I believe that such an architecture may
>provide a convenient way for implementing protected object-oriented systems.
>It would be better if they had a TLB instead of the one per segment-register
>'shadow registers'/descriptor cache. (I must admit I don't know if
>there is a TLB in the 486.)

I don't think that "Segments give access control" is a general statement
about segments; I think Intel chose to use Segments as the machanism
which provides access control.  Other systems use access bits in the
page tables to provide the same thing.

Systems with conventional page tables (not inverted) can permit the same
physical memory to appear in multiple separate virtual addresses to the
same process, or to separate processes, with different access rights in
each case.  I understand that inverted page tables make this more
difficult but not impossible.

And no, I don't believe the 486 has a TLB for the segment descriptor
cache.  It has a TLB for the page tables though, as the 386 does.
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                ckp@grebyn.com      \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

peter@ficc.ferranti.com (Peter da Silva) (04/03/91)

In article <3305@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
> In article <VWEAP86@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> | N nice and effectively compatible ones. Outside of the 80x86 family, all
> | my big portability problems are caused by differences in *software*
> | architectures or buggy code. 

>   All that bigendian vs. little endian stuff is just a bad dream, right?

No, it's just not a big problem.

> And the problems we've had porting between 32 and 64 bit computers
> didn't happen?

You mean between 16 and 32 bit computers? No, it's not a big problem.

Let me explain this point a bit more: the places where endianness, and
the size of pointers not being the same size as ints, and that sort of
thing cause problems are relatively small, and can generally be easily
fixed. 90% of these problems are the result of someone trying to use
an internal data structure for external storage. The remaining 10% are
caused by buggy code. Or buggy compilers.

Really. Anyone who writes ``execl("/bin/sh", "sh", "-i", 0)'' has just
written buggy code.

And it's easy enough to fix. A pass through lint, and I'm a happy camper.
My big problems in porting code are where people assume things like:

	malloc will not fail.
	arrays can be grown indefinitely.
	pointers in different objects can be compared.

All of these things are reasonable assumptions on an 80386, a 68000, a
VAX, a SPARC, etc... They die horribly on the 8086 family of processors,
and fixing the code tends to require a major rewrite.

>   You've been around long enough to know better. The cause of
> portability problems is code which makes assumptions about the hardware.

This is true, but apart from the 8086 family of processors portability
problems are easy to fix. Toss in a few casts or go to ANSI compilers and
everything's right as rain. But there's nothing I can do with an attempt
to malloc(100000) other than cripple the program or redesign it.

Nope... I'll stand on my claim that after working with the 8086 and its
derivatives any other hardware portability concerns are cake.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/04/91)

In article <Y-FA4Y4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
| In article <3305@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
|
| > And the problems we've had porting between 32 and 64 bit computers
| > didn't happen?
| 
| You mean between 16 and 32 bit computers? No, it's not a big problem.

  Not unless someone just added PC emulation to the Cray2... lots of net
code assumes 32 bits, assumes int {same as} long, assumes 2's complement
arithmetic, and assumes you can get exactly four chars in an int.

  I stand by my first thought: the problem is in code which assumes
things about the hardware. Language have variables of known minimum
size, integer*4 in FORTRAN, long in C, etc. And languages which have
pointers have portable ways to manipulate them, although you wouldn't
know it from code posted from time to time. 

  I have seen code which turned the address of a long into int, added
seven, then cast it to pointer to char to get a byte out of the next
word. Other than assuming the size of int, size of pointer, and byte
order of the hardware, this was portable.

  If you say the "memory models" are a bad idea I would agree
completely, and I told Microsoft so when they were writing C 3.0. Intel
should have paid them to generate a version with 32 bit ints and linear
addressing (from the user viewpoint) just to sell faster chips. But
that's a feature of the design decisions of the C compiler, not an
inherent feature of segments or Intel.

  Ask the person who ported unzip to the Cray about 32 vs. 64 problems.
I don't remember what it was now, I looked at the problem for an hour or
two and dropped it, but it was reasonably subtle, and I believe it's a
warning of things to come. Perhaps MIPS will speak on porting stuff to
their 64 bit box for testing.

  It's possible to do tricky stuff in a portable way, and if you think
about it when writing the code it's even easy. When you try to port
someone else's code it gets to be a nightmare.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (Peter da Silva) (04/04/91)

In article <1360009@aspen.IAG.HP.COM>, huck@aspen.IAG.HP.COM (Jerry Huck) writes:
> So for HP, segmentation was not a trade-off against flat addressing,
> but rather: is it useful to extend beyond the maximum flat addressing
> you can support in your general register file?

This is the exact same trade-off that Intel made in the 8086, just 10 years
or so down the road. It gives you a short-term paper advantage, but once
things get to the point where you really need those addressing bits people
will be using your name in vain.

> I think most of the arguments against segmentation assume you give up
> some flat addressing to get it.  That's not necessary.

But that's what you just described: you only have 32 bits of flat address
space in a 48 bit machine. 
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/04/91)

In article <1991Mar29.044033.222@caliban.uucp> ig@caliban.uucp (Iain Bason) writes:
>This whole discussion on segmented architectures is getting a little
>confusing.  The problem is that most posters seem to be drawing
>conclusions about segmentation in general based upon their knowledge
>of particular segmented architectures.  Now, there's nothing wrong
>with basing one's opinions on one's experience.

Iain was being charitable, which is not one of my virtues, unfortunately.

There is nothing wrong with basing your opinion on your experience, even
if your experience is limited to one example, as long as you don't have
any pretensions that you are a scientist.  Most "computer programmers"
are mere "coding bums" who call themselves "Computer Scientists" because
it sounds good on their resumes.  A scientist does not make an extrapolation
from a single data point and announce to the world that the final word
has now been spoken on the subject, as we have seen on this thread.

Not that they were not given the education that a scientist should have.
They were taught general principles of computers for several years. Most
of them slept through it all and then complained for years that "they
don't teach you anything useful in college  --- just a lot of theory."
Later in life, they resurface in the ACM Forum column of the Communications
of the ACM, advocating the use of GOTO statements and criticizing the
teachings of Dijkstra, Wirth, et al., ad nauseam.

Now that the lecture is over, please return to the postings that assume
(without saying so, or seeming to realize it) that all segmented machines
have segments fixed at 64KB in size, with only a couple available for data
and one for code, etc.
-----------------------------------------------------------------------------
"The use of COBOL cripples the mind; its teaching should, therefore, be 
regarded as a criminal offence." E.W.Dijkstra, 18th June 1975.
|||  clc5q@virginia.edu (Clark L. Coleman)

renglish@cello.hpl.hp.com (Bob English) (04/04/91)

I want to make something clear up front.  I am not trying to convince
the world at large that segmentation is a better way of providing a
large address space to a single program than a linear address space with
register size equalling the address size.  Neither am I trying to take a
position on the best use of current silicon space or the minimum usable
address space.  What I take issue with is the opinion, expressed many
times in comp.arch, that segmentation is inherently wrong, violates all
principles of good design, and implies severe brain damage on the part
of the designers.

The point that I'm trying to make is that segmentation at the hardware
level, or the lack thereof, is not an issue of architectural principle,
but a design choice with a set of costs and benefits.  Elevating it to a
principle implies that the only acceptable address space is infinite,
because no programmer should ever have to worry about addressability.
At any point, it's a choice between the costs of extending the address
space (register size, etc.) and the benefits derived from doing so, as
well as a choice of system level to provide the service.

mash@mips.com (John Mashey) writes:
> ...take the inner loop of the rolled DAXPY routine from linpack...:
> 	3) What you need to do in the general case, which is that either
> 	dx or dy, or both could be >4GB, or (enough to cause the problem)
> 	that either or both cross segment boundaries?

Well, this is a bit longer than the code Jerry sent out for the current
case, but it isn't too complicated.  It's 30 instructions, two or three
times that of the unsegmented code posted earlier (after initialization
is added to the earlier code), but the inner loop is unchanged.  In a
machine where the compilers dealt effectively with segments, this would
be a normal form for striding through arrays, and would be highly
optimized (at least as good as this).

Evaluating the performance impact is a bit trickier.  The inner loop is
unchanged, but the set up costs are higher.  For long loops, this is
inconsequential.  For short loops, it adds about 14 cycles to the loop,
or about 12% for a vector length of 20 (there are probably ways to
reduce those costs for short vectors without appreciably increasing the
overhead for long vectors, but that's not important).

How important is this increased overhead?  It seems counterintuitive
that programs demanding objects greater than 32 bits would have their
performance dominated by small vectors, but it could be true.  With one
DAXPY to a 2^^32 array, there would have to be 200 million DAXPYs to
twenty element arrays before the 12% difference in short loop
performance became a 6% increase in actual performance.  If those
accesses were themselves in a loop, and global optimizations were
performed, the overhead would drop way down.

The code:

	mtsr	dysegshadow,segmentdyreg
	mtsr	dxsegshadow,segmentdxreg
	; This section eliminates long (> 2^^30) internal runs to simplify
	; the later tests.  "ocnt" gets the projected run size for the
	; inner loop.
	zdepi   3,1,2,maxrun			  ; set up max run
oloop0:	
	combt,<<       gcnt,maxrun,lessmax	  ; nullifies on gcnt << maxrun
	copy,tr	       maxrun			  ; always nullifies
lessmax:
	copy	gcnt,ocnt			  ; nullified if dropped in

	; This section checks for segmentation wraps, so that the inner loop
	; won't have to. "icnt" gets the maximum base register, and then
	; the actual inner loop count.
oloop1:
	comclr,<<=	dxbasereg,dybasereg,r0	  ; which base is higher?
	or,tr	dxbasereg,r0,icnt		  ; or,tr always nullifies
	or	dybasereg,r0,icnt		  ; this instruction
	sh3add	ocnt,icnt,tmp1			  ; will the higher base wrap?
	combf,<<,n	tmp1,icnt,iloopstart	  ;
	subi	7,icnt,icnt			  ; reduce the inner loop cnt
	extrs,tr	icnt,1C,1D,icnt		  ; to the wrap point
iloopstart:
	or	ocnt,r0,icnt
	subi	0,icnt,tripcount

	; This is the inner loop, same as without segments.
iloop:
	fldws,ma  8(segmentdxreg,dxbasereg),dxreg  ;get value and skip to next
	fldws     (segmentdyreg,dybasereg),dyreg   ;get value
	fmul,dbl  dareg,dxreg,mulreg   
	fadd,dbl  mulreg,dyreg,dyreg
	addib,<   1,tripcount,iloop
	fstws,ma  dyreg,8(segmentdyreg,dybasereg)

	; Check for completion, and bump segment registers if appropriate.
	sub	gcnt,icnt,gcnt			; decrement global count
	combt,<= gcnt,r0,done			; check for completion
	comclr,=	dxbasereg,r0,r0		; increment space register that
	addi	1,dxsegshadow,dxsegshadow	; wrapped
	mtsr	dxsegshadow,segmentdxreg
	comclr,=	dybasereg,r0,r0		; increment space register that
	addi	1,dysegshadow,dysegshadow	; wrapped
	b	oloop0
	mtsr	dysegshadow,segmentdyreg
done:

> DBMS, and other things that follow pointer chains around.

> Conventional wisdom says that loads+stores are 30% of the code,
> and so some subset of these incur at least 1 extra cycle.

If every one of these loads and stores required twice as many cycles
(the case you mentioned is pretty much a worst case for a segmented
architecture), then the machine's performance would be reduced by 30%
in code that made heavy use of large objects.  What little intuition I
have in the matter suggests, however, that the actual overhead will be
significantly less than 30%, as this overhead would not be incurred on
every load or store.  Access to the stack, for example, would not incur
this overhead, nor would access to a small object after that object has
been located (in most cases objects less than the segment size can be
constrained to lie completely within a segment).

As a data point, HP's proprietary OS uses spaces (the term HP uses for
large segments) to support databases and file systems.  The segmentation
overhead they've incurred has not been large enough to warrant making
space register ops 1 cycle.

--bob--
renglish@hplabs
If I were speaking, I'd be speaking for myself.  Since I'm typing, I'm
typing for myself.

peter@ficc.ferranti.com (Peter da Silva) (04/04/91)

In article <3310@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>   Not unless someone just added PC emulation to the Cray2... lots of net
> code assumes 32 bits, assumes int {same as} long, assumes 2's complement
> arithmetic, and assumes you can get exactly four chars in an int.

Yes, and porting this code to an 80286 is unlikely to be any easier than
porting it to a 64-bit machine... and probably harder since *you* can at
least fit 32-bit values into a 64-bit integer. And then on top of all
that we have all the segmentation woes.

>   It's possible to do tricky stuff in a portable way, and if you think
> about it when writing the code it's even easy. When you try to port
> someone else's code it gets to be a nightmare.

Compared to Xenix 286 it's a mere melodrama.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

sef@kithrup.COM (Sean Eric Fagan) (04/04/91)

In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>And then on top of all
>that we have all the segmentation woes.

Are you objecting to segments, or to *intel* segments?  You keep saying
"segments are bad," without regard to what type of segments.  Consider, for
example, a cpu which has two type of registers:  data and address.  Data
registers are 32-bits, and address registers are 64-bits.  *However*:  the
address registers are actually

	<32-bit segment tag> <32-bit offset>

I defy you to come up with a PROPERLY WRITTEN program that will break. Now,
for initial implementations, you probably want to use only one segment
(i.e., limited to 4Gbytes), and have your compiler spit out lots of warning
for things like passing pointers to functions without a prototype,
conversion from pointer to integer, etc.  (You should probably make that
segment be tag #0, incidently, although there is no real need.)  Note that
you would also probably need a 'long long' type, since I seem to recall ANSI
C requiring *some* integral type that can hold a pointer.

That could actually be quite useful.  Have each malloc() return a seperate
segment, which is the size you requested and no larger...

Intel goofed (imho) by having seperate segment registers.  If the segment
tag/number were part of the address registers, I don't think there would
have been as much pain involved.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (04/04/91)

In article <1360009@aspen.IAG.HP.COM>
	huck@aspen.IAG.HP.COM (Jerry Huck) writes:

>PA-RISC uses segmentation to extend the addressability of the
>normal general register file.  It is not a partition of these
>registers into pieces.  Segments are 2^32 in size and give
>capability in several areas.

But, by sr(segment register) 4-7, we can address only 1GB of the segment.

So, when we want >4GB, we do general data access with <1GB segment or
with sr1-3 only (sr0 is unusable).

					Masataka Ohta

firth@sei.cmu.edu (Robert Firth) (04/04/91)

In article <1PGAOP7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

>But that's what you just described: you only have 32 bits of flat address
>space in a 48 bit machine. 

Sigh.  I have seen this decade after decade, generation after generation.
It seems to be a working rule among the builders of segmented machines
that the most flat space anyone will ever need is 4 bits more than the
current market leader.  That's how we went from 12 bits to 16; the same
arguments I heard from the builders of the 20-bit machine in 1974 are
lying in my mailbox explaining why 32 bits (after all, 2 more than the
VAX!) is enough.

Here's an analogy.  You live in a three-bedroom house.  To get to two
bnedrooms, you climb the interior staircase.  To get to the third, you
go outside and climb a ladder on the North wall.

It's time to trade up.  You look at a five-bedroom house.  Three
bedrooms open off the interior staircase; the other two are reached
by a ladder on the North wall.  The builder says "Look, you have three
directly accessible bedrooms, which is 50% more that your current
home, what more could you ever need?"  You explain that what matters
is not the absolute number of bedrooms, it is rather that, however
many there are, they all be directly accessible by a simple and uniform
route.  He shakes his head in bewilderment.  As do I.

firth@sei.cmu.edu (Robert Firth) (04/04/91)

In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

> Data
>registers are 32-bits, and address registers are 64-bits.  *However*:  the
>address registers are actually
>
>	<32-bit segment tag> <32-bit offset>
>
>I defy you to come up with a PROPERLY WRITTEN program that will break.

My pleasure, sir.

	DIMENSION BIGMAT(50000,50000)
	DOUBLE PRECISION BIGMAT

I have a perfectly legal Fortran declaration; I will never use an
index value bigger than seventeen (signed) bits; there is enough
virtual memory to hold it; and your bozo machine will not permit
me to address it.

peter@ficc.ferranti.com (Peter da Silva) (04/05/91)

In article <1991Apr04.023845.3501@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> >And then on top of all
> >that we have all the segmentation woes.

> Are you objecting to segments, or to *intel* segments?

Intel segments.

> You keep saying "segments are bad," without regard to what type of segments.

No, I keep ragging on the 80x86. I explicitly mentioned the chip by name
in the paragraph you quoted from.

[32-bit address+32-bit segment number, stored in the address registers]
> I defy you to come up with a PROPERLY WRITTEN program that will break.

If wrapping around the end of the segment isn't a problem, I can't. If it
is, I'll just operate on a >4 GB object.

Of course, if wrapping around the end of the segment isn't a problem (as
it wouldn't be on the 80x86 is intel hadn't screwed up) then I would say you
don't have a segmented machine: you just have a 64-bit machine with a
possibly limited address space... like the 68000, where you can look at
the address space as a 24-bit offset and an (initially ignored) 8-bit
segment number. That's how Microsoft treated the poor little chip for
their Basic interpreters on the Mac and Amiga, which is why my Amiga 3000
doesn't have Basic available.

> segment be tag #0, incidently, although there is no real need.)  Note that
> you would also probably need a 'long long' type, since I seem to recall ANSI
> C requiring *some* integral type that can hold a pointer.

Nah, just make int=32 bits, long=64 bits.

> That could actually be quite useful.  Have each malloc() return a seperate
> segment, which is the size you requested and no larger...

You can do the same on a "flat" address space machine if your address
space is large enough. DEC does this on the VAX under VMS: 31 bit offset
and two segments: user and system.

> Intel goofed (imho) by having seperate segment registers.

No, intel goofed by putting tag bits at the wrong end of the segment
register. Whether the segment part is explicitly loaded into a segment
register or the top half of an address register is purely a code
generation problem.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

ckp@grebyn.com (Checkpoint Technologies) (04/05/91)

In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>In article <ZTGAK5E@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>>And then on top of all
>>that we have all the segmentation woes.
>
>Are you objecting to segments, or to *intel* segments?

Well, Intel segments are *soooo* bad....

Here are (what I think) are the unforgivably bad features of Intel
x86 segments:

- Huge pointers require normalization
- There are fewer segment registers than address registers (I include
  the program counter and stack pointer as address registers)
- They are context-chosen (code space, data space, stack space)
- The instruction set encourages programmers to economize segment useage

> You keep saying
>"segments are bad," without regard to what type of segments.  Consider, for
>example, a cpu which has two type of registers:  data and address.  Data
>registers are 32-bits, and address registers are 64-bits.  *However*:  the
>address registers are actually

Perhaps we can get some subjective comments data from programmers of
other "segmented" machines?  I can think of two. The Western Design
65816 in 16 bit mode, and the Zilog Z8000 both are "segmented" machines.
How about some comments on these implementations?  (I don't mean to
solicit "the 65816 is *way* better than the 6502" comments...)
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                ckp@grebyn.com      \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) (04/05/91)

In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>>	<32-bit segment tag> <32-bit offset>
>>I defy you to come up with a PROPERLY WRITTEN program that will break.
>My pleasure, sir.
>	DIMENSION BIGMAT(50000,50000)
>	DOUBLE PRECISION BIGMAT
>I have a perfectly legal Fortran declaration; I will never use an
>index value bigger than seventeen (signed) bits; there is enough
>virtual memory to hold it; and your bozo machine will not permit
>me to address it.

Survey says: Bzzt!

There's nothing that says that array elements in FORTRAN - or, for that
matter, C - have to be contiguous. Thinking that that must be true as a
matter of Natural Law is purest VAXocentrism.

It's the compiler's job to hide those details from the programmer. It's
a real tragedy that there ate VAXocentric C programmers out there that
think that the whole world should work the way their specific environment
does, and write software with lots of hard-to-find nonportabilities lurking
to trap the unsuspecting soul who tries to run it on non-VAXen.

It's bad enough that I gave serious consideration to buying an 11/750 that's
for sale around here just so I could see why people get that attached to it.
-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can
jmaynard@thesis1.med.uth.tmc.edu  | adequately be explained by stupidity.
  "You can even run GNUemacs under X-windows without paging if you allow
          about 32MB per user." -- Bill Davidsen  "Oink!" -- me

sef@kithrup.COM (Sean Eric Fagan) (04/05/91)

In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>	DIMENSION BIGMAT(50000,50000)
>	DOUBLE PRECISION BIGMAT

Gee, this works on current 32-bit machines?  The FORTRAN standard allows one
to declare arrays of any size, and guarantees that they will work?  I guess
it's more braindamaged than I thought.  I mean, I remember having problems
with arrays *much* smaller on both Crays and Cybers...

You're giving a knee-jerk response.  If the compiler manual says that no
object may be larger than <x>, and you try to create an object of <x*2>,
*you're* the one who screwed up.  And if it bothers you that much, fine:
for the FORTRAN compiler, it, also, will use just one segment tag, just like
the intial C port I hypothesized about.  There, now you've only got 4Gb of
virtual memory for any fortran program.  Happy?

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (04/05/91)

In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In article <1991Apr04.023845.3501@kithrup.COM>
>sef@kithrup.COM (Sean Eric Fagan) writes:
>
>>I defy you to come up with a PROPERLY WRITTEN program that will break.
>
>My pleasure, sir.
>	DIMENSION BIGMAT(50000,50000)
>	DOUBLE PRECISION BIGMAT

People forget history so quickly these days!  The Burroughs 5000 and
descendants all used segmented architectures, and they routinely handled
two dimensional arrays as an array of pointers to segments.  That is
precisely how Burroughs FORTRAN would have handled the above case, and
if 50000 double's was too big for one segment, it would have automatically
made the array into a 3 or 4 dimensional array, completely hiding the
problem from the programmer without any need for the programmer to specify
some kind of "large memory model" or other such hocum that people are
forced to do on the 8086 family.

I remember a statistic from Burroughs that the average segment on their
machines was less than 64 words long (48 bits per word).  The code of
each procedure was in a different segment, each array was a different
segment, and so on.

I never heard a Burroughs programmer complain about segments the way 8086
programmers do because the Burroughs architectures did it right!  I've
had a number of students who were Burroughs programmers (Quaker Oats in
Cedar Rapids had a high-end machine with something like 6 CPU's in the
early 80's, and they may still be a Unisys customer).

				Doug Jones
				jones@cs.uiowa.edu

cgy@cs.brown.edu (Curtis Yarvin) (04/05/91)

In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes:
>In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>
>Survey says: Bzzt!
>
>There's nothing that says that array elements in FORTRAN - or, for that
>matter, C - have to be contiguous. Thinking that that must be true as a
>matter of Natural Law is purest VAXocentrism.

But, in C, you have to be able to move through the array with pointer
arithmetic; which means that it is much harder for the compiler to hide, and
hence much slower.

>It's the compiler's job to hide those details from the programmer. It's
>a real tragedy that there ate VAXocentric C programmers out there that
>think that the whole world should work the way their specific environment
>does, and write software with lots of hard-to-find nonportabilities lurking
>to trap the unsuspecting soul who tries to run it on non-VAXen.

There are two solutions to this problem: for everyone to write portable code,
or for everyone to build flat-addressed machines. I think everyone should be
able to see the direction the market is moving in: the latter.

This is not necessarily a bad thing, unless you have an unnecessarily
Calvinist approach toward the world.

Curtis

firth@sei.cmu.edu (Robert Firth) (04/05/91)

In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes:

>Survey says: Bzzt!
>
>There's nothing that says that array elements in FORTRAN - or, for that
>matter, C - have to be contiguous. Thinking that that must be true as a
>matter of Natural Law is purest VAXocentrism.

I suggest you check ANSI X3.9-1978, especially sections 5.2.5 and 5.4.3.
The standard requires that an array be associated with a "storage
sequence" in column-major form, and that there be a one to one mapping
between the "subscript values" and the elements of this storage sequence.

Naturally, these elements need not be contiguous in physical storage,
which isn't what I asked for, since I explicitly referred to virtual
memory.  But they do have to be contiguous in the virtual memory
model of the Fortran language.  I await your suggested implementation.

firth@sei.cmu.edu (Robert Firth) (04/05/91)

In article <1991Apr04.202446.13595@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

[	DIMENSION BIGMAT(50000,50000) 
[	DOUBLE PRECISION BIGMAT

>Gee, this works on current 32-bit machines?

No.  But you claimed to have a machine with 32-bit integer and 48-bit
addressing, and challenged us to produce code that ought to work on
such a machine but won't on yours.  The above was my response.

>  And if it bothers you that much, fine:
>for the FORTRAN compiler, it, also, will use just one segment tag, just like
>the intial C port I hypothesized about.  There, now you've only got 4Gb of
>virtual memory for any fortran program.  Happy?

Yes, for you have just conceded my point: the code ought to work; it won't
work on your machine because your addressing scheme forbids it; you are
not competent to solve the problem in the compiler; so you've given up
and thrown the mess you designed back in the user's lap.  Pathetic.

sef@kithrup.COM (Sean Eric Fagan) (04/05/91)

In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>If wrapping around the end of the segment isn't a problem, I can't. If it
>is, I'll just operate on a >4 GB object.

No, you missed the point.  I set things up such that no single object can be
larger than 4Gbytes.  And a correct program can't tell that it has anything
larger than a 4Gbyte address space, unless it starts mallocing up all the
memory it can and keeps track.

On an 8086, the natural limit for the size of any given object is 64K.  On
the hardware I described, it would be 4G.  Now, please show me a correct
program that will fail.  Answer: you can't, because The Standard (ANSI, in
this case, since I'm more concerned about C) has enough limits in it that
you can't get around it without being non-conformant.

Please read ANSI, and if you can find a statement in there that says that
the system must provide for an object >4GB, then I will send you a case of
beer.

>Of course, if wrapping around the end of the segment isn't a problem (as
>it wouldn't be on the 80x86 is intel hadn't screwed up) then I would say you
>don't have a segmented machine: you just have a 64-bit machine with a
>possibly limited address space... 

It *is* a segmented machine; you cannot wrap around segments, because the
largest size of any single object is 4Gbytes.  Now, if you wanted to provide
for a pseudo-64-bit address space, you have the system a) allocate segments
sequentially, and b) when a segment-overrun trap occurs, increment the
segment tag index appropriately, and continue.  But it's still a segmented
machine.

And, again, note that *you* never see the segments.  It's possible to set up
the machine such that it looks like a normal 32-bit-address machine;
however, for correctly programs (correct by ansi, not correct as in
specially written), you can use as much memory as the system will allow.

>> segment be tag #0, incidently, although there is no real need.)  Note that
>> you would also probably need a 'long long' type, since I seem to recall ANSI
>> C requiring *some* integral type that can hold a pointer.
>Nah, just make int=32 bits, long=64 bits.

That would be inefficient; too many people use 'long' when they don't need
to, because they assume they can use that type to hold an address (which, I
guess, would be true, but then they pass int and long around freely).  ANSI
does require an integral type to hold a pointer, but it does not specify
which type.  So, either 'long long', or, if you want a fully-ansi-compliant
mode, '_longlong'.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

johnl@iecc.cambridge.ma.us (John R. Levine) (04/05/91)

In article <4919@lib.tmc.edu> jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes:
>>	DOUBLE PRECISION BIGMAT(50000,50000)
>
>There's nothing that says that array elements in FORTRAN - or, for that
>matter, C - have to be contiguous.

Well, there is the small matter of ANSI X3.9-1978.  In sections 5.2.5 and
17.1.1 it makes it pretty clear that all arrays have to be contiguous with
the first subscript varying fastest.  The F90 standard gives you a little
wiggle room by saying that arrays that mentioned in EQUIVALENCE or COMMON
statements have to be contiguous, other arrays can be implemented any way
the compiler wants.  Real programs tend to put their large arrays in
common in which case the array above really does need a 20 gigabyte flat
address space.

The argument might be made that any program that does that is "wrong" but
many numeric codes can easily expand to fill all available memory,
particularly those that cut up a 2- or 3- dimensional space into a mesh
and do something on each element in the mesh, since the finer the mesh,
the more accurate the results.  I have little sympathy for arguments that
a 50000 x 50000 array is somehow different from a 1000 x 1000 array just
because it's bigger.

A segmented address space need not be a disaster for large arrays, though
the much reviled Intel implementation is for two reasons:

  -- Segment arithmetic is very complicated due to Intel's inexplicable
     decision to make the low three bits of the segment number magic.

  -- Loading a segment register is so slow on existing implementations (on
     a 486, a segment load takes 6 cycles, a regular load takes 1) that
     you have to handle intrasegment addresses differently from
     intersegment in order to get reasonable performance.

The RT PC and RS/6000 have a segmented address space, but the segment number
is merely the high four bits of the address.  If you have an array or file
that is bigger than a segment (256MB in this case) you can map it into
several contiguous segments without having to do anything special in your
object code.  Segmentation like that can be quite useful both for sharing
and for protection.

-- 
John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
Cheap oil is an oxymoron.

sef@kithrup.COM (Sean Eric Fagan) (04/05/91)

In article <23660@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In article <1991Apr04.202446.13595@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>>Gee, this works on current 32-bit machines?
>No.  But you claimed to have a machine with 32-bit integer and 48-bit
>addressing, and challenged us to produce code that ought to work on
>such a machine but won't on yours.  

No, actually, I described a machine with 32-bit segments, and 32-bits worth
of possible segments.  A difference.

>The above was my response.

Which was incorrect.  I impose the limitation, as I said in my first
article, that no single object (such as your array) be larger than 4Gbytes.
You broke that restraint.  No correct code will break, as there is *no*
requirement in any language standard (c, pascal, fortran, ada, etc.) that
the sum total of all objects' size be less than or equal to the size limit
for a single object.

*You*, by being flip, decided to come up with a program that would break.
BFD.  I can come up with programs that will break for any given
language/hardware combination.

Now, please show me how my proposed segmented machine a) breaks existing
*correct* code, and/or b) makes things difficult or impossible?  Yes,
dealing with a single object larger than 4Gbytes is difficult or impossible,
but damned few people are doing that (and, even then, I can make it work, if
you let me play with the OS and compilers/linkers a bit; it just won't be as
efficient as it could be in a flat address space).

>>  And if it bothers you that much, fine:
>>for the FORTRAN compiler, it, also, will use just one segment tag, just like
>>the intial C port I hypothesized about.  There, now you've only got 4Gb of
>>virtual memory for any fortran program.  Happy?
>
>Yes, for you have just conceded my point: the code ought to work; 

The code you gave *oughtn't* work.  Period.  Check out the Implementation
Defined Details section of the FORTRAN compiler manual for the machine
(hint:  it doesn't exist, either, but that makes things easier 8-)).  In it,
it says that, "persuant to section <mumbledymumble> of the <mumbledymumble>
FORTRAN standard, the size of any single array must not exceed 4Gbytes."

Now, if you are going to try to claim that *any* standard mandates that a
system must allow arrays larger than 4Gbytes, well, tough.  No system *I*
play with (and that includes some rather decent mainframes) is going to
allow that, either, so damned few people are going to be coding for it.

>you are
>not competent to solve the problem in the compiler; 

Bullshit.  I *am*, and in a followup article, I described what one can do to
implement flat-style addressing on the machine I described.  But I guess
reading all of the articles in a thread is beyond you, isn't it?

Why don't you try a) reading some standards, b) playing with real systems,
and c) trying to figure out just what someone is capable of before insulting
their abilities?

*You* are the one who's pathetic, buddy.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (04/05/91)

Sean Eric Fagan writes:
#but damned few people are doing that

Isn't that part of the problem? While damn few people are doing it
*TODAY*, in three or four years, *EVERYONE* will wish to do it, but
can't. 

* Slight exaggertation for effect.

--
-Greg Hennessy, University of Virginia
 USPS Mail:     Astronomy Department, Charlottesville, VA 22903-2475 USA
 Internet:      gsh7w@virginia.edu  
 UUCP:		...!uunet!virginia!gsh7w

ckp@grebyn.com (Checkpoint Technologies) (04/05/91)

In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>possibly limited address space... like the 68000, where you can look at
>the address space as a 24-bit offset and an (initially ignored) 8-bit
>segment number. That's how Microsoft treated the poor little chip for
>their Basic interpreters on the Mac and Amiga, which is why my Amiga 3000
>doesn't have Basic available.

I had wondered... The 68K line, at least through the 68030, has 8
possible address spaces as coded by the CPU's FC lines.  One is user
program, one is user data, one is supervisor program, one is supervisor
data, one is "CPU space" and is used to address coprocessors, generate
interrupt acknowledgements, and signal breakpoints, and the other three
are undefined.  You can program the 68851 PMMU and the 68030's MMU to
choose from 8 different page tables based on the FC code, and there's
the MOVES instruction for choosing your FC directly when performing a
move.  The 680[23]0 manual tells how to generate cycles to program space
for data accesses if it's important.

Does this make the 68K a segmented machine, with 32 bits offset and 3
bits segment number? (Expiring minds want to know...)
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                ckp@grebyn.com      \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

hrubin@pop.stat.purdue.edu (Herman Rubin) (04/05/91)

In article <4919@lib.tmc.edu>, jmaynard@thesis1.med.uth.tmc.edu (Jay Maynard) writes:
> In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
> >In article <1991Apr04.023845.3501@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

			.....................

> There's nothing that says that array elements in FORTRAN - or, for that
> matter, C - have to be contiguous. Thinking that that must be true as a
> matter of Natural Law is purest VAXocentrism.
> 
> It's the compiler's job to hide those details from the programmer. It's
> a real tragedy that there ate VAXocentric C programmers out there that
> think that the whole world should work the way their specific environment
> does, and write software with lots of hard-to-find nonportabilities lurking
> to trap the unsuspecting soul who tries to run it on non-VAXen.

Any machine with a big enough storage of any kind can emulate any other.
If it is necessary to do this type of manipulation, a compiler will make
a slow mess of it; any decent programmer should be able to use the 
idiosyncracies of the natural structure of the data to do a better
job.

It is the hardware designer's job to make it unnecessary to have any more
kludges than can be avoided.  If the array elements are not contiguous,
and there may be very good reasons for the programmer to set things up
that way, even the best of the present compilers will cause things to
slow down.  It is the compiler's job to help the user get the most out
of the machine, and hiding things from the user is definitely not the
way to do it.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

przemek@rrdstrad.nist.gov (Przemek Klosowski) (04/06/91)

>>>>> On 5 Apr 91 01:03:43 GMT, sef@kithrup.COM (Sean Eric Fagan) said:
Bob> In  <23660@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
	... 50000x50000 array of doubles...
Sean>Gee, this works on current 32-bit machines?
Bob> No.  But you claimed to have a machine with 32-bit integer and 48-bit
Bob> addressing, and challenged us to produce code that ought to work on
Bob> such a machine but won't on yours.  The above was my response.

Sean> Which was incorrect. I impose the limitation, as I said in my first
Sean> article that no single object (such as your array) be larger than 4GB.

Sean> Now, please show me how my proposed segmented machine a) breaks existing
Sean> *correct* code, and/or b) makes things difficult or impossible?  Yes,

Sean> *You* are the one who's pathetic, buddy.

Hey, hey, a little bit worked up, aren't we? 

Sean seems to believe that since all code (most of it, anyway) has a
4GB limitation currently, all code adressing anything above that is
broken.  So what is the point of providing 48 bytes of address then? I
would think that since it is there, it should be used.
 Bob gave a valid example of the program that uses the capability that 
_would_ be provided by the 48 bit flat adressing. If it worries Sean that
this is unportable to VAX, please consider that in reality one would have
to parametrize the size of the table anyway, since even though e.g. VAX
has the 32 bit address space, the different operating systems put a practical
limit on the working set sizes etc. forcing one to limit the size of a problem
to smaller values.
 And it of course isn't just a fancy to try to squeeze a bigger array.
In the area I am somewhat familiar with, physical modelling, 4 GB of
memory allows to model a very modest 640x640x640 system of Heisenberg
(vector) spins, since each spin is a pair of double-precision values.
>>>We need those address bits!<<<

 Let me just say that I wish that people would just take a good 
counter-argument and not cover the confusion with panache.
	
--
			przemek klosowski (przemek@ndcvx.cc.nd.edu)
			Physics Department
			University of Notre Dame IN 46556

oasis@gary.watson.ibm.com (GA.Hoffman) (04/06/91)

I've worked extensively on the IBM RT and RS/6000 .. we consider both to
be segmented machines.  Native support by compilers only uses a few 
segments -- mapping text, bss, etc onto segments 0,1,2,3.  Thru shmat(),
segment registers may be loaded for use of the entire 32-bit effective-
address space.  These segments, supported by hardware segment-registers,
provide one-cycle loads and stores with hardware protections and capabilities.  Protections and capabilities are very useful and difficult
to implement without something simple like segments ... our segments are
selected by the high-order 4-bits of an effective-address.  Having protections and capabilities that could start on arbitrary pages and go to arbitrary sizes would require something like a CAM; this is very expensive in silicon.

The only serious complaint I've ever had about how we do segments, is that the segments are too small and there aren't enough of them active concurrently
Our segments are 256M-bytes, and there are only about 12 segment-registers
available.  These numbers are too small for all the objects that Mach and
other programs would like to have active simultaneously.  So there is overhead for changing segment-registers, like shmat() and shmdt(), but the
overhead has not proven unbearable.
 
-- 
g

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/06/91)

In article <5277@ns-mx.uiowa.edu> jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes:

| People forget history so quickly these days!  The Burroughs 5000 and
| descendants all used segmented architectures, and they routinely handled
| two dimensional arrays as an array of pointers to segments.  That is
| precisely how Burroughs FORTRAN would have handled the above case, and
| if 50000 double's was too big for one segment, it would have automatically
| made the array into a 3 or 4 dimensional array, completely hiding the
| problem from the programmer without any need for the programmer to specify
| some kind of "large memory model" or other such hocum that people are
| forced to do on the 8086 family.

  This is a limitation of the compilers used on the Intel 286 chips,
rather than a characteristic of the ships themselves. The compiler
vendors could have provided a model (which the user would see only on
the compiler command line) with 32 bit ints, and the exact hiding of
detail you mention. I suggested this to several vendors while beta
testing their compilers.

  It's a little harder to fault the 386 chips, since their limitations
are the same as other 32 bit machines and segmentation is not visible.
There is the ability to handle more than 4GB by using the segments, but
I don't see either the capability or the commercially viable demand
right now.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (Peter da Silva) (04/06/91)

In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <CIHATJ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> >If wrapping around the end of the segment isn't a problem, I can't. If it
> >is, I'll just operate on a >4 GB object.

> No, you missed the point.  I set things up such that no single object can be
> larger than 4Gbytes.

No, I didn't miss the point. If I have a 48 bit wide VM address and can't
operate on any object larger than a 32 bit wide pointer can address, then
it's a problem.

> On an 8086, the natural limit for the size of any given object is 64K.  On
> the hardware I described, it would be 4G.

OK.

> Now, please show me a correct program that will fail.

Any program that operates on an object larger than 4 GB.

> Answer: you can't, because The Standard (ANSI, in
> this case, since I'm more concerned about C) has enough limits in it that
> you can't get around it without being non-conformant.

I see. I'm talking quality of implementation, and you're talking language
legalese. In a minute, I'm going to ask an important question... in the
meantime, I'll play your game... ignoring for the moment every other
programming language in existence.

> Please read ANSI, and if you can find a statement in there that says that
> the system must provide for an object >4GB, then I will send you a case of
> beer.

Is there a statement in it that the system must provide for an object
greater than 64KB? Not that I can see... for the very good reason that
it would otherwise be extremely difficult to implement an ANSI C compiler
on the most common commodity personal computer in existence.

Now, here's the important question: why is the 64K object size limitation
in the IBM-PC a problem? After all, you cannot write a correct program
that will fail on it. You cannot legally determine that the maximum
object size is >64K.

Ah, you say, that's different. Nobody would ever need a single object
larger than 4GB. After all, there are hardly any computers that let you
address more than that, and they're all mainframes and supers. Of course
the same arguments were given about the 64KB limitation in the 8088 back
in the late '70s when *it* was under design.

(and no, it's not 20-20 hindsight: I was appalled at the choice of the 8088
in the IBM-PC when it first came out... and that was before they screwed
up the 80286 segment registers when they had a chance at fixing things)

> It *is* a segmented machine; you cannot wrap around segments, because the
> largest size of any single object is 4Gbytes.  Now, if you wanted to provide
> for a pseudo-64-bit address space, you have the system a) allocate segments
> sequentially, and b) when a segment-overrun trap occurs, increment the
> segment tag index appropriately, and continue.  But it's still a segmented
> machine.

If it quacks like a duck...

> And, again, note that *you* never see the segments.  It's possible to set up
> the machine such that it looks like a normal 32-bit-address machine;

Right, but then why bother with the extra address space?

> >Nah, just make int=32 bits, long=64 bits.

> That would be inefficient; too many people use 'long' when they don't need
> to,

Too many people use "short" when they don't need to, also. Correctly written
programs (correct in terms of being intentionally written portably, not by
some legalistic measure) don't have that problem. ANSI C has to cater to too
much old, broken code. I choose not to.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jallen@csserv1.ic.sunysb.edu (Joseph Allen) (04/06/91)

In article <1991Apr04.234928.8637@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
>A segmented address space need not be a disaster for large arrays, though
>the much reviled Intel implementation is for two reasons:

>  -- Segment arithmetic is very complicated due to Intel's inexplicable
>     decision to make the low three bits of the segment number magic.

>  -- Loading a segment register is so slow on existing implementations (on
>     a 486, a segment load takes 6 cycles, a regular load takes 1) that
>     you have to handle intrasegment addresses differently from

Though this doesn't have as much to do with large arrays, here's another intel
(386) segement gripe:

  -- The segment bounds register is only 20 bits.  This means you're limited
     to 1MB segments or 4GB segments with 4K pages.  This is a big problem
     when you want to impliment mapped files with intel segemnts:  you can't
     make the file grow a byte at a time (automatically) unless they're less
     than 1MB.  (You could kludge it by switching between 1MB and 4GB modes and
     by changing the base address- but that's stupid and inefficient).

--
#define h 23 /* Height */         /* jallen@ic.sunysb.edu (129.49.12.74) */
#define w 79 /* Width */                       /* Amazing */
int i,r,b[]={-w,w,1,-1},d,a[w*h];m(p){a[p]=1;while(d=(p>2*w?!a[p-w-w]?1:0:0)|(
p<w*(h-2)?!a[p+w+w]?2:0:0)|(p%w!=w-2?!a[p+2]?4:0:0)|(p%w!=1?!a[p-2]?8:0:0)){do
i=3&(r=(r*57+1))/d;while(!(d&(1<<i)));a[p+b[i]]=1;m(p+2*b[i]);}}main(){r=time(
0L);m(w+1);for(i=0;i%w?0:printf("\n"),i!=w*h;i++)printf(a[i]?" ":"#");}

tbray@watsol.waterloo.edu (Tim Bray) (04/06/91)

In article <1991Apr05.161615.16869@watson.ibm.com> oasis@watson.ibm.com writes:
>The only serious complaint I've ever had about how we do segments, is that the 
>segments are too small ...
>Our segments are 256M-bytes

The complaints are serious and they are correct.  256M is too small.  Not
too small sometime, nor pretty soon, nor tomorrow, but today.  In fact, I
suspect the recent brouhaha in this group about segmentation might be
described as converging on a consensus, despite the intemperate language:

 If a computer has a natural N-bit word size, segmentation is OK and
 can make life easier for the OS and compilers, but is more trouble than
 it's worth if the segments are noticeably smaller than 2^N.

Tim Bray, Open Text Systems

paul@taniwha.UUCP (Paul Campbell) (04/06/91)

In article <5277@ns-mx.uiowa.edu> jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes:
>In article <23615@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>>In article <1991Apr04.023845.3501@kithrup.COM>
>>sef@kithrup.COM (Sean Eric Fagan) writes:
>>
>>>I defy you to come up with a PROPERLY WRITTEN program that will break.
>>
>>My pleasure, sir.
>>	DIMENSION BIGMAT(50000,50000)
>>	DOUBLE PRECISION BIGMAT
>
>People forget history so quickly these days!  The Burroughs 5000 and
>descendants all used segmented architectures, and they routinely handled
>two dimensional arrays as an array of pointers to segments.  That is
>precisely how Burroughs FORTRAN would have handled the above case, and
>if 50000 double's was too big for one segment, it would have automatically
>made the array into a 3 or 4 dimensional array, completely hiding the

I worked for a computing center for a university that had a 6700 for
many years. Back in those days there weren't many languages you
could por6yt code around in (for the 6700 we had fortran, cobol, pl/1 and
to a lesser extent pascal) of these most of the code people tried to
port was in fortran ... the biggest pain was fortran arrays FOR EXACTLY
THIS REASON - people write in fortran and pass slices of arrays around
all the time, if your fortran arrays aren't stored in memory 'just so'
then all sorts of code breaks. Every time someone brought in another
matrix math package in that they couldn't get ported I alwys new
exactly what was wrong. Something else that also often broke fortran
programs on the 6700 was the stack - fortran back in those days didn't
really have one - parameters were static (global), recursive calls
really screwed this up because the 6700 fortran got too smart for its
own good (or programmers did wierd stuff they could get away with on their
360 or whatever)

For what it's worth cobol was much more portable :-( There was a bcpl
port, it modeled memory simply as one giant array and pretended it was
running on a different machine (the pascal heap was done the same way).
Noone every ported c as far as I know - you would probably have to do 
things the same as for bcpl or leave it soley as a systems programming
language. (Contrary to popular belief there was (is) an assembler for
the machine - several of them in fact - but for obvious resons they
weren't available for common usage).

>I remember a statistic from Burroughs that the average segment on their
>machines was less than 64 words long (48 bits per word).  The code of
>each procedure was in a different segment, each array was a different
>segment, and so on.

This was only if you really wanted to, often many functions would end up
in the same segment (depends on the compiler)

>I never heard a Burroughs programmer complain about segments the way 8086
>programmers do because the Burroughs architectures did it right!  I've

Well at least they did it better - big arrays were still a pain (plus
the fact that indirect pointers and stacks (the equivalents to page
tables in this environment) could not be paged/swapped this also limited
how big arrays could actually get).

	Paul Campbell

-- 
Paul Campbell    UUCP: ..!mtxinu!taniwha!paul     AppleLink: CAMPBELL.P

"But don't we all deserve.
 More than a kinder and gentler fuck" - Two Nice Girls, "For the Inauguration"

sef@kithrup.COM (Sean Eric Fagan) (04/06/91)

In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
>No, I didn't miss the point. If I have a 48 bit wide VM address and can't
>operate on any object larger than a 32 bit wide pointer can address, then
>it's a problem.

Why is everybody harping on this 48 bits wide?  I actually never said
anything about how wide the address space, except that I implied that it was
at least 32 bits.  In fact, given the machine I proposed, it would still
work quite well with only a 32-bit address space.  And that's the difference
between the flast and the segmented:  using the segmented version (which, as
I pointed out two articles ago, can imitate, if slowly, a flat-address-space
machine), I can hide the fact that I only have 32-address bits, virtual or
otherwise.

>> Now, please show me a correct program that will fail.
>Any program that operates on an object larger than 4 GB.

Please read that agin, peter.  You assumed, incorrectly, that I had more
than 32-bit for addressing.  You assumed that, if I cannot allocate a 5Gbyte
object, things are broken.

Guess what:  they're not.

>Is there a statement in it that the system must provide for an object
>greater than 64KB? 

Nope, and it's not broken in that respect.  (I believe ansi c has a requirement that an implementation must support a single object of at least 32k
characters.)

>Now, here's the important question: why is the 64K object size limitation
>in the IBM-PC a problem? 

Peter:  not all people who run into memory problems need more than 4Gbytes
for a single object.  Some people do; that's why I organized things in my
machine the way I did.  For people who don't (i.e., people who just keep
doing malloc's to get more memory dynamically), they will never know that
the machine can access more than 4Gbytes, assuming, of course, it can.

Since I purposefully kept int's and long's at 32 bits, there is no way to
specify the size of an object larger than 4Gbytes.  How are you going to
know about it?

>After all, you cannot write a correct program
>that will fail on it. You cannot legally determine that the maximum
>object size is >64K.

Actually, for ANSI C, yes, you can.  You can use size_t for that purpose.
And it's all perfectly legal.

>Ah, you say, that's different. Nobody would ever need a single object
>larger than 4GB. 

No, I never said that at all.  I know for a fact that there are people who
want to be able to have single objects larger than 4Gbytes.  They are by far
in the majority, however, largely because no system most of them are using
today allows them to.

I admit freely that 32-bits is a limit.  But my question stands still:
please show me a Correct (by K&R II, ANSI, or POSIX standards) C program
that will fail on the system described by:

	struct pointer {
		unsigned long segment;
		unsigned long offset;
	};
	typedef unsigned long size_t;
	typedef long ptrdiff_t;
	ASSERT (sizeof(void*) == 8);
	ASSERT (sizeof(__longlong) == 8);
	ASSERT (sizeof(long) == 4);

You cannot even *write* a C program that tries to declare a single object
with more than 4Gbytes (except by trying to pass something of type
__longlong into malloc, which will then only look at half of it or fail, as
the system wishes).  Since size_t is a 32-bit number, that precludes trying
to do

	double foo[50000][50000];

Now, once again:  please show me a Correct C program (defined above) that
will fail on the system as I have defined it.

Please note that, although a pointer is 64-bits, I have said nothing about
how large the address space is.  I refuse to, as, since the way I've
organized the machine, IT DOESN'T MATTER.  (For example:  a 68k has 32-bit
pointers, but it's address space is only 16Mbytes.  I guess peter and
company consider that a broken machine, huh?  Because it's not possible to
use up all of the address space implied by the size of the pointers, that
is?)

For people who want to have objects larger than 4Gbytes, if they pay me
enough money, I will give them a special compiler and library that will
allow it.  (As I said, that requires a little bit of work.)

>Right, but then why bother with the extra address space?

Actually, I want it because having every object in its own segment is an
incredibly useful thing.  I've made compilers and runtime libraries do that
under xenix for the '286, and it has made debugging broken programs easier.
(Found routines that ran off the end of areas they malloc'd, for example,
because I got a SIGSEGV as soon as it happened.)

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

jfc@athena.mit.edu (John F Carr) (04/07/91)

In article <VFIA832@xds13.ferranti.com>
	peter@ficc.ferranti.com (Peter da Silva) writes:

>No, I didn't miss the point. If I have a 48 bit wide VM address and can't
>operate on any object larger than a 32 bit wide pointer can address, then
>it's a problem.

Why do you care what the address size is?  A programmer's concern should be:
how many objects can I have, how big can each be, and how fast does the code
run?  Let the system designers decide whether to have a flat address space
or segments.  If you have code which requires 2^40 byte objects, put this in
your requirements when you buy a system.  The cost of 2^40 bytes of memory
can finance the OS and compiler changes needed to support such objects on a
segmented MMU.

--
    John Carr (jfc@athena.mit.edu)

beal@paladin.owego.ny.us (Alan Beal) (04/07/91)

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) writes:
>People forget history so quickly these days!  The Burroughs 5000 and
>descendants all used segmented architectures, and they routinely handled
>two dimensional arrays as an array of pointers to segments. 

I say amen to that.  Being a former Burroughs programmer, I know what a nice
experience it was to program on these systems.  Invalid indexes and seg array
errors(due to REPLACEs or SCANs) were all caught by the hardware, and a
meaningful error message was returned by the MCP - imagine that.

>I never heard a Burroughs programmer complain about segments

Because you were usually unaware segments were even being used.  I guess this
was due to the reliance of compilers to do the job - never had to look at
machine language, and there was no assembler.  It is a shame Burroughs
Large Systems never really caught on because they were nice systems to
program on.
-- 
Alan Beal
Internet: beal@paladin.Owego.NY.US
USENET:   {uunet,uunet!bywater!scifi}!paladin!beal

jallen@libserv1.ic.sunysb.edu (Joseph Allen) (04/08/91)

In article <1991Apr6.211320.18594@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
>In article <VFIA832@xds13.ferranti.com>
>	peter@ficc.ferranti.com (Peter da Silva) writes:

>>No, I didn't miss the point. If I have a 48 bit wide VM address and can't
>>operate on any object larger than a 32 bit wide pointer can address, then
>>it's a problem.

>Why do you care what the address size is?  A programmer's concern should be:
>how many objects can I have, how big can each be, and how fast does the code
>run?  Let the system designers decide whether to have a flat address space
>or segments.  If you have code which requires 2^40 byte objects, put this in
>your requirements when you buy a system.  The cost of 2^40 bytes of memory
>can finance the OS and compiler changes needed to support such objects on a
>segmented MMU.

This closed-system view is something I disagree with very strongly.  One of
the great things about UNIX is that instead of using the system manufacture's
compilers, you have a great range of third party software as well (everything
from the WATCOM fortran and C compilers to GNU C).

I guess what I'm trying to say is that system programers are programmers too
and also shouldn't have to deal with badly implimented segments either.  Making
your system difficult for 3rd party developers is not a good marketing strategy
(even IBM is switching to UNIX these days).

I'm not trying to say that UNIX is perfect either.  It isn't.  But if there's
to be a new standard which includes segments, it should be done right. Probably
it should be a 64 bit data/address machine with the top 16 bits of the address
being the segment number (although this is probably too small for dynamic
linking with segments, it would be ideal for huge databases and mapped files).

Actually, if you have a flat 64-bit address, it's so huge that you probably
don't need segments at all:  The paging system would detect lower and upper
bound "segment violations".  You probably also want to add a mechanism to
indicate how full the last page of a segment is (with byte granularity) so
that memory mapped files could grow automatically a byte at a time.  This is
a much more dynamic approach to segmenting- the actual segment size is just
whatever the maximum file size is.  Plus you wouldn't have to divide the
memory map up equally (or in powers of two).  Read-only libraries and files
wouldn't need space to grow- so they could be loaded adjacently.

I guess it comes down to whether you prefer segmented addresses,
.EXE library files (I.E., libraries which are relocated when loaded) or
address independant code (the 6809 was a truely great uP- OS9 had dynamicly
linked libraries without even a memory manager).  Note that the last two
options are not incompatible with each other and the first option is gross-
it may have far pointers but it definately would have to have MK_FP(segment,
offset), FP_SEG(addr) and FP_OFF(addr).

Sorry about the length of this article.  I've decided for myself now:  I
definately don't want segments.  There's too many other easier ways to get
the same effect.

--
#define h 23 /* Height */         /* jallen@ic.sunysb.edu (129.49.12.74) */
#define w 79 /* Width */                       /* Amazing */
int i,r,b[]={-w,w,1,-1},d,a[w*h];m(p){a[p]=2;while(d=(p>2*w?!a[p-w-w]?1:0:0)|(
p<w*(h-2)?!a[p+w+w]?2:0:0)|(p%w!=w-2?!a[p+2]?4:0:0)|(p%w!=1?!a[p-2]?8:0:0)){do
i=3&(r=(r*57+1))/d;while(!(d&(1<<i)));a[p+b[i]]=2;m(p+2*b[i]);}}main(){r=time(
0L);m(w+1);for(i=0;i%w?0:printf("\n"),i!=w*h;i++)printf(a[i]+"#\0 ");}

bellman@lysator.liu.se (Thomas Bellman) (04/08/91)

[ This is a comment on this whole thread, not aimed directly at the
  articles in the References line. ]

This post is intended to make people look at segmentation from a
slightly different angle, hopefully calming down this discussion just
a little bit.  (Not very much hope, though...)

Take a typical file system.  Say you have a 4 Gbyte disk.  You can
have 4'294'967'296 files of 1 byte each (modulo such things as
directory information, a minimum physical file size due to sector
sizes, and other things), or you can have 1 file containing 4 Gbyte
data, or anything in between.  How do you specify what data you want?
Normally, you first open the file you want, receiving a file
descriptor from the OS.  Then you seek in the file, handing the file
descriptor and an offset to the OS.  Hmm, doesn't this look familiar?

Substitute "create/attach a segment" for "open file" and substitute
"index in the segment" for "seek in the file", and you have a
segmented memory model.  In the file system, you have (say) 8 bits of
file descriptor, and 32 bits of offset, but even though you have 40
bits of "pointer", you can't address more than 32 bits.  People
doesn't seem to have any problem with doing this in a file system, so
why the dislike for doing this with the memory too?

Now, for memory, you probably want more that 256 segments, and in a
modern machine (i e one that hits the market in -93) you might want 64
bit offsets, but the principle remains the same.

Sometimes you don't want the segmentation.  Sometimes you want the
flat address modell.  This is equivalent to accessing the physical
disk in a file system.  The file system it self want to do this, but
will probably not want to let the user do that himself.  Same for
memory, the OS wants to address the memory as a flat space, but might
not want the user programs to do this.

This post seems to imply that segments are great.  But actually, I
haven't really made up my mind yet.  I can see advantages for both
segmented and non-segmented memory.  A flat address space is a simple
model that is easy to understand and use.  I just wanted to point out
that on other levels of the computer, people don't object to exactly
the same system.  There might be some advantages to segments, since
they are so popular in the file systems.  It's just that they are
called files instead of segments.

Perhaps the best would be to let the programmer choose for himself.
Have two types of instructions for accessing memory -- one type that
uses pointers that consists of a segment number and an offset, and one
type that has a flat view of the address space, both usable from user
mode.  Say SEGSTORE and SEGLOAD that takes a segmented address, and
FLATSTORE and FLATLOAD that takes a non-segmented address.  And then
some way of converting between the two types of pointers.  Then those
that like segments, can take advantage of them, and those that likes a
flat address space, can take advantage of that.


--
Thomas Bellman,  Lysator Computer Club   !  "Make Love - Nicht Wahr"
          Linkoping University, Sweden   !  "Too much of a good thing is
e-mail:         Bellman@Lysator.LiU.Se   !   WONDERFUL."     -- Mae West

fargo@iear.arts.rpi.edu (Irwin M. Fargo) (04/08/91)

In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes:
>
> [a few paragraphs removed]
>
>Sometimes you don't want the segmentation.  Sometimes you want the
>flat address modell.  This is equivalent to accessing the physical
>disk in a file system.  The file system it self want to do this, but
>will probably not want to let the user do that himself.  Same for
>memory, the OS wants to address the memory as a flat space, but might
>not want the user programs to do this.
>

With what I know of OSs, wouldn't segmentation be what the OS wants?

In most of today's computer systems, virtual memory is the Big Thing (tm).
The idea behind virtual memory (correct me if I'm wrong), is that a program
can read/write to memory as if memory were directly connected, but it is
actually re-mapped to a previously specified location in physical memory.

Obviously, virtual memory mappers of today use pages to allow more flexible
ways of memory mapping.  Couldn't a virtual memory page be considered the
same as a segment? (a la the Intel 80386 in protected mode)

If the OS (or any other program really wants, you can tell the MMU you want
one page that takes up all of memory of lots of little pages.

My whole point is, if we consider virtual memory pages to be equivalent to
segments, then it would seem that quite a few systems do use segmentation
and that it really is not that outdated an idea.

-- 
Thank you and happy hunting!		Actually: Ethan M. Young
					Internet: fargo@iear.arts.rpi.edu
Please press 1 on your touch tone	Bitnet (??): userfp9m@rpitsmts.bitnet
phone to speak to God...		Disclaimer: Who said what?

peter@ficc.ferranti.com (peter da silva) (04/08/91)

In article <1991Apr06.030330.1533@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> >In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> >No, I didn't miss the point. If I have a 48 bit wide VM address and can't
> >operate on any object larger than a 32 bit wide pointer can address, then
> >it's a problem.

> Why is everybody harping on this 48 bits wide?

Subject: Re: Segmented Architectures ( formerly Re: 48-bit computers)

This subject started with the idea of using segments to expand the address
space of 32-bit computers. Many of us are still thinking along those terms.

> I actually never said
> anything about how wide the address space, except that I implied that it was
> at least 32 bits.  In fact, given the machine I proposed, it would still
> work quite well with only a 32-bit address space.

In fact, it would only work well with a 32-bit address space (or maybe 33 or
34 bits). Once you get much more address space available to a program than
you can stick into a single object you will run into problems: that's the
lesson of the 8086. The reverse situation is not a big deal: that's the
lesson of the 68000.

> And that's the difference
> between the flast and the segmented:  using the segmented version (which, as
> I pointed out two articles ago, can imitate, if slowly, a flat-address-space
> machine), I can hide the fact that I only have 32-address bits, virtual or
> otherwise.

My Amigas both have a flat address space, but one has 32 address bits and
the other 24. Apart from some trash software written by Microsoft, there
is no difference that the program has to deal with.

> Please read that agin, peter.  You assumed, incorrectly, that I had more
> than 32-bit for addressing.  You assumed that, if I cannot allocate a 5Gbyte
> object, things are broken.

So what's the advantage to segments?

> >Now, here's the important question: why is the 64K object size limitation
> >in the IBM-PC a problem? 

> Peter:  not all people who run into memory problems need more than 4Gbytes
> for a single object.

And not all people who run into memory problems need more than 64K for a single
object: that's one reason why ints in most IBM-PC C compilers are only 16
bits wide. For people who don't, they will never now that the machine can
access more than 64K, assuming, of course, it can.

Since ints are only 16 bits, there is no way to specify the size of an
object longer than 64K (that's why I was talking about 64 bit longs: size_t
can easily be an int), how are you going to know about it?

> >After all, you cannot write a correct program
> >that will fail on it. You cannot legally determine that the maximum
> >object size is >64K.

> Actually, for ANSI C, yes, you can.  You can use size_t for that purpose.
> And it's all perfectly legal.

size_t is 16 bits wide on most PC compilers.

> I admit freely that 32-bits is a limit.  But my question stands still:
> please show me a Correct (by K&R II, ANSI, or POSIX standards) C program
> that will fail [this system].

Can't. Can't show one that will fail on an IBM-PC either.

> You cannot even *write* a C program that tries to declare a single object
> with more than 4Gbytes (except by trying to pass something of type
> __longlong into malloc, which will then only look at half of it or fail, as
> the system wishes).

Can't write a C program that tries to declare a single object with more than
64K on an IBM-PC, since size_t is a 16 bit number.

> Now, once again:  please show me a Correct C program (defined above) that
> will fail on the system as I have defined it.

No, I'm not going to play with your straw man.

> Please note that, although a pointer is 64-bits, I have said nothing about
> how large the address space is.  I refuse to, as, since the way I've
> organized the machine, IT DOESN'T MATTER.

Has nothing to so with segments, either. The 68000, which I've already brought
up, is a counterexample.

> For people who want to have objects larger than 4Gbytes, if they pay me
> enough money, I will give them a special compiler and library that will
> allow it.  (As I said, that requires a little bit of work.)

So why not just use 64 bit registers in the first place, but only use the low
32 bits in the first versions... like the 68000 does. What do the segments
buy you?

> Actually, I want it because having every object in its own segment is an
> incredibly useful thing.

Just use the MMU and build your program with a sparse address space. You can
do anything you do with segments this way and you're not crippling the machine
at the starting gate.

Think of it as dynamically resizable segments, if you like. malloc can quite
easily make the top 'n' bits of any pointer unique, and the effective
result is exactly the same. Except you're not building the magic 2^32 into
the architecture.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (peter da silva) (04/08/91)

In article <1991Apr6.211320.18594@athena.mit.edu>, jfc@athena.mit.edu (John F Carr) writes:
> Why do you care what the address size is?  A programmer's concern should be:
> how many objects can I have, how big can each be, and how fast does the code
> run?

That's right.

> Let the system designers decide whether to have a flat address space
> or segments.

No, because that immediately limits me to "how big an object can be". And
the cost of RAM is continually dropping (see below)

> If you have code which requires 2^40 byte objects, put this in
> your requirements when you buy a system.

I might not, now. But some people are already using more than 2^32 bytes,
and single objects larger than that are already around the corner. You have
to consider your next system, and the system after that. Are you going to
be able to just buy the next larger version, change a few constants, and
deal with bigger problems with the same software?

> The cost of 2^40 bytes of memory
> can finance the OS and compiler changes needed to support such objects on a
> segmented MMU.

Let's pretend it's 1978 and we're looking to design a system.

So the cost of 2^20 bytes of memory should finance the OS and
compiler changes needed to support such objects on a segmented MMU,
so we'll build a segmented system.

And at late-'70s prices, when the 8086 was being designed, that was
probably true. By the time it came out, memory was cheap enough that
the original 64K was too small. But we're still stuck with the design
decision that software could cover for the segments in the off chance
anyone would ever need to go beyond 64K objects.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter.da.silva.Of.250/401@p402.f401.n250.z1.FidoNet.Org (peter da silva Of 250/401) (04/09/91)

In article <1991Apr06.030330.1533@kithrup.COM>, sef@kithrup.COM (Sean Eric
Fagan) writes:
> In article <VFIA832@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da
Silva) writes:
> >In article <1991Apr04.230953.15294@kithrup.COM>, sef@kithrup.COM (Sean Eric
Fagan) writes:
> >No, I didn't miss the point. If I have a 48 bit wide VM address and can't
> >operate on any object larger than a 32 bit wide pointer can address, then
> >it's a problem.

> Why is everybody harping on this 48 bits wide?


This subject started with the idea of using segments to expand the address
space of 32-bit computers. Many of us are still thinking along those terms.

> I actually never said
> anything about how wide the address space, except that I implied that it was
> at least 32 bits.  In fact, given the machine I proposed, it would still
> work quite well with only a 32-bit address space.

In fact, it would only work well with a 32-bit address space (or maybe 33 or
34 bits). Once you get much more address space available to a program than
you can stick into a single object you will run into problems: that's the
lesson of the 8086. The reverse situation is not a big deal: that's the
lesson of the 68000.

> And that's the difference
> between the flast and the segmented:  using the segmented version (which, as
> I pointed out two articles ago, can imitate, if slowly, a flat-address-space
> machine), I can hide the fact that I only have 32-address bits, virtual or
> otherwise.

My Amigas both have a flat address space, but one has 32 address bits and
the other 24. Apart from some trash software written by Microsoft, there
is no difference that the program has to deal with.

> Please read that agin, peter.  You assumed, incorrectly, that I had more
> than 32-bit for addressing.  You assumed that, if I cannot allocate a 5Gbyte
> object, things are broken.

So what's the advantage to segments?

> >Now, here's the important question: why is the 64K object size limitation
> >in the IBM-PC a problem? 

> Peter:  not all people who run into memory problems need more than 4Gbytes
> for a single object.

And not all people who run into memory problems need more than 64K for a single
object: that's one reason why ints in most IBM-PC C compilers are only 16
bits wide. For people who don't, they will never now that the machine can
access more than 64K, assuming, of course, it can.

Since ints are only 16 bits, there is no way to specify the size of an
object longer than 64K (that's why I was talking abouE cz8V9.RTuDnV
 GDMK]xM1ZtD%VMwa:TQM0``CO$aVK
><oLUc!]x0
OG06'(1
C2 'qR:	+g&iOkmGeP1*JmdSW94eA*F#!4^yW
v'q0
wH[ZXhB
IyJ+Gc4\Ro$%S@mpTz>4w|vnNQQ-X60$QB[Be$V	{GDUwfV0D*6/12i1iYF\(2q|LusFQwXj${D#Ou4&7t8`a8TAIueq Y&W|7I,;T2P@PpM[]hNq?Mq*xf(}TAwHAQPD	aTiQzh*%RM#34P Z2<D%nK\a@WsB9NU8K	n%
qiJDSRR0I Yh46<y'Y;SG3H&m(_DJiE7ac>Bv
P u(!O=!$d%iTd9T`rJl@C4Phz
C]V0M=eJK\dn)$Qa WX3MhP>pR2;21 N:H\fRD-$dnMY c3X;B=J}C yr&5DMh 0r}/?  g$4dM $U?zBp
=
BX3f`XV+]D! s,
s!Np/EbLg0Ie[BJ88
0X	Ws&<lF(<.y) _6
8d"ie237AWhV&F9,O_\Q9M9$ZQK&5IIN9r	bIE@76#h*Wt_PjbEe#}bav)#JLrr*H^nB$!N:Q) :&-	
Pm:*wy1!JzJBMe"NxI<B+5vdA3JWR`H9$ )t*WS?Kt%u\(BK,I
:rH0i5C+#?`d.2
R\2MKehGk[V3pF5v,d 	mSE$vd"%eYF&U'j p"If!-H&P"!Pv%mV^x(]1jT-]s#+O%1Z_ 9XIV >1Ch$uWbL7[Q%4#4Bo"
"&	dwJ!->F`pRbRR6z=*Hj!Jh^
C1I"lxH4"+#StHTcrPPU;3U.	G\WD 9.s6iUc?%8%A}^f3s5WA7yjC@<PM&EJo<_-8!Hpexpn0
$Pba
ZK#66	Csc9W!1e`b#7P$<4?uZuMWx)	\J$
40gLL9^nh&aSJMk6UPl
Ku~"bw&i66l(gxU8X'2CeM?
T2/)I8 Ebdwj;.Ga	ENPv&
()p^%2ijcMc5t)L>AhZl/
;E/$h5.8GC]k
=-X'mS	haaZ4,e6*V|OyS;ZhaF:^	G#mUpe-|J95Xt\ ZM:ZyT4\H?
a/T,E'HH\wd`y
A8u`dBO)Q+C3E D$8	BU\P7~Q#9T0d=-E!jRd}uB3:QcrE<1jY'1#,X!Q$1W_6C"6C`\TWdp+A~A$#XkUFA.
3E[Cw%-'H#YWqeQ"7D[U0	&#or,asj2(@%@s-Fq]Xg:43JP
R1S(QS'	R*)go	TmIT=uG#ZUE
3x{6	&dLeq!Xup
"MvpP`\c[qB9,%$nrasS<8s	a(36+rn@uc\xTH6.t(t9'HQ!vDc`.(R$
x%)pr3p '@EbU26_! ~6;#1gYkO
`}2nm1z+t}T@Ux!\rA`P_P'`}d@ ^% |.pP711 73 2pDR1@1tU
^'}gq{8740#ka2yp"@m8 E-P.p(@)0BPT]p 
gO`>4EpSPQbI<P9a qNYGp7Wb(: 5 /8`0P  I@#aLh
'E7P;Y71E!a	5/PX1yyiywH`a997      z  /		P1CU' yP: 4  
AP( prXgl@-  `)f	T9WY Nys`.|qE.p1Q  0<cTb#oF`(Q
zHOWb
Ql %^?1I PXFePk9<pY0AP;	xPciy 
aeJ3A=aoi;Y GPTP cRX7A1(>z12(q0`94
@AhdRR>j`c &s1&ez&8"o

mlord@bwdls58.bnr.ca (Mark Lord) (04/10/91)

In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes:
<
<Substitute "create/attach a segment" for "open file" and substitute
<"index in the segment" for "seek in the file", and you have a
<segmented memory model.  In the file system, you have (say) 8 bits of
<file descriptor, and 32 bits of offset, but even though you have 40
<bits of "pointer", you can't address more than 32 bits.  People
<doesn't seem to have any problem with doing this in a file system, so
<why the dislike for doing this with the memory too?

Not at all the same thing.  Files are read in large chunks to negate the
performance impact somewhat.  Accessing largish data items in main memory
requires segment prefixing (or whatever one calls it) on *each* access,
barring loop optimisations.  Quite the performance hit, in addition to being
a kludge to overcome limited addressing capability.

For that matter, even the analogy is not correct.  How about..
Substitute "access variable" for "open file", and from then on the file itself
looks like one huge flatly addressed table.  No segment boundaries to worry on.
I can seek directly (on most OS's) to exactly the part I want (read "indexing"
the table).  In memory, this *could* be equally clean on a segmented system,
provided we can fit each large data item completely within one segment, and
provided there are enough segment registers to accomodate *all* of the large
data items simultaneously.  Anything else requires extra logic to maintain
segment registers.  The closest counterexample in file systems is having to
put up with multiple disk volumes, where we treat each drive as a "segment".
And yes, we hate that, which is why drives keep getting MUCH larger and larger
(bigger flatter addressing segments).

Computers have plenty of die space to provide enough addressing bits to send
segments back to the dark ages NOW, so why kludge around?

-- 
MLORD@BNR.CA  Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end

bellman@lysator.liu.se (Thomas Bellman) (04/11/91)

mlord@bwdls58.bnr.ca (Mark Lord) writes:
> In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes:
> < [Me describing an analogy between memory segments and disk files.]

> Not at all the same thing.  Files are read in large chunks to negate the
> performance impact somewhat.  Accessing largish data items in main memory
> requires segment prefixing (or whatever one calls it) on *each* access,
> barring loop optimisations.  Quite the performance hit, in addition to being
> a kludge to overcome limited addressing capability.

I don't really see why it should hinder loop optimisation.  At least
not if it segmentation is done on a low enough level of hardware.  I
think of segmentation as part of the MMU.  Sort of selecting which
page table to use.  From a restricted set of tables, though.
(Restricted by the OS, i e the OS decides the contents of the page
tables.)  I am *not* saying that you should have to do a "select
segment" and then index in that segment.  Rather I would have the
segment as part of the pointer.

And I am definitely *not* wanting segments for overcoming limited
addressing.  I want it to be able to know what object I'm using.
Consider mapping a file into memory.  When using the file normally,
you can extend the file by just writing at the end of it.  How do you
do that when the file is mapped?  You might have something just after
the mapped file.  Or take a stack.  How does the OS know if you're
extending the stack or indexing outside your allocated memory by
mistake?  It doesn't.  It just guesses.

The *programmer* decides the sizes of the segments, and how many he
wants.  You should be able to fit the entire address space in one
segment.  Just like you can have one single file taking up all of your
disk.

The programmer should also be allowed to specify the attributes of
each segment (read, write, execute permission, auto-extending on
writes after the end, ...), but that is up to the OS to deal with, and
not a hardware question.

--
Thomas Bellman,  Lysator Computer Club   !  "Make Love - Nicht Wahr"
          Linkoping University, Sweden   !  "Too much of a good thing is
e-mail:         Bellman@Lysator.LiU.Se   !   WONDERFUL."     -- Mae West

meissner@osf.org (Michael Meissner) (04/11/91)

In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas
Bellman) writes:

| Perhaps the best would be to let the programmer choose for himself.
| Have two types of instructions for accessing memory -- one type that
| uses pointers that consists of a segment number and an offset, and one
| type that has a flat view of the address space, both usable from user
| mode.  Say SEGSTORE and SEGLOAD that takes a segmented address, and
| FLATSTORE and FLATLOAD that takes a non-segmented address.  And then
| some way of converting between the two types of pointers.  Then those
| that like segments, can take advantage of them, and those that likes a
| flat address space, can take advantage of that.

No, No, No.

If a segmented pointer needs different instructions to load up, you
need to provide two or more versions of the library, one that expects
pointers to be segmented and one that doesn't.  This is the primary
problem with x86 segments -- you have to have different models and
libraries, and then you wind up gunking up 'portable' code to
anonatate whether particular pointers are far, near, or huge.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

jfc@athena.mit.edu (John F Carr) (04/12/91)

I don't think the tradeoffs between segments and a flat address space are
the same now for >32 bit machines than they were for >16 bit machines.

In the past decade, memory cost has dropped by about 2^8.  The 32 bit
address space that some find too small costs 2^8 times as much to fill as
the 16 bit address space did 12 years ago.

--
    John Carr (jfc@athena.mit.edu)

dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber) (04/12/91)

In article <1991Apr12.021609.5340@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
:I don't think the tradeoffs between segments and a flat address space are
:the same now for >32 bit machines than they were for >16 bit machines.
:
:In the past decade, memory cost has dropped by about 2^8.  The 32 bit
:address space that some find too small costs 2^8 times as much to fill as
:the 16 bit address space did 12 years ago.
:

Oh come on!  No one here has been seriously requesting 4GB of real
physical memory! (Well, not many anyway :))  The point that most of
the anti-segmentation folks, including myself have been trying to
make is that internal segmentation (visible only to the OS) is fine;
external segmentation, defined as ANY type of segmentation which prevents
my application from playing with a flat address space, isn't.  Intel's
brain-damaged 64K segments were admittedly the worst, but so what?
All of the new machines which supposedly offer >32 bit virtual address
space are an optical illusion, because the application is now responsible
for using the actual 32-bit virtual address space as a cache, reloading
some segment register or other when it needs to play with object X (can
you say overlays?  I knew you could!)  And I don't really care if IBM
has made loading a segment register on the RS/6000 so fast I can do it
in 30 instructions.  My point is that my application has to know to do
this B.S.  Can you say non-portable?

:--
:    John Carr (jfc@athena.mit.edu)


--

Dan S.

firth@sei.cmu.edu (Robert Firth) (04/13/91)

In article <1991Apr12.021609.5340@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
>I don't think the tradeoffs between segments and a flat address space are
>the same now for >32 bit machines than they were for >16 bit machines.
>
>In the past decade, memory cost has dropped by about 2^8.  The 32 bit
>address space that some find too small costs 2^8 times as much to fill as
>the 16 bit address space did 12 years ago.

Your figures are approximate, but let's take them as a starting point.
As I recall (having lived through it) by about 1975 we were bashing
into the 16-bit limit often enough to leave major bruises.  At least,
that's when the place I worked for started looking seriously at
segmented machines, flat 20-bit machines, software tricks with
separate I and D spaces, and so on.  That's the vintage of the
Interdata 7/32 and the Algol-68C compiler with separate I and D
segments.

So, if memory cost drops at 2^8 per decade, it will be as cheap
to fill 32 bits in 1995 as it was to fill 16 bits in 1975.  Now,
cheapness isn't everything, but the figure does suggest that, by
1995, we'll be hitting the 32-bit limit in the same way that we
were hitting the 16-bit limit in 1975.

So, if you are designing a machine today (April 1991), to be
shipped in, say, 1994Q1 - not an unreasonable lead time - then,
if you hardwire a 32-bit object limit, your machine will be
constraining an appreciable fraction of potential users within
18 months of first release.

Not, one feels, a prudent business strategy.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/13/91)

In article <24004@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:

| So, if memory cost drops at 2^8 per decade, it will be as cheap
| to fill 32 bits in 1995 as it was to fill 16 bits in 1975.  Now,
| cheapness isn't everything, but the figure does suggest that, by
| 1995, we'll be hitting the 32-bit limit in the same way that we
| were hitting the 16-bit limit in 1975.

  I believe you're basing that on a false assumption that problems grow
to fit the available computer, while the truth is that larger computers
attract larger problems, which isn't the same thing at all.

  The subtle difference is that the little problems don't go away. We
still have a need for the bc or hand calculator size solution. We still
do email and spreadsheets, text editing, and compilations. What this
means is that a computer twice as big won't solve twice as many
problems. Electronic mail on a SPARC doesn't take more resources than
it did on a PDP-11. You can't even put N times as many people doing
mail on a machine N times faster, because the i/o hasn't grown N times.

  What you see is that additional resources don't proportionally
solve additional problems, so the cost of solving is larger on a "per
problem" basis.

  Most problems solved on workstations today probably will benefit from
faster i/o, and from faster CPU, but not from more memory, because the
typical workstation will already handle today's problems. Most
workstations are capable of holding more physical memory than they have,
say 64MB max, 16MB typical.

  Given this, if you were a vendor, would you put your R&D into faster
CPU or adding memory. And given that a larger word size is inherently
slower, would you go to a huge word size for which there was a very
limited market?

  I predict that the growth in the next decade will be in faster CPU,
bigger and faster disk, and that the slope of the growth curve in
actual memory will be 4x lower than the CPU. Money will be spent on the
most marketable solutions, and volume sales will bring price down ...
feedback between financial and technical.

  I think the jump to 64 bit will be limited to the top end of the
market, while lots of vendors take advantage of 32 bit being smaller,
cheaper, lower power, and faster. And most structs and arrays will be
twice as big in 64 bit, driving the cost of 64 bit systems up relative
to 32 bit.

  People who need 64 bit will jump. People who always need the latest
may or may not, depending on the speed difference between the 32 and 64
bit machines. The average user  would like one, but doesn't have a
single problem which needs it. I will guess 64 bit will get less than
half the market (by unit) through the end of the decade.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

henry@zoo.toronto.edu (Henry Spencer) (04/14/91)

In article <3336@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>... Electronic mail on a SPARC doesn't take more resources than
>it did on a PDP-11...

Oh, but it does.  The pdp11 had the immense good fortune of being too small
to run sendmail...!
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

jesup@cbmvax.commodore.com (Randell Jesup) (04/15/91)

In article <1991Apr14.014401.1297@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <3336@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>>... Electronic mail on a SPARC doesn't take more resources than
>>it did on a PDP-11...
>
>Oh, but it does.  The pdp11 had the immense good fortune of being too small
>to run sendmail...!

	Quite right.  Never underestimate the ability of software people to
use all available resources, and then 10% (or 100%) more.  If mail has become
"small", someone we recode it using OO, or in a functional language, or...
Then they'll add all sorts of frills, say automatic AI junk-mail filters,
voicemail, a friendly voice that says "Some important mail from your buddy
fred has arrived, and I knew you would want to know about it immediately;
shall I read it for you?", or some such sillyness.

	(Please excuse my intentionally semi-serious predictions.)

	The only proof I need is X/OpenLook/Unix.  When we have 1000-Spec
machines on our desktops, they'll probably _still_ have >1sec response times.
;-|

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
Disclaimer: Nothing I say is anything other than my personal opinion.
Thus spake the Master Ninjei: "To program a million-line operating system
is easy, to change a man's temperament is more difficult."
(From "The Zen of Programming")  ;-)

rminnich@super.ORG (Ronald G Minnich) (04/16/91)

In article <p-bgl5n@rpi.edu> fargo@iear.arts.rpi.edu (Irwin M. Fargo) writes:
>With what I know of OSs, wouldn't segmentation be what the OS wants?
Not want, but have, and use, even before VM. The question is, given that
8086-style segmentation support is Bad, and maybe even B5500-style support
is taken to be Bad, and maybe even HP PA or RS6000 support is Bad, is there 
anything Good that computer architectures can do to support segmentation?
Besides ignore it completely, as most do now?

ron

halkoD@batman.moravian.EDU (David Halko) (04/23/91)

In article <p-bgl5n@rpi.edu>, fargo@iear.arts.rpi.edu (Irwin M. Fargo) writes:
> In article <572@lysator.liu.se> bellman@lysator.liu.se (Thomas Bellman) writes:
> >
> > [a few paragraphs removed]
> >
> >Sometimes you don't want the segmentation.  Sometimes you want the
> >flat address modell.  This is equivalent to accessing the physical
> >disk in a file system.  The file system it self want to do this, but
> >will probably not want to let the user do that himself.  Same for
> >memory, the OS wants to address the memory as a flat space, but might
> >not want the user programs to do this.
> 
> With what I know of OSs, wouldn't segmentation be what the OS wants?
> 
> In most of today's computer systems, virtual memory is the Big Thing (tm).
> The idea behind virtual memory (correct me if I'm wrong), is that a program
> can read/write to memory as if memory were directly connected, but it is
> actually re-mapped to a previously specified location in physical memory.
> 
> Obviously, virtual memory mappers of today use pages to allow more flexible
> ways of memory mapping.  Couldn't a virtual memory page be considered the
> same as a segment? (a la the Intel 80386 in protected mode)
> 
> If the OS (or any other program really wants, you can tell the MMU you want
> one page that takes up all of memory of lots of little pages.
> 
> My whole point is, if we consider virtual memory pages to be equivalent to
> segments, then it would seem that quite a few systems do use segmentation
> and that it really is not that outdated an idea.
> 
From what little I have read on virtual memory and segmentation, segmentation
seems to be an abstraction of Virtual Memory.

Virtual memory is one dimensional because virtual addresses go from 0 to some
maximum address, one address after another. This has a tendency of causing
problems. If a compiler is building several tables (symbol table, parse tree, 
call stack, numeric constants table, etc.) and one table grows at a faster
rate than it is supposed to, thus causing one table to collide with another,
this can cause problems!

Segmentation was designed to solve this problem. Instead of there baing 1 
linear address space, there are multiples, allowing in this example, the
symbol table, constants, parse tree, and call stack to take up separate
address spaces starting at 0 and growing until the maximum of its segment
is hit (which then causes a problem), but the point it the space can grow
dymanamically, not taking up any extra address space, thus leaving lots 
of room for other processes taking up other segments, until memory eventually
runs out (but this is what we have virtual memory built underneath segmentation
for! Oh Baby!)

Besides that, segments allow separate procedures or data to be distinguished
separately from one another and protected. Sharing procedures in segments
between users is facilitated. The programmer needs to be aware of segmentation
to make use of it, however (at a lower leve, compiler designers, for example)
to make full use of it.

May I add, however, smart OS's have taken advantage of the theories behind
segmentation before it existed in hardware practice (OS-9 used memory
modules which could be distinguished from data/executable modules as 
well as these modules being available to be shared between processes...
I still can't figure out why MS-Dos ever took a descent foothold in the market!)

-- 
     _______________________________________________________________________
    /                                                                       \
   / "The use of COBOL cripples the mind; its teaching should, therefore, be \
  /  regarded as a criminal offence."           E.W.Dijkstra, 18th June 1975. \
 +-----------------------------------------------------------------------------+
  \  "Have you purchased a multi-   halkoD@moravian.edu  Have you booted your /
   \ media machine from IMS yet?"   David J. Halko       OS-9 computer today?/
    \_______________________________________________________________________/