[comp.arch] Architectural Requirements for Unix

fouts@bozeman.ingr.com (Martin Fouts) (05/30/90)

In article <1990May19.230618.16090@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

   From: henry@utzoo.uucp (Henry Spencer)

   In article <30016@cup.portal.com> mmm@cup.portal.com (Mark Robert Thorson) writes:
   >When I asked about the difficulty of running Unix on the 376, which only
   >has segmentation, not paging, but is in other ways similar to the 386,
   >I meant running full-blown demand-paged virtual memory...

   Who needs paging?  Keep your programs down to a sane size and segmentation
   will amply suffice, especially on a single-user machine.  Of course, very
   few of today's Unix programmers know how to make programs small and fast,
   so this won't work too well in practice.

There are at least four wrong assumptions which the naieve might read
into this paragraph:

1) Segmented memory machines have smaller memory/process than virtual
   memory machines.  This is not always true.  The Cray 2, for example
   is a    segmented memory machine and I've run 0.99 gigabyte
   processes on it.  (Others have since run larger...)

2) Virtual memory implies larger processes.  Not true.  Trading memory
   against performance in a virtual memory system may mean larger
   process images.  However, if they have good locality of reference,
   it might mean smaller memory resident set sizes.

3) Smaller is "better".  If I have to solve a 1000^3 grid, I have to
   solve it.  I've done it with segmented memory in the small (Y/MP)
   and segmented memory in the large (Cray 2) and virtual memory
   (ETA/10.)  For my purposes, the Cray 2 was best.  Your milage would
   vary.

4) The main reason for using virtual memory is to allow a small
   physical memory to support a large process image size.  This is the
   worst reason for using virtual memory.  The main reason for using
   virtual memory is to make programs easier to write.

The advantages of text/data sharing, small resident sizes, and
implicit memory management are enough to justify the cost of tlbs,
mmus and slower memory accesses in a lot of cases.

I need paging.  I need it to keep the total amount of memory I am
using small and make my programs more efficient by:

1) Using implicit sharing of text segments
2) Using copy on write sharing of forked images
3) Using explicit sharing of library code
4) Using explicit sharing of multithreaded applications
5) Using good locality of reference to minimize resident sets
6) Using copy on read to implement lazy evaluation
7) Using remapping to implement data transfer where possible

I do it with paging rather than segments because there ain't never
enough segments on a segmented system, and many segmented systems
don't have the architectural support to ease implementations of some
of the featues.

I *never* use virtual memory to make a small memory look large,
because I can't afford the performance hit from the paging activity,
and feel sorry for those who must take it.

So, the answer (in part) to the question "Who needs it?" is I do.
--
Martin Fouts

 UUCP:  ...!pyramid!garth!fouts  ARPA:  apd!fouts@ingr.com
PHONE:  (415) 852-2310            FAX:  (415) 856-9224
 MAIL:  2400 Geng Road, Palo Alto, CA, 94303

If you can find an opinion in my posting, please let me know.
I don't have opinions, only misconceptions.

peter@ficc.ferranti.com (Peter da Silva) (06/01/90)

In article <383@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes:
> 3) Smaller is "better".  If I have to solve a 1000^3 grid, I have to
>    solve it.  I've done it with segmented memory in the small (Y/MP)
>    and segmented memory in the large (Cray 2) and virtual memory
>    (ETA/10.)  For my purposes, the Cray 2 was best.  Your milage would
>    vary.

In turn, you're making a false assumption: that there's something inherent
in these very large programs (VLPs) that people are complaining about
that requires them to be large. Nobody is denying that some problems
require big iron. Text processing, text editing, window systems, and so
on... the majority of VLPs that come under fire... have no inherent reason
to be very large. Even machines as small as a 128K Mac can run excellent
windowing systems, and editors and text processors are even smaller.

> I need paging.  I need it to keep the total amount of memory I am
> using small and make my programs more efficient by:

> 1) Using implicit sharing of text segments

This does not require paging. Look at good old PDP-11 UNIX.

> 2) Using copy on write sharing of forked images

This requires paging. Fork() is a poor match for a non-demand-paged
architecture. My reaction is that we don't need fork().

> 3) Using explicit sharing of library code

This does not require paging. Look at good old PDP-11 RSX.

> 4) Using explicit sharing of multithreaded applications

This does not require paging. Look at the Transputer.

> 5) Using good locality of reference to minimize resident sets

This is the big win for paging systems. You don't need to pull any more
reasons out of the hat. However, when you have a small program anyway
this becomes much less important.

The question shouldn't be "is paging good" or "is paging bad" or "who
needs it". The question is "do you need it".

Also, paging and bloated programs are not synonymous. People can write
bloated program with overlays: look at the OS/360 kernel as described in
the Mythical Man-Month. The problem isn't paging. The problem is people
who think that because memory is cheap they can act like it's free.
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/01/90)

In article <383@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes:

| I need paging.  I need it to keep the total amount of memory I am
| using small and make my programs more efficient by:
| 
| 1) Using implicit sharing of text segments

  This can be done with segments, too. As long as the code is pure it
can be shared.

| 2) Using copy on write sharing of forked images

  This can be done with segments, too, at least for the segment as a
whole. It's marked read only, then the fault is used to force the copy.

| 3) Using explicit sharing of library code

  This could be done with segments, although I don't know of any machine
which does it. I looked at doing it on a 286, with all the library
routines called with far calls to a library segment. I am sure it can be
done, but I don't have an example of doing it.

| 6) Using copy on read to implement lazy evaluation

  I believe you can do that with segments, too. You certainly can have
separate read and write bits for a segment and turn off both read and
write, then copy on fault.

| 7) Using remapping to implement data transfer where possible

  ??? I think you can do this, but I am not totally sure what you have
in mind. Obviously shared memory works in segmented systems, and the
data does not need to be at the same segment number in each process.
| 
| I do it with paging rather than segments because there ain't never
| enough segments on a segmented system, and many segmented systems
| don't have the architectural support to ease implementations of some
| of the featues.

  Most of the segmented CPUs, Intel for sure, have 32k or 64k segments.
That certainly covers a lot of ground.
| 
| I *never* use virtual memory to make a small memory look large,
| because I can't afford the performance hit from the paging activity,
| and feel sorry for those who must take it.

  Even running a job which *can* fit in physical memory, I often see
that the working set is smaller than max. Often there is some startup or
wrapup code which doesn't stay in memory, some strings, like error
message, etc.

  In many cases, where the order of data access is not easily
determined, virtual memory will be faster than using i/o to bring the
data in from a file. There will *always* be a program which uses a data
set larger than physical memory, and applications which could
legitimately use more address space than they have.

  In spite of that, I believe that 95% of all programs on all computers
(by number rather than CPU cycles) will run in 4MB of address space.
And probably 95% in 16MB. There are not a lot of applications which
need the huge memory, and that means a small market, few vendors, high
prices, etc. Adam Smith walks here, too.

-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

mmm@cup.portal.com (Mark Robert Thorson) (06/02/90)

In some previous postings, I gave incorrect prices for chips which are
candidates for "world's cheapest virtual memory demand paged Unix engine".

Correct prices are (in 1000-unit quantity):

Intel 376 (not really a candidate)    $43 @ 16 MHz
Intel 386SX                           $66.50 @ 16 MHz
                                      $125 @ 25 MHz

Someone suggested the Acorn RISC machine would be a possibility.

VLSI Tech. VL86C010                   $31.25 @ 12 MHz
           VL86C020                   $92.80 @ 25 MHz

Note that an external MMU is required (I don't have pricing for that).

Motorola is about to release the 68020 in a new package with much reduced
pricing.  I don't know if the information has been formally announced yet,
so I can't tell you what it is, but I would say that this chip will be
certainly be the winner as far as low-end Unix machines are concerned.
If you don't care to run MS-DOS, it's price-performance product will be
significantly better than the 386SX.

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (06/02/90)

In article <30418@cup.portal.com> mmm@cup.portal.com (Mark Robert Thorson) writes:
>In some previous postings, I gave incorrect prices for chips which are
>candidates for "world's cheapest virtual memory demand paged Unix engine".
>
>Correct prices are (in 1000-unit quantity):
>
>Intel 376 (not really a candidate)    $43 @ 16 MHz
>Intel 386SX                           $66.50 @ 16 MHz
>                                      $125 @ 25 MHz
>
You have included only half the cpu: You must add in the prices for
the floating point chip.

Doug McDonald

csimmons@jewel.oracle.com (Charles Simmons) (06/02/90)

In article <2286@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E
Davidsen Jr) writes:
> | 3) Using explicit sharing of library code
> 
>   This could be done with segments, although I don't know of any machine
> which does it. I looked at doing it on a 286, with all the library
> routines called with far calls to a library segment. I am sure it can be
> done, but I don't have an example of doing it.
> 

Ah!  I can't pass this one up.  Turns out that one of your favorite
machines, the GE 6something (well, one of its descendents), does this
sort of thing.  The Dartmouth College Time Sharing system started
putting shared libraries in segments in around 1985 or so.  Worked
great.

-- Chuck

atk@boulder.Colorado.EDU (Alan T. Krantz) (06/03/90)

In article <1990Jun2.134157.14516@oracle.com> csimmons@oracle.com writes:
>In article <2286@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E
>Davidsen Jr) writes:
>> | 3) Using explicit sharing of library code
>> 
>>   This could be done with segments, although I don't know of any machine
>> which does it. I looked at doing it on a 286, with all the library
>> routines called with far calls to a library segment. I am sure it can be
>> done, but I don't have an example of doing it.
>> 
>
>Ah!  I can't pass this one up.  Turns out that one of your favorite
>machines, the GE 6something (well, one of its descendents), does this
>sort of thing.  The Dartmouth College Time Sharing system started
>putting shared libraries in segments in around 1985 or so.  Worked
>great.
>
>-- Chuck

I'm not sure what you mean by explicit sharing of library code - but tops-10
did this sort of things with segments. Every program had two segments - a
"high" segment for code (usually writeing was disabled) and a low segment for
data. What the fortran compiler did was put the user's code in the low segment
and then when the program ran it specified the name of the "shared" hi segment
which contained the fortran library (a user could force the linker to generate
his own copy of the fortran library. Of course - one day our machine had a 
parity error - just so happen it was in the middle of the shared segment - so
all the fortran programs died... 

One neat part of using the shared segments was that the run library could be
updated without rebuilding the user's programs ...

Oh - I wish someone would donate a PDP-10 to me - and a house with a powerplant
to run it ...

------------------------------------------------------------------
|  Mail:    1830 22nd street      Email: atk@boulder.colorado.edu|
|           Apt 16                Vmail: Home:   (303) 939-8256  |
|           Boulder, Co 80302            Office: (303) 492-8115  |
------------------------------------------------------------------

jesup@cbmvax.commodore.com (Randell Jesup) (06/03/90)

In article <30418@cup.portal.com> mmm@cup.portal.com (Mark Robert Thorson) writes:
>In some previous postings, I gave incorrect prices for chips which are
>candidates for "world's cheapest virtual memory demand paged Unix engine".
>
>Correct prices are (in 1000-unit quantity):
...
>Note that an external MMU is required (I don't have pricing for that).
>
>Motorola is about to release the 68020 in a new package with much reduced
>pricing.  I don't know if the information has been formally announced yet,
>so I can't tell you what it is, but I would say that this chip will be
>certainly be the winner as far as low-end Unix machines are concerned.
>If you don't care to run MS-DOS, it's price-performance product will be
>significantly better than the 386SX.

	Check out '030 prices.  They're now available in a solderable
plastic chip carrier (winged leads) in 16 and 25 Mhz (at least), and they
of course have the MMU built in.  Commodore is using them in our new A3000.
I don't know the price, but you could ask Motorola (I'm a software person).
They're supposed to be pretty cheap.  The '030 is considerably faster than
the '020, due to integrated MMU and data cache, and some improvements to
internal execution speeds for some instructions.

	Last I checked they're cheaper than an '020 and '851 pair.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Common phrase heard at Amiga Devcon '89: "It's in there!"

fouts@bozeman.ingr.com (Martin Fouts) (06/09/90)

In article <D:T30:5@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

   In article <383@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes:
   > 3) Smaller is "better".  If I have to solve a 1000^3 grid, I have to
   >    solve it.  I've done it with segmented memory in the small (Y/MP)
   >    and segmented memory in the large (Cray 2) and virtual memory
   >    (ETA/10.)  For my purposes, the Cray 2 was best.  Your milage would
   >    vary.

   In turn, you're making a false assumption: that there's something inherent
   in these very large programs (VLPs) that people are complaining about
   that requires them to be large. Nobody is denying that some problems
   require big iron. Text processing, text editing, window systems, and so
   on... the majority of VLPs that come under fire... have no inherent reason
   to be very large. Even machines as small as a 128K Mac can run excellent
   windowing systems, and editors and text processors are even smaller.

Say What!?  There are *no* assumptions in my comments, which were based
on measurements.  I said for a particular problem (and hinted at its
nature) that a particular machine was *measured* to perform better.  I
then qualified the statement by noting that it wouldn't hold in all
cases:  "Your milage would vary."

   > I need paging.  I need it to keep the total amount of memory I am
   > using small and make my programs more efficient by:

   > 1) Using implicit sharing of text segments

   This does not require paging. Look at good old PDP-11 UNIX.

Doesn't require, but can use.

   > 2) Using copy on write sharing of forked images

   This requires paging. Fork() is a poor match for a non-demand-paged
   architecture. My reaction is that we don't need fork().

   > 3) Using explicit sharing of library code

   This does not require paging. Look at good old PDP-11 RSX.

Doesn't require, but can be much easier to do.  It is often harder to
manage the usually smaller number of segments than the usually larger
number of pages

   > 4) Using explicit sharing of multithreaded applications

   This does not require paging. Look at the Transputer.

Look at the Cray 2.  But it's easier to do if you have them.

   > 5) Using good locality of reference to minimize resident sets

   This is the big win for paging systems. You don't need to pull any more
   reasons out of the hat. However, when you have a small program anyway
   this becomes much less important.

Ture.

   The question shouldn't be "is paging good" or "is paging bad" or "who
   needs it". The question is "do you need it".

I wasn't dealing with good/bad.  I was asked "who needs it?"  and I
answered me.  I'm not sure that the difference between "who needs it"
and "do you need it" is worth using two statements for.

   Also, paging and bloated programs are not synonymous. People can write
   bloated program with overlays: look at the OS/360 kernel as described in
   the Mythical Man-Month. The problem isn't paging. The problem is people
   who think that because memory is cheap they can act like it's free.

That is what I was trying to say.  I won't be subtle this time:

1) Paging can be abused
2) Paging can be used for many things.
3) Some of these things can be done other ways.
4) Taken as a whole, they make paging worth the effort.

Marty
--
Martin Fouts

 UUCP:  ...!pyramid!garth!fouts  ARPA:  apd!fouts@ingr.com
PHONE:  (415) 852-2310            FAX:  (415) 856-9224
 MAIL:  2400 Geng Road, Palo Alto, CA, 94303

If you can find an opinion in my posting, please let me know.
I don't have opinions, only misconceptions.

peter@ficc.ferranti.com (Peter da Silva) (06/09/90)

In article <446@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes:
> In article <D:T30:5@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> Say What!?  There are *no* assumptions in my comments, which were based
> on measurements.

I didn't intend to imply that you were making assumptions about the
software. My point is that people are not complaining about software that
has to be big... they're complaining about bloated software that really has
no reason to be using that sort of resources. Emacs. GNU CC. X. Nobody's
saying paging is a bad thing.
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

richard@aiai.ed.ac.uk (Richard Tobin) (06/12/90)

In article <78-3JC2@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> bloated software that really has no reason to be using that sort
> of resources. Emacs. GNU CC. X. 

While all these programs are fairly large, gcc (as I have mentioned
before) is rather better than most of the competition.  Compiling a
single large function (a virtual machine interpreter) gcc (sparc) grew
to a size of 2 Mbytes.  Sun's (sparc) C compiler compiling the same
code grew to a size of 44 Mbytes.  Mips's C compiler used at least 6
Mbytes, maybe more.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin