[comp.os.minix] Why the 8086 architecture is wonderful :-)

preston@felix.UUCP (Preston Bannister) (02/22/89)

Before we get carried away trashing the Intel 8086 architecture lets not
forget it's advantages...

1.  If you can restrict your program to 16-bit pointers (i.e. the "small"(?)
memory model), as does Minix, then:

- Your code will tend to be both smaller and faster.  (The code for "small"
8086 programs tend to be smaller than "small" 68000 programs).

- You can treat each segment as a seperate address space.  Segments can be
moved around in memory as needed.  This is the rather nifty trick Minix uses
to efficiently implement fork() without a real MMU.  Since the 68000 has
a flat address space, the following program under ST Minix is incredibly
inefficient:

	main ()
	{
	  if (fork())
	    for (;;) printf("Heave");
	  else
	    for (;;) printf("Ho");
	}

On each process switch ST Minix _must_ swap the parent and child data so
that the running process is in the same place in the address space.  In PC
Minix the process switch only has to change a couple of segment registers.
(This _not_ a criticism of ST Minix, as this approach is a necessary
function of the 68000 architecture).

Guess which one is faster :-)


2.  If you want a real virtual memory system, using the Intel 286 (i.e. an
AT-clone) in "protected" mode is the least expensive solution.  The 286
effectively has an MMU on chip.  True, it is segment-based (no paging), but
you can't beat the price.  The MMU on-chip also means fewer delays between
the CPU and memory, i.e. for a given clock rate you can use slower (less
expensive) RAM.


All this is not to say the I think the Intel 8086 architecture is truly
wonderful, just that it does have _some_ advantages.

When I read about the 8086 architecture I was working on a UCSD Pascal
system.  If you are used to thinking in terms of shared libraries, a 64kb
limit on the size of one code segment is not bad.  In fact good programming
practice would tend to break up such large modules.  Given a good set of
shared libraries, large monolithic programs are much less likely.

None of which is much help when trying to port GNU Emacs...
--
Preston L. Bannister
USENET	   :	hplabs!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister

allbery@ncoast.ORG (Brandon S. Allbery) (03/01/89)

As quoted from <84154@felix.UUCP> by preston@felix.UUCP (Preston Bannister):
+---------------
| Before we get carried away trashing the Intel 8086 architecture lets not
| forget it's advantages...
| 
| 1.  If you can restrict your program to 16-bit pointers (i.e. the "small"(?)
| memory model), as does Minix, then:
| 
| - Your code will tend to be both smaller and faster.  (The code for "small"
| 8086 programs tend to be smaller than "small" 68000 programs).
| 
| - You can treat each segment as a seperate address space.  Segments can be
| moved around in memory as needed.  This is the rather nifty trick Minix uses
| to efficiently implement fork() without a real MMU.  Since the 68000 has
| a flat address space, the following program under ST Minix is incredibly
| inefficient:
> (...deleted...)
| On each process switch ST Minix _must_ swap the parent and child data so
| that the running process is in the same place in the address space.  In PC
| Minix the process switch only has to change a couple of segment registers.
| (This _not_ a criticism of ST Minix, as this approach is a necessary
| function of the 68000 architecture).
+---------------

It is NOT necessary.  For example, the Macintosh uses relative addressing
almost exclusively.  This results in 32K relocatable segments, as the 68000
uses signed 16-bit displacements.  ST Minix could use the same trick (and
with some hacking, it could use 64K segments if it wanted to deal with 32K
of "negative" addresses), in which case your example reduces to exchanging
the chosen base register, which is done when the process's quantum starts
anyway.

The main reason that this is not normally done is that it introduces the
joys [ ;-) ] of large-model programming to the 68000.  An ST-Minix program
can be much larger than a PC-Minix program (witness gcc), because accessing
an address outside the current segment is possible without munging the
segment registers.  (Note that this can still be done on the 68000, even if
you use relative addressing:  it's not a processor mode, it's the choice of
instruction.)  The Mac uses it so it can rearrange memory as necessary, as
noted above for the "fork" trick -- otherwise, certain combinations of
actions could generate lots of holes in memory (open DA, open application,
close DA -- suddenly there's a sizeable chunk of "dead space" between the
system heap and the application, where the DA used to be).

An-relative addressing also makes for smaller and faster code, since only
two bytes need be read from memory to fetch an address, vs. 4 for absolute
addressing.  (Counterpoint:  protected-mode 386 programs can also use 4-byte
addresses, so they are also potentially slower.  I don't know whether any
existing 386 programs actually use 4-byte addressing, though; with the 386es
at Telotech came sdb, so I didn't have to learn the processor the way I had
to with the adb-only 68000 in ncoast.)

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

preston@felix.UUCP (Preston Bannister) (03/03/89)

From article <13428@ncoast.ORG>, by allbery@ncoast.ORG (Brandon S. Allbery):
> As quoted from <84154@felix.UUCP> by preston@felix.UUCP (Preston Bannister):
> +---------------
> | - You can treat each segment as a seperate address space.  Segments can be
> | moved around in memory as needed.  This is the rather nifty trick Minix uses
> | to efficiently implement fork() without a real MMU.  Since the 68000 has
> | a flat address space, the following program under ST Minix is incredibly
> | inefficient:
>> (...deleted...)
> | On each process switch ST Minix _must_ swap the parent and child data so
> | that the running process is in the same place in the address space.  In PC
> | Minix the process switch only has to change a couple of segment registers.
> | (This _not_ a criticism of ST Minix, as this approach is a necessary
> | function of the 68000 architecture).
> +---------------
> 
> It is NOT necessary.  For example, the Macintosh uses relative addressing
> almost exclusively.  This results in 32K relocatable segments, as the 68000
> uses signed 16-bit displacements.  ST Minix could use the same trick (and
> with some hacking, it could use 64K segments if it wanted to deal with 32K
> of "negative" addresses), in which case your example reduces to exchanging
> the chosen base register, which is done when the process's quantum starts
> anyway.

Almost, but not quite.  The problem is with pointers and the stack
on the 68000.  There are absolute memory addresses embedded in the
stack.  Absolute pointers to within the stack (frame pointers at
least) make the stack not movable.  Return addresses (and function
pointers) in the stack make the pointed to code not movable.

On the other hand, the frame pointers on the 8086 are all offsets
within the stack segment.  If you use only the 8086's "short" calls
then the return addresses are all offsets within the same code
segment.  

You _do_ bring up a good point, as the 68000's relative addressing
mode could be very useful in implementing shared libraries, as done
(surprise, surprise) on the Macintosh.  (Anyone listening?? :-)

(deleted...)
> The Mac uses it so it can rearrange memory as necessary, as
> noted above for the "fork" trick -- otherwise, certain combinations of
> actions could generate lots of holes in memory (open DA, open application,
> close DA -- suddenly there's a sizeable chunk of "dead space" between the
> system heap and the application, where the DA used to be).

Er, I should point out that the Macintosh does NOT have a fork()...

I've never been particularly fond of fork() as a primitive operation,
because it assumes more than is necessary about the architecture of
the machine.  To do fork() efficently, you pretty much have to have
a virtual memory machine.  In the "real" world, most fork() calls
are immediately followed with an exec() call.  The copy of the
process that fork() creates, exec() discards, so why bother making
the copy...
--
Preston L. Bannister
USENET	   :	hplabs!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister

ugkamins@sunybcs.uucp (John Kaminski) (03/08/89)

>I've never been particularly fond of fork() as a primitive operation,
>because it assumes more than is necessary about the architecture of
>the machine.  To do fork() efficently, you pretty much have to have
>a virtual memory machine.  In the "real" world, most fork() calls
>are immediately followed with an exec() call.  The copy of the
>process that fork() creates, exec() discards, so why bother making
>the copy...
>--
>Preston L. Bannister
>USENET	   :	hplabs!felix!preston
>BIX	   :	plb
>CompuServe :	71350,3505
>GEnie      :	p.bannister

It is my opinion that fork() assumes nothing about anything.  It is merely a
standard system call that has standard semantics -- i.e., you will get another
independently running process with the same program and the same variables, the
same state of just about everything...and the parent gets a pid (the returned
value).  The manual for the UNIX system in question usually clearly tells you
the semantics (in the case of MINIX, I guess that would be the description
for V7 UNIX).

Also, I agree that most of the time that exec() of some kind is usually per-
formed after most fork()s.  However, it is not always desired.  Take for
example my attempt a a real-time communications program for OS9 (which is also
supposed to mimic UNIX, at least in its system calls).  What I wished to happen
was that the program immediately split into two parts -- one for taking
keyboard input and sending it out, and another to receive characters and
display them.  Without getting OS9-specific by going on calls to see if there
was anything in the buffer (either keyboard or serial line -- and later from
another user logged on, i.e., not a terminal program), it seemed to me that
the only way to prevent the program from blocking on read() calls was to have
two independent processes, each of which could block all it wanted while
waiting for characters, because the other process would still be running or
ready to run.  A fork() as defined under UNIX would have been great, but
os9fork() requires a string argument which is the module name (executable files
are roughly the equivalent of modules) to be "exec()ed" after the new process
is created.  I had given up on the project due to lack of time, but what I was
attempting is something like os9fork(argv[0]) and have the program somehow
determine if it had already been started and if not, signal the started one
that it was the copy....as you can see, it got complicated REAL fast.

In short, if you are given the choice of fork() then exec() or forkexec(),
I'll take the control offered by separate functions any day.

steve@basser.oz (Stephen Russell) (03/14/89)

In article <4566@cs.Buffalo.EDU> ugkamins@sunybcs.UUCP (John Kaminski) writes:

>
>It is my opinion that fork() assumes nothing about anything.

Your own example shows that fork() _DOES_ assume certain properties of the
underlying architecture.

> [Example of problems with an OS9 application due to lack of separate
>  fork()/exec() calls deleted]

Ask yourself why the OS9 designers left out the UNIX-style fork(). The
answer, of course, is that fork() is difficult/expensive to perform without
appropriate MMU hardware. The 6809/68000 machines either lack any MMU, or
suffer from the lack of a standard (at least until the very late introduction
of the 68881 (?) PMMU for the 68020 series).

Why is the MMU important? After a fork() the child's data segment will
obviously occupy different physical memory addresses than the parent's.
However, the program text contains addresses fixed by the linker
for static/extern data, and the data contain the addresses of auto or
malloc'd variables. Without an MMU to translate these addresses to the
child's new data region, all memory refs from the child will still refer
to the parent data. This breaks the UNIX fork() semantics.

Of course, we could postulate restrictions on the semantics of fork()
so that this is not a problem. For example,

	- the child cannot access any data that existed before the fork(),
	or if it does, it may incur the wrath of its parent.

	- the child _can_ access data that it allocates itself, using malloc
	or by calling a function, thus growing its stack.

	- the child cannot return from the function that called fork()

	- etc.

These restrictions don't exist in UNIX, so they should not exist in Minix.
Fortunately, the 8x86 series allow separate data segments, thanks to the
DS register. While the architecture may be ugly in other ways, it does
allow efficient fork() implementation.

henry@utzoo.uucp (Henry Spencer) (03/17/89)

In article <1845@basser.oz> steve@basser.oz (Stephen Russell) writes:
>... fork() is difficult/expensive to perform without
>appropriate MMU hardware...

The Atari ST has no MMU, and Minix runs fine on it.  fork() is not as
cheap as it would be with an MMU, but the cost is manageable.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

preston@felix.UUCP (Preston Bannister) (03/21/89)

From article <1989Mar16.184945.23152@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> In article <1845@basser.oz> steve@basser.oz (Stephen Russell) writes:
>>... fork() is difficult/expensive to perform without
>>appropriate MMU hardware...
> 
> The Atari ST has no MMU, and Minix runs fine on it.  fork() is not as
> cheap as it would be with an MMU, but the cost is manageable.

Er, Henry are you saying that fork() without an MMU is NOT hundreds
or thousands of times as expensive as with an MMU? :-) :-)

"Cheap" is relative.

If you are forking the shell to exec() a program in response to user
typed command making a copy of the shell's address space as part of
the fork() is no big deal.  Forking GNU Emacs (or some other large
program) to exec() _is_ likely to be painful.  

The previous poster's example of communications program that uses
fork() to generate a clone of itself is _enormously_ inefficent if
the child and parent have to be swapped around in memory of each
process switch.

BTW, my objection to fork() is not just esthetic.  In daily
development I use a symbolic debugger.  The process size of the
debugger with all symbols loaded can easily get up to a megabyte.
Since we don't have copy-on-write, running (fork() and exec()) a
program from within the debugger is at best slow.  When memory is
low it can be fatal (we have a version 7 variant, i.e. swapping, no
paging).
--
Preston L. Bannister
USENET	   :	hplabs!felix!preston
BIX	   :	plb
CompuServe :	71350,3505
GEnie      :	p.bannister

henry@utzoo.uucp (Henry Spencer) (03/23/89)

In article <88236@felix.UUCP> preston@felix.UUCP (Preston Bannister) writes:
>Er, Henry are you saying that fork() without an MMU is NOT hundreds
>or thousands of times as expensive as with an MMU? :-) :-)

If the programs are small, yes, that's exactly what I'm saying.  Maybe
two or three times more expensive, but NOT hundreds or thousands.

>If you are forking the shell to exec() a program in response to user
>typed command making a copy of the shell's address space as part of
>the fork() is no big deal.  Forking GNU Emacs (or some other large
>program) to exec() _is_ likely to be painful.  

Well, serves you right for running such elephantine software! :-)
The GNU software is quite explicitly built for tomorrow's machines,
or maybe the next millenium but one's, not today's.

>The previous poster's example of communications program that uses
>fork() to generate a clone of itself is _enormously_ inefficent if
>the child and parent have to be swapped around in memory of each
>process switch.

Yes, that particular case -- where an exec() is not imminent, which
it *is* in almost all forks -- needs special handling on such systems.

Note that I didn't say it was wonderful in general; I said it was not
a big problem under Minix (which tends to avoid elephantine programs).
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu