[comp.arch] Why fork?

albaugh@dms.UUCP (Mike Albaugh) (01/17/90)

From article <610@ssp11.idca.tds.philips.nl>, by willy@idca.tds.PHILIPS.nl (Willy Konijnenberg):
[quoting a lot of people arguing about MMU's and the needs of *nix, then
getting to the heart of the matter:]

> I don't think you should try to think of relocating a program once
> it has been running for a while. You have no way of knowing what it is
> doing with pointers.

	Precisely why systems like the MAC (and minix on the PC) use
only segment-relative addressing. Well, actually, it gets a little more
complicated on the MAC, but believe me you don't want to know :-)

> When you run a unix-like system, there is one additional point where
> this scheme slows the system down, in addition to the relocation work
> during program load.
> As Craig noted, when the program fork()s, you have two programs that need
> to be located at the same virtual (== physical with no MMU) address space
> to run, so for every context switch, you must check whether the program is
> at the proper place and if not, swap things around (in memory, not necessarily
> to disk), which dramatically increases context switch overhead.

	Although not mentioned here, someone else on this thread (mis)stated
that the MMU was also needed for protection, which is not strictly true.
A scheme like that implemented on the early mid-range 360's can provide
protection without relocation, and can do it in _parallel_ with the fecth,
so there is no performance penalty. There will always be a performance
penalty for relocation, but it may be masked by other, still slower, parts
of the memory system. I just wanted to get that point out of the way early.

> Fortunately, this is normally not much of a problem, since usually a program
> does an exec() shortly after the fork() and this exec() can fix the problem.
> 
> This scheme is not very elegant, but it allows one to run a unix system
> on hardware like ST, Mac and Amiga.

	SO--- Why do we _still_ use fork() for all these near-trivial
cases. I have been mucking around with computers for over 20 years but am
not really familiar with *nix. I would like a reality check on this.
I'm also not asking on comp.unix... because I'm afraid that would be
like asking pointed questions about the trinity in a seminary :-) Since
comp.arch folk have to deal with _implementing_ this stuff, I thought I'd
get a more reasoned response ( 1/2 :-). Anyway, I can see a few reasons
to use fork:

1) It can be used for part of spawn _and_ for actual task-splitting
	(problem subdivision). Why have two calls when one will do?
2) On machines that are only (or mainly) swapping anyway, there is no
	penalty, so what the heck.
3) By just (effectively) copying the entire memory space, we don't
	need to keep track of just which parts actualy _need_ to be
	passed to the new task (laziness as a virtue :-).
4) "We have always done it this way".

(my personal feelings are that 1 & 2 were the original reasons while
3 & 4 are the reason we are stuck with it now)

Against this we have the problems mentioned above with handling *nix
programs on machines without dynamic relocation. Also, even machines
that _can_ do relocation don't get fork for free:

1) Machines with base/bounds registers may need to copy the whole
	memory image to a new area. If they have two sets (e.g. KA10)
	they might get away with "only" copying the data segment.
2) Paging machines still need to at least mark all data pages
	"copy on write", which may involve traversing the segment
	and page tables in software. For a large image this can
	be time consuming. Also, I'd imagine it's a real judgement
	call whether to deal with a page at a time or just punt and
	do the copy as soon as "enough" of the image has changed to
	make write-trap handling a nuisance.

	And all this hassle so that three or four instructions later the
program can overlay itself (most of the time). I must be missing something
major here. Can someone tell me what?

				Mike

> 	Willy Konijnenberg		<willy@idca.tds.philips.nl>

| Mike Albaugh (albaugh@dms.UUCP || {...decwrl!pyramid!}weitek!dms!albaugh)
| Atari Games Corp (Arcade Games, no relation to the makers of the ST)
| 675 Sycamore Dr. Milpitas, CA 95035		voice: (408)434-1709
| The opinions expressed are my own (Boy, are they ever)

andrew@frip.WV.TEK.COM (Andrew Klossner) (01/17/90)

Yep, most uses of fork() are quickly followed by exec().  But the
exec() doesn't happen within a couple of instructions.  Typically this
sort of thing happens:

	fork()
	child does:
		close files that child won't need;
		open (or otherwise establish) standard input, output,
			and error files;
		exec(argc, argv, environment_p);
		print_error("exec failure"); exit();

Wrapping up all the interesting child manipulation of files into
options to one huge spawn() call would be cumbersome.

Berkeley 4.1BSD Unix addressed the problem by inventing the vfork()
syscall, which creates a new child thread but runs it in the parent's
environment.  This eliminates the need to clone writable memory, or to
set up a new set of page table entries with "copy-on-write" set.  The
parent thread is suspended until the exec(), and any change that the
child makes to memory before the exec() are seen by the parent.

vfork() is marked for extinction in a future BSD release.  We
extinguished it on our BSD-derived 68k workstation (for reasons that
were sound at the time).  Yes, you have to build a duplicate set of
page tables in order to implement copy-on-write, but for processes that
aren't huge, this isn't a large part of fork() handling.  The kernel
has to do a lot of other work besides just building the PTEs.  For huge
processes, though, we've observed that PTE creation time dominates
fork() time.

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]

pete@sun1102.UUCP (Peter R. Carpenter) (01/17/90)

In article <952@dms.UUCP> albaugh@dms.UUCP (Mike Albaugh) writes:
>
>	SO--- Why do we _still_ use fork() for all these near-trivial
>cases.  [stuff deleted]

>1) It can be used for part of spawn _and_ for actual task-splitting
>	(problem subdivision). Why have two calls when one will do?
>

Actually, the Berkley folks have a system call vfork(), which does the 
fork/exec in one operation. But of course, it is not compatible with Sys V.

---
Pete Carpenter, Cirrus Logic Inc, 1463 Centre Pointe Dr, Milpitas, CA 95035
{amdahl,ames,apple,bunker,pyramid}!oliveb!tymix!cirrusl!pete   408-945-8300
---------------------------------------------------------------------------

chris@mimsy.umd.edu (Chris Torek) (01/17/90)

In article <5875@orca.wv.tek.com> andrew@frip.WV.TEK.COM
(Andrew Klossner) writes:
>... you have to build a duplicate set of page tables in order to
>implement copy-on-write, but for processes that aren't huge, this isn't
>a large part of fork() handling.  The kernel has to do a lot of other
>work besides just building the PTEs.  For huge processes, though, we've
>observed that PTE creation time dominates fork() time.

This is useful information.  It does, however, make some assumptions:

1. PTEs.  Not all machines with virtual memory require PTEs (e.g.,
   all MIPS-based machines: page translation is done in software).

2. PTEs (if present) must be copied.  One valid approach, possible
   on some (particularly two level page table) machines, is to mark
   the primary pages invalid or unwritable.  Since most uses of fork
   are of the form:

	switch (pid = fork()) {
	case 0:
		diddle this, that, and the other thing;
		execv(name, argv);
		error(notfound);
		_exit(1);
		/* NOTREACHED */

	case -1:
		error(cannotfork);
		break;

	default:
		record(pid);
		break;
	}

   the kernel could:
	a. suspend execution of the parent
	b. hand PTEs to the child, mark them `must not be written'
	   or `invalid'
	c. start a timer to allow resumption of the parent
	d. let the child run.  When it gets page faults, make copies
	   of exactly those PTEs and pages needed to let it continue.
	   When it exits or execs, cancel the timer, and move the PTEs
	   back, replacing those that were copied with the originals
	e. if the timer expires, suspend the child, copy the PTEs back
	   to the parent (replacing modified PTEs and pages with
	   originals), and then resume both processes.

   This is a fair amount of work (particularly in data structures for
   tracking original and copied PTEs) and may well not be worth the
   effort---but it might turn out to help tremendously.  The kernel
   could choose whether to use this trick based on process size.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

tony@cairo.UUCP (Tony Anzelmo) (01/17/90)

In article <1239@cirrusl.UUCP> pete@cirrusl (Pete Carpenter) writes:
>In article <952@dms.UUCP> albaugh@dms.UUCP (Mike Albaugh) writes:
>>
>>	SO--- Why do we _still_ use fork() for all these near-trivial
>>cases.  [stuff deleted]
>
>>1) It can be used for part of spawn _and_ for actual task-splitting
>>	(problem subdivision). Why have two calls when one will do?
>>
>
>Actually, the Berkley folks have a system call vfork(), which does the 
>fork/exec in one operation. But of course, it is not compatible with Sys V.
              ^^^

Vfork does not do fork/exec in "one" operation. It simply avoids copying
the parent's address space to the child on the assumption that the child
will invoke exec (and therefore replace that address space). The parent
loans its address space to the child until an execve (or exit) is performed
by the child. During this time, the parent is suspended while the child
uses its address space.

There are also some strange anomalies with vfork (as compared to fork). One
of the more interesting ones is the ability of the child to modify the
parent's address space such that the parent sees those changes upon resumption.
Since the original intent of vfork was to provide an "efficient" fork, some
vendors have eliminated it by using "copy-on-write" in their fork
implementations. This technique reduces the address space copying to only
portions that are written and amortizes those costs across the child (and
parent) execution. However, the subtle semantic of vfork with regard to
the anomaly mentioned above is lost.

Tony Anzelmo

dwc@cbnewsh.ATT.COM (Malaclypse the Elder) (01/18/90)

In article <952@dms.UUCP>, albaugh@dms.UUCP (Mike Albaugh) writes:
> 
> > Fortunately, this is normally not much of a problem, since usually a program
> > does an exec() shortly after the fork() and this exec() can fix the problem.
> > 
> > This scheme is not very elegant, but it allows one to run a unix system
> > on hardware like ST, Mac and Amiga.
> 
> 	SO--- Why do we _still_ use fork() for all these near-trivial
> cases. I have been mucking around with computers for over 20 years but am
> not really familiar with *nix. I would like a reality check on this.
> I'm also not asking on comp.unix... because I'm afraid that would be
> like asking pointed questions about the trinity in a seminary :-) Since
> comp.arch folk have to deal with _implementing_ this stuff, I thought I'd
> get a more reasoned response ( 1/2 :-). Anyway, I can see a few reasons
> to use fork:
> 
> 1) It can be used for part of spawn _and_ for actual task-splitting
> 	(problem subdivision). Why have two calls when one will do?
> 2) On machines that are only (or mainly) swapping anyway, there is no
> 	penalty, so what the heck.
> 3) By just (effectively) copying the entire memory space, we don't
> 	need to keep track of just which parts actualy _need_ to be
> 	passed to the new task (laziness as a virtue :-).
> 4) "We have always done it this way".
> 
> (my personal feelings are that 1 & 2 were the original reasons while
> 3 & 4 are the reason we are stuck with it now)
> 
actually, we indirectly address this type of question in a paper we
are presenting at the winter usenix in washington next week.  the paper
is titled "insuring improved vm performance: some no-fault policies"
and discusses some things we did to improve vm performance in unix system v
release 4.  in it, we argue that the idea that all forks are followed
IMMEDIATELY by execs is misunderstood.  it all depends on what you mean
by IMMEDIATELY.

to fully understand the scale of IMMEDIATELY, you have to think about
how forks and execs are used.  on MOST systems, the majority of forks
are executed by programs that are classified as "shells".  these shells
are command interpreters whose function is to provide a command level
interface to users and often also provide a programming language for
executing "shell scripts".  all major shells (bourne shell, ksh, csh)
provide for such things as i/o redirection, argument expansion, etc.
in other words, the shell must allow for the child to have an environment
that may be substantially different than the parent.  and some shells
provide such things as history files, command recall, etc.  the bottom
line is that for these shells, more than just "a couple of instructions"
are executed between fork and exec.

thus, it really was a stroke of genius to have a fork call that duplicates
the parent and allow the child to craft its environment the way that it
wants to since you have to provide primitives to allow a process to change
its environment anyway.  the alternative is to have a single call with
a HUGE argument list to specify all the options.  and maintaining
compatibility with new release would be a major hassle under this scheme.

> Against this we have the problems mentioned above with handling *nix
> programs on machines without dynamic relocation. Also, even machines
> that _can_ do relocation don't get fork for free:
> 
> 1) Machines with base/bounds registers may need to copy the whole
> 	memory image to a new area. If they have two sets (e.g. KA10)
> 	they might get away with "only" copying the data segment.
> 2) Paging machines still need to at least mark all data pages
> 	"copy on write", which may involve traversing the segment
> 	and page tables in software. For a large image this can
> 	be time consuming. Also, I'd imagine it's a real judgement
> 	call whether to deal with a page at a time or just punt and
> 	do the copy as soon as "enough" of the image has changed to
> 	make write-trap handling a nuisance.
> 
> 	And all this hassle so that three or four instructions later the
> program can overlay itself (most of the time). I must be missing something
> major here. Can someone tell me what?
> 
on one hand its a matter of your point of view.  do you design the
operating system to fit the machine or do you design it to provide
a nice interface to programmers/users (and put minimal requirements
on the machine to support your operating system design)?  we can
argue religiously about it but it all comes down to costs and market
niches.  for some, increasing the system cost for an mmu more than
offsets the cost of having programmers write code to "do it with mirrors".
for others (depending on market niche), it may not pay.

to summarize, i believe that people argue about fork/exec without
really thinking about who uses it and how it is used.  if you accept
my assertion that shells are the almost exclusive users of the fork/exec
interface, you will realize that more than just 3 or 4 instructions are
executed between forks and execs.  and since shells tend to be reasonably
sized processes, the concern about doing forks of large processes is currently
unjustified.  but nothing stays constant and these large new graphical user
interfaces may change things.

as a final plug, read the paper in the conference proceedings.  i even
present a nice new way of avoiding copy on write faults after forks which
is pretty clever (a biased opinion).

danny chen
att!hocus!dwc

henry@utzoo.uucp (Henry Spencer) (01/18/90)

In article <1239@cirrusl.UUCP> pete@cirrusl (Pete Carpenter) writes:
>Actually, the Berkley folks have a system call vfork(), which does the 
>fork/exec in one operation....

Nope, sorry, wrong, vfork is just a variant of fork that runs faster at
the cost of truly horrible semantics.  It was basically a kludge to get
around some hardware/firmware problems that made it impossible to do a
proper copy-on-write optimized fork on the early VAXen.  It's already
greatly outlived its real usefulness.
-- 
1972: Saturn V #15 flight-ready|     Henry Spencer at U of Toronto Zoology
1990: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

peter@ficc.uu.net (Peter da Silva) (01/19/90)

> Actually, the Berkley folks have a system call vfork(), which does the 
> fork/exec in one operation. But of course, it is not compatible with Sys V.

I thought it sort of deferred the actual fork until the exec occurred, just
duplicating the file table and other "cheap" resources.

And of course once you have a VM system fork() is just fine and vfork() is
a meaningless optimisation. Just mark all the data pages copy-on-write. So
why didn't they put vfork() in 2BSD?
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

andrew@frip.WV.TEK.COM (Andrew Klossner) (01/19/90)

> From: dwc@cbnewsh.ATT.COM (Malaclypse the Elder)
> Organization: The Legion of Dynamic Discord
> ... we indirectly address this type of question in a paper we
> are presenting at the winter usenix ...
> danny chen
> att!hocus!dwc

Folks, when you refer to something interesting from yourself and your
organization, would you please undo the schmaltz in your "name" and
"organization" fields?  It would be nice in this case to have a better
attribution than just "ATT."  (I don't believe the bit about the
Legion ... :-) )

> since shells tend to be reasonably
> sized processes, the concern about doing forks of large processes is
> currently unjustified.

Think of shell escapes in emacs.  Or vi.  In my favorite BASIC
programming environment (no cracks please) it takes 30 seconds to do
"!ls".

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]

ccplumb@lion.waterloo.edu (Colin Plumb) (01/19/90)

The other way to avoid fork() is to recognise that the only thing that
survives the eventual exec() is kernel-maintained state, so a call
(which takes the place of fork()) to ask the kernel to create a new
process state, and an additional argument to all state-changing system
calls to specify which state (since now more than one can be associated
with a single thread of execution) to use will achieve the same
effect.  exec() is replaced by something which gives one of the states
(it doesn't really matter which) a different load image and starts it
executing.  This is kind of like vfork() except it doesn't even try to
fake two threads.

The essential idea stolen from Unix is that the system calls uised to
manipulate the child's environment are the same ones used to manipulate the
parent's.

P.S. It actually may be useful to have two states maintained for a long time.
It gives you two ownerships, so you can avoid the effective uid kludge to a
large degree, and more signal bits if you use them for IPC, etc.  The
only problem is, which process gets signalled for bus error, etc.?
-- 
	-Colin

pete@sun1102.UUCP (Peter R. Carpenter) (01/19/90)

In article <1990Jan17.204433.18006@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>Nope, sorry, wrong, 


The tone of your posting sucks.

Read the preceding articles, the original poster asked a simple question:
Why can't fork/exec be done together?

I simply replied that there was such a beast, and pointed out that it is 
not portable. 

---
Pete Carpenter, Cirrus Logic Inc, 1463 Centre Pointe Dr, Milpitas, CA 95035
{amdahl,ames,apple,bunker,pyramid}!oliveb!tymix!cirrusl!pete   408-945-8300
---------------------------------------------------------------------------

jdarcy@pinocchio.encore.com (Jeff d'Arcy) (01/19/90)

peter@ficc.uu.net (Peter da Silva):
. And of course once you have a VM system fork() is just fine and vfork() is
. a meaningless optimisation. Just mark all the data pages copy-on-write. So
. why didn't they put vfork() in 2BSD?

Not all VM systems have copy-on-write.

Jeff d'Arcy     OS/Network Software Engineer     jdarcy@encore.com
  Encore has provided the medium, but the message remains my own

sms@WLV.IMSD.CONTEL.COM (Steven M. Schultz) (01/19/90)

In article <IC51D77xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>> Actually, the Berkley folks have a system call vfork(), which does the 
>> fork/exec in one operation. But of course, it is not compatible with Sys V.
>
>And of course once you have a VM system fork() is just fine and vfork() is
>a meaningless optimisation... So why didn't they put vfork() in 2BSD?

	They did.  Put vfork() in 2BSD that is.  Works very nicely too. 'vmstat'
	even reports how many forks() and vforks() were done along with the
	number (and averages) and amount of clicks of memory copied for each 
	type of fork.

	Steven M. Schultz
	sms@wlv.imsd.contel.com