chris@mimsy.umd.edu (Chris Torek) (08/06/90)
Since there seems to be a great deal of confusion and ignorance
surrounding the `vfork issue', it might help if I describe how these
things work.
The basic idea behind `fork' is quite simple: make an exact duplicate
of the running process---copy all the user information and all the
kernel information, except of course the process ID and special
internal-only implementation data---and then return from the fork in
the old process giving the new process's ID, and return from the fork
in the new process giving a 0 (so that it can tell it is the clone
rather than the original).
This operation tends to be expensive since it involves copying data,
potentially a great deal of data. There are a number of solutions:
a. Avoid fork entirely (the spawn issue): `If the copy is merely
being made so that the clone can throw away its user data and run
another program, why bother copying?' This is a sensible idea, but
every time someone tries to do it, the spawn call winds up being
terribly complex. Perhaps this is because the wrong people do it;
but perhaps it is because the spawn approach is inherently flawed.
I am not going to argue one way or the other here.
b. Try not to copy. Anything that can never be changed (e.g., the
program's code or `text' section) obviously need not be physically
copied. (Unix fork() calls have always shared text.) In addition,
if the hardware supports it (e.g., via copy-on-write or copy-on-
reference), the data and stack segments also need not be copied.
This is not always the best policy; sometimes the expense of
copying on write or reference outweighs the savings from never
copying some pages. There are some tricks that can be applied
here as well (e.g., always copy the current stack page, leave the
rest copy-on-write).
c. Cheat. This is the `vfork' approach.
Even if fork can be done via copy-on-write, on some machines it remains
expensive for large processes, because the page tables must be copied
anyway and these can also be large. (Some of the pages of PTEs can be
shared, but this rapidly gets tricky.) And of course, there are
machines on which copy-on-write cannot be done, either because the
hardware never supported it (e.g., where writes cannot be restarted) or
because of a bug in the microcode (e.g., the infamous VAX-11/750 bug);
here copy-on-reference can be used, but it tends to be more expensive
(most processes read many more locations than they write). Thus, the
`virtual fork' trick.
Suppose that, instead of making a copy of the data and stack segments
*and* the kernel per-process information, the kernel were to make a
copy of just the kernel per-process information, then suspend the
parent process entirely, `handing over' the entire virtual memory image
to the clone (by giving the original's PTE pages as well as its data
and stack segments to the clone)? The clone would then be running in
the very same virtual (and physical) memory that the original had
before it `virtually forked'. Nothing has been copied except the
kernel data. The main issue here is that, if the original process
were allowed to run, both the parent and the child would be competing
for the same data and stack memory---much like threads, but disastrous
since the stacks would rapidly collide---so the parent must be marked
as having no memory at all, and cannot run while the child uses the
space. The child, too, must be marked specially, so that when it is
ready to give up its virtual space (via exec() or exit()), the kernel
knows instead to `hand back' the entire virtual memory to the parent.
This is how vfork works. Only the kernel data (the u. area) are copied.
The old virtual memory is not so much `shared' as `handed over' to the
child: it uses the exact same resources the parent process had. When
the child process is done with them, they are `handed back' (using the
same kernel routine, in fact; it is called `vpassvm') and the parent is
allowed to continue. There are several problems with this scheme:
- Since the child process (the clone) has already returned from the
vfork() system call, the stack frame needed for this return has been
wiped out. This is handled by avoiding the return in the first
place: The vfork call is written in assembly, and pops the stack
frame before doing the system call at all. On return from the
system call the assembly code jumps to the place a return would have
gone, having pre-set the registers as appropriate.
- Since the old VM is handed over to the child, any changes the child
makes show up in the parent as well. Occasionally some software
winds up depending on this (e.g., csh).
- Since the parent is suspended, the order of execution (child runs to
exec() or exit()) is defined. Occasionally some software winds up
depending on this as well (again, csh is the prime example).
- Although the kernel data are copied (so that changes made in the
child to, e.g., the signal dispositions are not reflected in the
parent), some of these describe the virtual memory resources. When
the child hands the virtual memory back to the parent, the parent
process must account for any changes the child has made to the
virtual memory itself---for instance, if the child has made the heap
or stack larger, the parent must update its size information. This
is a good place for bugs to hide.
Buggy user programs aside, it is always possible for a kernel to
implement vfork as a true fork, if this is somehow cheaper. Thus there
can never be an efficiency argument in favor of fork; vfork never costs
more. Moreover, even given a copy-on-write fork, vfork may still be
cheaper, because vfork does not have to copy PTEs. On a machine like
the Butterfly, a vfork helps because the parent process is suspended:
the child can run in the parent's space until it exits or, more likely,
gets another processor to do an exec. In essence, vfork is `fork with
a promise to finish soon' or even `spawn, but let me do a few things
before I go'.
When you come right down to it, the real disadvantage to vfork is that,
conceptually, it is downright ugly. Viewing it as `spawn, but let me
run a few instructions first' helps a lot here. The meaning of
pid = vfork();
if (pid == 0) {
(void) dup2(fd, 0);
(void) close(fd);
(void) execve(path, av, env);
_exit(1);
}
...
is not `make a copy; if this is the child, fiddle with descriptors
and run something else' but rather `run something else, but first
fiddle with descriptors using the following code'.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
(New campus phone system, active sometime soon: +1 301 405 2750)
tbray@watsol.waterloo.edu (Tim Bray) (08/06/90)
chris@mimsy.umd.edu (Chris Torek) writes an informative and very helpful backgrounder on the subject of fork() and vfork(). But there is a piece of information missing. How much does fork() cost? In an environment with a lot of fork()ing going on, (e.g. a heavily-timeshared system with a lot of shell scripts running?), how much of the system resources are burned up forking? Quantitatively, how much does vfork() save? Is the picture significantly different on a workstation which spends much of its time running a window system and a few technical applications? Somebody must have measured this at some time. Or did somebody in the bsd gang just say "That looks inefficient. Let's fix it" and create vfork()? I.e., can anyone prove whether the ugliness of vfork() really buys anything? Cheers, Tim Bray, Open Text Systems
jkenton@pinocchio.encore.com (Jeff Kenton) (08/06/90)
From article <25904@mimsy.umd.edu>, by chris@mimsy.umd.edu (Chris Torek): > Since there seems to be a great deal of confusion and ignorance > surrounding the `vfork issue', it might help if I describe how these > things work. > > [Good survey of the issues] > > On a machine like > the Butterfly, a vfork helps because the parent process is suspended: > the child can run in the parent's space until it exits or, more likely, > gets another processor to do an exec. In essence, vfork is `fork with > a promise to finish soon' or even `spawn, but let me do a few things > before I go'. The one issue you haven't dealt with is the case where a program wants to spawn many copies of itself but never execs after the fork. In this case you need a fork() which is biased toward moving to another processor and has no way to guarantee the "child runs first" semantics (which the man page warns against relying upon). On the Butterfly, with >100 CPUs and non-uniform memory, this is a reasonable scenario. Traditional vfork() buys very little, and it's ugly. Beauty is only skin deep, but ugly goes right to the bone. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - jeff kenton --- temporarily at jkenton@pinocchio.encore.com --- always at (617) 894-4508 --- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chris@mimsy.umd.edu (Chris Torek) (08/06/90)
In article <1990Aug6.010044.8638@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) asks the very reasonable question: >How much does fork() cost? ... Quantitatively, how much does vfork() >save? Is the picture significantly different on a workstation which spends >much of its time running a window system and a few technical applications? > >Somebody must have measured this at some time. Or did somebody in the bsd >gang just say "That looks inefficient. Let's fix it" and create vfork()? I have only two data points on this myself (the vfork thing was done before my time---I only got into this Unix business in late 1981...). I know that the Franz Lisp group at Berkeley drove some of the changes to the VM system between various 4BSD releases (including, I suspect, vadvise); and I know that the VM group at Sun who did the SunOS 4.x virtual memory took vfork out (made it a simple copy-on-write fork) and then put it back. Whether the latter was only temporary, I cannot say, but surely someone from Sun should remember: Was this for performance or just to accomodate broken programs like csh? Anyway, old Berkeley papers should describe the vfork savings (which for 6+ MB Lisp jobs on a VAx-11/780 without a copy-on-write fork will no doubt be quite impressive); but keep in mind that these will compare `copy everything' against `copy nothing' without mentioning a middle ground (`copy PTEs and written pages'). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (08/06/90)
In article <12382@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes: >Beauty is only skin deep, but ugly goes right to the bone. True. The problem here is that the semantics of fork-exec is really ugly, because it is virtual memory ineffecient. First, by simple implementation, it requires unnecessary copying of virtual memory. By COW workaround, copying is not necessary. But the fundamental ineffeciency on virtual memory is still there. So, a large process can't fork. By not-pre-allocating-swap-space workaround, a large process can now fork. But the fundamental ineffeciency on virtual memory is only hidden from the sight and still exists. So, now, there is possibility of deadlock. So, the ugliness of virtual memory ineffeciency is really goes right to the bone. The solution is to get rid of the ugly virtual memory ineffeciency. Thus vfork - spawn new process in a virtual memory efficient way To appreciate beauty of vfork, you should first think of general purpose segment sharing mechanism. Then, by using the mechanism, you can construct a virtual memory effecient way to spawn a process. BTW, for the complete beauty, vfork should have been designed so that changes to registers in a child context are also reflected to the parent process. That is important for a highly optimized compiler. Masataka Ohta
peter@ficc.ferranti.com (Peter da Silva) (08/06/90)
I agree that vfork is a better mechanism. It also has one further advantage: - It can be implemented on machines with no MMU. This is what makes this such a frustrating subject. I'm not attacking fork() and vfork(). I think they're the best pair of ideas since sliced bread. But I have to deal with real-time operating systems where fork() can't be implemented at all, and vfork() can't be implemented cheaply outside the kernel. I also believe that the changes needed to make spawn() usable without excessive complexity are valuable in their own right. Come now, wouldn't you rather save your current directory so you can chdir() back instead of having to burn cycles calling getcwd()? How about having the nicer semantics of sigset() universally available via signal()? -- Peter da Silva. `-_-' +1 713 274 5180. 'U` <peter@ficc.ferranti.com>
peter@ficc.ferranti.com (Peter da Silva) (08/06/90)
In article <12382@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes: > The one issue you haven't dealt with is the case where a program wants > to spawn many copies of itself but never execs after the fork. In this case vfork() isn't even usable. Doesn't matter if you're on a Butterfly or not. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` <peter@ficc.ferranti.com>
jack@cwi.nl (Jack Jansen) (08/08/90)
A reasonably clean solution to the vfork() hack would be to split fork in two: one call that creates a new kernel environment (u area, etc), and another one that starts a new process in that environment. You could then program like pre_fork(); /* ** From now on, all system calls we do refer to the kernel ** state of the *new* process. */ close files; open other files; change signal; etc; pid = spawn("a.out", ....); /* ** Here we're back in our own kernel environment again. */ The only problem I see is that there are some calls for which it is unclear what the semantics should be between the pre_fork() and the spawn() call: sbrk() comes to mind. Also, if you crash between the pre_fork() and the spawn it is unclear which of the two u areas you would want to store in the coredump. -- -- Een volk dat voor tirannen zwicht | Oral: Jack Jansen zal meer dan lijf en goed verliezen | Internet: jack@cwi.nl dan dooft het licht | Uucp: hp4nl!cwi.nl!jack
mcdonald@aries.scs.uiuc.edu (Doug McDonald) (08/08/90)
In article <1940@charon.cwi.nl> jack@cwi.nl (Jack Jansen) writes: >A reasonably clean solution to the vfork() PLEASE folks, this is not a computer architecture issue. It is a question that has to do with the software of one very particular operating system. Let's move it appropriate os-specific group? OK. Most OS's don;t seem to have any such weird difficulties. Doug MCDonald
peter@ficc.ferranti.com (Peter da Silva) (08/08/90)
In article <1940@charon.cwi.nl> jack@cwi.nl (Jack Jansen) writes: > A reasonably clean solution to the vfork() hack would be to split > fork in two: one call that creates a new kernel environment (u area, > etc), and another one that starts a new process in that environment. You just re-invented vfork! The only difference is that the pid is returned in a different place. > pre_fork(); This is equivalent to "if((pid=vfork()) == 0) {". > pid = spawn("a.out", ....); And this can be replaced by "exec(...); exit(ERROR_CODE); }" -- Peter da Silva. `-_-' +1 713 274 5180. 'U` <peter@ficc.ferranti.com>
peter@ficc.ferranti.com (Peter da Silva) (08/09/90)
In article <1990Aug8.133546.27806@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes: > PLEASE folks, this is not a computer architecture issue. Depends on how you define "architecture". Software versus hardware versus industrial design of cabinets... -- Peter da Silva. `-_-' +1 713 274 5180. 'U` <peter@ficc.ferranti.com>