[comp.arch] vfork

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (07/11/90)

In article <920@dgis.dtic.dla.mil>
	jkrueger@dgis.dtic.dla.mil (Jon) writes:

>>With copy-on-write scheme, a page need swap space
>>when the page is written something.

>But not until.  Page and swap file space allocation is as postponeable
>as the memory copy.

NO.

>>If you think virtual memory is free and allow forking without reserving
>>actual swap space, when  swap space is required, it is often the case
>>that, there is no free swap space, anymore.

>That way lies MVS, friends;

OK, if you love MVS, use it.

As for UNIX, there are three alternatives to treat forking:

1) An utterly broken implementation where some important system
process (such as inetd, ypbind or sendmail) may killed if there
is not enough swap space.

2) A crippled implementation where a large process can not fork-exec.

3) A healthy, effecient and easy implementation with vfork.

You, as a lover of MVS, seems to like 2), but, I, as a UNIX user,
like 3).

						Masataka Ohta

peter@ficc.ferranti.com (Peter da Silva) (07/11/90)

In article <5830@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
> 1) An utterly broken implementation where some important system
> process (such as inetd, ypbind or sendmail) may killed if there
> is not enough swap space.

Alternatively, put the program in a wait state until swap space is available.
Deadlocks are possible, but unlikely. Indefinite deferment is more likely,
and that can be handled by queueing input.

> 3) A healthy, effecient and easy implementation with vfork.

4) A really efficient implementation with spawn.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

quiroz@cs.rochester.edu (Cesar Quiroz) (07/12/90)

In <5830@titcce.cc.titech.ac.jp>,
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) wrote:
| In article <920@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
| >>With copy-on-write scheme, a page need swap space
| >>when the page is written something.
| 
| >But not until.  Page and swap file space allocation is as postponeable
| >as the memory copy.
| 
| NO.

Care to enlighten us, or are the uppercase letters supposed to be
enough?  And where did you get the idea that anybody around comp.arch
likes MVS?  Not from Jon's article, I hope.

While I hold your attention, what is the point of this discussion?
Are you saying that the semantics of vfork are so useful that they
should be kept in spite of its utility being a thing of the past?
If so, can you say it without getting excited?


-- 
                                      Cesar Augusto Quiroz Gonzalez
                                      Department of Computer Science
                                      University of Rochester
                                      Rochester,  NY 14627

baxter@zola.ics.uci.edu (Ira Baxter) (07/12/90)

>In <5830@titcce.cc.titech.ac.jp>,
>mohta@necom830.cc.titech.ac.jp (Masataka Ohta) wrote:
>| In article <920@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>| >>With copy-on-write scheme, a page need swap space
>| >>when the page is written something.
>|
>| >But not until.  Page and swap file space allocation is as postponeable
>| >as the memory copy.
>|
>| NO.

I don't understand the problem here.  It appears that objections to
the MVS scheme stem from the notion that one must both allocate AND
ASSIGN the swap space when the potential space requirements appear
(i.e., COW + new virtual space --> allocate and assign swap space for
entire new virtual space because it might be entirely COW'd over
time), at potentially high costs to actually effect the assignment
(find the available swap space, build tables, etc.).  The
UNIX-wait-till-I-need-it scheme suffers from the potential of deadlock
over swap requirements when they finally appear.  But one can walk a
middle road: allocate the space, but only assign it incrementally when
COW actually happens.  Space allocation given a fixed supply is a
simple matter of adjusting a semaphore count.  If one wanted to mix
the policies, all you need is a hint to vfork (or whatever) that says
"allocate now" or "delay allocation".  Then programmers can pick their
poison depending on application requirements; system processes can be
run atomically with respect to their space requirements.

--
Ira Baxter

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (07/12/90)

In article <269B8E4F.27941@ics.uci.edu>
	baxter@zola.ics.uci.edu (Ira Baxter) writes:

>The
>UNIX-wait-till-I-need-it scheme suffers from the potential of deadlock
>over swap requirements when they finally appear.

OK, you understand a problem here.

>But one can walk a
>middle road: allocate the space, but only assign it incrementally when
>COW actually happens.  Space allocation given a fixed supply is a
>simple matter of adjusting a semaphore count.

As I already mentioned, the problem with such an approach is that
a large process can not fork.

If there is only 100MB swap space, and a 80MB process want to fork
just to exec a small program such as shell, 160MB of swap space must
be temporarily allocated. It is impossible.

>If one wanted to mix
>the policies, all you need is a hint to vfork (or whatever) that says
                                         ^^^^^
>"allocate now" or "delay allocation".

A hint to vfork???

You must have misunderstood something. If we use vfork, there is
no problem.

				Masataka the-protector-of-vfork Ohta

akira@atson.asahi-np.co.jp (Akira Takiguchi) (07/12/90)

In article <269B8E4F.27941@ics.uci.edu> baxter@zola.ics.uci.edu (Ira Baxter) writes:
>But one can walk a
>middle road: allocate the space, but only assign it incrementally when
>COW actually happens.  Space allocation given a fixed supply is a
>simple matter of adjusting a semaphore count.  If one wanted to mix
>the policies, all you need is a hint to vfork (or whatever) that says
                                         ^^^^^
                                         never vfork, but fork
>"allocate now" or "delay allocation".

       This scheme works fine for `big process cannot fork'-problem, but
  you cannot get rid of for-years-existing vfork() only for that reason,
  since introducing a new argument to fork() cannot be considered a good
  idea.

       I'm not saying your scheme is useless;  there is another motivation
  to introduce delayed swapspace-allocation option.  If you want a huge
  array to be used very sparsely but you cannot afford sufficient swap space,
  this idea can be useful [I can't think of such an application now, but
  there could be one].
       This time you don't need a new argument for fork() - but a new call,
  something like swapadvise() will do.  If you know you are going to
  allocate huge space and you may risk the process to be killed by the OS,
  you can make this call upon program startup.
-- 
                                       Akira Takiguchi @ ATSON, Inc.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (07/12/90)

In article <5DL4SPD@xds13.ferranti.com>
	peter@ficc.ferranti.com (Peter da Silva) writes:

>> 1) An utterly broken implementation where some important system
>> process (such as inetd, ypbind or sendmail) may killed if there
>> is not enough swap space.

>Alternatively, put the program in a wait state until swap space is available.
>Deadlocks are possible, but unlikely. Indefinite deferment is more likely,

No.

Once swap space shortage occurs, it will tend to occur continually
until some large process exits. So, if all such processes put in
wait states (which is very likely to occur, because active process
often requires new pages) the situation is deadlock.

>and that can be handled by queueing input.

What do you mean by "can be handled by queueing input."?

>4) A really efficient implementation with spawn.

As far as I know, it is impossible to implement spawn, because there
is no rational definition of spawn. To eliminate repetition of unnecessary
fruitless discussion, you should show a reasonable and complete (at least
as complete as UNIX man page) definition of spawn if you want to claim
it possible. Also, it should be more beautiful than vfork, of course.

						Masataka Ohta

jkenton@pinocchio.encore.com (Jeff Kenton) (07/13/90)

If we are going to have this ongoing discussion concerning the virtues
of vfork() vs. Copy-On-Write (COW), we need an accepted measure by which
to judge their relative merits.  COW FLOPS, anyone?


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com	 
		   ---  always at (617) 894-4508  ---
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

seanf@sco.COM (Sean Fagan) (07/13/90)

[Note the followup...]
In article <5845@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>If there is only 100MB swap space, and a 80MB process want to fork
>just to exec a small program such as shell, 160MB of swap space must
>be temporarily allocated. It is impossible.

Since when?  SysV doesn't do that.  The swap space is allocated *on demand*,
not on fork.  Since the data pages are marked COW, only if they are written
is swap space needed.  If you can get away with not writing to anything,
including your stack, then no extra swap space is used *at all*.

So I guess it's not so impossible after all, is it?

-- 
-----------------+
Sean Eric Fagan  | "Just think, IBM and DEC in the same room, 
seanf@sco.COM    |      and we did it."
uunet!sco!seanf  |         -- Ken Thompson, quoted by Dennis Ritchie

chip@tct.uucp (Chip Salzenberg) (07/13/90)

According to mohta@necom830.cc.titech.ac.jp (Masataka Ohta):
>In article <2699E08D.117A@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>>This behavior is entirely consistent with other Unix resource
>>management behavior.
>
>I don't know what variation of UNIX you know. So please tell me
>what happens when there is not enough swap space with your
>favorite UNIX.

My favorite Unix is SCO Xenix/386 2.3.  I haven't experimented with
its behavior under memory and swap space exhaustion.  I'm sure,
however, that a SCO person will know how it works and can followup
with that information.  However, the behavior of my "favorite" Unix is
irrelevant to this discussion.

The Unix philosophy of resource usage is: "Pay as you go."  Files are
not pre-allocated, but dynamically extended.  The data segment is not
pre-allocated, but grows and shrinks on demand.  Given this
philosophy, the only consistent treatment of swap space is to allocate
it when it is needed, and not before.

Of course, dynamic allocation always introduces new failure modes.
All C programmers should deal with malloc() failure and disk full
conditions, but due to laziness it doesn't always happen.  Likewise, a
Unix kernel should deal gracefully with swap space exhaustion.  If it
doesn't, that indicates a bug in the kernel; it most certainly does
NOT indicate a problem with the "pay as you go" philosophy, which has
proven to be quite flexible and useful.

>With vfork, such a situation never occur (except for stack segment),
>becasue fork is denied.

Note the "except" clause.  Mr. Masataka himself here points out that
vfork() is an incomplete solution, since a kernel with vfork() must be
prepared to deal gracefully with swap space exhaustion due to stack
modification.  So vfork() does not eliminate the swap space contention
inherent in new process creation.  A kernel could replace vfork() with
fork() and nothing of importance would be lost.

In summary:  I consider vfork() a botch.  It is a half-hearted attempt
to solve an unsolvable problem.  Unix would be a better place if it
were to disappear.
-- 
Chip Salzenberg at ComDev/TCT     <chip@tct.uucp>, <uunet!ateng!tct!chip>

peter@ficc.ferranti.com (Peter da Silva) (07/13/90)

In article <5855@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
> As far as I know, it is impossible to implement spawn, because there
> is no rational definition of spawn.

Well, you've about convinced me that vfork is an adequate replacement for
spawn from an efficiency standpoint, and I agree it's a lot cleaner. I'm
not convinced that it is *as* efficient, particularly in a non-mapped
execution environment, but it should be adequate.

However, there are any number of "rational definitions of spawn", since
outside of UNIX it is about the only process creation primitive that
exists. It's also how just about every thread implementation that I
know of implements thread creation... including Mach. You can find as
many varieties of spawn as you wish by perusing other operating system's
programming manuals. It would be redundant for me to duplicate any
particular definition here.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

barry@tredysvr.Tredydev.Unisys.COM (Barry Traylor) (07/14/90)

In article <Q=M4GG5@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>
>However, there are any number of "rational definitions of spawn", since
>outside of UNIX it is about the only process creation primitive that
>exists. It's also how just about every thread implementation that I
>know of implements thread creation... including Mach. You can find as
>many varieties of spawn as you wish by perusing other operating system's
>programming manuals. It would be redundant for me to duplicate any
>particular definition here.

You point up one of the difficulties of defining a "standard" spawn for Posix
(tm IEEE), in that there are so many different forms.  Thread creation is a
different case, and I agree that a "spawn-like-function" is a better way to
go.  The primary difficulty of spawn w/r/t Unix (tm, AT&T) or Posix is that 
Unix has never had a "task object" that could be manipulated.  Instead, 
fork() was used and the manipulation was done directly (and implicitly) to 
the new process.

At this point, coming to any sort of agreement on how to define a task
object, and how to manipulate it in such a way as to emulate the
possibilities of fork()/exec[ve](), would be pretty close to impossible.  On
the other hand, a good implementation of vfork() can avoid most of the cost
of a fork() while losing very little of its flexibility.

Barry Traylor
Unisys Large A Series Engineering (read: Big Mainframes)
barry@tredydev.unisys.com

colin@array.UUCP (Colin Plumb) (07/14/90)

I'd just like to point out that Unix can already run out of swap space
at invconvenient times.  I haven't tested it, for obvious reasons, but:

int recurse(int i)
{
	return 1 + recurse(i+1);	/* Foil simple tail-recursion */
}
int main(void)
{
	while (malloc(65536))
		;
	recurse(0);
}

Should blow up on anything that dynamically grows the stack.  So by allowing
a process to die if it hits a bad COW case we aren't breaking the semantics
too badly, although obviously the situation should be avoided if at all
possible.

Is my reasoning wrong?
-- 
	-Colin

ian@sibyl.eleceng.ua.OZ (Ian Dall) (07/14/90)

In article <5855@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>>> 1) An utterly broken implementation where some important system
>>> process (such as inetd, ypbind or sendmail) may killed if there
>>> is not enough swap space.
>
>>Alternatively, put the program in a wait state until swap space is available.
>>Deadlocks are possible, but unlikely. Indefinite deferment is more likely,

>Once swap space shortage occurs, it will tend to occur continually
>until some large process exits. So, if all such processes put in
>wait states (which is very likely to occur, because active process
>often requires new pages) the situation is deadlock.

Masataka seems to live in a binary world were there is only the BSD
and the SysV implimentations. These are no the only possibilities! I
like the SysV method of not allocating swap space until it is
necessary. It allows the total virtual memory used to (almost) equal
the sum of the swap space and the physical memory which seems to me
desirable. It is true that it is undesirable for the system to into
deadlock due to lack of swap space.  However, there are ways around
it. A simple solution is to have a high water mark for free swap space
and when that high water mark is exceeded cause any process, except
those with certain effective uids, to block if it does anything which
increases its memory requirements (including COW). To the user the
system will appear to have deadlocked, but not to the super user who
can at least run ps and kill. Of course if you are in that situation
often you need more swap space or more physical memory. (With the BSD
scheme you need both). Of course some systems (such as VMS) have per
user quotas on swap space. Whether that is good depends on your
environment I suppose. (I personally disliked VMS quota for everything,
but that might just have been because it was never big enough!)

Others might think up more elegant solutions. The point is you
shouldn't throw out the baby with the bath water. Copy on write
is an elegant idea. Don't stomp on it just to keep vfork!

-- 
Ian Dall     life (n). A sexually transmitted disease which afflicts
                       some people more severely than others.

limonce@pilot.njin.net (Tom Limoncelli) (07/15/90)

In article <5830@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:

> As for UNIX, there are three alternatives to treat forking:

> 1) An utterly broken implementation where some important system
        ^^^^^^^ or maybe an "udderly"?
> process (such as inetd, ypbind or sendmail) may killed if there
> is not enough swap space.

I won't say what side I'm on, but at the Jan '90 Usenix a paper was
presented on SVR4's implementation of COW.  They reached a good
solution if you ask me.

They implemented COW, but found that each COW is slower than the usual
page-in by some huge factor.  So, they did some more research (*) and
found that when you fork a new shell you almost always COW the same
couple of pages immediately.  So, on every fork they page in those
pages automatically.  That avoids (something like) 10-20 COWs.  They
call the algorithm "bovophobic" because it tries to avoid COWs.

The presentation was GREAT.  The paper was in the the procedings if
you want to read it.

-Tom
(*) -- Sorry for using that word.  It implies "science" and we all
know that using scientific analysis to solve a problem is something
that NO programmer would get caught doing.  That's for those "computer
science" types that never get anything done.  Right?
-- 
tlimonce@drew.edu      Tom Limoncelli
tlimonce@drew.uucp     +1 201 408 5389
tlimonce@drew.Bitnet  "You'd better move ovah
limonce@pilot.njin.net     ...here comes a supernova"  -The B-52's.

jgh@root.co.uk (Jeremy G Harris) (07/16/90)

With all this discussion of possible modifications to Unix semantics, I
just have to put my oar in.   I hope I won't annoy anyone too much.

Background:   I like the power of fork/manipulate/exec.   I believe that
	      vfork was a stupid idea which should have gone away in
	      BSD4.4 .

My understanding of Masata's position:
              Preallocation of swap space (actually, virtual space, made
	      up of swap space plus real memory) upon fork is necessary to
	      avoid deadlock (or indefinite sleep for resources).   The
	      latter are bad and must be avoided.
	      Large processes exhibit the problem most obviously; the example
	      of an 80MB process wishing to fork/exec a subshell on a system
	      with a 100MB swap area (and, presumably, less than 60MB of
	      real memory) is given.

Proposal:     A segment type which is _lost_ by the child of a fork.

Discussion: 1) This is only a panacea, not a complete fix for the problem.

	    2) It muddles Unix fork semantics (the process is no longer
	      completely duplicated) but, IMHO, in a less objectional way
	      than vfork.

	    3) I'm assuming that the typical 80MB process didn't start out
	      that size but grew up to it by use of sbrk.

	    4) A whole raft of new system calls are needed, to obtain new
	      segments, grow them, modify the attributes, delete them,
	      share them.

	    5) Source modifications to existing programs are required.

	    6) mmap (or however you wish to spell it) does most of what is
	      needed anyway.

Comments, anyone?
-- 
Jeremy Harris			jgh@root.co.uk			+44 71 315 6600

peter@ficc.ferranti.com (Peter da Silva) (07/16/90)

In article <855@tredysvr.Tredydev.Unisys.COM> barry@tredysvr.Tredydev.Unisys.COM (Barry Traylor) writes:
> Thread creation is a different case, and I agree that a
> "spawn-like-function" is a better way to go.

For real-time operating systems where all processes are basically threads,
then spawn() is equally important. I suspect this might be an eventual point
of conflict between 1003.1 and 1003.4.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

peter@ficc.ferranti.com (Peter da Silva) (07/16/90)

In article <269DBEFB.583C@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
> >With vfork, such a situation never occur (except for stack segment),
> >becasue fork is denied.

> Note the "except" clause.  Mr. Masataka himself here points out that
> vfork() is an incomplete solution, since a kernel with vfork() must be
> prepared to deal gracefully with swap space exhaustion due to stack
> modification.

Is it absolutely necessary to clone the stack segment in a vfork() call?
If so, then it's not a general replacement for spawn(), because a machine
without memory management hardware can't relocate a stack.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

lkaplan@bbn.com (Larry Kaplan) (07/17/90)

In article <2340@root44.co.uk> jgh@root.co.uk (Jeremy G Harris) writes:
>Proposal:     A segment type which is _lost_ by the child of a fork.
>

This segment type is already implemented in Mach derived systems.  It
consists of making a vm_inherit call on the region with an inheritance of
VM_INHERIT_NONE.  Child processes will not receive any knowledge of memory
segments marked in this manner.

I'm not sure about the utility of such a feature in relation to fork vs vfork.
While you could conceivably mark all the regions you KNOW you won't need
after the fork but before the exec, it appears too involved a job for the 
typical programmer.  I could be wrong though, someone might think it worth
doing.

#include <std_disclaimer>
_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238

renglish@hpcupt1.HP.COM (Robert English) (07/17/90)

> / barry@tredysvr.Tredydev.Unisys.COM (Barry Traylor) /  6:32 pm  Jul 13 1990 /

> At this point, coming to any sort of agreement on how to define a task
> object, and how to manipulate it in such a way as to emulate the
> possibilities of fork()/exec[ve](), would be pretty close to impossible.

On the other hand, the number of operations that fork and exec actually
need in order to support important paths through the shells and some
selected library calls is not nearly so difficult.  Adding a run/spawn
system call to improve efficiency does not require eliminating all
existing fork-exec code, and it does not require that all possible
fork-exec semantics be supported by the new call.

--bob--
renglish@hplabs.hp.com

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (07/17/90)

In article <180@array.UUCP> colin@array.UUCP (Colin Plumb) writes:

>I'd just like to point out that Unix can already run out of swap space
>at invconvenient times.

Yes. I have already pointed out that in

	Message-ID: <5844@titcce.cc.titech.ac.jp>
	Date: 12 Jul 90 04:15:11 GMT

	:With vfork, such a situation never occur (except for stack segment),
						   ^^^^^^^^^^^^^^^^^^^^^^^^

and in the previous discussion on the same vfork topic about half a
year ago.

>Should blow up on anything that dynamically grows the stack.  So by allowing
>a process to die if it hits a bad COW case we aren't breaking the semantics
>too badly,

As I said about half a year ago, growing stack problem is not so
serious, because most programs (espacially, important system processes)
can do with the initially allocated stack segment and it is much less
likely to occur.

>although obviously the situation should be avoided if at all
>possible.

So, it is important to reduce the possibility.

						Masataka Ohta

peter@ficc.ferranti.com (Peter da Silva) (07/18/90)

In article <5876@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
> As I said about half a year ago, growing stack problem is not so
> serious, because most programs (espacially, important system processes)
> can do with the initially allocated stack segment and it is much less
> likely to occur.

And others can't. Consider, if you will, GCC.

Again, is it possible to make use of a vfork() that doesn't clone the
stack? I Can't think of any reason why it'd need to do that. It'd make
things convenient if you were to return in the child process (that is,
if vfork can't be inlined), but it's not essential. You could always save
and restore the top stack frame.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>