[net.unix] Help Measuring Fork/Exec Overhead

juhlin@druak.UUCP (JuhlinB) (08/21/85)

I am trying to determine the cost/overhead involved in a fork/exec sequence
under Unix 5.2 on a 3B5 (I'll probably do other AT&T machines after this).

I've managed to track through the code and predict the overhead of an exec
given the process's size.  My measurements (particularly the I/O's) are 
very consistent with what I predict.

Unfortunately, tracking/predicting fork overhead seems much more difficult.
CPU usage seems to be marginally affected by process size and much more
affected by the number of processes in the proc table.

Are there other factors I should be aware of in determining fork overhead?

Any good ideas on how to measure it? 

[I'm currently getting the CPU time for a parent that fork's a child and
waits for its completion.  The parent does this 100 times.
I've done this for parents of varying sizes, and with the system in
run-states 1, 2, 3 (5, 18, and 71 processes in the table, respectively).]

Do you know of any memos, etc. that have addressed this issue?

Any comments, suggestions, or guidance you can give will be greatly
appreciated.  Feel free to contact me via the net or mail.

Thanks,

Bruce Juhlin
druak!juhlin

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/23/85)

UNIX System V Release 2 fork() overhead is fairly small if you
don't modify much data in the child process, because it uses
"copy on write" to avoid unnecessary copying of data from parent
into the child process.  This is a big win in most forks, which
are almost immediately followed by exec().

lcc.niket@locus.ucla.edu (Niket K. Patwardhan) (08/23/85)

>From: juhlin@druak.UUCP (JuhlinB)
>Subject: Help Measuring Fork/Exec Overhead
>Date: 20 Aug 85 23:33:29 GMT
>To:       info-unix@brl-tgr.arpa

I did some of this at INTEL. We decided to include 1 exit() , 1 wait() and 2
task switches as part of the fork overhead. Reason: if you do a fork() you
cannot escape the fact that these events are eventually going to happen - at
some time! On our systems there was almost always enough memory that it was not
necessary to swap out the child to do the fork, so the results were consistent.

The task switches were important:- on the implementation we had, it could take
upto 5mS to do the task switch! (And you cant escape that order of magnitude
with any of the usual C compilers for the 286 for UNIX style switching!)

guy@sun.uucp (Guy Harris) (08/24/85)

> UNIX System V Release 2 fork() overhead is fairly small if you
> don't modify much data in the child process, because it uses
> "copy on write" to avoid unnecessary copying of data from parent
> into the child process.

UNIX System V Release 2 VAX Version 2 (3B20 Version X, iAPX286 Version Y,
M68K version Z, etc., for some X, Y, Z, etc.) fork() overhead, anyway.
S5R2V1 still uses traditional UNIX code for fork() (as might *any* UNIX on a
non-paged machine, since there's not much point in copy-on-write if the
smallest unit that can be mapped is an entire segment).

	Guy Harris

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/25/85)

> ... there's not much point in copy-on-write if the
> smallest unit that can be mapped is an entire segment).

I don't see that.  Most fork()s are almost immediately followed
by exec(), so that avoiding the useless copying of data segments
still seems like a big win.

guy@sun.uucp (Guy Harris) (08/28/85)

> > ... there's not much point in copy-on-write if the
> > smallest unit that can be mapped is an entire segment).
> 
> I don't see that.  Most fork()s are almost immediately followed
> by exec(), so that avoiding the useless copying of data segments
> still seems like a big win.

You'll still copy the stack segment, as a bare minimum.  If "exec" touches
the data space in any way, you copy it too.  (Yes, there are systems where
"exec" touches the data space - the PDP-11's system call sequence didn't
involve parameters pushed onto the stack, so they had to be put into a
static area instead.  Also, if you "exec" a lot of shell files and don't
have a system whose kernel can detect that and run a shell (i.e., the
4.xBSD/V8(?) #! stuff), the first "exec" will return an error code, stored
in "errno" - blammo, time to copy the whole data segment.)

	Guy Harris

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/29/85)

Aah, Guy was talking about systems where the entire data space is one
segment.  I thought he meant PDP-11 or Gould-like systems.

By the way, folks who think that demand paging is always better than
partial swapping should implement a demand paging UNIX on a PDP-11/70
and measure its typical performance.  (Yes, this is possible.)