[comp.unix.wizards] fork affecting ndbm requests?

madd@world.std.com (jim frost) (10/21/89)

I have an application which opens an ndbm database read-only and forks
several children to do a lot of queries on that database.  I get
periodic dbm_fetch() failures if I do this when I am certain the
records exist.  This does not happen if the application does all of
the fetches unparallelized, but there are performance losses.

Is there some reason I should not fork after opening the database?

jim frost
software tool & die
madd@std.com

perry@ccssrv.UUCP (Perry Hutchison) (10/25/89)

In article <1989Oct20.174654.4143@world.std.com> madd@world.std.com (jim frost)
writes:

> I have an application which opens an ndbm database read-only and forks
> several children to do a lot of queries on that database.  I get
> periodic dbm_fetch() failures if I do this when I am certain the
> records exist.  This does not happen if the application does all of
> the fetches unparallelized, but there are performance losses.

> Is there some reason I should not fork after opening the database?

Forking _per se_ is not the problem.

From the man page for fork(2):

+ The child process has its own copy of the parent's descriptors.  These
+ descriptors reference the same underlying objects ... an lseek(2) on a
+ descriptor in the child process can affect a subsequent read or write
+ by the parent.

or another child, as in this case.

You open the database and then, via fork, duplicate that one file descriptor
for each child.  Now some guesswork:  reading this database involves seeking
to an appropriate position in the file and then doing a read.  This leads to
a race condition in which child A seeks, then child B seeks, then child B
reads (and succeeds), then child A reads (and fails, getting the record
following child B's).

Short of interlocking the children so that the seek-read sequence becomes
effectively indivisible, the cure is for each child to have its own
independent file descriptor instead of sharing one descriptor among all of
them.  As far as I know, the only way to accomplish this is by having each
child issue its own open (or, equivalently, the parent could close and re-
open the file after each fork).