[comp.sys.hp] About wait

jdudeck@polyslo.CalPoly.EDU (John R. Dudeck) (04/26/91)

I am porting an application (the PP X.400 system) from bsd to HP/UX, on
a system running HP/UX ver.  6.5B.  The code includes three daemons.  I am
having problems that seem to be related to the wait(2) call.

The original code defines a wait structure:
  struct wait w;
then later on does a fork and a wait, etc, in the normal fashion.

Then there is a line such as 
  if (WIFEXITED(w)) return w.w_retcode;

This does not compile under HP/UX, with an "operands of cast have
incompatible types" error.  The sys/wait.h file defines WIFEXITED using
a cast of struct to int, as opposed to bsd which does not.

So I changed the above line to
  if (WIFEXITED(w.w_status)) return w.w_retcode;
which compiles ok.

This same scenario exists 5 places in the package.  Now I am at a point
were different parts of the packgage crash with a Bus Error at certain
points in the code.  It doesn't crash in this code, but since this is
the part I changed, I tend to suspect a problem here.  I suspect that
if the code returns the wrong status here, it could provoke a crash
elsewhere in the system.

Furthermore, the daemon isn't cleaning up its zombies like it should.

There also are a couple of wait3() calls in the system, which I didn't
make any changes to.

In the man page for wait(2), there is a line which says:
  "The third parameter to wait3 is currently unused, and must always be
  a null pointer".
In one place this is not null in my code.

My questions are these:

1. Did I do something wrong in the changes I made?
2. Is there a difference in the way wait() works on HP/UX?
3. What happens if the third parameter to wait3 isn't a null pointer?


-- 
John Dudeck                                              "You can only push
jdudeck@Polyslo.CalPoly.Edu                             simplicity so far."
ESL: 62013975 Tel: 805-545-9549                -- AT&T promotional brochure

decot@hpcupt1.cup.hp.com (Dave Decot) (04/30/91)

See the "Notes" section on the HP-UX wait(2) man page (I hope it was
in 6.5, but it is certainly there in 7.0 and 8.0).

You may want to consider updating to HP-UX 7.0 and/or HP-UX 8.0.

> 2. Is there a difference in the way wait() works on HP/UX?

Yes, these are differences between POSIX and BSD.  In particular, the
value returned in the variable to which wait's argument points
can no longer be decoded using the WIF* functions used in BSD.
Unfortunately, POSIX chose to use those macro names for a different
interface for decoding the value, and HP-UX followed POSIX.

However, the BSD macros are still available by defining the _BSD
symbol (using the -D_BSD option on the cc command line).  I don't know
if this worked in 6.5; it's been quite a while since that release was
current.

> 1. Did I do something wrong in the changes I made?

Yes.  Either change the code back to the BSD way and compile the code
with the -D_BSD option (this is best if you want the code to still
port to BSD 4.3 or earlier), or convert the code to use the POSIX
version of these macros as described on HP-UX's wait(2) man page
(this will work with BSD 4.4 or later).

> 3. What happens if the third parameter to wait3 isn't a null pointer?

  "The third parameter to wait3 is currently unused and must always be
  a null pointer."

If it isn't, no warranty is expressed or implied, since you have violated
the requirements of the documentation.  Among the possible results are
a memory fault, or mysterious changes to unrelated variables.

Dave

rml@hpfcdc.HP.COM (Bob Lenk) (05/08/91)

> Then there is a line such as 
>   if (WIFEXITED(w)) return w.w_retcode;
> 
> This does not compile under HP/UX, with an "operands of cast have
> incompatible types" error.  The sys/wait.h file defines WIFEXITED using
> a cast of struct to int, as opposed to bsd which does not.
> 
> So I changed the above line to
>   if (WIFEXITED(w.w_status)) return w.w_retcode;
> which compiles ok.

This should work fine.  The HP-UX macros are compatible with POSIX rather
than BSD.  In newer versions of HP-UX, <sys/wait.h> has a BSD compatible
version of these macros within #ifdef _BSD.

> This same scenario exists 5 places in the package.  Now I am at a point
> were different parts of the packgage crash with a Bus Error at certain
> points in the code.  It doesn't crash in this code, but since this is
> the part I changed, I tend to suspect a problem here.  I suspect that
> if the code returns the wrong status here, it could provoke a crash
> elsewhere in the system.

I don't think there's any relationship.

> Furthermore, the daemon isn't cleaning up its zombies like it should.
> 
> There also are a couple of wait3() calls in the system, which I didn't
> make any changes to.
> 
> In the man page for wait(2), there is a line which says:
>   "The third parameter to wait3 is currently unused, and must always be
>   a null pointer".
> In one place this is not null in my code.

I believe wait3() will return an EINVAL error.  (That's what the manual
says.  I don't have a 6.5 system or source handy to check - but you
should be able to verify with a small program if you like).  This could
easily be causing the unreaped zombies.  It's possible that some code
isn't detecting the error, is expecting some returned values to have
useful data, and is causing the core dumps.

There is no supported way to get the functionality of the third
parameter to wait3().  In order to port this, you need to check how the
code uses the rusage structure.  You can get the same information as in
the CPU time fields (ru_utime and ru_stime) with times(2) (call it
before and after wait/wait3/waitpid - difference in child times is time
for newly reported child).  The information in the other rusage fields
is not available.

> My questions are these:
> 
> 1. Did I do something wrong in the changes I made?

Only in supplying the non-NULL third parameter to wait3().

> 2. Is there a difference in the way wait() works on HP/UX?

Only in (a) not supporting the third parameter to wait3() and
(b) difference from BSD on type of status argument and thus on type of
argument to the WIF*() macros.

> 3. What happens if the third parameter to wait3 isn't a null pointer?
See above (EINVAL error, I think).

		Bob Lenk
		rml@fc.hp.com
		{uunet,hplabs}!fc.hp.com!rml

Normal disclaimer - not an official response from HP.

carllp@diku.dk (Carl-Lykke Pedersen) (05/10/91)

rml@hpfcdc.HP.COM (Bob Lenk) writes:
>> So I changed the above line to
>>   if (WIFEXITED(w.w_status)) return w.w_retcode;
>> which compiles ok.
>
>This should work fine.  The HP-UX macros are compatible with POSIX rather
>than BSD.  In newer versions of HP-UX, <sys/wait.h> has a BSD compatible
>version of these macros within #ifdef _BSD.

But WTERMSIG and WSTOPSIG still seems to be defined the POSIX-way (in
hpux 7.0).

Regards
Carl-Lykke

--
Carl-Lykke Pedersen (System Administrator)     Email:  carllp@diku.dk
DIKU (Dept. Comp. Sci. Univ. Copenhagen)       Fax:   +45 31 39 02 21
Universitetsparken 1
DK-2100 Copenhagen, Denmark