carroll@m.cs.uiuc.edu (08/20/90)
As has been noted by others, Epoch is a little unstable when running subprocesses under ISC 2.0.2. While part of this was errors on my part, I've recently tracked at least some of the crashes down to the fact that wait(loc) sometimes returns negative values in *loc, e.g. 0xFFFFxxyy where xx and yy are the values you'd normally expect in the bottom 16 bits of the return value. TFM specifies the contents of the bottom 16 bits of *loc, but says absolutely nothing about the top 16 bits. The GNU-Emacs process internals assume that the value is positive, in particular that the type and mark bits are 0 (generally the top 8 bits). A simple fix is to use w &= 0xFFFF; immediately after the wait. Is this reasonable? Does Emacs make unwarranted assumptions about wait(), or is this an ISC bug? Alan M. Carroll Barbara/Marilyn in '92 : carroll@cs.uiuc.edu + This time, why not choose the better halves? Epoch Development Team CS Grad / U of Ill @ Urbana ...{ucbvax,pur-ee,convex}!cs.uiuc.edu!carroll
dougm@ico.isc.com (Doug McCallum) (08/20/90)
In article <70400015@m.cs.uiuc.edu> carroll@m.cs.uiuc.edu writes: ... >subprocesses under ISC 2.0.2. While part of this was errors on my part, I've >recently tracked at least some of the crashes down to the fact that wait(loc) >sometimes returns negative values in *loc, e.g. 0xFFFFxxyy where xx and yy >are the values you'd normally expect in the bottom 16 bits of the return value. >TFM specifies the contents of the bottom 16 bits of *loc, but says absolutely >nothing about the top 16 bits. The GNU-Emacs process internals assume that >the value is positive, in particular that the type and mark bits are 0 >(generally the top 8 bits). A simple fix is to use > w &= 0xFFFF; >immediately after the wait. Is this reasonable? Does Emacs make unwarranted >assumptions about wait(), or is this an ISC bug? The wait call only specifies the bottom 16 bits. In the V.3 implementation, only those 16 bits are ever set so the "w &= 0xFFFF;" is reasonable. I checked the source and it looks like a bug in the AT&T code. The exit() call has its argument shifted left 8 bits and is stored in a "short" in the proc structure. When wait is called, that short is put into an "int" for return to the caller. If the exit call had been made with a negative number, or the value when shifted left set the sign bit, the sign will get extended. This looks like a pretty generic AT&T behavior. Doug McCallum Interactive Systems Corp. dougm@ico.isc.com
rvdp@cs.vu.nl (Ronald van der Pol) (08/20/90)
carroll@m.cs.uiuc.edu writes: >sometimes returns negative values in *loc, e.g. 0xFFFFxxyy where xx and yy >are the values you'd normally expect in the bottom 16 bits of the return value. >TFM specifies the contents of the bottom 16 bits of *loc, but says absolutely >nothing about the top 16 bits. The GNU-Emacs process internals assume that wait(loc) int *loc; loc = 0xHHHH.LLLL HHHH: low byte of child exit argument (0 on normal exit) LLLL: termination status of child process (0 on normal exit) -- Ronald van der Pol <rvdp@cs.vu.nl>
carroll@m.cs.uiuc.edu (08/21/90)
/* Written 10:13 am Aug 20, 1990 by rvdp@cs.vu.nl in m.cs.uiuc.edu:comp.unix.i386 */ carroll@m.cs.uiuc.edu writes: >sometimes returns negative values in *loc, e.g. 0xFFFFxxyy where xx and yy >are the values you'd normally expect in the bottom 16 bits of the return value. wait(loc) int *loc; loc = 0xHHHH.LLLL HHHH: low byte of child exit argument (0 on normal exit) LLLL: termination status of child process (0 on normal exit) /* End of text from m.cs.uiuc.edu:comp.unix.i386 */ I think that you are confused. You say "byte" but you use 4 hex digits, which is _two_ bytes. TFM does not say "byte", it specifically says "8 bits", which is 2 hex digits, so it would be 0xQQQQEETT with Q undefined, T termination status, and E exit code.
rvdp@cs.vu.nl (Ronald van der Pol) (08/21/90)
carroll@m.cs.uiuc.edu writes: |is 2 hex digits, so it would be 0xQQQQEETT with Q undefined, T termination |status, and E exit code. This is correct. When I wrote 'HHHH' I indeed meant 'HH'. -- Ronald van der Pol <rvdp@cs.vu.nl>
walter@mecky.UUCP (Walter Mecky) (08/25/90)
In article <70400015@m.cs.uiuc.edu> carroll@m.cs.uiuc.edu writes:
<
< I've
< recently tracked at least some of the crashes down to the fact that wait(loc)
< sometimes returns negative values in *loc, e.g. 0xFFFFxxyy where xx and yy
< are the values you'd normally expect in the bottom 16 bits of the return value.
< TFM specifies the contents of the bottom 16 bits of *loc, but says absolutely
< nothing about the top 16 bits.
As others noted, only the bottom 16 bits are defined and it's possible,
that the high order bit is propagated. I had to learn hard, that wait(2)
is indeed giving an int some years ago. I thought "He gives me 16
bits, so a short is enough", defined the "loc" variable as short
and had a hard time to find the mysterious changing of another variable ...
BTW: X/OPEN and Posix defined macros in <sys/wait> to access the parts
of "loc". SCO-UNIX has them too.
--
Walter Mecky [ walter@mecky.uucp or ...uunet!unido!mecky!walter ]