[comp.unix.questions] Some questions

dannyb@kulcs.UUCP (Danny Backx) (01/20/88)

I have a few questions concerning XENIX.
We are running XENIX System V on a genuine IBM AT, equiped with two 30Mbyte
fixed disks, and 2.5Mbytes of RAM.

My first question is on that second disk, which was only recently installed.
When XENIX boots, at some point the following is displayed :

| 		The IBM Personal Computer XENIX
| 			Version 2.00
| 		(c) Copyright IBM Corp. 1984, 1985
| 	(c) Copyright Microsoft Corp. 1983, 1984, 1985
| 
| Reserved Memory = 2K
| Kernel Memory = 176K
| Buffers = 100K
| User Memory = 2282K
| bad signature (B66D) on drive 1
| 
| Type Ctrl-d to proceed with normal startup,
| (or give root password for system maintenance):_
| 

My question is now : what does the BAD SIGNATURE message really mean,
and how do I get this fixed.
I must say that the second disk is not currently in use for XENIX,
(or anything), so we just put one large DOS partition on it for testing.
It seems to work just fine.
Also, the diagnostics program on the diagnostics diskette delivered with
the AT shows no errors.


My second question concerns the configuration of the XENIX kernel.
We are using the XENIX system as a development system for network drivers.
This means we are adding several device drivers to it, (for PCnet, ethernet,
and in the near future Token Ring), and we are currently adding gateway
software to the system.

Now we also use some XENIX System III (which is XENIX 1.) systems,
om which the same software is added.

What I'd like to know more about is the kernel memory assignment.
In the configuration files, a lot of parameters are set, concerning
things such as kernel buffers for IPC, and for disk access (if I recall
correctly).

Does anybody know what these parameters exactly mean ?
We don't use the IPC facilities such as messages or shared memory.
Do you know a way to get the buffers for these things out of our new
kernels ?
Basically : is it safe to put a zero value for some of these
'tunable parameters' ?

Please mail your answers directly to me.
I will summarize on the net.

Thanks everybody.

	Danny

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Danny Backx                            |  mail: Katholieke Universiteit Leuven 
 Tel: +32 16 200656 x 3537              |        Dept. Computer Science
 E-mail: dannyb@kulcs.UUCP              |        Celestijnenlaan 200 A
         ... mcvax!prlb2!kulcs!dannyb   |        B-3030 Leuven
         dannyb@kulcs.BITNET            |        Belgium     

ct@tcom.stc.co.uk (Clive Thomson) (06/28/90)

Hello,

I have recently started the long journey that will hopefully lead to UN*X
enlightenment, and have some questions I hope somebody will answer for me.

1) The document for the dup call says that it will return the lowest numbered
   file descriptor not used by the process. With the exception of one line
   in "The design and implementation of the BSD 4.3 UNIX operating system"
   (Leffler et al) I have seen no documentation to say open, creat and socket
   will do the same. Observation of open seems to suggest that the lowest fd
   is used, but I would like to be sure.

2) When I am doing socket programming (ULTRIX 3.0 and SunOS4), and I do a 
   bind, if the program terminates abnormally, I find that when I re-run the
   program the bind will fail with an "in use" error. Is there any way to
   convince the system that it is no longer "in use" (assuming of course 
   uid, gid etc are the same).

3) I am a little confused by the "death of child signal". Is the following
   correct. If the parent ignores this signal, the kernel will release
   entries for zombie processes automatically. If the parent uses the default
   handler, it must wait() for the death of each child, or the child will
   become a zombie. If the parent invokes its own handler, in this handler
   a wait should be invoked, otherwise the child will become a zombie. If
   the parent dies before the children, all children are adopted by the init
   process, and the programmer need no longer worry about zombie processes.

Thanks for your time.

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+   Clive Thomson                                   ...!mcvax!ukc!stc!ct     +
+                                                      ct@tcom.stc.co.uk     +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

jik@athena.mit.edu (Jonathan I. Kamens) (06/29/90)

In article <1716@jura.tcom.stc.co.uk>, ct@tcom.stc.co.uk (Clive Thomson)
writes:
|> 1) The document for the dup call says that it will return the lowest
numbered
|>    file descriptor not used by the process. With the exception of one line
|>    in "The design and implementation of the BSD 4.3 UNIX operating system"
|>    (Leffler et al) I have seen no documentation to say open, creat
and socket
|>    will do the same. Observation of open seems to suggest that the lowest fd
|>    is used, but I would like to be sure.

  All file descriptor allocation in the kernel works on the "use the
lowest available fd" system.

|> 2) When I am doing socket programming (ULTRIX 3.0 and SunOS4), and I do a 
|>    bind, if the program terminates abnormally, I find that when I re-run the
|>    program the bind will fail with an "in use" error. Is there any way to
|>    convince the system that it is no longer "in use" (assuming of course 
|>    uid, gid etc are the same).

  I've noticed that the Kernel sometimes gets confused about the state
of a socket which isn't being used anymore; a program exiting abnormally
is one way to cause this to sometimes occur (although it doesn't always
occur).  What ends up happening is that socket stays around in
CLOSE_WAIT status so that no new connections can be made to it.

  Occasionally, the CLOSE_WAIT eventually goes away and it's once again
possible to connect to the socket.  However, if you don't want to wait
and see if that'll happen, and you don't want to have to reboot the
system in order to get the socket to go away, there is a way to force
the ability to connect to the socket.

  What you need to do (at least in BSD; I don't know what happens with
things like this in SysV) is to use the setsockopt() call to set the
SO_REUSEADDR option on your new socket, before you attempt to connect to
the socket which is busy.

  Keep in mind that this option works for all socket connections, not
just the ones that in CLOSE_WAIT, so if another program really is using
the socket and you try to connect to it again with SO_REUSEADDR set,
you'll connect to it and the other program could very well lose.

|> 3) I am a little confused by the "death of child signal". Is the following
|>    correct. If the parent ignores this signal, the kernel will release
|>    entries for zombie processes automatically. If the parent uses the
default
|>    handler, it must wait() for the death of each child, or the child will
|>    become a zombie. If the parent invokes its own handler, in this handler
|>    a wait should be invoked, otherwise the child will become a zombie. If
|>    the parent dies before the children, all children are adopted by the init
|>    process, and the programmer need no longer worry about zombie processes.

  Unfortunately, it's impossible to generalize how the death of child
processes should behave, because the exact mechanism varies over the
various flavors of Unix.  Perhaps someone who's "in the know" (or at
least more so than I am) about POSIX can tell us what the POSIX standard
behavior (if there is any) for this is.

  First of all, by default, you have to do a wait() for child processes
under ALL flavors of Unix.  That is, there is no flavor of Unix that I
know of that will automatically flush child processes that exit, even if
you don't do anything to tell it to do so.

  Second, allegedly, under some SysV-derived systems, if you do
"signal(SIGCHLD, SIG_IGN)", then child processes will be cleaned up
automatically, with no further effort in your part.  However, people
have told me that they've never seen this actually work; the best way to
find out if it works at your site is to try it, although if you are
trying to write portable code, it's a bad idea to rely on this in any case.

  If you can't use SIG_IGN to force automatic clean-up, then you've got
to write a signal handler to do it.  It isn't easy at all to write a
signal handler that does things right on all flavors of Unix, because of
the following inconsistencies:

  On some flavors of Unix, the SIGCHLD signal handler is called if one
*or more* children have died.  This means that if your signal handler
only does one wait() call, then it won't clean up all of the children. 
Fortunately, I believe that all Unix flavors for which this is the case
have available to the programmer the wait3() call, which allows the
WNOHANG option to check whether or not there are any children waiting to
be cleaned up.  Therefore, on any system that has wait3(), your signal
handler should call wait3() over and over again with the WNOHANG option
until there are no children left to clean up.

  On SysV-derived systems, SIGCHLD signals are regenerated if there are
child processes still waiting to be cleaned up after you exit the
SIGCHLD signal handler.  Therefore, it's safe on most SysV systems to
assume when the signal handler gets called that you only have to clean
up one signal, and assume that the handler will get called again if
there are more to clean up after it exits.

  On older systems, signal handlers are automatically reset to SIG_DFL
when the signal handler gets called.  On such systems, you have to put
"signal(SIGCHILD, catcher_func)" (where "catcher_func" is the name of
the handler function) as the first thing in the signal handler, so that
it gets reset.  Unfortunately, there is a race condition which may cause
you to get a SIGCHLD signal and have it ignored between the time your
handler gets called and the time you reset the signal.  Fortunately,
newer implementations of signal() don't reset the handler to SIG_DFL
when the handler function is called.

  The summary of all this is that on systems that have wait3(), you
should use that and your signal handler should loop, and on systems that
don't, you should have one call to wait() per invocation of the signal
handler.  Also, if you want to be 100% safe, the first thing your
handler should do is reset the handler for SIGCHLD, even though it isn't
necessary to do this on most systems nowadays.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

cpcahil@virtech.uucp (Conor P. Cahill) (06/29/90)

In article <1716@jura.tcom.stc.co.uk> ct@tcom.stc.co.uk (Clive Thomson) writes:
>
>1) The document for the dup call says that it will return the lowest numbered
>   file descriptor not used by the process. With the exception of one line
>   in "The design and implementation of the BSD 4.3 UNIX operating system"
>   (Leffler et al) I have seen no documentation to say open, creat and socket
>   will do the same. Observation of open seems to suggest that the lowest fd
>   is used, but I would like to be sure.

Zillions of unix commands would be broken if open/creat did not return the
lowest file descriptor available.  All of the std{in|out|err} redirection
code requires this behavior.

>3) I am a little confused by the "death of child signal". Is the following
>   correct. If the parent ignores this signal, the kernel will release
>   entries for zombie processes automatically. If the parent uses the default
>   handler, it must wait() for the death of each child, or the child will
>   become a zombie. If the parent invokes its own handler, in this handler
>   a wait should be invoked, otherwise the child will become a zombie. If
>   the parent dies before the children, all children are adopted by the init
>   process, and the programmer need no longer worry about zombie processes.

If you ignore the signal, a child just goes away, never becoming a zombie.
If you default the signal, a child becomes a zombie until you wait for it.
If you catch the signal, a child becomes a zombie until you wait for it, and
you are sent a signal when the child exits (and then should wait on the 
child).

For most cases, a programmer does not need to worry about zombie processes 
unless she is writing a program that will spawn many children, or will be
a long-running program that will spawn children every once in a while.  The
key is the number of children that can be reasonably expected during a 
single iteration of the parent and the life of the parent itself.  

Low numbers of children (< 5 or so) and/or short lived parents (<1hr or so)
normally do not have to worry about zombies. 

The safest thing to do is determine if the child actually has to run 
asyncronously or can the parent just wait for it to finish.  If the parent
can't wait, use a signal handler, otherwise use the wait following the fork
and exec or whereever it is appropriate.

In general, zombies are not a problem for a system (your system will probably
handle Zillions of processes that exit over the next week or so and 99.99999%
of them will become zombies (at least momentarily)).

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

jik@pit-manager.mit.edu (Jonathan I. Kamens) (07/02/90)

  One final note about SIGCHLD signal handlers.... Don Libes has
informed me in E-mail that SunOS 4 is one of the operating systems
under which signal(SIGCHLD, SIG_IGN) will cause dying child processes
to be cleaned up automatically.

  The people with whom I've discussed this in the past have implied
that this feature is a "SysV-derived" feature, but since SunOS is
BSD-derived (or, at least, I *thought* it was), perhaps it's no longer
safe to make that generalization.  I guess the only way to generalize
is to say that vendors which have decided to put this feature in have
done so, and those which haven't, haven't -- check your manual for
more information, or write a program to test it :-).

  However, it's probably still not a good idea to rely on this if
you're trying to write portable code -- you should install a signal
handler of your own, using wait3 if it's available, or wait if it
isn't (or wait4, I guess :-).

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

libes@cme.nist.gov (Don Libes) (07/03/90)

In article <1990Jul1.213022.26393@athena.mit.edu> jik@pit-manager.mit.edu (Jonathan I. Kamens) writes:
>  One final note about SIGCHLD signal handlers.... Don Libes has
>informed me in E-mail that SunOS 4 is one of the operating systems
>under which signal(SIGCHLD, SIG_IGN) will cause dying child processes
>to be cleaned up automatically.
>
>  The people with whom I've discussed this in the past have implied
>that this feature is a "SysV-derived" feature, but since SunOS is
>BSD-derived (or, at least, I *thought* it was), perhaps it's no longer
>safe to make that generalization.  I guess the only way to generalize
>is to say that vendors which have decided to put this feature in have
>done so, and those which haven't, haven't -- check your manual for
>more information, or write a program to test it :-).

Guy Harris has informed me that I was wrong about signal(SIGCHLD,SIG_IGN)
reaping child processes automatically on SunOS.  He is correct.

I had the unfortunate luck to be calling a subroutine that someone
else wrote which did the old wait in a loop trick, making me believe
that this long-standing behavior had been changed on my system.

Jonathon was right, originally.  SunOS 4.1 works the way that BSD
systems have worked all along.  Sorry Jonathan.

Don Libes          libes@cme.nist.gov      ...!uunet!cme-durer!libes

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/04/90)

In article <4919@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes:
>Guy Harris has informed me that I was wrong about signal(SIGCHLD,SIG_IGN)
>reaping child processes automatically on SunOS.  He is correct.
>I had the unfortunate luck to be calling a subroutine that someone
>else wrote which did the old wait in a loop trick, making me believe
>that this long-standing behavior had been changed on my system.

I can't tell from the sparse information, but if you were using the
System V environment it is possible you were getting an emulation of
the System V SIGCLD behavior.  I know it was present in my original
System V emulation for 4BSD, even though I personally think it is a
horrible design and that the originator of that kludge in UNIX System
V should be hanged by its thumbs.

guy@auspex.auspex.com (Guy Harris) (07/05/90)

>I can't tell from the sparse information, but if you were using the
>System V environment it is possible you were getting an emulation of
>the System V SIGCLD behavior.

He wasn't; we didn't put that into the S5 environment in SunOS.  The
problem lay elsewhere....