[comp.unix.wizards] Monitoring your nameserver

david@twg.com (David S. Herron) (08/17/90)

In article <9008141525.AA27754@sci.ccny.cuny.edu> dan@SCI.CCNY.CUNY.EDU (Dan Schlitt) writes:
>Subject: How do YOU tell if named has died?

>So how do folks arrange to get automatic notification in a timely way
>when their nameserver software dies?  Answers for diverse hardware
>running unix for me, but others may be interested in other cases.

A quick hack would be to have a cron job on occasion which either
checks for the existance of critical processes & start's 'em up.  Or
just start's em & lets the processes fight over how many of which kind
are to be running.  Buuuut..

There's a generic problem with the way daemon's are done in Unix
whose issue is beyond `name service'.  That is that the daemons
are processes spun off into the background and not watched after.
[So therefore I'm cross-posting to unix-wizards..]  If `cron' dies
the system is just as crippled, though in a different way.  And
random people are just as likely to notice cron dieing as they
do when named dies now.

Something on my long and varied list of Things To Do (but haven't done
yet) is to write a program (name: respawn, or daemond) which watches
after generic processes.  As opposed to init which is suited to
watching after /etc/getty's.

This process will somehow take a list of processes to watch after.
It will be the parent of all those processes, so that it will be notified
of them dieing ..  It will have a number of actions it can do when
the process dies, like wait awhile before starting a new copy, start
one immediately, start one under some condition, etc.

This is different from init in that init is rather specific to
watching after getty's.  Even the SysV version of init .. though
the configurability of /etc/inittab gets close to what I have in mind.

This is different from inetd in that inetd is specific to network
services.  `cron' is not a network service, yet it also needs to
be watched over in this way.  Also inetd is suited to a situation
where it starts up a fresh process for each connection -- in the
particular case of named this is bad because named needs to be
running all the time.

At the moment we're relying on the hopefull assumption of a lack of
bugs in these background daemons.  (Where's some wood to knock on??)
-- 
<- David Herron, an MMDF weenie, <david@twg.com>
<- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
<-
<- Sign me up for one "I survived Jaka's Story" T-shirt!

boyd@necisa.ho.necisa.oz (Boyd Roberts) (08/20/90)

In article <7769@gollum.twg.com> david@twg.com (David S. Herron) writes:
>
>This process will somehow take a list of processes to watch after.
>It will be the parent of all those processes, so that it will be notified
>of them dieing ..  It will have a number of actions it can do when
>the process dies, like wait awhile before starting a new copy, start
>one immediately, start one under some condition, etc.
>

It's been done already.  Back in '83 or so Tim Long% at Sydney Uni
Comp Sci rewrote init so it was far more flexible as a general
purpose daemon controller.

He had a file /etc/procs with entries like this:

tty-console	/etc/login@ peb1200 /dev/console
netd-basser40	/usr/spool/ACSnet/_lib/NNdaemon -I basser40
skulk		/etc/skulk

The first field was a handle for the process and the other fields
were the program to run and its arguments.  All daemons were started
by init and a naming convention was used so that a group of related
processes could be controlled easily.

There was no concept of init `state'.  But you could interrogate
init and ask it what was going on.  To interrogate it you used
a program called `toinit':

    toinit <command> <regular-expressions...>

The commands were (from what I can remember):

    start	- start it
    stop	- SIGTERM it and don't restart it
    kill	- SIGTERM then SIGKILL
    curtail	- don't restart it when it dies
    status	- tell me what the state of world is
    scanprocs	- re-read /etc/procs and incorporate any changes

The regular expressions were matched against the first /etc/procs field
(the handle for the process) and the appropriate action was taken
on any of the matches.

There were special entries in /etc/procs for a single user shell on the
console for boot & shutdown.  Startup was just a script that had the
appropriate mounts and then a large `toinit start ...'.  Shutdown
was just a `toinit stop tty-.* ...' and then some magic (I forget)
to get a single user shell on the console (these machines were 32V
VAX 11/780's).

There were some bugs, but we fixed them and hacked in some more
magic for auto-reboots.  The `magic' was usually just a `rc' like
script that did the right things and then told init to start
the appropriate stuff.

With this approach you could control a _single_ entry, unlike the
ghastly mess that is System V's /etc/inittab.  The IPC between
`toinit' and `init' was a bit messy, but with a mounted process
stream implementation (was this ever done John?) it can be
done really cleanly.


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

-------
% Bruce Ellis, Piers Lauder, John Mackin, Chris Maltby and myself
  added mods and bug fixes over the years.

@ getty/login were re-written into /etc/login.  /bin/login was unlinked.