[comp.unix.wizards] Please don't redefine system calls!

roy@phri.UUCP (Roy Smith) (02/10/89)

	I sat down on my SunOS-3.5 system to do some rwhod hacking (*) today
and ran across something which I consider pretty gross.  Like many daemons,
rwhod puts itself in the background by forking and having the parent exit.
To make it easier (read possible) to debug, if you #define DEBUG, the fork
code isn't compiled in and rwhod runs in the foreground.  Since I was
debugging it, I #defined DEBUG and started single-stepping the process in
dbx.  Much to my surprise, when it got up to

	sp = getservbyname("who", "udp");

I started getting, every 5 seconds:

	sendto 7f000001.111
	hostname    up     0:00
	load 0.02, 0.17, 0.00

	A bit of headscratching revealed what was going on.  What rwhod does
is to gather up some system statistics once a minute and broadcast the stats
using a sendto() call.  If you #define DEBUG, rwhod.c includes its own
sendto() routine which instead of actually sending out a packet, just prints
some stuff on your terminal.  It would seem that the author of rwhod never
realized that some library routines might also want to call sendto() to do
some private stuff and the redefined call would break that.  In this case, it
was a Yellow Pages based version of getservbyname().  Makes it kind of hard
to actually debug the program.  I'm surprised Sun never ran across this
problem before (I was working from the 3.2 rwhod.c, not having the 3.5
sources available).

	The moral of the story is that you shouldn't redefine system calls.
If you really need to get some other behavior for (for example) sendto(),
you should put a "#define sendto my_sendto" at the beginning of your source
file.  That way, you get the modified version, but library routines still get
the regular one they were expecting.

----------------

(*) Just why was I hacking on rwhod you ask?  As documented by Sun, rwho is a
real performance pig.  With N machines running rwhod on your net, you get N^2
packets received each minute.  With lots of diskless clients, that means your
NFS servers spend all their time servicing requests to write /usr/spool/rwho
files.  The result is that you know what machines are up, but you can't do
anything useful on any of them.  The N^2 effect isn't so bad when you've got
15 or 20 machines, but it kills you when you've got hundreds.  My idea was to
make rwhod write into /usr/lib/rwho instead of /usr/spool/rwho.  Each of the
diskless clients would run rwhods which sent out status packets but didn't
listen for any.  Each file server would run a normal rwhod which sent out
stats and also listened for status packets and wrote them to /usr/lib/rwho.
The diskless clients would NFS mount /usr/lib/rwho, and rwho and ruptime
would know to look there instead of /usr/spool/rwho for their data.  The end
result is that every machine has a functioning rwho and ruptime, but the load
of listening for and writing out the rwho broadcast packets would be reduced
by an order of magnitude, not to mention the secondary savings of the servers
not having to constantly page rwhod in and out on each of the clients.
-- 
Roy Smith, System Administrator
Public Health Research Institute
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net
"The connector is the network"