[comp.sys.sun] Sun YP brain-damage and cure

trinkle@cs.purdue.edu (02/16/90)

Ever since we converted to SunOS 4.0, we have had a bad problem rebooting
diskless clients.  For about the first 5-10 minutes after reboot, it is
almost impossible to log in.  Even localdisk machines exhibit the same
behavior, but usually a bit faster.  Incredibly enough, this problem is
due almost entirely to a big mistake made in the getservbyname() library
routine.

This problem actually started in (maybe earlier, but) some time back in
SunOS 3.2 or 3.4.  There used to be two YP maps for services entries, one
called services.byname and one called services.byport (or maybe bynumber,
it is not that important).  One was keyed by port/proto and the other was
supposed to be keyed by name/proto.  Unfortunately, the
/usr/etc/yp/Makefile target for building services.byname had a mistake in
it that caused the key to be incorrect.  It was either just the name or
port/proto, I don't remember exactly.  I corrected the Makefile and it was
just fine.  However, it was not Sun code that had the problem with the YP
map having the incorrect key, it was a uVAX-II running YP.  It turns out
that the Sun getservbyname routine was roughly

	if (ypmatch("name/proto", "services.byname") {
		return success;
	} else {
		while (se = getservent()) {
			if (match())
				return success;
		}
	}

so the routine was quietly failing each time and resorting to a painfully
slow YPPROC_FIRST, YPPROC_NEXT loop of getting service entries.

Well, under SunOS 4.0 they got rid of this problem in a clever way - they
removed the initial part of the if statement, just used the while loop,
and did away with the services.byname map.  Ah, you say, there is a
services.byname map.  Well, that is what they call it, but it is NOT
services.byname, it is services.byport.  The initial analysis of this
brilliant design decision is that no one at Sun could figure out how to
make the Makefile target generate a correct map!

Unfortunately, when a machine boots, most of its YP traffic is doing
getservbyname on every single service for which it starts a daemon or for
which inetd listens.  By the way, this same behavior also exists in
getrpcbyname(), but it was not worth building a new map for a file that
only has 30 entries.  Also, this is not used as much as getservbyname().
getservbyname is also used every time you do a rsh, rlogin, telnet, ftp
... well, a lot of stuff.

I have a patch to /var/yp/Makefile and /usr/src/lib/libc/net/getservent.c
to actually make and use a services.byname map.  Note, however, two
gotcha's.  First, the comment field of the /etc/services line must begin
with a "# " because the # must be a separate field to awk.  Second, note
that I call the new map services.byport instead of the correct
services.byname.  This is because other vendors have blindly adopted Sun's
code without looking at it very closely, and consequently they also use
the services.byname map to do the getservbyport() routine.  To prevent
them from looping trying to do getservbyport() calls, it is best to keep
the old map the same.

If you are a source site and want the patches, please let me know.  I
would also be very happy to have the getservent.o installed in the
libc.pic on UUNET, but I don't know about the legalities.  I hope Sun will
take the patches, and also allow us to somehow or other distribute correct
binaries.  This fix makes rebooting every time processes get hung in
device wait a bit easier to tolerate (but not much).

For those that might be interested, I also made a hack patch to etherfind
so that it will print out the YP map name for YPPROC_{MATCH,FIRST,NEXT}
RPC packets (when using -r).  Again, if you are a source site and
interested in these patches, please let me know.

Daniel Trinkle			trinkle@cs.purdue.edu
Dept. of Computer Sciences	{backbone}!purdue!trinkle
Purdue University		317-494-7844
West Lafayette, IN 47907