trinkle@cs.purdue.edu (02/16/90)
Ever since we converted to SunOS 4.0, we have had a bad problem rebooting diskless clients. For about the first 5-10 minutes after reboot, it is almost impossible to log in. Even localdisk machines exhibit the same behavior, but usually a bit faster. Incredibly enough, this problem is due almost entirely to a big mistake made in the getservbyname() library routine. This problem actually started in (maybe earlier, but) some time back in SunOS 3.2 or 3.4. There used to be two YP maps for services entries, one called services.byname and one called services.byport (or maybe bynumber, it is not that important). One was keyed by port/proto and the other was supposed to be keyed by name/proto. Unfortunately, the /usr/etc/yp/Makefile target for building services.byname had a mistake in it that caused the key to be incorrect. It was either just the name or port/proto, I don't remember exactly. I corrected the Makefile and it was just fine. However, it was not Sun code that had the problem with the YP map having the incorrect key, it was a uVAX-II running YP. It turns out that the Sun getservbyname routine was roughly if (ypmatch("name/proto", "services.byname") { return success; } else { while (se = getservent()) { if (match()) return success; } } so the routine was quietly failing each time and resorting to a painfully slow YPPROC_FIRST, YPPROC_NEXT loop of getting service entries. Well, under SunOS 4.0 they got rid of this problem in a clever way - they removed the initial part of the if statement, just used the while loop, and did away with the services.byname map. Ah, you say, there is a services.byname map. Well, that is what they call it, but it is NOT services.byname, it is services.byport. The initial analysis of this brilliant design decision is that no one at Sun could figure out how to make the Makefile target generate a correct map! Unfortunately, when a machine boots, most of its YP traffic is doing getservbyname on every single service for which it starts a daemon or for which inetd listens. By the way, this same behavior also exists in getrpcbyname(), but it was not worth building a new map for a file that only has 30 entries. Also, this is not used as much as getservbyname(). getservbyname is also used every time you do a rsh, rlogin, telnet, ftp ... well, a lot of stuff. I have a patch to /var/yp/Makefile and /usr/src/lib/libc/net/getservent.c to actually make and use a services.byname map. Note, however, two gotcha's. First, the comment field of the /etc/services line must begin with a "# " because the # must be a separate field to awk. Second, note that I call the new map services.byport instead of the correct services.byname. This is because other vendors have blindly adopted Sun's code without looking at it very closely, and consequently they also use the services.byname map to do the getservbyport() routine. To prevent them from looping trying to do getservbyport() calls, it is best to keep the old map the same. If you are a source site and want the patches, please let me know. I would also be very happy to have the getservent.o installed in the libc.pic on UUNET, but I don't know about the legalities. I hope Sun will take the patches, and also allow us to somehow or other distribute correct binaries. This fix makes rebooting every time processes get hung in device wait a bit easier to tolerate (but not much). For those that might be interested, I also made a hack patch to etherfind so that it will print out the YP map name for YPPROC_{MATCH,FIRST,NEXT} RPC packets (when using -r). Again, if you are a source site and interested in these patches, please let me know. Daniel Trinkle trinkle@cs.purdue.edu Dept. of Computer Sciences {backbone}!purdue!trinkle Purdue University 317-494-7844 West Lafayette, IN 47907