Sun-Spots-Request@RICE.EDU (William LeFebvre) (03/11/88)
SUN-SPOTS DIGEST Wednesday, 9 March 1988 Volume 6 : Issue 26 Today's Topics: Administrivia Re: Bug in unlink (2) Re: lpd connection to terminal server (2) Re: Mysterious ethernet misbehavior (4) Re: Spurious level 3 interrupts (2) Re: Stuck in caps lock (2) Re: Sun-4 bcopy warning ; fast 68020 bcopy Re: adding new clients Bug in rpc.yppasswdd (with fix) New improved calentool Monthtool Send contributions to: sun-spots@rice.edu Send subscription add/delete requests to: sun-spots-request@rice.edu Bitnet readers can subscribe directly with the CMS command: TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name Recent backissues are stored on "titan.rice.edu". For volume X, issue Y, "get sun-spots/vXnY". They are also accessible through the archive server: mail the word "help" to "archive-server@rice.edu". ---------------------------------------------------------------------- Date: Wed, 09 Mar 88 14:25:55 CST From: William LeFebvre <phil@Rice.edu> Subject: Administrivia Greetings! I currently have a tremendous backlog of messages still waiting to appear in a digest (179 kilobytes not including this digest--- enough for 9 digests). Rather than the backlog getting smaller, it seems to have increased! Since I usually try to find the best in bad situations, I decided to be more topical than chronological. So for this issue I grouped together replies to the same question, messages on identical topics, etc. As a result, this digest asks few questions and answers many! I am beginning to feel that more drastic action may be necessary to lessen the backlog, such as increasing the digest size from 20K to 30K. If anyone has objections to this idea, please send me mail. I am also open to other suggestions. A note to all BITNET readers and potential BITNET readers: remember that you are now getting your messages via a BITNET listserver. You no longer need to solicit me for add and delete requests. See the information immediately after "Today's Topics" to find out how to subscribe. You can remove your BITNET address from the listserver with the following CMS command: "TELL LISTSERV AT RICE SIGNOFF SUNSPOTS". William LeFebvre Department of Computer Science Rice University <phil@Rice.edu> ------------------------------ Date: Tue, 1 Mar 88 16:16:04 MST From: dbd%benden@lanl.gov (Dan Davison) Subject: Re: bug in unlink (1) The bug reported by steve Maurer (steve@ut-sally.UUCP) in unlink was described in one of the Sun Software Technical Bulletins (STBs) recently. I don't have the issue handy but it was the issue that was the size of a small city's phone book. The issue contained the Customer Distributed Buglist and makes for pleasant just-before-bedtime reading. In the unlink bug discussion I recall there also being a work around, but I don't recall the details. Send me mail if you want it posted. dan davison dbd@benden.lanl.gov theoretical biology, los alamos national laboratory ------------------------------ Date: Thu, 3 Mar 88 16:49:12 EST From: Root Boy Jim <rbj@icst-cmr.arpa> Subject: Re: Bug in unlink (2) >Steve Maurer: > I would appreciate any information or advice anyone might have on how to > remove such links.... Try moving the data somewhere else, and point clri at the offending directory. While not a general solution, it should fix things for you. (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> National Bureau of Standards Flamer's Hotline: (301) 975-5688 ------------------------------ Date: Wed, 2 Mar 88 20:12:16 EST From: Terry Slattery <tcs@usna.mil> Subject: Re: lpd connection to terminal server (1) Starting with the enclosed 'ttcp' program might be a useful way to get your connection between lpd and the terminal server. The usage message says: Usage: ttcp -t [-options] host <in\n\ -l## length of bufs written to network (default 1024)\n\ -s source a pattern to network\n\ -n## number of bufs written to network (-s only, default 1024)\n\ -p## port number to send to (default 2000)\n\ -u use UDP instead of TCP\n\ Usage: ttcp -r [-options] >out\n\ -l## length of network read buf (default 1024)\n\ -s sink (discard) all data from network\n\ -p## port number to listen at (default 2000)\n\ -B Only output full blocks, as specified in -l## (for TAR)\n\ -u use UDP instead of TCP\n\ The -r flag is for receive and -t is transmit. It prints out transfer rate info which is useful for network testing (use default -l and -n plus -s to get 1Mby transfer). I also use it on occasion to create a network pipe between untrusted hosts: On source machine: tar cf - . | ttcp -t rem_host On dest machine: ttcp -r | tar xvf - I'm sure you can think of other uses. Modifying for bi-directional operation shouldn't take long. -tcs [[ The program has been placed in the archives as "sun-source/ttcp.c" and is 11924 bytes in length. It can be retrieved via anonymous FTP from the host "titan.rice.edu" or via the archive server with the request "send sun-source ttcp.c". For more information about the archive server, send a mail message containing the word "help" to the address "archive-server@rice.edu". --wnl ]] ------------------------------ Date: Thu, 3 Mar 88 14:28:25 EST From: Root Boy Jim <rbj@icst-cmr.arpa> Subject: Re: lpd connection to terminal server (2) I don't know what you are trying to do, but perhaps you are using the wrong tool for the job. Presumably the `raw TCP datastream' has another end somewhere on another machine. Why not mail to a fake account on the other machine which does what you want to do. For example, we have in /usr/lib/aliases the following lines: laser: "|/usr/ucb/lpr -Plzr -p" prt: "|/usr/ucb/lpr -p" lpr: "|/usr/ucb/lpr" lzr: "|/usr/ucb/lpr -Plzr" Since I don't know what you're doing I don't know whether this will solve your problem, but somebody might find this useful. I can't remember wherther this prints the mail header or not, but if it does, you might pipe it to "sed '1,/^$/d'" before in the alias file. (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> National Bureau of Standards Flamer's Hotline: (301) 975-5688 ------------------------------ Date: Fri, 26 Feb 88 19:06:40 CST From: kane%fang@gswd-vms.gould.com (Patrick E Kane) Subject: Re: Mysterious ethernet misbehavior (1) Try killing the "rwhod" processes that you really don't need. The Rwhod program likes to broadcast packets which cause nodes with "/usr/spool/rwho" on a remote system to send NFS packets to their server. I modified our local rwhod (that runs on our diskless nodes) to not write out rwho info, but still send it. I found that having 20 sun diskless suns running the standard rwho would effectivly flush my root server's disk buffer cache every few minutes. Pat Kane P.S. My root server has a very small ( < 2 Megs) kernel address space. ------------------------------ Date: Mon, 29 Feb 88 10:07:36 CST From: Jim Knutson <knutson%SW.MCC.COM@mcc.com> Subject: Re: Mysterious ethernet misbehavior (2) There are two things to watch for when running lots of diskless clients on a single ethernet. One is running rwhod without the broadcast only hack and two is things run from cron. The best way to run rwhod is to modifying it or obtain a modified copy of rwhod and run it broadcast only on all clients. The clients would then NFS mount the servers copy of /usr/spool/rwhod. This prevents all clients from trying to do simultaneous writes on the reciept of a single rwho broadcast packet. I wish Sun would distribute this. I sent the fix to them back with release 2.2. The other problem to watch for is something run from cron. If all your diskless clients are using the same crontab file, and the clocks are all in sync, then it is likely that they will all request a copy of the same thing at exactly the same time. This can not only flood your server with disk requests but also the net as well. At the University of Texas, we hit 95%+ saturation of the net on the quarter hour with about 100 suns due to atrun firing up from cron. This can be resolved by staggering startup times in cron or by subnetting to keep the packets local to a server and its clients. Jim Knutson knutson@mcc.com knutson@milano.uucp ------------------------------ Date: Tue, 1 Mar 88 10:12:17 CST From: boyle%antares@anl-mcs.arpa Subject: Re: Mysterious ethernet misbehavior (3) I'm not exactly an old ethernet hand, but at Argonne we have a network of 4 servers (3/280s) and 40 clients (3/60s and 3/140s). In addition, there are several other Unix machines on our net: a VAX, Encore Multimax, Sequent B21000, Alliant FX/8, Intel Hypercubes, and AMT DAP 510 on a Sun host. We sought Sun's advice about configuring the network, and they recommended a backbone for the 4 servers and other machines. Each server has a second ethernet board, and that ethernet goes to its 10 clients. This system works very well, and we have had none of the problems you describe. (Incidentally, we run NFS among the Suns and some of the other machines.) Since the second ethernet boards are relatively cheap (a few K$), I recommend you try this configuration. 40 clients on one net sounds like a lot to me. Jim Boyle ------------------------------ Date: 4 Mar 88 02:14:25 GMT From: ksr!benson@uunet.uu.net (Benson Margulies) Subject: Re: Mysterious ethernet misbehavior (4) I reported periodic storms of ie: no carrier and ethernet jammed. Contrary to those who believe that there are necessarily hardware-related problems, we were suffering from cron-ic load problems. That is, every few minutes all of the workstations would hit the network to page in whatever cron told them to do. When the clock sync was working particularly well, the results were collision storms. This sure looks like a candidate for Sun documentation, or even a cron feature to randomize the times slightly. Benson I. Margulies Kendall Square Research Corp. harvard!ksr!benson ksr!benson@harvard.harvard.edu ------------------------------ Date: Sat, 27 Feb 88 01:22:33 +0200 From: leonid@TAURUS.BITNET Subject: Re: Spurious level 3 Interrupts (1) When we first got our 3/180 there was not problem, then suddently we had a hardware problem, eventually located to be with the backplane. The synpmtom was that at some (early) point it would shout "Suprious level 3 Interrupt" and crash. We have replaced the backplane and all went fine. Except that the replacement backplane was an old revision, older than our original, and since then we would get spurious interrupt message once or twice a day with no apparent damage (no crashes etc.) Anyhow, we asked our SUN rep to replace the backplane with a real new one. So he did, and since then we have no more of these messages. My guess folks that if you get such messages, it means that your backplane revision number is not compatible with your CPU board revision number. Isn't it simple ? Leonid ------------------------------ Date: 7 Mar 88 15:27:23 GMT From: jc@piaget.uucp Subject: Re: Spurious level 3 interrupts (2) I recently encountered a site with a second Ethernet controller and they were getting spurious interrupt messages associated with it. One knowledgeable person I talked to said that that's one of the things you get with a second Ethernet controller and that there is a) no known fix for it and b) no known cause. Can anybody tell me more about the cause and (hopefully) a fix for this problem? --jc John Cornelius (...!sdcsvax!piaget!jc) ------------------------------ Date: Thu, 3 Mar 88 14:17:14 EST From: Michael Sykora <sykora@violin.ctr.columbia.edu> Subject: Re: Stuck in caps lock (1) >From: AARON KONSTAM <79343382@TRINITY.BITNET> > Second, there is some combination of keys that one can hit on the keyboard > that puts one irreversibly in upper case.... The key being hit is probably "F1". Just hit it again to get out of [CAPS] mode. [[ Mr. Konstam wasn't clear about what environment he was using in which he got stuck. This solution works for SunView's shelltool. The one that follows works (I assume) for X. --wnl ]] Mike Sykora System Manager Computer Communications Research Lab Center for Telecommunications Research Columbia University e-mail: sykora@ctr.columbia.edu ------------------------------ Date: Tue, 08 Mar 88 15:53:24 PST From: Craig Leres <leres%lbl-helios@lbl-rtsg.arpa> Subject: Re: Stuck in caps lock (2) You didn't say, but I assume that you're running X10r4 on your Sun. If this is the case, here's the solution. The following comment is from the routine ConvertEvent() in libsun/events.c: /* * The static count is keeping track of how many * keys I have down for the given function. * Only need to do this for shift and meta. * On an up event I decrease the count. If it is * not the last one up then I convert to a down event * which really won't do anything. I should ignore * the event, but this works. * At odd times the sun keyboard gets confused and I * miss an UP event. This may get you stuck in * shift mode. I assume there is only 2 shift keys * and only two meta keys. If count ever goes above * 2 I make it 2 again, assuming I have missed an up * event. If you get stuck in shifted mode, just his * both shift keys and you should be fixed. */ So sometimes when you type too quickly (remember that you generate two key events for each key stroke) a shift down key event gets lost AND (I swear to God, it happened to me just now!) your X server becomes hosed. As we see from the above Enlightening Comment, pressing both shift keys at the same time resets things. Craig ------------------------------ Date: Thu, 3 Mar 88 07:57:50 EST From: suneast!ozone!murph@sun.com (Joe Murphy, Manager ECD Hardware) Subject: Re: Sun-4 bcopy warning ; fast 68020 bcopy > Although the main loop does do eight movl >instructions to move the data, the fastest possible version would be >completely unrolled; in other words, no looping at all....--wnl Sorry wnl, but I don't agree. You are ignoring the instruction cache of the 68020. Even with an external cache, it is better to fetch an instruction from the internal instruction cache than it is to fetch external to the chip (free's up the bus for actually moving the data for one thing). However an unwound loop is still a win because you don't have to break the pipe as often. The tradeoff is the cost of caching in the larger unwound loop (plus the replacement of other instructions you might have had to kick out) .vs. the increased speed of the loop you have cached in. The optimal amount of loop "unwinding" is dependent on how much data you want to move. I did an analysis a couple of years ago for the 3/110 color map update that showed for a total of 4.5 clocks on the read, and 4x4 clocks on the write (the color map is a byte-wide device), and for 256 x 3 bytes moved as long words, the optimal loop size was 31 movl's per iteration. Experimental results correlated with this quite well. For a generic routine where you don't know how much data you will be moving in advance, like bcopy, unwinding the loop a little bit is probably a good idea; 8, 16, or 32 would be my guess as to the "optimal" amount. [[ You are quite correct. I had forgotten about the instruction cache. This brings up an interesting point regarding a 68010 (if anyone cares about them any more): a bcopy whose main loop is small enough to get the '010 in loop mode might be faster than any amount of unrolling. --wnl ]] One thing to be wary of BTW on the Sun3 when considering user level "bcopy"'s is that the 3/2xx series has special bcopy hardware that the kernel takes advantage off to keep the large amount of non repeating sequential accesses from trashing the cache. The 3/xx and 3/1xx machines don't have external caches, and don't have any special hardware, so your best "bcopy" loop has a good or better chance of being optimal as the one available via the bcopy system call. For machines without internal instruction caches, you are correct, the optimal "loop" is the completely unwound one. -murph [[ Of course none of this has helped to solve the original problem.... what is an near-optimal bcopy for the Sun4? Is the one that Sun distributes in the library sufficient? --wnl ]] ------------------------------ Date: Fri, 26 Feb 88 09:27:17 EST From: uunet!dmnhack!phb@ut-sally.UUCP (Paul Breslin) Subject: Re: adding new clients The problems associated with adding diskless clients makes me wonder why Sun doesn't build 3/50's and 3/60's with an optional mini-winchester built in. (Something akin to a hard-card for the IBM-PC.) A small 30 or 40 Mb disk with a generic root and swap partition would save on network bandwidth and eliminate the hassles of configuration. You would boot it up, mount some NFS partitions, configure a custom kernel and be rolling within about an hour or two. Such small disks only cost a few hundred dollars (not counting Sun's huge markup) and wouldn't increase the cost too much. ------------------------------ Date: Fri, 26 Feb 88 14:11:38 PST From: dredge@cheshire.stanford.edu (Michael Eldredge) Subject: Bug in rpc.yppasswdd (with fix) Product: rpc.yppasswdd (versions through Sun OS3.5) This bug is in the 3.0, 3.4, and 3.5 versions. Problem: Incorrectly handles updates when the name is a subset of another. Example (more or less): /etc/passwd: ... jeffery:PASSWD1:100:10:Longer Name:/u/jeffery:/bin/csh jeff::101:10:Short Name:/u/jeff:/bin/csh ... % yppasswd jeff Old yppasswd:<cr> New yppasswd:passwd2 Again:passwd2 can't change passwd % yppasswd jeff Old yppasswd:PASSWD1<cr> # jeffery's passwd New yppasswd:passwd2 Again:passwd2 % /etc/passwd: ... jeffery:PASSWD2:100:10:Longer Name:/u/jeffery:/bin/csh jeffery::100:10:Longer Name:/u/jeffery:/bin/csh ... Note that entry for 'jeff' is gone! Very bad! Fix: In the source (which we took from 3.0 since that is the most recent that we have: There is a rewrite of the function "getpwnam()". When it compares the given name with each entry in the passwd file it just does a strncmp() with the length of the given name. Thus if the given name is shorter than (a subset of) an entry, strncmp() will match. The fix is to make sure that the lengths of both the given name and the name from the passwd file are the same and THEN do the strncmp(). diff rpc.yppasswdd.c rpc.yppasswdd.c-fix 265a266,267 > char *e; > char *index() ; 273,274c275,279 < while ((p = fgets(line, BUFSIZ, pwf)) && strncmp(name, line, cnt)) < continue; - - --- > while ((p = fgets(line, BUFSIZ, pwf))) { > e = index(line, ':') ; > if (e && (e-line)==cnt && strncmp(line,name,cnt)==0) > break ; > } Michael Eldredge Manager Electrical Engineering Computer Facility Stanford University dredge@hitchrack.stanford.edu [[ Thanks to Keith Vincent <keith%lccr.sfu.cdn@ean.ubc.ca> who also pointed out this bug. -wnl ]] ------------------------------ Date: Thu, 25 Feb 88 09:18:59 PST From: Bill Randle <billr@tekred.tek.com> Subject: New improved calentool Here is a copy of the new improved calentool, originally distributed on the Sun Users' Group tape (1987). See README2 for a list of new features and additions. -Bill Randle Tektronix, Inc. billr@tekred.TEK.COM [[ The source has been placed in the archives as two separate shar files: "sun-source/calentool.shar.1" and "sun-source/calentool.shar.2". They are 50197 and 48343 bytes, respectively. They can be retrieved via anonymous FTP from the host "titan.rice.edu" or via the archive server with the request "send sun-source calentool.shar.1 calentool.shar.2". For more information about the archive server, send a mail message containing the word "help" to the address "archive-server@rice.edu". --wnl ]] ------------------------------ Date: 8 Mar 88 18:44:53 GMT From: Sarah Metcalfe <sarahm@cognos.uucp> Subject: Monthtool A number of people sent me mail concerning Monthtool. Unfortunately, my mailtool crashed and I lost a lot of messages. Luckily the core file had all the headers in it, but I don't have paths for the following people: vixen!ronbo cfa247!joe esj@bikini.cis.ufl.edu emmy.umd.edu!dna@eneevax.umd.edu bdrc!jwc@mcnc.org hope.lanl.gov!dwf studguppy.lanl.gov!roberts esmond@msr.epm.ornl.gov allegra!dnelson Can these people please resend their messages? Thanks! Sarah Metcalfe decvax!utzoo!dciem!nrcaer!cognos!sarahm Cognos Incorporated P.O. Box 9707, 3755 Riverside Drive, Ottawa, Ontario, CANADA K1G 3Z4 (613) 738-1440 ------------------------------ End of SUN-Spots Digest ***********************