Sun-Spots-Request@RICE.EDU (Vicky Riffle) (07/02/86)
SUN-SPOTS DIGEST Wednesday, 2 July 1986 Volume 4 : Issue 20 Today's Topics: Compiling TeX on SUN2 running 3.0 UNDUMP for TeX for SUN3 Bug in 3.0 UDP checksums Beware The Profiler! (C bug) Mount deadlocks (really /etc/rc.boot) Sun as a timesharing box (2) Adding an Eagle-and-a-half to a Sun Panic: ifree SL on Sun3 Sun-2 and 3.0 questions? 3.0 strip run on 2.0 binaries? Finding out your hostname when you boot? Diskless SUNs with Gould server? ------------------------------------------------------------------------ Date: Thu 26 Jun 86 16:18:16-PDT From: Pierre MacKay <MACKAY@WASHINGTON.ARPA> Subject: compiling TeX on SUN2 running 3.0 I seem not to have been clear enough about the problem reported to me with compilations of TeX. They do not stem from the attempt to run 68020 binaries on a 68010 system. In fact the result of that attempt is simple and obvious. You get the message Cannot execute binary file Exec format incorrect. or something like that. It is baffling the first time when you hit it by accident, but not thereafter. The TeX compilation problem is more serious. It involves source on a SUN2, compiled on a SUN2, loaded on a SUN2 and run on a SUN2. If the principal compilation or loading were not done on a SUN2, I suspect that you wouldn't even get the appearance of the thing starting to run. So far as I can discover, there is no obvious way for 68020 code to get into this act, unless the libraries are all in 68020 based binaries. But the general tendency is for people to face the problem of a SUN2 server feeding a SUN3 node, rather than the other way around. Anyway, I do not think that most of the queries I have had are from people trying to run 68020 binaries on a 68010 machine. I haven't had the opportunity to try the trick myself yet, but I hope to. Incidentally, I suppose the possibility of a SUN2 compilation picking up sun3 library routines is going to take some serious thought, once general networking is assumed. Pierre ------- ----------------------------- Date: Fri, 27 Jun 86 12:49:37 CDT From: William LeFebvre <phil@titan.rice.edu> Subject: UNDUMP for TeX for SUN3 I got a copy of Barry's undump and added a few more things to it so that it will work with more than just ZMAGIC files. I'll place a tar file with the source on one of our machines here at Rice. By the time the Sun-spots readers get this, it will be available vis anonymous FTP (any password) from host "dione.rice.edu" in the tar file "public/undump.tar". Since our connection costs us money, please only FTP it between 6 p.m. and 7 a.m. on weekdays, and expect the transfer to take longer than normal. I also intend to send it in to unix-tex in the hopes that they will put it in the official Unix TeX distribution and make it available on one of their machines. William LeFebvre Department of Computer Science Rice University <phil@Rice.edu> ----------------------------- Date: Fri, 27 Jun 86 12:04:43 EDT From: Steve D. Miller <steve@gyre.umd.edu> Subject: Bug in 3.0 UDP checksums The UDP checksum code in Sun 3.0 is incorrect. As shipped, UDP checksums are turned off (presumably for performance reasons); if checksums are turned on (by setting udpcksum to 1 in the kernel using adb), however, the Sun code cannot talk to itself or any other UDP implementation. The problem is that the checksum code in udp_output() is confused about the UDP/TCP ip overlay versus the true IP header; the code sets the overlay up properly for checksumming, incorrectly sets the header length field (changing a zero field in the overlay to something other than zero), then checksums the now incorrect packet. Protocol-wise, the packet looks fine, but the checksum will always be wrong. Since the header length is set in udp_output() only so that the fast loopback code (code that detects things destined for loopback interface and hands such packets directly to udp_input()) will work correctly, the fix is to set the header length in the fast loopback code (fix for vanilla Sun 3.0 sources): *** bad udp_usrreq.c Fri Jun 27 11:47:57 1986 --- udp_usrreq.c Fri Jun 27 10:57:25 1986 *************** *** 196,202 ui->ui_sport = inp->inp_lport; ui->ui_dport = inp->inp_fport; ui->ui_ulen = (u_short)ui->ui_len; - ((struct ip *)ui)->ip_hl = sizeof (struct ip) >> 2; /* * Stuff checksum and output datagram. --- 196,201 ----- ui->ui_sport = inp->inp_lport; ui->ui_dport = inp->inp_fport; ui->ui_ulen = (u_short)ui->ui_len; /* * Stuff checksum and output datagram. *************** *** 221,226 ui->ui_src.s_addr = ui->ui_dst.s_addr; } ((struct ip *)ui)->ip_len -= sizeof (struct ipovly); udp_input(m); return (0); } --- 220,226 ----- ui->ui_src.s_addr = ui->ui_dst.s_addr; } ((struct ip *)ui)->ip_len -= sizeof (struct ipovly); + ((struct ip *)ui)->ip_hl = sizeof (struct ip) >> 2; udp_input(m); return (0); } I'd guess that having UDP checksums turned off when running UDP across an ethernet is probably OK, since (after all) most packets which arrive do so correctly; I think that if I was running UDP over a serial line or something, though, I'd much rather have the checksums on... It should be noted that the ku_fastsend routine used to send NFS/RPC/UDP packets quickly in the NFS implementation also avoids checksums. This routine should (perhaps) be disabled in similar situations, depending on one's level of paranoia. -Steve Spoken: Steve Miller ARPA: steve@maryland Phone: +1-301-454-4251 CSNet: steve@umcp-cs UUCP: {seismo,allegra}!umcp-cs!steve USPS: Computer Science Dept., University of Maryland, College Park, MD 20742 *********************** ----------------------------- Date: Wed, 2 Jul 86 12:38:36 PDT From: gerolima@Ford-wdl1.ARPA (Mark Gerolimatos) Subject: BEWARE THE PROFILER! (C bug) My partner, Ron Barack (give credit where due...), discovered the following problem... Given the following program... ------------------------------------Rip Here------------------------------------ main() { float func(); x = func(); printf(" x = %f\n",x); } float func() { float val; val = 1.e-20 * 1.e-21; return val; } ------------------------------------Rip Here------------------------------------ Compiled with no options, the output looks like: x = 0.000000 <----which is correct. BUT, with the -p option, it comes out as: x = 0.003403 <----oops. With a little more investigation, I found out that the answer was correct all the way up to the printf. "x = func()" sets x to the correct value. Sew, vhatz going on, here? The problem is with "libc_p.a". Fstod (Floating-point Single TO Double), to be exact (we used the -fsoft, the default, option). When func() returns, it's value is Fdtos'd, and placed in x, and then x is Fstod'd, and passed to printf (Gosh, how C can be inefficient when you don't use -fsingle!). I checked out libc.a's Fstod, and libc_p.a, and couldn't find much difference between the two (not including the mcount code, of course) EXCEPT for "UNLK A6" just before an RTS (when the value == 0). To whit: (libc.a) cmpl #0,a0 bges Fstod+0x36 orl #0x80000000,d0 rts (libc_p.a, Fstod+0x40 - Fstod+0x4e) cmpl #0,a0 bges Fstod+0x4e orl #0x80000000,d0 unlk a6 <--this DOES get called rts Could that be the problem? -Mark "For almost a quarter of a century..." "...Change Baby, Don't Worry!... Mark Gerolimatos ...Welcome! Welcome!... ARPA: gerolima@ford-wdl1.arpa ...Change Baby, Don't Worry!... UUCP: {sun,fortune}!wdl1!gerolima ...Box! Box! Box! Box!... AT&T: (415) 852-4105 ...Now, We Say Good-Bye... Mail: c/o Ford Aerospace ...Welcome to the GALATT... 3939 Fabian Way ...G-A-L-A-T-T We're GALATT..." Palo Alto CA 94306 -English phrases from a Japanese song Mail Stop X20 ----------------------------- Date: Fri, 27 Jun 86 01:47:02 EDT From: Broadway's Streetsweeper <dupuy%amsterdam@columbia.edu> Subject: Mount deadlocks (really /etc/rc.boot) >John Bruner <jdb@s1-c.arpa> writes: > >Another problem Sun UNIX has with mounts occurs in "/etc/rc.boot". >This file disobeys the rule that no changes be made to a corrupted >filesystem until it is cleaned. Specifically, zeroing "/etc/mtab", >adding an entry for "/", and mounting "/pub" is a no-no if the root >hasn't been salvaged yet. > > { hacked /etc/rc.boot included here} > >[I think a better solution would be a path to single-user which avoids >"rc.boot" entirely.] An undocumented feature of 2.x /etc/init is the "-b" flag which causes the /etc/rc.boot file to be skipped. I don't know if this is also in 3.x, but I expect it is. All you need to do if your mtab is busted is "b vmunix -bs". This is perhaps easier than patching rc.boot, especially under 2.x, which lacks the System V style "sh" with builtin "test" that comes with 3.0. If you want to patch rc.boot for a 2.x system anyhow, you can % set noglob % mount localhost:/ /mnt % cp /bin/test /mnt/pub/bin % ln /mnt/pub/bin/test /mnt/pub/bin/[ % umount /mnt % unset noglob and your tests will work, even before you mount /pub. By the way, what is the point of moving all the hostname and ifconfig stuff into rc.boot, instead of leaving it in rc.local where it belongs? Do you *really* need this to make the "umount -at nfs" work (it sends a clnt_broadcast to all rpc.mountd's to clear their rmtabs)? Maybe someone could clarify this. @alex ----------------------------- Date: Sat, 28 Jun 86 14:01:13 EDT From: Barry Shein <bzs%bu-cs.bu.edu@CSNET-RELAY.ARPA> Subject: Sun as a timesharing box (1) Since January BU-CS.BU.EDU has been two SUN3/180s running as time-sharing systems (the other which doesn't much show to the outside world is BUCSD.BU.EDU.) Not only is it possible, we are quite pleased. We replaced a VAX/780 (8MB, RP07, RA81, FPA etc) with the two systems and the large loads we used to suffer with on the 780 are gone, it's rare to see it go above 2.00 or 3.00 and even then it doesn't seem to much matter. The configuration is 8MB, 2 Eagles each. One has a 16-port ALM and the other has one now but will soon have 2 X 16. We have a SUN 6250 drive, etc. Until our 4MB expansions arrived we ran with 4MB each, that was pretty mediocre, 8MB seems to work very well though. I think if I did it again I would order 16MB but it doesn't seem very critical, we'll probably expand one of these days. All logical disks are cross-mounted through NFS such that it doesn't matter which you log onto, your home directory will be transparently accessible through your passwd entry. Fortunately our terminal switch (Ungermann/Bass Net/1 broadband) randomizes which port it grabs so all ports have the same system name, users just land on one or the other. I found performance is better if most binaries are locally available so each system has its own /usr/ucb, /usr/bin etc, there are a few cross-links here and there. These are just tuning issues and can be handled on-line. The mail system is all set up to help transparency. We typically have 15-20 logins per system. I must say that many of our users are *not* doing development, this is a faculty machine. A lot of word processing, a lot of idle time. In contrast, I would expect a student community to be different. On the other hand, we have a lot of general systems work, grad students etc, usually this huge lisp job running in the background on at least one machine perpetually, BU-CS is the campus mail relay and USENET relay, it's not totally idle...maybe it just does the job well? The nice thing of course is that I'll probably soon add a third SUN3/180 to the 'cluster' to absorb the Math department here, maybe a fourth later as users start to acquire diskless nodes, how to grow for a while is obvious and should be painless. I am also looking ahead at the new 2.2(?)MB/s 451 SMD disk controllers and the rumored 25MHZ SUN/? board as relatively inexpensive performance boosters tho right now I wouldn't need them really. Yes, it works fine, I can't tell you what your upper limits will be but certainly it handles 16 users very well. If you're on the ARPAnet feel free to finger @bu-cs.bu.edu/bucsd.bu.edu during the day, our finger shows load average, logged in users, idle times and what people are running (TWENEX style.) -Barry Shein, Boston University ----------------------------- Date: Tue, 1 Jul 86 10:26:13 edt From: Ken Mandelberg <km%emory.csnet@CSNET-RELAY.ARPA> Subject: Sun as a timesharing Box (2) I want to thank the several people who responded to my note about using Sun's for timesharing. Most of the responses suggested other Unix machines that are more traditional for timesharing (Pyramid, Sequent, etc). However, I would like to both clarify the motiviation behind my original request, and ask a technical question. Motivation: We currently use two Vax 780's to handle classes, assigning half of each class to each machine. Loads peek at about 20 students on each machine. Often the loads are quite acceptable with lav's under 5. There are also bad times when the lav soars between 10-20. It of course depends on the class, the assignment, and how close to the due date is. Timesharing on dumb terminals is a good match for this particular set of classes, although we use (and will increase the use) of Sun (and other) workstations for other class and research applications. Right now we are looking at replacing the 780's since they cost a mint to maintain. I was not suggesting that we run 40 users on one Sun. I was probably thinking more of handling the 40 users equally well (or badly) with say 3 Sun 3's. Maybe one Sun as a NFS file server with student files and big disks, and two others as CPU servers for logins with smaller (but fast) local disks for /,/usr, /tmp , and paging. The fileserver could do some other things like batch troff at low priority and handle some peripherals (printers, tapes, etc). This may not be a good solution, but I am wondering how it would compare to two 780s doing the same thing.I have nothing against other (and probably better) timesharing solutions. All things being equal I like Sun because they are clearly a leader in the Unix community, and the task of keeping current with applications software, compiler and OS eccentricities, and system administration is easier with a single CPU family. Unfortunately, I suspect the answer is that all things are not equal. ** Technical Question: I got several repsonses to the effect that Sun's are workstations and just not well designed for timesharing. No real reasons were given. I would like comments on comparing a Sun well configured for timesharing with say a Vax 780. My thoughts are: 1) The 68020 is a faster CPU than the 780 CPU, and seems to be as reaasonable a Unix processor as the Vax CPU. 2) The Sun 3 VME bus is faster than the Unibus. 780s have a fast Masbus, but recent DEC configurations tend to emphasize Unibus peripherals. 3) The DEC terminal multiplexors are nothing special. The DZ's are terrible, the DH/DMF's do DMA but do little in the way of FEP processing to unburden the CPU. I don't know how good or bad the Sun multiplexor's are. Presumably they do DMA and are similar to DH's. 4) The DEC disks are nothing special either. It seems that people that use third party controllers and Fujitsu disks on Vaxen are at least as happy as those that use the UDA/RA Dec disks. Figure the Sun can use a fast eagle, though I don't how good or bad the controller for it is. 5) The DEC ethernet boards sit on the slow Unibus and contend with other peripherals. The Sun ethernet controller is on the cpu/board (I think), and doesn't contend for the VME bus. The DEC DEUNA is slow (though more recent replacements are faster). On the otherside I recognize that 780's can be equipped with multiple Unibus adaptors, and I guess (??) the Sun can't. It also appears that most Sun periperals end up using a Multibus adapter which presumably slows it down. Discussion is solicited. Ken Mandelberg Emory University Dept of Math and CS Atlanta, Ga 30322 {akgua,sb1,gatech,decvax}!emory!km USENET km@emory CSNET km.emory@csnet-relay ARPANET ----------------------------- From: telesoft!pilotti@sdcsvax.ucsd.edu Subject: Re: Adding an Eagle-and-a-half to a Sun? Roy, I can verify that third-party Eagles can be successfully connected to a Sun-3/160,180. Simply match the jumpers on the three Eagle PC cards to the jumpers on the Eagle that came from Sun. You can use standard SMD command & data cables to daisy-chain the disks together. It isn't as pretty as getting the fancy connectors and modified VME-adapter faceplate (if such a beast exists for two drives), and it works. Note that this if for "standard" Eagles; we haven't tried mixing flavors on one controller. Diag> away! +KP ------------------------------------------------------------------------ /+\ Keith P \ <Pilotti@TeleSoft.COM> (Internet) / 10639 Roselle Street \+/ TeleSoft \ 1+(619) 457-2700 x172 (Voice) / San Diego, CA 92121 ------------------------------------------------------------------------ ----------------------------- Date: Tue, 1 Jul 86 13:06:06 PDT From: fluke!jeff@uw-beaver.arpa (Jeff Stearns) Subject: panic: ifree In SUN-Spots Digest v4n19, John Bruner writes: > BTW, ever since we switched to 3.0FCS we've been having frequent crashes > in "ifree". "fsck" ref ses to recover these because the link count in > some inode is too small (i.e. there are more links than the inode > indicates). Usually one of the links is "/etc/mtab" or "/etc/rmtab". > Without source code I've been unable to figure out what is happening. > This does not happen in Sun 2.0 or on our NFS VAXes (where I do have > source code). Has anyone else experienced this problem? > -- > John Bruner (S-1 Project, Lawrence Livermore National Laboratory) > MILNET: jdb@mordor [jdb@s1-c.ARPA] (415) 422-0758 > UUCP: ...!ucbvax!decwrl!mordor!jdb ...!seismo!mordor!jdb Yup. We've had this problem too. And it gets worse as the disks get busier. It will also happen in release 2.0 and 2.2 if your fileservers get busy enough. For releases 2.0 and 2.2, the fix was to put each of our fileserver's disks on a separate xylogics 450 controller. I strongly believe that this will prove to be the case with release 3.0. I also suspect that Sun may not tell you this. They seem to have some real blind spots with respect to this nasty bug. Jeff Stearns John Fluke Mfg. Co, Inc. (206) 356-5064 ----------------------------- Date: Fri, 27 Jun 86 11:43:59 PDT From: guy@SUN.COM (Guy Harris) Subject: SL on Sun3? > I am trying to bring SL (Serial Line IP) up on a Sun 3/52 running > Sun Unix 3.0. When i make a kernel with the mods, the sun seems to behave > alright, but when i bring up Suntools, and type a character into a window, > i get this error : 'ws_read_indev length error 1', whiel everything else > seems alright... I suspect you installed the non-Sun version of SLIP. What line discipline number does it use? Line disciplines 5 and 6 in Sun UNIX are not available for user-supplied line disciplines such as SLIP; they are used for mouse and keyboard ports, respectively. Sounds like you may have installed SLIP as line discipline 6 (or maybe 5). > The other end of the ttyline (which is what SL uses) is connected to a > VAX 11/750 with 4.2 BSD which already has SL running. I JUST got SL from > seismo.arpa, so it is definitely the 'latest' version. Last time I looked, Rick supplied Sun versions of some of the SLIP source files, including "tty_conf.c" (there are other differences in Sun UNIX which require other modules to change, as well). ----------------------------- Date: Thu, 26 Jun 86 18:43:08 mst From: "Roger Hayes" <rogerh@arizona.edu> Subject: Sun-2 and 3.0 questions Should we convert our Sun-2's to 3.0? Is performance adequate? What configuration is required? How much meory is useful? Will the Lucasfilms/Pixar software recompile? (See Hawley, Portland Usenix Proceedings, for description). Are they still using Suns? Is anyone else using their software? Thanks, Roger Hayes rogerh@arizona.edu ...!{ihnp4,ucbvax}!arizona!rogerh ----------------------------- From kgk%brown.csnet@CSNET-RELAY.ARPA Tue Jul 1 11:31:05 1986 From: dbo%textset@eecs.umich.edu Date: Tue, 1 Jul 86 19:18:27 EDT Subject: 3.0 strip run on 2.0 binaries Here's a good one. Take an unstripped 2.0 binary and strip it with 3.0 strip. Every case I try generates an unusable binary. Trying to execute the resultant binary causes the famous pid xxx: killed due to swap problems in getxfile: i/o error mapping page. error. Is this written down some place that I didn't notice? -Doug ----------------------------- Date: Tue, 1 Jul 86 22:16:42 edt From: seismo!allegra!phri!roy@SALLY.UTEXAS.EDU (Roy Smith) Subject: Finding out your hostname when you boot There is something I don't understand about what happens when a diskless client boots. When you turn the power on, the only thing a client knows is its ethernet address. Then it does some arp stuff and finds out its internet address. I don't quite understand how arp works, but I can deal with the fact that it does. What I don't understand is why in /etc/rc.boot, the hostname is hard-wired into the file system. First thing it does is "hostname=foo" and then later is runs "/bin/hostsname $hostname". Since the machine has managed to find out its internet number, isn't there some way for it to find out its hostname over the network as well? I could envison something like: hostname=`ypcat hosts.byaddr | egrep $inaddr | sed "s/[0-9 .]//g"` being the first line of rc.boot, but I can't see any way to get the internet address available to a user program. I suppose you could do some poking around in /dev/kmem to dig it out, but that seems too disgusting to think about. ----------------------------- Date: Mon, 30 Jun 86 12:50:41 -0300 From: Leonid Rosenboim <leonid@taurus.BITNET> Subject: Diskless SUNs with Gould server Has anyone experienced using diskless SUN with a GOULD PN/6000 as a server ? In particular, does GOULD run ND ? Are there any problems with NFS ? Two binary copies are needed, what about manuals etc. ? What is the performanceratio between a Gould or SUN-2/180S server ? Leonid Rosenboim CS. Departament, System group, Tel-Aviv Univ. UUCP: humus!taurus!leonid ARPA: leonid%taurus@wiscvm.ARPA ----------------------------- End of SUN-Spots Digest ***********************