Sun-Spots-Request@RICE.EDU (William LeFebvre) (11/07/87)
SUN-SPOTS DIGEST Friday, 6 November 1987 Volume 5 : Issue 57 Today's Topics: Sun User Group Special Interest Coordinators Wanted Re: Hard/ECC errors (2) Re: Problems with modems and ALM's Re: NAG etc. on Suns SunOS and Sun 2's WORMs on Suns Help with spurious level N interrupt Problem with SunOS 3.4 and Adaptec controller SunOS 3.4 consuming too much swap space Sun sendmail problems Problems with second ethernet board on a 3/180 Problem destroying panels in SunView 3.2 /etc/ethers wierdness Documentation for dbx symbols? ---------------------------------------------------------------------- Date: Mon, 26 Oct 87 17:49:46 PST From: weiser.pa@xerox.com Subject: Sun User Group Special Interest Coordinators Wanted Sun-Spots serves the overall Sun community (at least those with net access), but there are more specialized interests that it does not make sense to discuss in Sun-Spots, and there are many users who do not get a chance to see Sun-Spots. Examples of special interests include "Suns at Home", "Fortran", "Graphics", "Games", "Modula", "Common Lisp", "X Window System", "NeWS", "Device Drivers". There are many others. In order to better coordinate and communicate among people with these interests, and in order to reach more people not necessarily accessible electronically, the Sun User Group (an organization independent from Sun Microsystems) is looking for enthusiastic, knowledgable volunteers to be special interest group coordinators. We'll offer you publicity, email (if you don't have it already), and space in the Sun User Group Newsletter ("README") for a column. We'll also put you in touch with a Sun Technical person in your Special Interest area who can help answer questions. What you get out of it is contact with other people with your interests, fame, and glory. The best coordinators will often be people who are already too busy, but the coordinator's job need not be burdensome. For instance, it could be as little as being a focus for email and snailmail on a given topic, and every quarter picking the 5 most interesting letters and sending them to the README newsletter editor for inclusion in your column. Other options are also possible--you tell us! If you think you might be interested, send me email. I would like to have several coordinators in place for the Sun User Group conference in San Jose in December. At this conference there are always special interest group meetings between Sun users and Sun technical people, and that will be a good time to start into the coordination job. -mark weiser.pa@xerox.com sun!weiser ------------------------------ Date: Mon, 26 Oct 87 13:20:55 PST From: Doug Moran <moran@laguna.ai.sri.com> Subject: Re: Hard/ECC errors (1) At the end of September 1987, we had a problem with our disk that took us considerable time to diagnose. The problem turned out to be a badly soldered lead from the power supply to the backplane. I expect that there are a number of other Suns out there with similar defects, so I am passing on the symptoms to save some other poor soul what we went through. The problem and our diagnosis were reported to Sun, but I don't know how quickly this information propagates. Our system is a Sun-3/180, which began life as a VME Sun-2 (serial number 628C1073). It has two Eagles and is serving 8 clients. We are running with the disks very near full all the time. The symptoms were that we would get a bunch of bad blocks in the "active" disk partitions: root, swap and user areas. There was no apparent pattern to the distribution of bad blocks within the partitions. We had multiple failures over a 2 week period. We replaced the disk, disk controller and cables, to no effect. Running diagnostics never revealed any problems. The only pattern to the failures was that they seemed to occur about the time a partition became completely full, when multiple clients were paging heavily, or when we did a full dump (arrrrgh). This led us to suspected a power problem, but our initial measurements on both the backplane and the wall power revealed nothing. Finally, we got lucky an caught a reading when the power was low. Two clues: (1) the cpu got into an "impossible" state, with all the status lights lit, and (2) the disk controller reported the disks as being off-line. The problem was that the -5V lead was very badly soldered at the point where it entered the backplane. When we found the problem, the reading was -2V. Most of the time, contact was "good enough". We surmise that when the disks got very busy, the vibration would cause momentary disruptions that would result in bad data being written to the disk and in the controller reporting hard/ECC errors. In the absence of the vibration, the lead settled back to making an adequate connection. The disk diagnostics programs never tickled this problem because they didn't produce enough head movement (as a friend observed, "are we now seeing the evolution of diagnostic-resistant strains of computer bugs?") Since resoldering the lead, we have had no problems. Douglas B. Moran AI Center, SRI International ------------------------------ Date: Fri, 30 Oct 87 10:31:44 EST From: ajb%mwcamis@mitre.arpa Subject: Re: Hard/ECC errors (2) Greg Tarsa writes: >I have 2 Fuji 2322 disks connected to a xy450 and I periodically get a >slew of error #6 "ECC/Hard errors" which, when they appear in the swap >area cause the system to panic. We had similar problems on our 2 Fuji's connected to a xy450. If you are running SunOS 3.2 your problems may be due to the rewrite of the XY450 driver which assumes that your controller is capable of overlapped seeks. If you have an old xy450 controller which is not capable of overlapped seeks, you have to disable them in the driver, otherwise disk errors will start happening apparently at random. See page 67 of the "Release 3.2 Manual". Also see the xy man page. Let me know if this works. ------------------------------ Date: Wed, 28 Oct 87 11:22:46 PST From: Brent Chapman <capmkt!brent@lll-tis.arpa> Subject: Re: Problems with modems and ALM's >I have a few dialup lines connected to an ALM. Everything works fine, >however if a person does not log out properly, they remain logged into the >machine even though the line has dropped. On the CPU serial ports, the >login shell is terminated when the line drops. I have wired my modems >according to the ALM manual, what else is necessary? Is the "flags" field for the ALM in your kernel configuration file set properly? You have to twiddle the flags word to enable hardware carrier detect (which causes the tty to reset and associated processes to die and a new login to be set up and so on) on the ports which you have modems on. This is all explained in the Sun "System Administration" Manual, in the chapter called "Adding Hardware to Your System". It explains the things that need to be done to UNIX to configure serial ports for dialin/dialout modems. -Brent Brent Chapman Senior Programmer/Analyst ucbvax!lll-tis.arpa!capmkt!brent Capital Market Technology, Inc. capmkt!brent@lll-tis.arpa 1995 University Ave., Suite 390 Phone: 415/540-6400 Berkeley, CA 94704 ------------------------------ Date: Wed, 28 Oct 87 10:42 EDT From: SYSRUTH@utorphys.bitnet Subject: Re: NAG etc. on Suns I am posting this to the list since I could not get through directly. The Numerical Algorithms Group (authors of NAG) have a version for Sun 3's (at least). It is right up-to-date with Mark 12, and can be obtained from the same place as the VAX version: Carolyn Smith N.A.G. 1101 31st Street, Suite 100 Downers Grove, IL 60515-1263 (312) 971-2337 Cost is about $60 U.S./mo. (Beware, there are 4 libraries depending on which floating-point option you use, and even without the source they take up a lot of space). This is not public domain software, so you should not be able to get it from anyone else. I don't know for sure about the others, but if the companies have any sense, they will have Sun versions available. An awful lot of people who depend on these routines are using Suns these days. Ruth Milner Systems Manager University of Toronto Physics P.S. Public-domain stuff you can probably get from netlib@anl-mcs.arpa. (Source code for individual routines only). ------------------------------ Date: 29 Oct 87 15:33:59 GMT From: rti!shaddock@mcnc.org (Mike Shaddock) Subject: SunOS and Sun 2's We heard recently that SunOS 3.4 will be the last release of SunOS that will be supported on the Sun 2. Since we were considering renewing our maintenance agreements, we would like to know if this is true. Mike Shaddock {decvax,seismo,ihnp4,philabs}!mcnc!rti!shaddock shaddock@rti.rti.org ------------------------------ Date: Wed, 28 Oct 87 17:54:05 EST From: sundc!chessie!bwong@sun.com (Brian Wong - TSE Washington DC) Subject: WORMs on Suns Sun has been shipping drivers for the Optimem (1000?) optical disk drives for some time as a consulting special (which means that it is extra-price). It's pretty reasonable, and the performance is at least reasonable. Optimem, of course, supplies the drives. Both are listed in the Catalyst 3rd-party catalog, if you have that. ------------------------------ Date: Sun, 01 Nov 87 23:38:47 EST From: Steve Dyer <dyer@athena.mit.edu> Subject: Help with spurious level N interrupt OK, here's the background. I've written a device driver for the Sun 3/160 for a Multibus magtape controller which is plugged into a Sun VME-to-Multibus converter. This board had heretofore only been plugged into Intel iRMX Multibus systems. Most everything works fine. However, on two different Suns at different sites, after some variable length of time of operation, the system will lock up with the error message: Spurious level 3 interrupt Spurious level 3 interrupt . . . ad infinitum. Only a reboot can clear it. This is correlated with tape activity, although even simple operations like a shell loop which performs "mt weof 100" repeatedly can cause it, as will more typical data transfer operations. Moving the Multibus interrupt jumper to another level is tracked by the interrupt level reported in the error message changing. The engineer reports that moving the magtape board to a position BEFORE the SCSI controller will increase the MTBF of this occurring (although he doesn't claim the problem goes away.) Now, I am not an expert on Suns, although I have written many device drivers for VAX and 11/70 systems, and am familiar with all flavors of UNIX kernels. I know that (at least) the error message is directly generated by the trap() routine in response to the 68020 "spurious interrupt" vector. This is supposedly triggered by a device failing to provide an interrupt vector in response to the hardware interrupt acknowledge handshake. Now, I'm telling these folks that this is "hardware"--their problem, and that there is no way that my software could cause this; that in fact, my driver can't possibly get called at this point to have any effect on the situation. Since I'm not bit-level familiar with the Sun 3 hardware, I'm wondering if there are any subtleties which I'm missing. As far as I can see, the IACK/interrupt vector interactions are handled completely by the CPU and converter hardware, and this occurs on at least two different Sun VME-to-Multibus converters as well as another manufacturer's. A logic analyzer attached to the VME-to-Multibus converter seems to indicate that the IACK isn't getting to it (if I can trust the hardware guy from this company.) Yet, this problem has not been seen to occur when a Xylogics XT tape controller is plugged into the same converter on one of the systems. All quite confusing. As an alternative way to handle this (by doing an endrun around the entire vectoring issue), I was wondering if it's possible to configure a Sun 3 system to use the "autovector" polling routines for a particular interrupt level. This is alluded to in the device driver writer's manual, but exactly how to do this for a device which is listed in the config file as "vme16d16" isn't clear to me. It seems that the VME-to-Multibus board will always try to send an interrupt vector, and omitting the interrupt vector specification from the config file only guarantees that you'll crash upon receipt of a now-unknown interrupt. Can this method be used with Multibus boards which plug into a Sun 3 using the VME bus converter? If so, how do you do it? Any words of wisdom would be welcome, either to point out what I might be doing wrong, whether the primary problem could at all be due to faulty software on my part, or whether I'm justified in throwing the problem back in their laps as being due solely to hardware. If it *is* hardware, any theories of what might be going wrong specific to the Multibus board would be welcome. Steve Dyer dyer@harvard.HARVARD.EDU dyer@spdcc.COM aka {harvard,wanginst,ima,ihnp4,bbn,m2c}!spdcc!dyer ------------------------------ Date: Mon, 26 Oct 87 15:53:47 EST From: Dan Trinkle <trinkle@purdue.edu> Subject: Problem with SunOS 3.4 and Adaptec controller We converted all our machines to SunOS 3.4 recently. I was not able to get the generic kernel to boot on one machine only. It is a Sun 3/50 with a Micropolis 1325 disk and an Adaptec controller. The kernel will boot as far as the disk probe, then do one of two things. Either it will find a corrupted label on the disk and die, or it will pause (~10 seconds), correctly print the drive type (Micropolis 1325), probe the remaining devices, then die trying to mount the root partition (in the kernel) with a read error. This machine is perfectly happy running SunOS 3.2 kernels (of many configurations) without any noticable problems. It passes preliminary diagnostics. We have a Sun 3/50 with an Adaptec controller and a 1355 that boots with no problems. We have a Sun 3/50 with an Emulex controller and a 1325 that boots with no problem. What I would like to know is if anyone else has had similar problems or if anyone can confirm that such a configuration will work with SunOS 3.4. I would gladly contact Sun support about this, but our maintenance contract agreement is being held up by local red tape (no fault of Sun). Daniel Trinkle trinkle@cs.purdue.edu ARPA Computer Science Department trinkle%purdue.edu@relay.cs.net CSNET Purdue University {ucbvax,decvax,ihnp4}!purdue!trinkle UUCP West Lafayette, IN 47907 (317) 494-7844 PHONE ------------------------------ Date: Mon, 26 Oct 87 16:06:12 EST From: Dan Trinkle <trinkle@purdue.edu> Subject: SunOS 3.4 consuming too much swap space Since converting to SunOS 3.4, many users have noticed they are running out of swap space. On one Sun 3/110 (with a Micropolis 1325 local disk and a standard 16MB swap partition) was so bad, we booted the SunOS 3.2 kernel. After starting up suntools, the swap space usage was less than half for the 3.2 kernel (same user binaries, just different kernel). With 3.4, the swap used was ~9MB, with 3.2 is was ~4MB. Has anyone else noticed this problem? The swap segments are also allocated in a different arrangement. I would ask Sun support, but due to local red tape, our maintenance agreement is not up to date (no fault of Sun). Daniel Trinkle trinkle@cs.purdue.edu ARPA Computer Science Department trinkle%purdue.edu@relay.cs.net CSNET Purdue University {ucbvax,decvax,ihnp4}!purdue!trinkle UUCP West Lafayette, IN 47907 (317) 494-7844 PHONE ------------------------------ Date: 23 Oct 87 21:04:30 GMT From: roberts@edsews.eds.com (Ted Roberts) Subject: Sun sendmail problems During an install of smail 2.5 on a Sun 3/280 running version 3.2 of the operating system, I ran into a problem with the domainname macro (normally $D). Even though the domainname is defined in sendmail.cf, Sun appears to redefine it to be the Yellow Pages domainname. Has anybody found a resonable workaround for this problem short of redefining the YP domainname. Ted Roberts EDS TSD roberts@edsews.EDS.COM cbosgd!edstb!edsews!roberts ------------------------------ Date: 26 Oct 87 15:21:51 GMT From: dimitri%cui.UUCP%cernvax.bitnet@jade.berkeley.edu (KONSTANTAS Dimitri) Subject: Problems with second ethernet board on a 3/180 We recently purchased a 3/180 server in order to use it as gateway machine between two ethernet networks. For that reason a second ethernet board was suplied with it. Trying to install the the second board (always following the manuals!) the following problems apeared: 1. We did not knew the etherent address of the second board. (is there any way that we can get it?) 2. when the second board was initilalized with a call /etc/ifconfig ie1 <arbitary ethernet address> <hostname> -trailers up everything was working until a request on the newly mounted network was issued. After that the first network became invisible. Turning off the second network (/etc/config ie1 down) and trying to give "rup" we got the reply "network unaccesible" bringing again up ie1 and giving "rup" we could only see the second network. Does anyone has any idea what we did wrong? Can it be a problem on the kernel or just the second board is installed in the wrong slot? Dimitri Konstantas Centre Universitaire d' Informatique University of Geneva ------------------------------ Date: Tue, 27 Oct 87 11:07:09 GMT From: Eric Ole Barber <mcvax!nw.stl.stc.co.uk!sizex@uunet.uu.net> Subject: Problem destroying panels in SunView 3.2 This problem has been reported to Sun without any fix (as yet). I'm trying to write a tool over SunView 3.2, and I'm having problems destroying panels. Does anyone out there know of a fix, workaround, or explanation ? I have a popup containing a panel, and I want a button in that panel to say 'QUIT' and make the popup go away. The only way I can find to do that is to window_destroy() it from the button's handler (I want to destroy it, not just make WINDOW_SHOW FALSE, because I won't be needing it again). It nearly always works fine, but - just sometimes - I get a core dump as the panel is destroyed. The problem occurs unpredictably, and I presume it to be time-related. The stack trace seems to say that it dies in the handler for the timer that flashes the text caret. I *think* that the problem is that by the time the button handler has returned, the window is dead - but I wouldn't care to bet on it... Another possible complication is that the popup also has text sub-windows and I've had to use a notify_interpose_destroy_func() to do a textsw_reset() when the popup is destroyed. Any suggestion very gratefully received. Chris Uppal STL, Copthall House, Nelson Place, Newcastle-under-Lyme, ENGLAND ST5 1EZ Chris Uppal <mcvax!nw.stl.stc.co.uk!cuppal@uunet.uu.net> ------------------------------ Date: Wed, 28 Oct 87 10:00:13 -0500 From: lrj@helios.tn.cornell.edu Subject: /etc/ethers wierdness Under SunOS 3.3 and 3.4 one cannot have any leading '0's in the hardware ethernet address field of /etc/ethers. If there ARE leading '0's, then the machine will not be recognized, and thus it will not boot if it's a diskless machine. For example, with this entry, machine losna will not boot: 8:00:20:00:07:b7 losna Yet, this entry WILL work: 8:0:20:0:7:b7 losna I have just mailed to sun!hotline regarding this. -- Lewis R. Jansen, LASSP Systems Grunt lrj@helios.tn.cornell.edu ------------------------------ Date: 26 Oct 87 20:58:42 GMT From: ucbvax!sun!megatest!djones@decwrl.dec.com (Dave Jones) Subject: Documentation for dbx symbols? Hello Sun people, This is not specificly a Sun question, but I'm not having any luck wringing the info out of the unix wizards or compiler groups, so maybe someone here will help. I am looking for documentation on the format of the ".stab" symbols that the -g option causes to be put into executables for dbx, etc. From time to time I have occasion to write a special purpose compiler, and I want the binaries to be dbx-able. Where do I go to get this info? I have heard a rumor that there is a published standard, but I don't know the name. Thanks, Dave Jones Megatest Corp. 880 Fox Lane San Jose, CA. 95131 (408) 437-9700 Ext 3227 UUCP: ucbvax!sun!megatest!djones ARPA: megatest!djones@riacs.ARPA ------------------------------ End of SUN-Spots Digest ***********************