[comp.sys.sun] Sun-Spots Digest, v5n57

Sun-Spots-Request@RICE.EDU (William LeFebvre) (11/07/87)

SUN-SPOTS DIGEST         Friday, 6 November 1987       Volume 5 : Issue 57

Today's Topics:
           Sun User Group Special Interest Coordinators Wanted
                         Re: Hard/ECC errors (2)
                    Re: Problems with modems and ALM's
                           Re: NAG etc. on Suns
                            SunOS and Sun 2's
                              WORMs on Suns
                   Help with spurious level N interrupt
              Problem with SunOS 3.4 and Adaptec controller
                 SunOS 3.4 consuming too much swap space
                          Sun sendmail problems
              Problems with second ethernet board on a 3/180
                 Problem destroying panels in SunView 3.2
                          /etc/ethers wierdness
                      Documentation for dbx symbols?

----------------------------------------------------------------------

Date:    Mon, 26 Oct 87 17:49:46 PST
From:    weiser.pa@xerox.com
Subject: Sun User Group Special Interest Coordinators Wanted

Sun-Spots serves the overall Sun community (at least those with net
access), but there are more specialized interests that it does not make
sense to discuss in Sun-Spots, and there are many users who do not get a
chance to see Sun-Spots.  Examples of special interests include "Suns at
Home", "Fortran", "Graphics", "Games", "Modula", "Common Lisp", "X Window
System", "NeWS", "Device Drivers".  There are many others.

In order to better coordinate and communicate among people with these
interests, and in order to reach more people not necessarily accessible
electronically, the Sun User Group (an organization independent from Sun
Microsystems) is looking for enthusiastic, knowledgable volunteers to be
special interest group coordinators.  We'll offer you publicity, email (if
you don't have it already), and space in the Sun User Group Newsletter
("README") for a column.  We'll also put you in touch with a Sun Technical
person in your Special Interest area who can help answer questions. What
you get out of it is contact with other people with your interests, fame,
and glory.

The best coordinators will often be people who are already too busy, but
the coordinator's job need not be burdensome.  For instance, it could be
as little as being a focus for email and snailmail on a given topic, and
every quarter picking the 5 most interesting letters and sending them to
the README newsletter editor for inclusion in your column.  Other options
are also possible--you tell us!

If you think you might be interested, send me email.  I would like to have
several coordinators in place for the Sun User Group conference in San
Jose in December.  At this conference there are always special interest
group meetings between Sun users and Sun technical people, and that will
be a good time to start into the coordination job.

-mark

weiser.pa@xerox.com
 sun!weiser

------------------------------

Date:    Mon, 26 Oct 87 13:20:55 PST
From:    Doug Moran <moran@laguna.ai.sri.com>
Subject: Re: Hard/ECC errors (1)

At the end of September 1987, we had a problem with our disk that took us
considerable time to diagnose.  The problem turned out to be a badly
soldered lead from the power supply to the backplane.  I expect that there
are a number of other Suns out there with similar defects, so I am passing
on the symptoms to save some other poor soul what we went through.  The
problem and our diagnosis were reported to Sun, but I don't know how
quickly this information propagates.  

Our system is a Sun-3/180, which began life as a VME Sun-2 (serial number
628C1073).  It has two Eagles and is serving 8 clients.  We are running
with the disks very near full all the time.  

The symptoms were that we would get a bunch of bad blocks in the "active"
disk partitions: root, swap and user areas.  There was no apparent pattern
to the distribution of bad blocks within the partitions.  We had multiple
failures over a 2 week period.  We replaced the disk, disk controller and
cables, to no effect.  Running diagnostics never revealed any problems.
The only pattern to the failures was that they seemed to occur about the
time a partition became completely full, when multiple clients were paging
heavily, or when we did a full dump (arrrrgh).  This led us to suspected a
power problem, but our initial measurements on both the backplane and the
wall power revealed nothing.  Finally, we got lucky an caught a reading
when the power was low.  Two clues: (1) the cpu got into an "impossible"
state, with all the status lights lit, and (2) the disk controller
reported the disks as being off-line.  

The problem was that the -5V lead was very badly soldered at the point
where it entered the backplane.  When we found the problem, the reading
was -2V. Most of the time, contact was "good enough".  We surmise that
when the disks got very busy, the vibration would cause momentary
disruptions that would result in bad data being written to the disk and in
the controller reporting hard/ECC errors.  In the absence of the
vibration, the lead settled back to making an adequate connection.  The
disk diagnostics programs never tickled this problem because they didn't
produce enough head movement (as a friend observed, "are we now seeing the
evolution of diagnostic-resistant strains of computer bugs?") 

Since resoldering the lead, we have had no problems.

Douglas B. Moran
AI Center, SRI International

------------------------------

Date:    Fri, 30 Oct 87 10:31:44 EST
From:    ajb%mwcamis@mitre.arpa
Subject: Re: Hard/ECC errors (2)

Greg Tarsa writes:

>I have 2 Fuji 2322 disks connected to a xy450 and I periodically get a
>slew of error #6 "ECC/Hard errors" which, when they appear in the swap
>area cause the system to panic.

We had similar problems on our 2 Fuji's connected to a xy450. If you are
running SunOS 3.2 your problems may be due to the rewrite of the XY450
driver which assumes that your controller is capable of overlapped seeks.
If you have an old xy450 controller which is not capable of overlapped
seeks, you have to disable them in the driver, otherwise disk errors will
start happening apparently at random.  See page 67 of the "Release 3.2
Manual".  Also see the xy man page.  Let me know if this works.  

------------------------------

Date:    Wed, 28 Oct 87 11:22:46 PST
From:    Brent Chapman <capmkt!brent@lll-tis.arpa>
Subject: Re: Problems with modems and ALM's

>I have a few dialup lines connected to an ALM. Everything works fine,
>however if a person does not log out properly, they remain logged into the
>machine even though the line has dropped.  On the CPU serial ports, the
>login shell is terminated when the line drops. I have wired my modems
>according to the ALM manual, what else is necessary?

Is the "flags" field for the ALM in your kernel configuration file set
properly?  You have to twiddle the flags word to enable hardware carrier
detect (which causes the tty to reset and associated processes to die and
a new login to be set up and so on) on the ports which you have modems on.

This is all explained in the Sun "System Administration" Manual, in the
chapter called "Adding Hardware to Your System".  It explains the things
that need to be done to UNIX to configure serial ports for dialin/dialout
modems.

-Brent

Brent Chapman				Senior Programmer/Analyst
ucbvax!lll-tis.arpa!capmkt!brent	Capital Market Technology, Inc.
capmkt!brent@lll-tis.arpa 		1995 University Ave., Suite 390
Phone: 415/540-6400			Berkeley, CA  94704

------------------------------

Date:    Wed, 28 Oct 87 10:42 EDT
From:    SYSRUTH@utorphys.bitnet
Subject: Re: NAG etc. on Suns

I am posting this to the list since I could not get through directly.

The Numerical Algorithms Group (authors of NAG) have a version for Sun 3's
(at least). It is right up-to-date with Mark 12, and can be obtained from
the same place as the VAX version:

Carolyn Smith
N.A.G.
1101 31st Street, Suite 100
Downers Grove, IL 60515-1263
(312) 971-2337

Cost is about $60 U.S./mo. (Beware, there are 4 libraries depending on
which floating-point option you use, and even without the source they take
up a lot of space). This is not public domain software, so you should not
be able to get it from anyone else.

I don't know for sure about the others, but if the companies have any
sense, they will have Sun versions available. An awful lot of people who
depend on these routines are using Suns these days.

Ruth Milner
Systems Manager
University of Toronto Physics

P.S. Public-domain stuff you can probably get from netlib@anl-mcs.arpa.
(Source code for individual routines only).

------------------------------

Date:    29 Oct 87 15:33:59 GMT
From:    rti!shaddock@mcnc.org (Mike Shaddock)
Subject: SunOS and Sun 2's

We heard recently that SunOS 3.4 will be the last release of SunOS that
will be supported on the Sun 2.  Since we were considering renewing our
maintenance agreements, we would like to know if this is true.

Mike Shaddock	{decvax,seismo,ihnp4,philabs}!mcnc!rti!shaddock
		shaddock@rti.rti.org

------------------------------

Date:    Wed, 28 Oct 87 17:54:05 EST
From:    sundc!chessie!bwong@sun.com (Brian Wong - TSE Washington DC)
Subject: WORMs on Suns

Sun has been shipping drivers for the Optimem (1000?) optical disk drives
for some time as a consulting special (which means that it is extra-price).
It's pretty reasonable, and the performance is at least reasonable.
Optimem, of course, supplies the drives.  Both are listed in the Catalyst
3rd-party catalog, if you have that.

------------------------------

Date:    Sun, 01 Nov 87 23:38:47 EST
From:    Steve Dyer <dyer@athena.mit.edu>
Subject: Help with spurious level N interrupt

OK, here's the background.  I've written a device driver for the Sun 3/160
for a Multibus magtape controller which is plugged into a Sun
VME-to-Multibus converter.  This board had heretofore only been plugged
into Intel iRMX Multibus systems.

Most everything works fine.  However, on two different Suns at different
sites, after some variable length of time of operation, the system will
lock up with the error message:

Spurious level 3 interrupt
Spurious level 3 interrupt
.
.
.
ad infinitum.  Only a reboot can clear it.  This is correlated with tape
activity, although even simple operations like a shell loop which performs
"mt weof 100" repeatedly can cause it, as will more typical data transfer
operations.  Moving the Multibus interrupt jumper to another level is
tracked by the interrupt level reported in the error message changing.
The engineer reports that moving the magtape board to a position BEFORE
the SCSI controller will increase the MTBF of this occurring (although he
doesn't claim the problem goes away.)

Now, I am not an expert on Suns, although I have written many device
drivers for VAX and 11/70 systems, and am familiar with all flavors of
UNIX kernels.  I know that (at least) the error message is directly
generated by the trap() routine in response to the 68020 "spurious
interrupt" vector.  This is supposedly triggered by a device failing to
provide an interrupt vector in response to the hardware interrupt
acknowledge handshake.  Now, I'm telling these folks that this is
"hardware"--their problem, and that there is no way that my software could
cause this; that in fact, my driver can't possibly get called at this
point to have any effect on the situation.  Since I'm not bit-level
familiar with the Sun 3 hardware, I'm wondering if there are any
subtleties which I'm missing.  As far as I can see, the IACK/interrupt
vector interactions are handled completely by the CPU and converter
hardware, and this occurs on at least two different Sun VME-to-Multibus
converters as well as another manufacturer's.  A logic analyzer attached
to the VME-to-Multibus converter seems to indicate that the IACK isn't
getting to it (if I can trust the hardware guy from this company.)  Yet,
this problem has not been seen to occur when a Xylogics XT tape controller
is plugged into the same converter on one of the systems.  All quite
confusing.

As an alternative way to handle this (by doing an endrun around the entire
vectoring issue), I was wondering if it's possible to configure a Sun 3
system to use the "autovector" polling routines for a particular interrupt
level.  This is alluded to in the device driver writer's manual, but
exactly how to do this for a device which is listed in the config file as
"vme16d16" isn't clear to me.  It seems that the VME-to-Multibus board
will always try to send an interrupt vector, and omitting the interrupt
vector specification from the config file only guarantees that you'll
crash upon receipt of a now-unknown interrupt.   Can this method be used
with Multibus boards which plug into a Sun 3 using the VME bus converter?
If so, how do you do it?

Any words of wisdom would be welcome, either to point out what I might be
doing wrong, whether the primary problem could at all be due to faulty
software on my part, or whether I'm justified in throwing the problem back
in their laps as being due solely to hardware.  If it *is* hardware, any
theories of what might be going wrong specific to the Multibus board would
be welcome.

Steve Dyer
dyer@harvard.HARVARD.EDU
dyer@spdcc.COM aka {harvard,wanginst,ima,ihnp4,bbn,m2c}!spdcc!dyer

------------------------------

Date:    Mon, 26 Oct 87 15:53:47 EST
From:    Dan Trinkle <trinkle@purdue.edu>
Subject: Problem with SunOS 3.4 and Adaptec controller

We converted all our machines to SunOS 3.4 recently.  I was not able to
get the generic kernel to boot on one machine only.  It is a Sun 3/50 with
a Micropolis 1325 disk and an Adaptec controller.  The kernel will boot as
far as the disk probe, then do one of two things.  Either it will find a
corrupted label on the disk and die, or it will pause (~10 seconds),
correctly print the drive type (Micropolis 1325), probe the remaining
devices, then die trying to mount the root partition (in the kernel) with
a read error.

This machine is perfectly happy running SunOS 3.2 kernels (of many
configurations) without any noticable problems.  It passes preliminary
diagnostics.

We have a Sun 3/50 with an Adaptec controller and a 1355 that boots with
no problems.  We have a Sun 3/50 with an Emulex controller and a 1325 that
boots with no problem.

What I would like to know is if anyone else has had similar problems or if
anyone can confirm that such a configuration will work with SunOS 3.4.

I would gladly contact Sun support about this, but our maintenance
contract agreement is being held up by local red tape (no fault of Sun).

Daniel Trinkle			trinkle@cs.purdue.edu			ARPA
Computer Science Department	trinkle%purdue.edu@relay.cs.net		CSNET
Purdue University		{ucbvax,decvax,ihnp4}!purdue!trinkle	UUCP
West Lafayette, IN 47907	(317) 494-7844				PHONE

------------------------------

Date:    Mon, 26 Oct 87 16:06:12 EST
From:    Dan Trinkle <trinkle@purdue.edu>
Subject: SunOS 3.4 consuming too much swap space

Since converting to SunOS 3.4, many users have noticed they are running
out of swap space.  On one Sun 3/110 (with a Micropolis 1325 local disk
and a standard 16MB swap partition) was so bad, we booted the SunOS 3.2
kernel.  After starting up suntools, the swap space usage was less than
half for the 3.2 kernel (same user binaries, just different kernel).  With
3.4, the swap used was ~9MB, with 3.2 is was ~4MB.  Has anyone else
noticed this problem?  The swap segments are also allocated in a different
arrangement.

I would ask Sun support, but due to local red tape, our maintenance
agreement is not up to date (no fault of Sun).

Daniel Trinkle			trinkle@cs.purdue.edu			ARPA
Computer Science Department	trinkle%purdue.edu@relay.cs.net		CSNET
Purdue University		{ucbvax,decvax,ihnp4}!purdue!trinkle	UUCP
West Lafayette, IN 47907	(317) 494-7844				PHONE

------------------------------

Date:    23 Oct 87 21:04:30 GMT
From:    roberts@edsews.eds.com (Ted Roberts)
Subject: Sun sendmail problems

During an install of smail 2.5 on a Sun 3/280 running version 3.2 of the
operating system, I ran into a problem with the domainname macro (normally
$D).  Even though the domainname is defined in sendmail.cf, Sun appears to
redefine it to be the Yellow Pages domainname.  Has anybody found a
resonable workaround for this problem short of redefining the YP
domainname.

Ted Roberts
EDS TSD
roberts@edsews.EDS.COM cbosgd!edstb!edsews!roberts

------------------------------

Date:    26 Oct 87 15:21:51 GMT
From:    dimitri%cui.UUCP%cernvax.bitnet@jade.berkeley.edu (KONSTANTAS Dimitri)
Subject: Problems with second ethernet board on a 3/180

We recently purchased a 3/180 server in order to use it as gateway machine
between two ethernet networks. For that reason a second ethernet board was
suplied with it.  Trying to install the the second board (always following
the manuals!) the following problems apeared:
1. We did not knew the etherent address of the second board.
   (is there any way that we can get it?)
2. when the second board was initilalized with a call
    /etc/ifconfig ie1 <arbitary ethernet address> <hostname> -trailers up
   everything was working until a request on the newly mounted network
   was issued. After that the first network became invisible.
   Turning off the second network (/etc/config ie1 down) and trying to
   give "rup" we got the reply "network unaccesible" bringing again up
   ie1 and giving "rup" we could only see the second network.

Does anyone has any idea what we did wrong? Can it be a problem on the
kernel or just the second board is installed in the wrong slot?

Dimitri Konstantas
Centre Universitaire d' Informatique
University of Geneva

------------------------------

Date:    Tue, 27 Oct 87 11:07:09 GMT
From:    Eric Ole Barber <mcvax!nw.stl.stc.co.uk!sizex@uunet.uu.net>
Subject: Problem destroying panels in SunView 3.2

This problem has been reported to Sun without any fix (as yet).

I'm trying to write a tool over SunView 3.2, and I'm having problems
destroying panels. Does anyone out there know of a fix, workaround, or
explanation ?

I have a popup containing a panel, and I want a button in that panel to
say 'QUIT' and make the popup go away. The only way I can find to do that
is to window_destroy() it from the button's handler (I want to destroy it,
not just make WINDOW_SHOW FALSE, because I won't be needing it again).  It
nearly always works fine, but - just sometimes - I get a core dump as the
panel is destroyed.

The problem occurs unpredictably, and I presume it to be time-related.
The stack trace seems to say that it dies in the handler for the timer
that flashes the text caret. I *think* that the problem is that by the
time the button handler has returned, the window is dead - but I wouldn't
care to bet on it...

Another possible complication is that the popup also has text sub-windows
and I've had to use a notify_interpose_destroy_func() to do a
textsw_reset() when the popup is destroyed.

Any suggestion very gratefully received.

Chris Uppal
STL, Copthall House, Nelson Place, Newcastle-under-Lyme, ENGLAND ST5 1EZ
Chris Uppal <mcvax!nw.stl.stc.co.uk!cuppal@uunet.uu.net>

------------------------------

Date:    Wed, 28 Oct 87 10:00:13 -0500
From:    lrj@helios.tn.cornell.edu
Subject: /etc/ethers wierdness

Under SunOS 3.3 and 3.4 one cannot have any leading '0's in the hardware
ethernet address field of /etc/ethers.  If there ARE leading '0's, then
the machine will not be recognized, and thus it will not boot if it's a
diskless machine.

For example, with this entry, machine losna will not boot:

	8:00:20:00:07:b7  losna

Yet, this entry WILL work:

	8:0:20:0:7:b7  losna

I have just mailed to sun!hotline regarding this.

-- Lewis R. Jansen, LASSP Systems Grunt
lrj@helios.tn.cornell.edu
					

------------------------------

Date:    26 Oct 87 20:58:42 GMT
From:    ucbvax!sun!megatest!djones@decwrl.dec.com (Dave Jones)
Subject: Documentation for dbx symbols?

Hello Sun people,

This is not specificly a Sun question, but I'm not having any luck
wringing the info out of the unix wizards or compiler groups, so maybe
someone here will help.

I am looking for documentation on the format of the ".stab" symbols that
the -g option causes to be put into executables for dbx, etc.

From time to time I have occasion to write a special purpose compiler, and
I want the binaries to be dbx-able.

Where do I go to get this info?  I have heard a rumor that there is a
published standard, but I don't know the name.

Thanks,

Dave Jones
Megatest Corp.
880 Fox Lane
San Jose, CA.  95131

(408) 437-9700 Ext 3227
UUCP: ucbvax!sun!megatest!djones     ARPA: megatest!djones@riacs.ARPA

------------------------------

End of SUN-Spots Digest
***********************