[unix-pc.general] Dying machine?

kevin@kosman.UUCP (Kevin O'Gorman) (12/31/89)

I wonder if this machine is running on anything stronger than a wing and
a prayer, and I would like some advice from someone more knowlegable than
me.

It's a fire-sale 3b1, running continuously since I bought it a couple years
ago.

A month ago, I heard noises that sounded like the drive going weird, but
the noises went away.  It did get me to back up in a big hurry, though.

For a while, not too clear in my mind, I've been having the system freeze
on me about once a week.  It's odd, too: typically, I haven't used the
system for a while, I sit down and break into the screen saver okay, and
the system seems to track the mouse.  The keyboard may or may not be
dead, depending, but I can generally point and click (yes, the UA is
usually around).  I get a little bit of disk activity, but that grinds
to a halt, and the system just stares at me until I reboot it.  I'm running
3.51 with the fixdisk kernel, and I know this thing used to stay up
nearly forever.

I didn't worry about it too much though; the crashes weren't getting
any more frequent.

Then, I was getting ready for Xmas, and decided that since this thing
sits with its back to the common areas in my house, that I would clean
up the cabling a bit.  I had the power off for several hours while I
did this.  It took me a large number of tries to get it running after
that interval.

This was not the sticktion disk troubles that others have reported: I
could hear the disk spinning up.  At first, the screen showed a pattern
that looked like a rectangular spider web with a couple of random
mouse tracks running across at two places.  A reboot got to the normal
screen test-pattern and then a permanently blank screen.  Several more
reboots got further and further.  Another got to UNIX, with crashes
within the first few minutes.  Finally, after power had been on for
30 or more minutes, I got a reboot that seemed to be stable.  Then a
bit later, I got the first honest-to-goodness kernel panic I've seen
in years.  Eventually, it got stable, and has been running without
crash for about a week.

Sounds to me as if something's temperature sensitive, and likes to be
hot.

Anybody have any ideas about how I could go about finding out what it
really is?  I would like to just chuck the part, whatever it is that's
doing this.  I'm pretty handy with tools, though I don't have more than
a soldering kit, multimeter, and the usual pliers and things.

-- 
Kevin O'Gorman ( kevin@kosman.UUCP, {pyramid,csun}!srhqla!kosman!kevin )
voice: 805-984-8042 Vital Computer Systems, 5115 Beachcomber, Oxnard, CA  93035
Non-Disclaimer: my boss is me, and he stands behind everything I say.

msm@rayssdb.ray.com (Mark S. Mersereau) (01/12/90)

In article <1057@kosman.UUCP>, kevin@kosman.UUCP (Kevin O'Gorman) writes:
> ...
>
> Then, I was getting ready for Xmas, and decided that since this thing
> sits with its back to the common areas in my house, that I would clean
> up the cabling a bit.  I had the power off for several hours while I
> did this.  It took me a large number of tries to get it running after
> that interval.
>
> This was not the sticktion disk troubles that others have reported: I
> could hear the disk spinning up. At first, the screen showed a pattern
> that looked like a rectangular spider web with a couple of random
> mouse tracks running across at two places.  A reboot got to the normal
> screen test-pattern and then a permanently blank screen.  Several more
> reboots got further and further.  Another got to UNIX, with crashes
> within the first few minutes.  Finally, after power had been on for
> 30 or more minutes, I got a reboot that seemed to be stable.  Then a
> bit later, I got the first honest-to-goodness kernel panic I've seen
> in years.  Eventually, it got stable, and has been running without
> crash for about a week.
>
> Sounds to me as if something's temperature sensitive, and likes to be
> hot.
>
> Anybody have any ideas about how I could go about finding out what it
> really is?  I would like to just chuck the part, whatever it is that's
> doing this.   I'm pretty handy with tools, though I don't have more than
> a soldering kit, multimeter, and the usual pliers and things.

     I'd also like to find out what's causing this problem, since my
machine seems to be similarly plagued.

     I bought my 3b1 new about 2 years ago and it has been running
more or less continuously ever since. Over the past year, I periodi-
cally powered it down and opened it up for cleanings and repairs
(e.g., hard disk drive replacement, power supply replacement, static
eliminator spring  adjustment, WD2010 installation, etc.).  Occasion-
ally, after I put the 3b1 back together and switched on the power, the
screen would come up full of thin horizontal cells -- a `rectangular
spider web' -- instead of showing a normal boot load in progress, and
all four of the motherboard-mounted LEDs would be on.  Sometimes this
was accompanied by a 5 pulse per second chirping sound, regardless of
whether the phone lines were connected.  Immediately hitting reset
caused no change; but once in a while, if the machine was left running
for several hours and then reset was hit, the screen would clear and
the system would boot.  I adopted the recovery method of switching the
power off and on until the machine came on without the chirping sound,
waiting for several hours, and pressing reset.  It usually booted suc-
cessfully once it got going, but a few times it would fail with kernel
panics.  Re-opening the case and re-seating any connectors I had
disconnected during the repair always corrected the kernel panic prob-
lem.

     Before I replaced the original 40 Mbyte Miniscribe drive, reboot-
ing the system after it had been shut down for repairs or cleaning was
frequently met with the above failure (i.e., a rectangular spider web
on the screen and a reset button that had no effect). After I replaced
the drive with a Micropolis 1325, I was pleased to find the problem
had gone away ... until last week.

     After opening up the 3b1 last week to adjust a vibrating static
eliminator spring and to install a WD2010 disk controller, I haven't
been able to get it to boot since. The rectangular spider web is back.
On a tip from the originator of this discussion, Kevin O'Gorman, I
replaced the power supply with a brand spanking new one, but the prob-
lem persisted.  What to try next?

-- {decuac,gatech,necntc,sun,uiucdes,ukma}!rayssd!msm
   msm@rayssd.ray.com

wtm@neoucom.UUCP (Bill Mayhew) (01/15/90)

All four LEDs on and some weird pattern on the screen should not
normally happen.  It sounds to me like the reset pulse generator is
failing to provide a satisfacotry initial reset if that is
happening.

Here is a brief description of the boot ROM test and initialization
sequence:

There are four LEDS on the left side visible though the vent slots
below the keyboard platform.  The LED nearest the back of the
machine is the most significant bit.  Off=0, On=1.

Test #1 ( 0 0 0 1 )
This pattern lit and no boot means failed to initialize telephone
hardware

Test #2 ( 0 0 1 0 )
Failed video RAM test.  A lace pattern may appear briefly on the
monitor while this is running if the CRT is warmed up.  The video
test uses an incementing write to vid ram and reads back the
address.

Test #3 ( 0 0 1 1 )
Failed page map ram test.

Test #4 ( 0 1 0 0 )
Fialed to set map ram to unity mapping.  The system tries to
initialize the virtual memory map to point at physical memory pages
with a state of page present, write enabled, not written yet.

Test #5 ( 0 1 0 1 )
Failed dynamic ram test

Test #6 ( 0 1 1 0 )
Failed system initialization

Test #7 ( 0 1 1 1 )
Could not find loader on disk.  This is the part where the small
squares appear starting at the upper left corner of the screen
until the system is able to find a loader on one of the disks.



Bill