[net.bugs.v7] Codata

sam@delftcc.UUCP (Sam Kendall) (12/10/85)

The problem: periodically (lately once or twice a day) the system
decides that there should be lots of memory faults.  Most commands you
type die horrible deaths; eventually your shell dies and you are gone.
Likewise for the other users.  The only solution is to reboot.

The system: Codata 3300 (a 68000 box) running Unisis 3.1.1 (Codata's
UniSoft V7).  Lots of serial I/O (netnews, 2 modems), disks (3 Atasis,
each with several ~10 MB filesystems; and we are frozen forever at 52
system disk buffers, no more), and load (netnews, some large interactive
programs).  All three of these factors have increased recently.  We have
1.3MB of RAM and rarely swap.

Has anyone seen something like this on a Codata, on a UniSoft V7 system,
or on any V7 system?  Maybe some inconsistency in the memory management
(?).  If it were something simple, like this happens iff processes start
swapping--but that doesn't seem to be it.  Any solutions for a
binary-only machine, with no new system releases expected ever?

----
Sam Kendall			     allegra \
Delft Consulting Corp.		seismo!cmcl2  ! delftcc!sam
(212) 243-8700			       ihnp4 /
ARPA: delftcc!sam@nyu.ARPA or @nyu-cmcl2.ARPA

rfm@x.UUCP (Bob Mabee) (12/13/85)

In article <113@delftcc.UUCP> sam@delftcc.UUCP (Sam Kendall) writes:
>The problem: periodically (lately once or twice a day) the system
>decides that there should be lots of memory faults.
>The system: Codata 3300 (a 68000 box) running Unisis 3.1.1

Are you doing any programming which could cause you to get odd-address traps?
There is a little-known defect in the 68000 which can cause it to trash
memory pretty thoroughly, even when running in user mode.

When the 68000 makes a word reference to an odd address, it begins an
external cycle (AS and DS) before noticing the address is odd.  Some time
later, it simply aborts the cycle by removing AS and DS, and changes the
address lines to start the next cycle.  However, external logic may be in
the middle of a RAM read/writeback cycle and assume that the addresses are
stable.  Depending on timing, bus interface, and memory card design, you
could end up with bad parity (or just bad data) in all of one row of the
RAM, which probably maps into a bad word every 256 or 512 words.

If you are running into this, then your crashes will generally come right
after you run the buggy program, perhaps even before you can get the message
that it had an odd-address trap.  To avoid this problem you will have to do
your debugging at off hours (so the crash won't hurt as much), and try to
find the odd address before it gets used (not easy!).
-- 
				Bob Mabee @ Charles River Data Systems
				decvax!frog!rfm