sam@delftcc.UUCP (Sam Kendall) (12/10/85)
The problem: periodically (lately once or twice a day) the system decides that there should be lots of memory faults. Most commands you type die horrible deaths; eventually your shell dies and you are gone. Likewise for the other users. The only solution is to reboot. The system: Codata 3300 (a 68000 box) running Unisis 3.1.1 (Codata's UniSoft V7). Lots of serial I/O (netnews, 2 modems), disks (3 Atasis, each with several ~10 MB filesystems; and we are frozen forever at 52 system disk buffers, no more), and load (netnews, some large interactive programs). All three of these factors have increased recently. We have 1.3MB of RAM and rarely swap. Has anyone seen something like this on a Codata, on a UniSoft V7 system, or on any V7 system? Maybe some inconsistency in the memory management (?). If it were something simple, like this happens iff processes start swapping--but that doesn't seem to be it. Any solutions for a binary-only machine, with no new system releases expected ever? ---- Sam Kendall allegra \ Delft Consulting Corp. seismo!cmcl2 ! delftcc!sam (212) 243-8700 ihnp4 / ARPA: delftcc!sam@nyu.ARPA or @nyu-cmcl2.ARPA
rfm@x.UUCP (Bob Mabee) (12/13/85)
In article <113@delftcc.UUCP> sam@delftcc.UUCP (Sam Kendall) writes: >The problem: periodically (lately once or twice a day) the system >decides that there should be lots of memory faults. >The system: Codata 3300 (a 68000 box) running Unisis 3.1.1 Are you doing any programming which could cause you to get odd-address traps? There is a little-known defect in the 68000 which can cause it to trash memory pretty thoroughly, even when running in user mode. When the 68000 makes a word reference to an odd address, it begins an external cycle (AS and DS) before noticing the address is odd. Some time later, it simply aborts the cycle by removing AS and DS, and changes the address lines to start the next cycle. However, external logic may be in the middle of a RAM read/writeback cycle and assume that the addresses are stable. Depending on timing, bus interface, and memory card design, you could end up with bad parity (or just bad data) in all of one row of the RAM, which probably maps into a bad word every 256 or 512 words. If you are running into this, then your crashes will generally come right after you run the buggy program, perhaps even before you can get the message that it had an odd-address trap. To avoid this problem you will have to do your debugging at off hours (so the crash won't hurt as much), and try to find the odd address before it gets used (not easy!). -- Bob Mabee @ Charles River Data Systems decvax!frog!rfm