[net.unix-wizards] Collapsing cosmos

ted@usceast.UUCP (Ted Nolan) (01/22/85)

<realy cosmic>

I've been having a few problems with our cosmos system, and wonder if
anyone out there has any helpful clues.

First, a brief description of the system: A Cosmos Starfield Lyra
running close to v7 Unix with a few bsdisms (CMS16/UNX 7.20M - CGIO v2.1,
to be exact).  The machine is a 68000 multibus system with standard 
Motorola mmu, 8 serial ports and a 20M hard disk. The software is from 
Unisoft and probably differs little from other Unisoft 68000 v7 ports.

Now, the problems.  The biggest one is that the system crashes a lot.
Usually the dying gasp message indicates a kernal memory management error
or a kernal bus error. The virtual address printed out at this time is
usually very high, often in the u. area and sometimes the sp seems to
be in the high area reserved for i/o (if I understand the setmmu's in
machdep.c correctly.) The system most often crashes during a uucp
to our vax, though no time is safe. (I am convinced that a clock
interrupt crashed it once).  Taking the post mortem saved pc and
adbing around the kernal shows that no particular routine is the
culprit, though fairly often some sort of stack access was going on.

Much less often, a "panic trap" will bring it down, usually after someone
has hit a key.

I have made a few kernal modifications, mostly in writing a device driver
for our image boards.  This necessitated adding an extra setmmu call in
machdep.c to map the strange address into the kernal space, but as far
as I can tell this doesn't step on anything important.  The crashes
don't correspond with display use and the display is unaffected by the
crashes.  
	setmmu(0,0xFF,btoc(0xe00000),btoc(0xe00000),btoc(0x40),RW,0);


The other main problem (though less urgent) is that I am unable to
bring the serial line we uucp through any higher than 1200 baud.
Anything faster brings "overrun errors" (and seemingly more frequent
crashes).  There seems to be some correspondence between line length
and the speed we can run it at, the console works at 9600, but I can't
bring my office terminal (about 50 feet away) higher that 4800. I don't
see why this would matter (if anything, I would think a longer line
would slow the data rather than overwhelming the cpu), but it seems to.
The comments in the serial port code indicate that they knew there were
problems with it, but I also wonder if each call to ttyinput from the
interrupt routine (klint) may be taking too long to get back. Since our people
are transmitting some large image files, a faster uucp line would be really
nice.

If anyone has any suggestions fixes, or speculations about either problem (esp
the first) I would be glad to hear them. 

				Thanks Much,

				Ted Nolan	..usceast!ted

PS: if anyone knows how to dump core to a file during a crash that would
be nice too.
-- 
-------------------------------------------------------------------------------
Ted Nolan                   ...decvax!mcnc!ncsu!ncrcae!usceast!ted  (UUCP)
6536 Brookside Circle       ...akgua!usceast!ted
Columbia, SC 29206          allegra!akgua!usceast!ted@UCB-VAX.ARPA (ARPA, maybe)

      ("Deep space is my dwelling place, the stars my destination")
-------------------------------------------------------------------------------