epstein@trwacs.UUCP (Jeremy Epstein) (05/15/91)
Environment: Sun 4/260 running SunOS 4.1.1 and OpenWindows 2.0 We have a *very* strange problem...one of our users runs "dclock" (I believe patchlevel 4, but I'm not positive), and that user's server crashes every day precisely at 1:00pm. If the user kills dclock before that time, then the system doesn't crash. I've checked, and there's no cron jobs or anything else running. If I look at the logs in /usr/adm/messages*, some of the days don't have any crash indications. Those which do all look the same: BAD TRAP pid xxxx, `xnews': Data fault kernel read fault at addr=0x0, pme=0x70000000 Bus Error Reg 80<INVALID> and so on with registers, tracebacks, etc. We haven't tried this on any machines other than one 4/260 (we're planning to), so I suppose it's possible that it could be hardware related. I haven't yet tried to figure out exactly where it's crashing, but something is really wrong if xnews (the OpenWindows server) is causing the system to crash. Has anyone else seen this problem (or perhaps other programs which cause OpenWindows to crash SunOS)? Can anyone reproduce the problem (if anyone is willing to try :-) ). Is there any known solution? We could stop running dclock (that's no big deal), but we're concerned that there's a lurking bug which could manifest itself in other ways once the system is out in the field. Thanks for any help...in the meantime we'll be working on analyzing the traceback. --Jeremy -- Jeremy Epstein UUCP: uunet!trwacs!epstein Trusted X Research Group Internet: epstein@trwacs.fp.trw.com TRW Systems Division Voice: +1 703/876-8776 Fairfax Virginia
mouse@lightning.mcrcim.mcgill.EDU (der Mouse) (05/16/91)
> Environment: Sun 4/260 running SunOS 4.1.1 and OpenWindows 2.0 > We have a *very* strange problem...one of our users runs "dclock" (I > believe patchlevel 4, but I'm not positive), and that user's server > crashes every day precisely at 1:00pm. If the user kills dclock > before that time, then the system doesn't crash. I've checked, and > there's no cron jobs or anything else running. > If I look at the logs in /usr/adm/messages*, some of the days don't > have any crash indications. Those which do all look the same: > BAD TRAP > pid xxxx, `xnews': Data fault > kernel read fault at addr=0x0, pme=0x70000000 > Bus Error Reg 80<INVALID> This looks singularly like something we observe here. We have two SPARCserver 470s running 4.1 (not 4.1.1...yet) and my mterm will, about one time in ten, produce a strikingly similar panic on startup. (Always just at startup. If it survives the first second after the window comes up, it'll survive fine...until the next mterm starts up.) Stack traces from adb -k seem to imply that the crash is occurring somewhere deep inside the scheduler, which, coupled with Sun's choice to release a binary-only system, isn't much help. Once, some process running emacs (we have at least two executables called emacs, though I suspect that in this case it was GNU emacs) produced a similarly inexplicable crash. I, too, would be most interested in any solutions or rumors thereof. der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu
pete@iris49.biosym.COM (Pete Ware) (05/18/91)
Your problem with dclock crashing your X/NeWS server at 1:00 is quite interesting. Mostly because I'm running a beta version of SGI's X server and at exactly 1:00pm everyday I get a white band displayed across the screen. Looks like I'm going to have to look at dclock instead of just blaming it on SGI. --pete
dwig@b11.ingr.com (David Wiggins) (05/23/91)
pete@iris49.biosym.COM (Pete Ware) writes: >Your problem with dclock crashing your X/NeWS server at 1:00 is quite >interesting. Mostly because I'm running a beta version of SGI's X >server and at exactly 1:00pm everyday I get a white band displayed >across the screen. >Looks like I'm going to have to look at dclock instead of just blaming >it on SGI. Our server also had this problem. dclock sends a CopyArea with a negative width (or very large positive, depending on your interpretation), which apparently confuses a lot of vendors' servers. >--pete David P. Wiggins Intergraph