stevel@rtech.rtech.com (Steve Langley) (02/01/90)
From article <9001301444.AA01685@Larry.McRCIM.McGill.EDU>, by mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse): > I've been getting consistent crashes from the MIT R4 sample server, and > have tracked the problem far enough to know I'm out of my depth and > need to call someone who knows the server better. > > Environment: Sun-3, server built with > > CDEBUGFLAGS=-O > CC=gcc -DNOSTDHDRS -fstrength-reduce -fpcc-struct-return > CCOPTIONS=-m68881 -pipe > > % gcc -version > gcc version 1.36.93 > > Typical stack trace from adb: > > core file = core -- program ``Xsun'' > SIGSEGV 11: segmentation violation > _checksept() + 34 > _EnqueueEvent() + 106 > _sunKbdProcessEvent() + 22c > _ProcessInputEvents() + e6 > _Dispatch() + b8 > _main(0x3,0xefff9e8,0xefff9f8) + 356 > I have been seeing exactly the same type of crash. I am running a Sun 3/60, and the server was built using the standard Sun cc rather than gcc. It has happened 3 times now, and under the same circumstances you have detailed. (Event queue corrupted as you describe, happens while changing window focus, etc.) I had a vague notion that it might have something to do with xman; every time it has crashed I have had xman running. When xman is not up the server doesn't seem to die. This is a totally wild hunch, drawn from too few datapoints, but it's all I can add to your analysis of the situation. Anybody at MIT want to comment? Any other Sun (or non-Sun) users that have seen this? +--------------------------------------------------------------------------+ | Steve Langley | Phone: (415)748-3658 | | Ingres Corporation | Internet: stevel@ws58s.rtech.com | | P.O. Box 4008 | | | 1080 Marina Village Parkway | | | Alameda, California 94501 | | +--------------------------------------------------------------------------+
rws@EXPO.LCS.MIT.EDU (Bob Scheifler) (02/02/90)
Anybody at MIT want to comment? I'll make the obvious comment, that this kind of thing is very hard to fix without a reasonably deterministic method to repeat it. If you want to see it get fixed, you'll have to try and narrow down the search space.
mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse) (02/03/90)
>> I've been getting consistent crashes from the MIT R4 sample server, >> and have tracked the problem far enough to know I'm out of my depth >> and need to call someone who knows the server better. > I have been seeing exactly the same type of crash. I am running a > Sun 3/60, and the server was built using the standard Sun cc rather > than gcc. Someone else recommended I remove the -fstrength-reduce; I did and it didn't help. Your note makes it nearly certain that this has nothing to do with it. (Did you use -O? What release's cc?) > I had a vague notion that it might have something to do with xman; > every time it has crashed I have had xman running. When xman is not > up the server doesn't seem to die. Curious. I don't think I've run xman even once under R4. I suppose it must be something that both xman and one of my clients do.... > Anybody at MIT want to comment? And someone at MIT did comment, saying something about we'd have to narrow down the search space. Yes, thanks, that's what I wanted to do. I didn't really expect "we'll drop whatever we're doing and get on it right away". But I have no idea where to look. I don't know which pieces of the server are getting exercised during the crash sequence; I was hoping someone could tell me, which is why I described the sequence in the detail I did. I could insert a call to my checking routine every third line throughout the server, but that's extremely tedious and overly drastic. I was hoping rather for something like "look at foo() and bar() in dix/foobar.c, or anything in dix/blee.c". I guess I'll have to wing it. Steve, I don't suppose you have Internet access by any chance? der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu
stevel@rtech.rtech.com (Steve Langley) (02/05/90)
From article <9002030634.AA26328@Larry.McRCIM.McGill.EDU>, by mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse): >>> I've been getting consistent crashes from the MIT R4 sample server, >>> and have tracked the problem far enough to know I'm out of my depth >>> and need to call someone who knows the server better. > >> I have been seeing exactly the same type of crash. I am running a >> Sun 3/60, and the server was built using the standard Sun cc rather >> than gcc. > > Someone else recommended I remove the -fstrength-reduce; I did and it > didn't help. Your note makes it nearly certain that this has nothing > to do with it. (Did you use -O? What release's cc?) > My server is compiled with -O using the standard cc from SunOS 4.0.1. >> I had a vague notion that it might have something to do with xman; >> every time it has crashed I have had xman running. When xman is not >> up the server doesn't seem to die. > > Curious. I don't think I've run xman even once under R4. I suppose it > must be something that both xman and one of my clients do.... > For what its worth (which is not much), since I replied to your original post about a week ago I have not been running xman and have not seen the crash again. As far as I can recall that's about the only thing that has changed in the set of Things I Run On A Regular Basis. Since you are *not* running xman (and still see the problem, I presume) that means- uhhh, I'm not sure *what* it means. >> Anybody at MIT want to comment? > > And someone at MIT did comment, saying something about we'd have to > narrow down the search space. > > Yes, thanks, that's what I wanted to do. I didn't really expect "we'll > drop whatever we're doing and get on it right away". Neither did I. > I guess I'll have to wing it. Steve, I don't suppose you have Internet > access by any chance? > > der Mouse > > old: mcgill-vision!mouse > new: mouse@larry.mcrcim.mcgill.edu I'm afraid not. And I'm as stumped as you are. Since the problem seems to have gone away for now and I don't know how to make it happen I probably wont spend a lot of time looking at it. But I'd be glad to talk about it and compare notes if you'd like to give me a call. +--------------------------------------------------------------------------+ | Steve Langley | Phone: (415)748-3658 | | Ingres Corporation | Internet: stevel@ws58s.rtech.com | | P.O. Box 4008 | | | 1080 Marina Village Parkway | | | Alameda, California 94501 | | +--------------------------------------------------------------------------+