[comp.windows.x] R4 MIT server coredumps

stevel@rtech.rtech.com (Steve Langley) (02/01/90)

From article <9001301444.AA01685@Larry.McRCIM.McGill.EDU>, by mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse):
> I've been getting consistent crashes from the MIT R4 sample server, and
> have tracked the problem far enough to know I'm out of my depth and
> need to call someone who knows the server better.
> 
> Environment: Sun-3, server built with
> 
> CDEBUGFLAGS=-O
> CC=gcc -DNOSTDHDRS -fstrength-reduce -fpcc-struct-return
> CCOPTIONS=-m68881 -pipe
> 
> % gcc -version
> gcc version 1.36.93
> 
> Typical stack trace from adb:
> 
> core file = core -- program ``Xsun''
> SIGSEGV	11: segmentation violation
> _checksept() + 34
> _EnqueueEvent()	+ 106
> _sunKbdProcessEvent() +	22c
> _ProcessInputEvents() +	e6
> _Dispatch() + b8
> _main(0x3,0xefff9e8,0xefff9f8) + 356
> 


I have been seeing exactly the same type of crash. I am running a
Sun 3/60, and the server was built using the standard Sun cc rather than
gcc. It has happened 3 times now, and under the same circumstances you
have detailed. (Event queue corrupted as you describe, happens while changing
window focus, etc.)

I had a vague notion that it might have something to do with xman; every
time it has crashed I have had xman running. When xman is not up the server
doesn't seem to die. This is a totally wild hunch, drawn from too few
datapoints, but it's all I can add to your analysis of the situation.

Anybody at MIT want to comment? Any other Sun (or non-Sun) users that
have seen this?


+--------------------------------------------------------------------------+
| Steve Langley                        | Phone: (415)748-3658              |
| Ingres Corporation                   | Internet: stevel@ws58s.rtech.com  |
| P.O. Box 4008                        |                                   |
| 1080 Marina Village Parkway          |                                   |
| Alameda, California 94501            |                                   |
+--------------------------------------------------------------------------+

rws@EXPO.LCS.MIT.EDU (Bob Scheifler) (02/02/90)

    Anybody at MIT want to comment?

I'll make the obvious comment, that this kind of thing is very hard to
fix without a reasonably deterministic method to repeat it.  If you
want to see it get fixed, you'll have to try and narrow down the
search space.

mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse) (02/03/90)

>> I've been getting consistent crashes from the MIT R4 sample server,
>> and have tracked the problem far enough to know I'm out of my depth
>> and need to call someone who knows the server better.

> I have been seeing exactly the same type of crash.  I am running a
> Sun 3/60, and the server was built using the standard Sun cc rather
> than gcc.

Someone else recommended I remove the -fstrength-reduce; I did and it
didn't help.  Your note makes it nearly certain that this has nothing
to do with it.  (Did you use -O?  What release's cc?)

> I had a vague notion that it might have something to do with xman;
> every time it has crashed I have had xman running.  When xman is not
> up the server doesn't seem to die.

Curious.  I don't think I've run xman even once under R4.  I suppose it
must be something that both xman and one of my clients do....

> Anybody at MIT want to comment?

And someone at MIT did comment, saying something about we'd have to
narrow down the search space.

Yes, thanks, that's what I wanted to do.  I didn't really expect "we'll
drop whatever we're doing and get on it right away".  But I have no
idea where to look.  I don't know which pieces of the server are
getting exercised during the crash sequence; I was hoping someone could
tell me, which is why I described the sequence in the detail I did.  I
could insert a call to my checking routine every third line throughout
the server, but that's extremely tedious and overly drastic.  I was
hoping rather for something like "look at foo() and bar() in
dix/foobar.c, or anything in dix/blee.c".

I guess I'll have to wing it.  Steve, I don't suppose you have Internet
access by any chance?

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

stevel@rtech.rtech.com (Steve Langley) (02/05/90)

From article <9002030634.AA26328@Larry.McRCIM.McGill.EDU>, by mouse@LARRY.MCRCIM.MCGILL.EDU (der Mouse):
>>> I've been getting consistent crashes from the MIT R4 sample server,
>>> and have tracked the problem far enough to know I'm out of my depth
>>> and need to call someone who knows the server better.
> 
>> I have been seeing exactly the same type of crash.  I am running a
>> Sun 3/60, and the server was built using the standard Sun cc rather
>> than gcc.
> 
> Someone else recommended I remove the -fstrength-reduce; I did and it
> didn't help.  Your note makes it nearly certain that this has nothing
> to do with it.  (Did you use -O?  What release's cc?)
> 

My server is compiled with -O using the standard cc from SunOS 4.0.1.

>> I had a vague notion that it might have something to do with xman;
>> every time it has crashed I have had xman running.  When xman is not
>> up the server doesn't seem to die.
> 
> Curious.  I don't think I've run xman even once under R4.  I suppose it
> must be something that both xman and one of my clients do....
> 

For what its worth (which is not much), since I replied to your original
post about a week ago I have not been running xman and have not seen the
crash again. As far as I can recall that's about the only thing that has
changed in the set of Things I Run On A Regular Basis. Since you are
*not* running xman (and still see the problem, I presume) that means-
uhhh, I'm not sure *what* it means. 


>> Anybody at MIT want to comment?
> 
> And someone at MIT did comment, saying something about we'd have to
> narrow down the search space.
> 
> Yes, thanks, that's what I wanted to do.  I didn't really expect "we'll
> drop whatever we're doing and get on it right away". 

Neither did I.

> I guess I'll have to wing it.  Steve, I don't suppose you have Internet
> access by any chance?
> 
> 					der Mouse
> 
> 			old: mcgill-vision!mouse
> 			new: mouse@larry.mcrcim.mcgill.edu

I'm afraid not. And I'm as stumped as you are.  Since the problem seems to
have gone away for now and I don't know how to make it happen I probably
wont spend a lot of time looking at it. But I'd be glad to talk about it
and compare notes if you'd like to give me a call.


+--------------------------------------------------------------------------+
| Steve Langley                        | Phone: (415)748-3658              |
| Ingres Corporation                   | Internet: stevel@ws58s.rtech.com  |
| P.O. Box 4008                        |                                   |
| 1080 Marina Village Parkway          |                                   |
| Alameda, California 94501            |                                   |
+--------------------------------------------------------------------------+