slevy@poincare.geom.umn.edu (Stuart Levy) (08/03/90)
We use a locally-written 3-d object viewer on our Irises (personal and GTX). For some aberrant objects, or possibly some xform matrices pushed on the stack, we find it causes the window server to crash -- with messages resembling "timeout: graphics FIFO still > 1/2 full" and/or "window server killed with signal 15". In extreme cases it can cause our GTX Iris to lock up such that we must reboot to recover the graphic display, though normally we're just kicked back to a login: prompt. Does anyone know what kinds of geometric data can wedge the graphics subsystem this way? If we knew what to avoid we might be able to change our application to prevent crashes. Stuart Levy, Geometry Group, University of Minnesota slevy@geom.umn.edu
kurt@cashew.asd.sgi.com (Kurt Akeley) (08/03/90)
In article <1990Aug3.075057.11705@cs.umn.edu>, slevy@poincare.geom.umn.edu (Stuart Levy) writes: |> We use a locally-written 3-d object viewer on our Irises (personal and GTX). |> For some aberrant objects, or possibly some xform matrices pushed on the stack, |> we find it causes the window server to crash -- with messages resembling |> "timeout: graphics FIFO still > 1/2 full" and/or "window server killed with |> signal 15". In extreme cases it can cause our GTX Iris to lock up such that |> we must reboot to recover the graphic display, though normally we're just |> kicked back to a login: prompt. |> |> Does anyone know what kinds of geometric data can wedge the graphics subsystem |> this way? If we knew what to avoid we might be able to change our application |> to prevent crashes. |> |> Stuart Levy, Geometry Group, University of Minnesota |> slevy@geom.umn.edu as you suggest, there may be some geometric data or transformation that causes the pipe to lock up. in my experience, however, it is more likely that you are calling GL routines in an order that is not supported, with the same result. a common mistake is to include GL commands other than c(), color(), cpack(), lmbind(), lmcolor(), lmdef(), n(), RGBcolor(), t(), or v() between bgnpolygon() and endpolygon() calls (ditto for points, lines, closedlines, tmeshes, and qstrips). you might expect, for example, to be able to change the depthcue parameters within a line or polygon - sorry, not allowed (a new GL depthcue feature will correct this and other issues regarding the current depthcue). of course the sequence theory makes more sense if the failure is associated with a viewing mode, rather than with particular data sets. if failures can be isolated to a subset of the viewing modes, it might be worthwhile to review the related code for unsupported GL sequences. -- kurt
blbates@AERO4.LARC.NASA.GOV ("Brent L. Bates AAD/TAB MS361 x42854") (08/04/90)
We had a similar problem with a demo Personal Iris. I was told it was a problem in the combination of the Personal Iris and the gl library. This problem is supposed to be fixed in 3.3 OS. In our case we weren't doing anything wrong or illegal (the programs were written for a 3130) it was just a problem in the system software. -- Brent L. Bates NASA-Langley Research Center M.S. 361 Hampton, Virginia 23665-5225 (804) 864-2854 E-mail: blbates@aero4.larc.nasa.gov or blbates@aero2.larc.nasa.gov
jim@baroque.Stanford.EDU (James Helman) (08/04/90)
> it might be worthwhile to review the related code for unsupported GL > sequences. It would be more worthwhile for SGI to make GL safer to use, even at some expense in performance. Having bad sequences lock up the pipe and bomb you back to login may have been acceptable back when IRISes had one window, i.e. the screen itself. But when an easily made error in the GL program your debugging causes your entire "desktop" of networked windows, edits and remote jobs (including, of course, the window you were debugging in) to go south, it's just plain lousy. (Actually, when it happens I have a slightly stronger word for it.) Count one vote for guardrails for the autobahn. Jim Helman Department of Applied Physics Durand 012 Stanford University FAX: (415) 725-3377 (jim@KAOS.stanford.edu) Voice: (415) 723-9127
drb@eecg.toronto.edu (David R. Blythe) (08/04/90)
In article <JIM.90Aug3152228@baroque.Stanford.EDU> jim@baroque.Stanford.EDU (James Helman) writes: > >> it might be worthwhile to review the related code for unsupported GL >> sequences. > >It would be more worthwhile for SGI to make GL safer to use, even at >some expense in performance. I disagree. Having a separate checking version of the library (say the unshared version) would be better. Then once you have some confidence your code works you can link with the high performance version which doesn't do checking. > >Jim Helman >Department of Applied Physics Durand 012 >Stanford University FAX: (415) 725-3377 >(jim@KAOS.stanford.edu) Voice: (415) 723-9127 -drb drb@clsc.utoronto.ca
jim@baroque.Stanford.EDU (James Helman) (08/05/90)
>>It would be more worthwhile for SGI to make GL safer to use, even at >>some expense in performance. > I disagree. Having a separate checking version of the library (say the > unshared version) would be better. Then once you have some confidence your > code works you can link with the high performance version which doesn't > do checking. I can't object to giving the user the freedom to decide the safety/performance level. If you *know* your program has no bugs whatsoever, go for it. But I don't want it to be a choice between a no-checks-made "production" version GL which runs at full speed, and a test version that runs at half speed. I would argue that safety features which slow performance down slightly (say less than 10%) should be included standard. From a marketing perspective, SGI outclasses other platforms by so much that 10% less performance would be well worth the improved reputation that would come from more robustness. And I'm sure a lot of it could be designed into the hardware with virtually no performance loss at all. I've seen too many complex programs in which the "unsafe" sequence was not tickled until demo time. Presumably, the authors also had "some confidence" in their code before they put it on display. But when one sees a failure, which is indistinguishable from a hardware or system software problem, it doesn't matter who's too blame. It makes the machine look like an unstable platform. After a non-technical management type sees a machine apparently crash and burn during a demo, how much good will it do to explain: ... application bug ... bad sequence ... locked pipe ... may be bad hardware.... but it's 10% faster! When a user program can easily and accidentally bring the entire windowing system down, in my book, it's a bug. Speaking of which, SGI's X server is much improved in IRIX 3.3. Some bugs remain, but (knock on wood) I haven't experienced a single core dump. Hats off to SGI's X team. When's the next release? Jim Helman Department of Applied Physics Durand 012 Stanford University FAX: (415) 725-3377 (jim@KAOS.stanford.edu) Voice: (415) 723-9127
swed@aerospace.aero.org (Gregory D. Swedberg) (08/06/90)
I have run into exactly this problem when the wrong type is passed to a GL function. The window manager especially hates passing doubles to routines expecting floats. The same thing also happens if a float value is NaN.
balaguer@disuns2.epfl.ch (Jean-Francis Balaguer) (08/29/90)
In article <1990Aug3.075057.11705@cs.umn.edu>, slevy@poincare.geom.umn.edu (Stuart Levy) writes: > We use a locally-written 3-d object viewer on our Irises (personal and GTX). > For some aberrant objects, or possibly some xform matrices pushed on the stack, > we find it causes the window server to crash -- with messages resembling > "timeout: graphics FIFO still > 1/2 full" and/or "window server killed with > signal 15". In extreme cases it can cause our GTX Iris to lock up such that > we must reboot to recover the graphic display, though normally we're just > kicked back to a login: prompt. > > Does anyone know what kinds of geometric data can wedge the graphics subsystem > this way? If we knew what to avoid we might be able to change our application > to prevent crashes. > > Stuart Levy, Geometry Group, University of Minnesota > slevy@geom.umn.edu We had the same problem here not specially on personal iris but on every kind of SGI machines. It was coming from an accumulation of wrong gl calls. The most dangerous one was n3f called with NaN coordinates. We think SGI should provide a debug version of GL where every inconsistent call to the library should be trapped as it took us more than 3 days to find the problem. ---------------------------------------------------------------------------- Francis Balaguer Departement d'Informatique Tel : 41-21-6935244 Laboratoire d'Infographie FAX : 41-21-6933909 Ecole Polytechnique Federale de Lausanne CH-1015 LAUSANNE E-Mail : balaguer@ligsg2.epfl.CH balaguer@elma.epfl.CH ----------------------------------------------------------------------------