mike@BRL.MIL (Mike Muuss) (03/03/89)
My thanks to the folks from SGI who informed me that ps_open_PostScript() not being machine independent presently, and that the new release fixes this. Since I am forced to abandon the portability topic until we get the fabled next release, I have moved on to some performance enhancements. I have noticed that if I write the entire screen (on a GT) with a single call to lrectwrite(), it proceeds with blinding speed. If, however, I send the same amount of data using one call to lrectwrite() per scanline, it proceeds at the rate of about 1000 scalines/second, which is very slow. Running gr_osview during this time, I notice that total user time is about 3%, and system time averages 40-60% !!! I am calling lrectwrite() in a loop, there are no (intentional) system calls being made. What I suspect is that lrectwrite() may be doing something like a gsync() on every call, or perhaps notifying the window manager that this might be a good time for another process to have a chance to do some graphics, or some such. I seek a way to stop this behavior. Can anybody help? (I find nothing about this in the online manuals. I know from writing SunView programs that acquiring a "window manager lock" is something that some systems permit. Is there a comparable SGI routine that will do as I wish?) Suggestions to reformat the in-memory data into a form so that I can ALWAYS use a single lrectwrite() are not helpful. Sometimes I can do it all in one, sometimes not. It would cost 4 Mbytes of extra memory and a lot of data copies to shuffle things. Thanks, -Mike
msc@ramoth.SGI.COM (Mark Callow) (03/04/89)
In article <8903030550.aa06677@SPARK.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes: > I have noticed that if I write the entire screen (on a GT) with > a single call to lrectwrite(), it proceeds with blinding speed. > If, however, I send the same amount of data using one call to > lrectwrite() per scanline, it proceeds at the rate of about 1000 > scalines/second, which is very slow. > > Running gr_osview during this time, I notice that total user time > is about 3%, and system time averages 40-60% !!! I am calling > lrectwrite() in a loop, there are no (intentional) system calls > being made. What I suspect is that lrectwrite() may be doing > something like a gsync() on every call, or perhaps notifying > the window manager that this might be a good time for another process > to have a chance to do some graphics, or some such. The system time is most likely due to the dma setup. The pixels are transferred by dma. On the GT lrectwrite chooses whether to use dma or push the pixels into the pipe depending on the number of pixels being transferred. Pushing pixels is much slower than dma but obviously the dma setup time is a concern. I don't know the exact changeover point. No messages are sent to the window manager. No window or screen locks are acquired or freed. -- -Mark
mike@BRL.MIL (Mike Muuss) (03/04/89)
Mark - Thanks for your detailed and informative note. From what you say, then the 60% SYS time must be DMA setup, and the 40% IDLE time must be the actual DMA transmission time. I was seeing 1000 scanlines/second. If that translates to 1000 syscalls and interupts per second, then I can understand the significant overhead that I was encountering. I guess I would like the opportunity to vary the pipe_write / DMA crossover point in my application, to see if I can produce faster screen updates. The SGI evaluation to set the threshold may not have taken the system overhead fully into account. THE BIG PICTURE Let me also take this opportunity to tell you what I need to do; perhaps you can suggest some different strategy that may achieve higher performance. I have a shared memory segment that is organized as 1024 scanlines of 1280 pixels of 4 bytes each (SGI AlphaBGR format for lrectwrite). The arrangement of this data must be fixed, regardless of what sub-rectangle of it is presently of interest. If it would help any, I can change the internal organization any way I like, subject to the previous constraint. When the application is using the full screen, then this entire array is written with a single call to lrectwrite(), with delightfully good performance. When the application is using a smaller window, it presently drops back to a loop which calls lrectwrite() once per scanline. Here is the actual code fragment: /* Simplest case, nothing fancy */ y = ybase; if( !sw_zoom && !sw_cmap ) { if( ifp->if_width == SGI(ifp)->mi_memwidth ) { /* This one is very fast */ lrectwrite( SGI(ifp)->mi_xoff+0, SGI(ifp)->mi_yoff+y, SGI(ifp)->mi_xoff+0+ifp->if_width-1, SGI(ifp)->mi_yoff+y+nlines-1, &ifp->if_mem[(y*SGI(ifp)->mi_memwidth)* sizeof(struct sgi_pixel)] ); return; } for( n=nlines; n>0; n--, y++ ) { lrectwrite( SGI(ifp)->mi_xoff+0, SGI(ifp)->mi_yoff+y, SGI(ifp)->mi_xoff+0+ifp->if_width-1, SGI(ifp)->mi_yoff+y, &ifp->if_mem[(y*SGI(ifp)->mi_memwidth)* sizeof(struct sgi_pixel)] ); /* XXX big performance hit here. * GTX is limited to about 1000 lrectwrites/sec, * due to some library synchronization mechanism * that burns 60% of the CPU in sys-time. ?!?! */ } return; } So, what I really want to do is write a RECTANGLE from my buffer to a RECTANGLE on the screen, more in the style of rectcopy(). Does the 4D architecture offer me a way of doing this? I can imagine several possibilities: 1) a subroutine, perhaps: lrectwriterect( x1,y1, x2, y2, pixel_p, mem_width, mem_skip ) which would use mem_width pixels, then skip mem_skip pixels, and repeat. This would be perfect. 2) A subroutine modeled on the Berkeley writev() call that would take an array of structures roughly like this (any reasonable layout is fine with me): struct fast_pixel_cmds { int xscr_base; int yscr_base; struct sgi_pixel *pixel_p; int count; } array[MAX_CMDS]; fast_pixel_write_v( &array[0], cmd_count ); 3) A "vector" version of lrectwrite() that looked something like this: struct lrectwrite_vector { int xscr_base, yscr_base; int xscr_max, yscr_max; struct sgi_pixel *pixel_p; } array[MAX_CMDS]; lrectwrite_v( &array[0], cmd_count ); Any suggestion at all that you might have will be greatly appreciated! Thanks, -Mike