[comp.sys.sgi] lrectwrite & gsync?

mike@BRL.MIL (Mike Muuss) (03/03/89)

My thanks to the folks from SGI who informed me that ps_open_PostScript()
not being machine independent presently, and that the new release fixes
this.

Since I am forced to abandon the portability topic until we get the
fabled next release, I have moved on to some performance enhancements.

I have noticed that if I write the entire screen (on a GT) with
a single call to lrectwrite(), it proceeds with blinding speed.
If, however, I send the same amount of data using one call to
lrectwrite() per scanline, it proceeds at the rate of about 1000
scalines/second, which is very slow.

Running gr_osview during this time, I notice that total user time
is about 3%, and system time averages 40-60% !!!  I am calling
lrectwrite() in a loop, there are no (intentional) system calls
being made.  What I suspect is that lrectwrite() may be doing
something like a gsync() on every call, or perhaps notifying
the window manager that this might be a good time for another process
to have a chance to do some graphics, or some such.

I seek a way to stop this behavior.  Can anybody help?

(I find nothing about this in the online manuals.  I know from writing
SunView programs that acquiring a "window manager lock" is something
that some systems permit.  Is there a comparable SGI routine that
will do as I wish?)

Suggestions to reformat the in-memory data into a form so that I can
ALWAYS use a single lrectwrite() are not helpful. Sometimes I can do
it all in one, sometimes not.  It would cost 4 Mbytes of extra memory
and a lot of data copies to shuffle things.

	Thanks,
	 -Mike

msc@ramoth.SGI.COM (Mark Callow) (03/04/89)

In article <8903030550.aa06677@SPARK.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes:
> I have noticed that if I write the entire screen (on a GT) with
> a single call to lrectwrite(), it proceeds with blinding speed.
> If, however, I send the same amount of data using one call to
> lrectwrite() per scanline, it proceeds at the rate of about 1000
> scalines/second, which is very slow.
> 
> Running gr_osview during this time, I notice that total user time
> is about 3%, and system time averages 40-60% !!!  I am calling
> lrectwrite() in a loop, there are no (intentional) system calls
> being made.  What I suspect is that lrectwrite() may be doing
> something like a gsync() on every call, or perhaps notifying
> the window manager that this might be a good time for another process
> to have a chance to do some graphics, or some such.
The system time is most likely due to the dma setup.  The pixels are
transferred by dma.  On the GT lrectwrite chooses whether to use dma or
push the pixels into the pipe depending on the number of pixels being
transferred.  Pushing pixels is much slower than dma but obviously the
dma setup time is a concern.  I don't know the exact changeover point.

No messages are sent to the window manager.  No window or screen locks are
acquired or freed.
--
	-Mark

mike@BRL.MIL (Mike Muuss) (03/04/89)

Mark -

Thanks for your detailed and informative note.  From what you say,
then the 60% SYS time must be DMA setup, and the 40% IDLE time must
be the actual DMA transmission time. I was seeing 1000
scanlines/second.  If that translates to 1000 syscalls and interupts per
second, then I can understand the significant overhead that I was
encountering.

I guess I would like the opportunity to vary the pipe_write / DMA
crossover point in my application, to see if I can produce faster
screen updates.  The SGI evaluation to set the threshold may not have
taken the system overhead fully into account.

THE BIG PICTURE

Let me also take this opportunity to tell you what I need to do;
perhaps you can suggest some different strategy that may achieve higher
performance.  I have a shared memory segment that is organized as
1024 scanlines of 1280 pixels of 4 bytes each (SGI AlphaBGR format
for lrectwrite).  The arrangement of this data must be fixed,
regardless of what sub-rectangle of it is presently of interest.
If it would help any, I can change the internal organization any
way I like, subject to the previous constraint.

When the application is using the full screen, then this entire array is
written with a single call to lrectwrite(), with delightfully good
performance.  When the application is using a smaller window, it
presently drops back to a loop which calls lrectwrite() once per
scanline.  Here is the actual code fragment:

	/* Simplest case, nothing fancy */
	y = ybase;
	if( !sw_zoom && !sw_cmap )  {
		if( ifp->if_width == SGI(ifp)->mi_memwidth )  {
			/* This one is very fast */
			lrectwrite(
				SGI(ifp)->mi_xoff+0,
				SGI(ifp)->mi_yoff+y,
				SGI(ifp)->mi_xoff+0+ifp->if_width-1,
				SGI(ifp)->mi_yoff+y+nlines-1,
				&ifp->if_mem[(y*SGI(ifp)->mi_memwidth)*
				    sizeof(struct sgi_pixel)] );
			return;
		}
		for( n=nlines; n>0; n--, y++ )  {
			lrectwrite(
				SGI(ifp)->mi_xoff+0,
				SGI(ifp)->mi_yoff+y,
				SGI(ifp)->mi_xoff+0+ifp->if_width-1,
				SGI(ifp)->mi_yoff+y,
				&ifp->if_mem[(y*SGI(ifp)->mi_memwidth)*
				    sizeof(struct sgi_pixel)] );
			/*  XXX big performance hit here.
			 *  GTX is limited to about 1000 lrectwrites/sec,
			 *  due to some library synchronization mechanism
			 *  that burns 60% of the CPU in sys-time. ?!?!
			 */
		}
		return;
	}

So, what I really want to do is write a RECTANGLE from my buffer
to a RECTANGLE on the screen, more in the style of rectcopy().
Does the 4D architecture offer me a way of doing this?
I can imagine several possibilities:

1)  a subroutine, perhaps: 
	lrectwriterect( x1,y1, x2, y2, pixel_p, mem_width, mem_skip )
which would use mem_width pixels, then skip mem_skip pixels, and repeat.
This would be perfect.

2)  A subroutine modeled on the Berkeley writev() call that would take
an array of structures roughly like this (any reasonable layout is fine
with me):
	struct fast_pixel_cmds {
		int		xscr_base;
		int		yscr_base;
		struct sgi_pixel *pixel_p;
		int		count;
	} array[MAX_CMDS];

	fast_pixel_write_v( &array[0], cmd_count );
		
3)  A "vector" version of lrectwrite() that looked something like this:
	struct lrectwrite_vector {
		int		xscr_base, yscr_base;
		int		xscr_max, yscr_max;
		struct sgi_pixel *pixel_p;
	} array[MAX_CMDS];

	lrectwrite_v( &array[0], cmd_count );

Any suggestion at all that you might have will be greatly appreciated!
	Thanks,
	 -Mike