[comp.windows.x] XCopyArea pixmap to pixmap: why is it so slow ?

TRANLE@INTELLICORP.COM (Minh Tran-Le) (12/03/89)

I have done some perfomance testing of XCopyArea and I have found that on
most of the servers that we use (Vax GPX, hp, IBM Ps2) that it is around
10 times slower to do a XCopyArea from pixmap to pixmap then to do XCopyArea
from window to window. The window an pixmap that we where using was 500x500.

Does anybody know if it was a design decision not to optimize copyarea for
pixmap or is it a general hardware limitation ? So far I found that only the
DecStation (mips) has a copyarea function that is a fast on pixmaps than
on windows. 

Thanks, Minh Tran-Le.

Arpanet: tranle@intellicorp.com
Uucp:    ..sun!icmv!mtranle
-------

keith@EXPO.LCS.MIT.EDU (Keith Packard) (12/03/89)

> I have done some perfomance testing of XCopyArea and I have found that on
> most of the servers that we use (Vax GPX, hp, IBM Ps2) that it is around
> 10 times slower to do a XCopyArea from pixmap to pixmap then to do XCopyArea
> from window to window.

In each of the given examples, the display memory is controlled by special
"graphics decellerators", while off-screen memory is controlled using the cpu
alone.  Because most benchmarking programs only measure performance for
on-screen graphics, vendors typically optimize the code/hardware which draws
there, and leave the off-screen rendering to some first-year engineer.

This causes a large discrepency between on-screen and off-screen graphics
performance, which can easily be rectified by writing more intelligent
off-screen graphics code.

>                                            So far I found that only the
> DecStation (mips) has a copyarea function that is a fast on pixmaps than
> on windows.

This is because on-screen rendering is the same as off-screen rendering; the
cpu does all of the work in both cases and simply writes to either main memory
or the memory mapped frame buffer.  This means that tuning on-screen
performance effectively tunes the off-screen cases as well.

A more substantial advantage of this latter method is the performance you
will get in moving bits between the screen and off-screen pixmaps.  The
performance you see for either on-screen/on-screen or off-screen/off-screen
bitBlt will be the same as on-screen/off-screen.  And all with only one
copy of the bitblt code.

The disadvantage that this typically has is that special graphics hardware
is frequently connected to a display memory system that provides additional
bandwidth (interleaved memory, page-mode access, or wider-than-32-bit access).
This special memory system allows the on-screen/on-screen case to work faster
than would otherwise be possible.  Note that the magic graphics hardware
has little effect on this; the typical CPU can easily copy bits around
as fast as the memory system can take them, it just doesn't usually have
a fast enough path to the display memory.

Keith Packard
MIT X Consortium

harry@hpcvxhp.cv.hp.COM (Harry Phinney) (12/06/89)

> keith@expo.lcs.mit.edu (Keith Packard)
> Because most benchmarking programs only measure performance for
> on-screen graphics, vendors typically optimize the code/hardware which draws
> there, and leave the off-screen rendering to some first-year engineer.

Hmm, I always thought we optimized the on-screen rendering because it
had the greatest impact on the usability of the system.  Our current
releases will generally render to pixmaps as fast as to the screen,
except in the many cases where the display card hardware provides
assistance (e.g.  rops, blits).


> The disadvantage that this typically has is that special graphics hardware
> is frequently connected to a display memory system that provides additional
> bandwidth (interleaved memory, page-mode access, or wider-than-32-bit access).

One other advantage to a hardware approach is the parallelism that can
be gained by allowing the on-card hardware to run while the CPU starts
dispatching the next request.

Harry Phinney  harry@hp-pcd.cv.hp.com

klein@lupine.UUCP (12/08/89)

> keith@expo.lcs.mit.edu (Keith Packard)
> Because most benchmarking programs only measure performance for
> on-screen graphics, vendors typically optimize the code/hardware which draws
> there, and leave the off-screen rendering to some first-year engineer.

  You respond:
   Hmm, I always thought we optimized the on-screen rendering because it
   had the greatest impact on the usability of the system. 

From what I have seen over the past year here, you gain at least as
much, if not more, "usability" by looking at other parts of the code.
For example, bring up an xmh window with lots of folders, (i.e, lots of
"buttons"), then resize it, move an opaque window over, or just
anything to make it "re-calculate". Beyond a reasonable on-screen
performance level, you can make this *lots* faster by improving other
parts of the server!

Doug Klein
NCD
klein@ncd.com