prl@iis.UUCP (Peter Lamb) (06/02/89)
Shortly after Claus Gittinger published his Xbench program and posted
a collection of xstones ratings, he was criticised for not providing
enough information about the compiler/optimisation flags for the servers
he benchmarked. In order to fill in the gap a little, here are some timings
done on a Sun 3/60M, diskless, 8Mb memory, server and client on the same
machine.
The server was compiled with SunOS4.0 cc -O, except for cfb and mfb,
which were compiled with the options shown with each set of results.
There are benchmarks for both the vanilla MIT X11R3 (patchlevel 9)
server and the same server with the Purdue2 B/W speedups applied.
They are sorted on increasing xstones.
The timings are best-of-3 runs for each benchmarked operation (Xbench
default) and a timegoal of 10 sec.
SunOS4.0 cc -O; Vanilla MIT
TOTAL 17849 lineStones
TOTAL 14780 fillStones
TOTAL 15850 blitStones
TOTAL 27915 arcStones
TOTAL 14917 textStones
TOTAL 16013 complexStones
TOTAL 15971 xStones
SunOS4.0 cc -O; Purdue2 speedups
TOTAL 18841 lineStones
TOTAL 13950 fillStones
TOTAL 22691 blitStones
TOTAL 27983 arcStones
TOTAL 15391 textStones
TOTAL 16143 complexStones
TOTAL 17037 xStones
SunOS4.0 cc -O4; Purdue2 speedups
TOTAL 21984 lineStones
TOTAL 16575 fillStones
TOTAL 19704 blitStones
TOTAL 28059 arcStones
TOTAL 21727 textStones
TOTAL 17124 complexStones
TOTAL 19872 xStones
GNU gcc -O -fpcc-struct-return -fstrength-reduce; Vanilla MIT
TOTAL 21483 lineStones
TOTAL 19146 fillStones
TOTAL 23191 blitStones
TOTAL 28111 arcStones
TOTAL 19457 textStones
TOTAL 17464 complexStones
TOTAL 20349 xStones
GNU gcc -O -fpcc-struct-return -fstrength-reduce; Purdue2 speedups
TOTAL 23036 lineStones
TOTAL 17692 fillStones
TOTAL 25457 blitStones
TOTAL 28039 arcStones
TOTAL 20106 textStones
TOTAL 17588 complexStones
TOTAL 20785 xStones
GNU gcc -O -fpcc-struct-return -fstrength-reduce; Purdue2 speedups, no asm()'s
TOTAL 23022 lineStones
TOTAL 17720 fillStones
TOTAL 24938 blitStones
TOTAL 28085 arcStones
TOTAL 20863 textStones
TOTAL 17588 complexStones
TOTAL 20955 xStones
Interestingly, there is almost no difference between the Purdue2 speedups
using the assembly language hacks (using the 68020 bfins and bfext instructions
for inserting bit fields) and the performance without them.
Not surprisingly, the biggest gain is if you have to use `cc -O';
here the fact that the Purdue2 speedups make more sensible use of
register variables than the sample server gives you a big advantage.
One surprise is that Purdue2 seems to *slow down* fill: looking
at the detailed results, Purdue2 is about the same or slightly better
in most of the area fill tests, except plain fill.
size 10 100 400
Purdue2
filled rectangles 7892.49 1642.91 252.12 rectangles/sec
tiled rectangles 5482.12 734.05 103.33 rectangles/sec
stippled rectangles 2050.97 231.11 53.82 rectangles/sec
invert rectangles 5936.20 1403.98 192.54 rectangles/sec
Vanilla MIT
filled rectangles 8170.75 2031.36 276.66 rectangles/sec
tiled rectangles 5503.75 736.69 103.43 rectangles/sec
stippled rectangles 2019.77 231.11 53.86 rectangles/sec
filled polygon 143.53 fills/sec
invert rectangles 5870.02 1661.68 207.27 rectangles/sec
I suspect that this is due to the use of Duff's device causing
Icache misses. Will probably do something else on machines with
different cache sizes and/or replacement strategies.
--
Peter Lamb
uucp: uunet!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241
Integrated Systems Laboratory
ETH-Zentrum, 8092 Zurich