prl@iis.UUCP (Peter Lamb) (06/02/89)
Shortly after Claus Gittinger published his Xbench program and posted a collection of xstones ratings, he was criticised for not providing enough information about the compiler/optimisation flags for the servers he benchmarked. In order to fill in the gap a little, here are some timings done on a Sun 3/60M, diskless, 8Mb memory, server and client on the same machine. The server was compiled with SunOS4.0 cc -O, except for cfb and mfb, which were compiled with the options shown with each set of results. There are benchmarks for both the vanilla MIT X11R3 (patchlevel 9) server and the same server with the Purdue2 B/W speedups applied. They are sorted on increasing xstones. The timings are best-of-3 runs for each benchmarked operation (Xbench default) and a timegoal of 10 sec. SunOS4.0 cc -O; Vanilla MIT TOTAL 17849 lineStones TOTAL 14780 fillStones TOTAL 15850 blitStones TOTAL 27915 arcStones TOTAL 14917 textStones TOTAL 16013 complexStones TOTAL 15971 xStones SunOS4.0 cc -O; Purdue2 speedups TOTAL 18841 lineStones TOTAL 13950 fillStones TOTAL 22691 blitStones TOTAL 27983 arcStones TOTAL 15391 textStones TOTAL 16143 complexStones TOTAL 17037 xStones SunOS4.0 cc -O4; Purdue2 speedups TOTAL 21984 lineStones TOTAL 16575 fillStones TOTAL 19704 blitStones TOTAL 28059 arcStones TOTAL 21727 textStones TOTAL 17124 complexStones TOTAL 19872 xStones GNU gcc -O -fpcc-struct-return -fstrength-reduce; Vanilla MIT TOTAL 21483 lineStones TOTAL 19146 fillStones TOTAL 23191 blitStones TOTAL 28111 arcStones TOTAL 19457 textStones TOTAL 17464 complexStones TOTAL 20349 xStones GNU gcc -O -fpcc-struct-return -fstrength-reduce; Purdue2 speedups TOTAL 23036 lineStones TOTAL 17692 fillStones TOTAL 25457 blitStones TOTAL 28039 arcStones TOTAL 20106 textStones TOTAL 17588 complexStones TOTAL 20785 xStones GNU gcc -O -fpcc-struct-return -fstrength-reduce; Purdue2 speedups, no asm()'s TOTAL 23022 lineStones TOTAL 17720 fillStones TOTAL 24938 blitStones TOTAL 28085 arcStones TOTAL 20863 textStones TOTAL 17588 complexStones TOTAL 20955 xStones Interestingly, there is almost no difference between the Purdue2 speedups using the assembly language hacks (using the 68020 bfins and bfext instructions for inserting bit fields) and the performance without them. Not surprisingly, the biggest gain is if you have to use `cc -O'; here the fact that the Purdue2 speedups make more sensible use of register variables than the sample server gives you a big advantage. One surprise is that Purdue2 seems to *slow down* fill: looking at the detailed results, Purdue2 is about the same or slightly better in most of the area fill tests, except plain fill. size 10 100 400 Purdue2 filled rectangles 7892.49 1642.91 252.12 rectangles/sec tiled rectangles 5482.12 734.05 103.33 rectangles/sec stippled rectangles 2050.97 231.11 53.82 rectangles/sec invert rectangles 5936.20 1403.98 192.54 rectangles/sec Vanilla MIT filled rectangles 8170.75 2031.36 276.66 rectangles/sec tiled rectangles 5503.75 736.69 103.43 rectangles/sec stippled rectangles 2019.77 231.11 53.86 rectangles/sec filled polygon 143.53 fills/sec invert rectangles 5870.02 1661.68 207.27 rectangles/sec I suspect that this is due to the use of Duff's device causing Icache misses. Will probably do something else on machines with different cache sizes and/or replacement strategies. -- Peter Lamb uucp: uunet!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241 Integrated Systems Laboratory ETH-Zentrum, 8092 Zurich