[comp.unix.i386] Simple X windows benchmark

richard@pegasus.com (Richard Foulk) (07/28/90)

I'm trying to assess the performance of the various video boards and X
servers.  Here's a very simple benchmark that zeros in on one very important
aspect -- scrolling.  I'd appreciate it if those running X could try this
out and report their results.

To run the benchmark start up an xterm like this:

	xterm -geometry 80x24 -fn 8x13 +j &

then put the following awk script in a file called x-test:
-----------------------------cut here----------------------------
BEGIN {
	for (i = 0; i < 1000; i++) {
		printf("xxxxxxxxxxxxx %d\n", i);
	}
	exit;
}
-----------------------------cut here----------------------------
Run it, from the above mentioned xterm, like this:

	time awk -f x-test

and report the real time results.

One data point:

	Unix:		ISC 2.0.2 Unix
	X server:	ISC X11R3 1.0.0, Xhrc
	Resolution:	720x348x2
	Video card:	Hercules monochrome graphics clone
	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
	80387:		no

	Time:		1:43


If you're running X on a 386 or 486 box please try this and post your results.

Thanks.


-- 
Richard Foulk		richard@pegasus.com

caf@omen.UUCP (WA7KGX) (07/29/90)

:One data point:
:
:	Unix:		ISC 2.0.2 Unix
:	X server:	ISC X11R3 1.0.0, Xhrc
:	Resolution:	720x348x2
:	Video card:	Hercules monochrome graphics clone
:	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
:	80387:		no
:
:	Time:		1:43
:
	Unix: SCO 3.2 / ODT
	Cpu:		33 mHz Micronics, 64k cache, 16 MB, no 387

	Time:		$33 Herc clone:	1:36
			Microfield T8:	0:50

misko@abhg.UUCP (William Miskovetz) (07/30/90)

In article <48@omen.UUCP>, caf@omen.UUCP (WA7KGX) writes:
> :One data point:
> :
> :	Unix:		ISC 2.0.2 Unix
> :	X server:	ISC X11R3 1.0.0, Xhrc
> :	Resolution:	720x348x2
> :	Video card:	Hercules monochrome graphics clone
> :	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
> :	80387:		no
> :
> :	Time:		1:43
> :
> 	Unix: SCO 3.2 / ODT
> 	Cpu:		33 mHz Micronics, 64k cache, 16 MB, no 387
> 
> 	Time:		$33 Herc clone:	1:36
> 			Microfield T8:	0:50


	UNIX:		ISC 2.2
	X server:	ISC X11 V 1.1, Xgp
	Resolution	1024x768x256
	Video card:	Paradise 8514/A
	CPU:		20 MHz Compaq 386, 9MB RAM
	80387:		Yes

	Time:		0:29


Bill Miskovetz
{uunet!lll-winken, apple!mathworks}!abhg!misko
misko@mathworks.com
abhg!misko@lll-winken.llnl.gov

rick@pcrat.uucp (Rick Richardson) (07/30/90)

In article <1990Jul28.014025.17578@pegasus.com> richard@pegasus.com (Richard Foulk) writes:
>One data point:
>
>	Unix:		ISC 2.0.2 Unix
>	X server:	ISC X11R3 1.0.0, Xhrc
>	Resolution:	720x348x2
>	Video card:	Hercules monochrome graphics clone
>	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
>	80387:		no
>
>	Time:		1:43

Here's another point, for the slowest 386 we've got:

	Unix:		ISC 2.0.2 Unix
	X server:	ISC X11R3 1.0.0, Xvga
	Resolution:	640x480x16
	Video card:	Paradise EGA-480
	Cpu:		16MHz 386 (Mylex), 64k cache, 4MB 32 bit + 4MB 16 bit
	80387:		no
	80287:		yes

	Time:		1:39

This is rather strange. The elapsed times are so close,
even though there is at least a factor of two difference
in the raw performance of the machines.

I'd guess one or both of the following are true:

	1) The herc server has had zero tuning
	2) The test saturates the bandwidth to the herc card

-Rick

-- 
Rick Richardson | Looking for FAX software for UNIX/386 ??? Ask About: |Mention
PC Research,Inc.| FaxiX - UNIX Facsimile System (tm)                   |FAX# for
uunet!pcrat!rick| FaxJet - HP LJ PCL to FAX (Send WP,Word,Pagemaker...)|Sample
(201) 389-8963  | JetRoff - troff postprocessor for HP LaserJet and FAX|Output

paul@dialogic.uucp (The Imaginative Moron aka Joey Pheromone) (07/31/90)

	Unix:		ISC 2.0.2 Unix
	X server:	ISC X11R3 1.1, Xwge
	Resolution:	1600x1024x2
	Video card:	Bell Tech Blit (Workstation Graphics Engine)
	Cpu:		25MHz 386 AST Premium 386/25, no cache, 8megs RAM
	80387:		no

	Time:		29.4 seconds
--
Paul Bennett	      |  			| "I give in, to sin, because
Dialogic Corp.	      |   paul@dialogic.UUCP	|  You have to make this life
300 Littleton Road    | ..!uunet!dialogic!paul	|  livable"
Parsippany, NJ 07054  |	 			|  Martin Gore

talvola@janus.Berkeley.EDU (Erik Talvola) (07/31/90)

In article <243@abhg.UUCP> misko@abhg.UUCP (William Miskovetz) writes:

>   In article <48@omen.UUCP>, caf@omen.UUCP (WA7KGX) writes:
>   > :One data point:
>   > :
>   > :	Unix:		ISC 2.0.2 Unix
>   > :	X server:	ISC X11R3 1.0.0, Xhrc
>   > :	Resolution:	720x348x2
>   > :	Video card:	Hercules monochrome graphics clone
>   > :	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
>   > :	80387:		no
>   > :
>   > :	Time:		1:43
>   > :
>   > 	Unix: SCO 3.2 / ODT
>   > 	Cpu:		33 mHz Micronics, 64k cache, 16 MB, no 387
>   > 
>   > 	Time:		$33 Herc clone:	1:36
>   > 			Microfield T8:	0:50
>
>
>	   UNIX:		ISC 2.2
>	   X server:	ISC X11 V 1.1, Xgp
>	   Resolution	1024x768x256
>	   Video card:	Paradise 8514/A
>	   CPU:		20 MHz Compaq 386, 9MB RAM
>	   80387:		Yes
>
>	   Time:		0:29

Just for comparisons:

UNIX: 		SunOS 3.5
X server:	MIT X11R4
Resolution:	1152x900x2 (black and white)
CPU:		Sun 3/50 (16 MHz 68020 w/ 68881)

Time:		0:31

Looks like the Hercules is just overly slow - probably because nobody
has done any work on it.  The Sun should be significantly slower than
the 33 MHz 386 machines - no graphics hardware in a Sun 3/50 either.




--
+----------------------------+
! Erik Talvola               | "It's just what we need... a colossal negative 
! talvola@janus.berkeley.edu | space wedgie of great power coming right at us
! ...!ucbvax!janus!talvola   | at warp speed." -- Star Drek

johnl@esegue.segue.boston.ma.us (John R. Levine) (07/31/90)

In article <1990Jul30.020330.6291@pcrat.uucp> rick@pcrat.UUCP (Rick Richardson) writes:
>>	Cpu:		33MHz 386 (Mylex), 128k cache, 8megs RAM
>>	80387:		no
>>	Time:		1:43

>	Cpu:		16MHz 386 (Mylex), 64k cache, 4MB 32 bit + 4MB 16 bit
>	80287:		yes
>	Time:		1:39
>
>This is rather strange. The elapsed times are so close, ...

The X11R3 does a lot of floating point arithmetic, so the 287 makes a lot
of difference.  I gather that the X11R4 sample server has been rewritten
to do as much as possible as integer.  It would be interesting to see some
relative figures on the R4 server.

-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
Marlon Brando and Doris Day were born on the same day.

scottw@ico.isc.com (Scott Wiesner) (07/31/90)

>>   > :	X server:	ISC X11R3 1.0.0, Xhrc
> 
> Looks like the Hercules is just overly slow - probably because nobody
> has done any work on it.  The Sun should be significantly slower than
> the 33 MHz 386 machines - no graphics hardware in a Sun 3/50 either.

While it's true ISC's Hercules server probably hasn't gotten the
attention the VGA server has, I doubt it could be made too much faster.
There were some improvements made to text output since the 1.0 release,
but scrolling won't change much.  The basic problem here is that we're
dealing with a slow 8 bit DRAM device running over a slow bus.  I have 
heard that on a VGA with a 16 Mhz 386, you're stuck with 20 or more
wait states.  I've found that accessing video memory can take more than
6 times as long as accessing normal memory.  Welcome to the wonderful
world of the IBM PC.

Scott Wiesner
Interactive Systems
X Development Group

scottw@ico.isc.com (Scott Wiesner) (07/31/90)

> The X11R3 does a lot of floating point arithmetic, so the 287 makes a lot
> of difference.  I gather that the X11R4 sample server has been rewritten
> to do as much as possible as integer.  It would be interesting to see some
> relative figures on the R4 server.

This is a common misconception that I wish would die.  Arcs and wide lines
in X11R3 use a lot of floating point.  There's no floating point in text
or CopyArea, so that's not affecting the speed of this scrolling test.  
The bottleneck here is mainly the Hercules board.  Also, the comparison
given was between a version of ISC's X that's over a year old and SCO's
X, which is newer.  Newer releases of ISC's X are somewhat faster.  The
newest (1.2) release integrates the X11R4 arc and wide line code to 
speed up those operations substantially.

Scott Wiesner
Interactive Systems
X Development Group

richard@pegasus.com (Richard Foulk) (08/01/90)

Please keep those benchmark numbers coming in!

I'll post a summary soon.

Thanks.



-- 
Richard Foulk		richard@pegasus.com

wtm@uhura.neoucom.EDU (Bill Mayhew) (08/01/90)

I have a very old 16 MHz IBM model 80-071 with motherboard VGA.
I was quite surprised to discover that there are 25 (!!!) wait
states required to access the VGA RAM.  Not exactly the sort of
machine that makes one want to have an X windows display.

==Bill==
-- 
Bill Mayhew  Northeastern Ohio Universities College of Medicine
Rootstown, OH  44272-9995  USA    phone: 216-325-2511
wtm@uhura.neoucom.edu   ....!uunet!aablue!neoucom!wtm
via internet: (140.220.001.001)

gary@mic.UUCP (Gary Lewin) (08/02/90)

These figures may be of interest:

	Unix:		ISC 2.0.2 Unix
	X server:	ISC X11R3 1.2, Xvga
	Resolution:	1024x768x256
	Video card:	Micro-Labs, Ultimate VGA
	Cpu:		25MHz 386, no cache, 16 MB
	80387:		yes

	Time:		0:13.31s


Nothing like a little variety to perk things up.  The X11 is a beta
of 1.2 which supports a number of 12024x768x256 color cards.  The
Micro-Labs card is excellent and is under review by ISC for adding
to the approved list.

Gary Lewin
gary@mic.lonestar.org

brando@uicsl.csl.uiuc.edu (08/08/90)

Just for comparisons (again):

UNIX:           SunOS 4.1
X server:       MIT X11R4
Resolution:     1152x900x2 (black and white)
CPU:            SPARCstation 1 Generic

Time:           0:13.1

plocher@sally.Sun.COM (John Plocher) (08/08/90)

+-- richard@pegasus.com (Richard Foulk) writes:
| 	xterm -geometry 80x24 -fn 8x13 +j &
| 
| then put the following awk script in a file called x-test:
| -----------------------------cut here----------------------------
| BEGIN {
| 	for (i = 0; i < 1000; i++) {
| 		printf("xxxxxxxxxxxxx %d\n", i);
| 	}
| 	exit;
| }
| -----------------------------cut here----------------------------
| Run it, from the above mentioned xterm, like this:
| 
| 	time awk -f x-test
| 
| and report the real time results.
+--

These test times can be reduced by 50% or more by replacing the
	time awk -f x-test
with
	awk -f x-test > /tmp/x
	time cat /tmp/x

This implies that you are measuring as much "awk" time as you
are "scrolling".  In fact, awk is a known abuser of FP, as
reflected by other comments about this benchmark.

FYI, on a Sun SS1+GX (1152x900x256), the test takes about 13 seconds.

  -John

roell@lan.informatik.tu-muenchen.dbp.de (Thomas Roell) (08/09/90)

First some test results:

	UNIX:		ISC 2.0.2
	X server:	X386 (X11R4) internal test version
	Video card:	VGA GENOA 5400
	CPU:		33 MHz, 32k cache, 8MB (PizzaMan's Special)
	30387:		no

a)	Resolution:	864x606x2
	Time:		0:59

b)	Resolution:	800x600x256
	Time:		3:40

Now the interpretation of the results: The generic at386 boxes are generally
slow for i/o task. The normal i/o bus speed is between 8MHz and 10MHz. 
Therefore it is not depending on the CPU speed, whether scrolling is fast. The
only factor for speed is the access time of the VGA card. Before doing any
asumtions on the speed of a particular VGA, you should note that the CPU can
access the VGA's memory about every 5 (five) VGA cycles. One VGA cycle is
dotclock/8. In case a) I used a 44.9 MHz dotclock, which means, I could acceive
a throughput of 1.06 MBytes/sec. The window we scrolled was 191360 Pixels big
(80x8x23x13); we scrolled 1000-24 times. This means we got athroughput of
about 0.75 MBytes/sec. Ok, lets work about this scrolling in monochrome some
time. Now case b): here I used a 39 MHz dotclock. You should note that the
GENOA (i.e the Tseng ET3000 Chip) alows us to access the VGA's memory wordwise.
Now our maximal throughtput is 1.86 MBytes/sec. And we got (via this test)
about 1,62 MBytes/sec. Summarized, I could say: "good job done, scrolling is
very close to the maximum throughput of your graphics device". (Other test
showed me that the scroll routine gets about 92% of the maximal possible
throughput in 256 colors mode, if only scrolling is tested)

Let's do now a interpretation of the test: (Since I didn't use prof(1) for this
test all numbers below are estimates) About 80% of the test is spent in the
scroll, 10% in a fill and 10% in a glyph painting routine. Was this intended ??
All tested here is only the speed of the graphics device. Nothing more !!
This test simulates NOT the normal case (via the option +j 'jump-scrolling'
is disabled !!). That means, what you tested here is NOT the time, which is 
necessary to scroll 1000 lines under normal conditions. My second main point
of critic is that you tested only a very small (and in my oppinion not so
absolute important) facette of a X server (about 2000 Bytes out of 560000 Bytes
of code !!). It depends upon your application what parts of the X server are
the most important for your job. For simple text application (which should be
benchmarked here) the mixture should be 50% glyph paint, 30% fill solid and
20% scrolling (that were the weights I got via prof(1), when did 'ls -RCs /'
in a 80x44 window about 100 times).

Here are the results, when you start up the xterm with jumpscroll enabled,
which seemed to be done by some people for getting better numbers for their
graphics borads (like Gary Lewin, with Micro-Labs, Ultimate VGA **):

a)	Resolution:	864x606x2
	Time:		0:07.28s

b)	Resolution:	800x600x256
	Time:		0:12.19s

But no critic without saying what could be done better. As I posted some days
ago you should use a special benchmarking utility. X11PERF is such an utility.
It can be found under the X11R4-tape (contrib/demos/x11perf). The port to X11R3
should be simple. Running all test will take some hours (4 to 5). But then you
will have results that are saying some more than the above test does.


- Thomas

PS: If you use the above test, move your cursor off the xterm in that you are
    doing the test. Guess why! The original author has forgotten to say this,
    or he didn't know the importance of this trick.

** (note)
Gary Lewin told us his VGA did the test in 13.31s. That means his VGA has a
throughput of 80x8x23x13x(1000-24)x2/13.31s = 26.76 MBytes/sec. (I did not
use the fact that the test uses only 80% in scrolling, so the more realistic
throughput has to be around 33 MBytes/sec) If his Graphics board allows 16bit
(ISA bus) access, his CPU MUST have an i/o speed of 13.5 MHz (assuming
the VGA allows zero wait state access, and the CPU can access video memory
parallel to the VGA's display unit; if you'll take 33 MB/sec it will be 17 MHz
i/o speed). This is faster than every board (CPU & VGA combination) I ever
heard of!! Please, Gary post realistic results, or tell us how your VGA works
(EISA bus, VRAM's, graphics processor and so on), and where to get it. 

--
_______________________________________________________________________________
Mail:                    Thomas Roell (c/o Daniel Hernandez)
                         Inst. f. Informatik / Technische Universitaet M"unchen
                         Arcisstr. 21 / 8000 Munich 2 / Fed.Rep. of Germany
E-Mail (domain):	 roell@lan.informatik.tu-muenchen.dbp.de
UUCP (when above fails): roell@tumult.{uucp | informatik.tu-muenchen.de}
-------------------------------------------------------------------------------

richard@pegasus.com (Richard Foulk) (08/11/90)

In article <3858@tuminfo1.lan.informatik.tu-muenchen.dbp.de> roell@lan.informatik.tu-muenchen.dbp.de (Thomas Roell) writes:
>First some test results:
>
>	UNIX:		ISC 2.0.2
>	X server:	X386 (X11R4) internal test version
>	Video card:	VGA GENOA 5400
>	CPU:		33 MHz, 32k cache, 8MB (PizzaMan's Special)
>	30387:		no
>
>a)	Resolution:	864x606x2
>	Time:		0:59
>
>b)	Resolution:	800x600x256
>	Time:		3:40
>

Thanks for the data.

> [...]
>Let's do now a interpretation of the test: (Since I didn't use prof(1) for this
>test all numbers below are estimates) About 80% of the test is spent in the
>scroll, 10% in a fill and 10% in a glyph painting routine. Was this intended ??

Yes.

> [...]
>But no critic without saying what could be done better. As I posted some days
>ago you should use a special benchmarking utility. X11PERF is such an utility.
>It can be found under the X11R4-tape (contrib/demos/x11perf). The port to X11R3
>should be simple. Running all test will take some hours (4 to 5). But then you
>will have results that are saying some more than the above test does.

That tests things that I'm not concerned with.  But more importantly
it's not a test that you're likely to get many people to run -- it's
just too much trouble.  (I've only gotten 18 responses to my simple
benchmark so far.  Thanks very much to those who've taken the time to
run it and send their results!)

>
>Gary Lewin told us his VGA did the test in 13.31s. That means his VGA has a
>throughput of 80x8x23x13x(1000-24)x2/13.31s = 26.76 MBytes/sec. (I did not
>use the fact that the test uses only 80% in scrolling, so the more realistic
>throughput has to be around 33 MBytes/sec) If his Graphics board allows 16bit
>(ISA bus) access, his CPU MUST have an i/o speed of 13.5 MHz (assuming
>the VGA allows zero wait state access, and the CPU can access video memory
>parallel to the VGA's display unit; if you'll take 33 MB/sec it will be 17 MHz
>i/o speed). This is faster than every board (CPU & VGA combination) I ever
>heard of!! Please, Gary post realistic results, or tell us how your VGA works
>(EISA bus, VRAM's, graphics processor and so on), and where to get it. 

A growing number of cards don't require the cpu to do the scrolling,
they have an on-board processor to do the work.  Cpu and bus speeds
should be mostly irrelevant for scrolling.

I'm interested in scrolling because it's the only performance aspect
that comes close to being unacceptable on some systems I've tried.
Yes, I use jump-scrolling, but too often it doesn't work very well.
Some programs send text just slow enough to circumvent jump-scrolling.

I'll post the benchmark summary in a day or two.  I was hoping to get
more responses, especially from people with faster cards, 8514's and
such, but I've only received a couple.

Someone complained that too much time is spent within awk, and that
this invalidated the benchmark results.  In my tests, on a machine with
no fpu, when I redirect the test to /dev/null it takes less than one
second to run.  This is insignificant compared to the average benchmark
run time of over a minute.  (When I was devising the benchmark, I
considered sending awk's output to a file and then timing a "cat" of
the file, but I decided it wasn't necessary.)


If you haven't run the benchmark yet please do so and send me the
results.  It's short and fairly simple, just in case you missed it,
here it is again:

To run the benchmark start up an xterm like this:

	xterm -geometry 80x24 -fn 8x13 +j &

then put the following awk script in a file called x-test:
-----------------------------cut here----------------------------
BEGIN {
	for (i = 0; i < 1000; i++) {
		printf("xxxxxxxxxxxxx %d\n", i);
	}
	exit;
}
-----------------------------cut here----------------------------
Run it, from the above mentioned xterm, like this:

	time awk -f x-test

and report the real time results.


Thanks very much.

richard@pegasus.com (Richard Foulk) (08/11/90)

My .signature misfired.  Please send those benchmark results to:

	richard@pegasus.com

or post them if you prefer.


Thanks again.



Richard Foulk		richard@pegasus.com