[net.arch] IOCALL results and problems

stubbs@ncr-sd.UUCP (Jan Stubbs) (12/12/85)

	IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK
The results so far are below. Thanks everybody.
Send your results to me directly. The benchmark is a "C" program
which measures Unix kernel performance. 

time iocall     Send all 3 times (user, system, real)
                I am reporting the system time only.
         
"The opinions expressed herein are those of the author. Your mileage may vary".
 
Problems:

1) As Jeff Makey kindly pointed out, IOCALL unfortunately does cross a
buffer boundary if your buffer size is 512. Older versions of Unix
(Version 7, System III) and their progeny were 512. Berkeley 4.2, 4.3,
System V and their progeny are 1024 or bigger, so no problem with those
numbers. But all the numbers sent to me for the 512 byte buffer unixes
are slower than they should be because they did over 1000 disk writes
which uses lots of cpu cycles in the drivers. I don't know about
Version 8 or 2.9 BSD, can anyone help?

Jeff offered a solution, which adds a seek to keep everything in the
1st 512 bytes. This makes the kernel do a little extra work, but it did
not change the timing on our Pyramid. The new source is below, if you
have a 512 byte buffer version of Unix please rerun with this one.

2) Jeff and others also pointed out that the 2nd argument to lseek
should be a long not an int. Shame on me! See what happens when you
don't lint your programs? The source below also fixes this. Reruns may
be required to get correct results on machines where longs aren't the
same as ints. (PDP's...).

3) I failed to mention that these timings should be run on an otherwise
idle machine. If you can please run them so, it does improve the
timings.

4) Since not everyone is a good sport about benchmarks, and since I
might be a biased source, and since I don't have access to the latest
NCR Unix stuff anyhow (the M68020 based Tower/32) I won't publish any
NCR numbers, unless offered to me by NCR E&M Columbia which is where
the Tower line comes from. I encourage someone else to do so however.


Jan Stubbs ..sdcsvax!ncr-sd!stubbs
              
IOCALL RESULTS:

SYSTEM				UNIX VERSION		SYSTEM TIME SECONDS
-----------			----------------	-------------------
DEC Rainbow100 w/NECV20 	Venix			18.4 *a
DEC Pro-300			Venix 1.1		18.1 *a
MicroVax I			Ultrix V1.1		18.0
Onyx C8002s Z8000		SIII			13.7 *a	
Onyx C8002 Z8000		v7			13.3 *a
TIL NS32016 9MHz No Wait states	Local Port		12.2
ATT 3b2/300			SV			10.3
VAX 11/750			4.2 BSD			10.0
PDP 11/44			ISR 2.9 BSD		9.5
VAX 11/750			SV.2			9.4
VAX 11/750			4.3 BSD			9.0
Sun-2 10MHz 68010		4.2 BSD Rel 2.0		9.0
Sun-2 10MHz 68010		4.2 BSD Rel 3.0 	8.7
PE 3220				V7 Workbench		8.5 *a
VAX 11/750			research version 8	8.1
VAX 11/750			4.1 BSD			7.2
Radio Shack 16A			Xenix (v7)		7.2 *a
PC/AT 				Venix 5.2		6.8
ATT7300 Unix PC 10MHz 68010	SV.2			6.4
Bullet286(PC/XT)		Venix 2.0		6.0 *a
Pyramid 90x w/cache		OSx2.3			5.8
VAX 11/780			4.2 BSD			5.7
Plessey Mantra 12.5Mhz 68000	Uniplus SV Release 0	5.5
MicroVax II			Ultrix 1.1		5.2
HP9000-550 3cpu's		HP-UX 5.01		5.1 *c
PC/AT 7.5 Mhz			Venix286 SV.2		5.1
Convex C-1			4.2 BSD			4.6
VAX 11/785			SV.2			4.4
VAX 11/785			4.3 BSD			3.6
Sun-3/75 16.67Mhz 68020		4.2 BSD			3.6
Sun-3/160M-4 16.67Mhz 68020	4.2 BSD Rel 3.0 Alpha	3.6
GEC 63/40			S 5.1			2.7
Gould PN9080			UTX 1.2			2.5
Sperry 7000/40 (aka CCI 6/32)	4.2 BSD			1.9 *b
VAX 8600			4.3 BSD			1.3
VAX 8600			Ultrix 1.2-1		1.1
IBM 3083			UTS SV			1.0 *b
Amdahl 470/V8			UTS/V (SV Rel 2,3)V1.1+ .98 *b

Notes:

*a 
This result obtained with original version of IOCALL which crosses the 512
512 byte buffer boundary, and this version of Unix has buffers of 512 bytes.
This is believed to be the case with all Version 7 and SIII derived OS's. It
will result in a 1001 writes being done which uses significantly more cpu time 
and makes these results comparable only to others with the same problem. See 
discussion above. 2.9 BSD????

*b 
This result was obtained with a system which probably had other programs runningat the time the result was obtained. Submitter is requested to rerun if possiblewhen system is idle. This will improve the result somewhat.

*c
Multi-cpu system. IOCALL was run single thread, which probably did not
utilize all cpu's. This system probably has considerably more power than
is reflected by the result.





-------cut----cut------cut-------------------------------

/*This benchmark tests speed of Unix system call interface
  and speed of cpu doing common Unix io system calls. */

char buf[512];
int fd,count,i,j;

main()
{
 fd = creat("/tmp/testfile",0777);
 close(fd);
  fd = open("/tmp/testfile",2);
  unlink("/tmp/testfile");
for (i=0;i<=1000;i++) {
  lseek(fd,0L,0);		/* add this line! */
  count = write(fd,buf,500);
  lseek(fd,0L,0);		/* second argument must be long */

  for (j=0;j<=3;j++) 
  	count = read(fd,buf,100);
  }
}

dan@rna.UUCP (Dan Ts'o) (12/13/85)

In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes:
>
>	IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK
>The results so far are below. Thanks everybody.
>Send your results to me directly. The benchmark is a "C" program
>which measures Unix kernel performance. 
>
>
>/*This benchmark tests speed of Unix system call interface
>  and speed of cpu doing common Unix io system calls. */
>
>char buf[512];
>int fd,count,i,j;
>
>main()
>{
> fd = creat("/tmp/testfile",0777);
> close(fd);
>  fd = open("/tmp/testfile",2);
>  unlink("/tmp/testfile");
>for (i=0;i<=1000;i++) {
>  lseek(fd,0L,0);		/* add this line! */
>  count = write(fd,buf,500);
>  lseek(fd,0L,0);		/* second argument must be long */
>
>  for (j=0;j<=3;j++) 
>  	count = read(fd,buf,100);
>  }
>}

	Well I don't want to flame too much. Just a few comments.

	Basically, I find it difficult to take this benchmark and the presented
results too seriously.

	- I have trouble understanding the point of the benchmark program. It
just seems bizarre. For 1000 times, it writes 500 bytes at the beginning of the
file and reads 400 of them back, 100 at a time. Because of the buffer cache,
this whole routine just does user/kernel buffer copies, back and forth. If the
performance of the system call interface and user/kernel memory copies is the
what is trying to be measured, then the results may be okay, although strangely
obtained. I don't believe it measures much else in the way of kernel
performance, or system performance.  Its not even something a normal user can
relate to, such as "copying files on a X is twice as fast as Y".

	- It is obviously a single point measurement. It can tell you very
little about how particular applications or the system in general will run.

	- The numbers are way to small to interpret with any substantial
significance (i.e. you should run the benchmark with say 10000, rather than 1000
in the the loop). The difference between the various VAX 11/750 times are,
for example 7.2 to 9.4 . I could be convince there is significance there, but...

	- That a Radio Shack 16A performs 25% better than a VAX 11/750 is cute
but little practical interest (read ridiculous, a benchmark that tells me that
is probably not going to be very useful, are we really to think that an
Amdahl 470/V8 is only 12% faster than a VAX8600, that a Pyramid is slower than
a VAX 11/780).

jbuck@epicen.UUCP (Joe Buck) (12/14/85)

> From: stubbs@ncr-sd.UUCP (Jan Stubbs)
> Date: 12 Dec 85 00:43:23 GMT
> 
> 	IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK
> The results so far are below. Thanks everybody.
> Send your results to me directly. The benchmark is a "C" program
> which measures Unix kernel performance.
>... 
> IOCALL RESULTS:
> 
> SYSTEM				UNIX VERSION		SYSTEM TIME SECONDS
> -----------			----------------	-------------------
>...
> Gould PN9080			UTX 1.2			2.5
> Sperry 7000/40 (aka CCI 6/32)	4.2 BSD			1.9 *b
> VAX 8600			4.3 BSD			1.3
> VAX 8600			Ultrix 1.2-1		1.1
> IBM 3083			UTS SV			1.0 *b
> Amdahl 470/V8			UTS/V (SV Rel 2,3)V1.1+ .98 *b

Eunice is a port of Berkeley 4.1bsd that runs on top of VMS. Thus running
something like IOCALL probably isn't valid. Many important Unix system calls
are hundreds of times slower on Eunice, particularly fork, exec, and filename
lookup. Single-character reads are also extremely slow. Keeping this in
mind, I expected an awful time; I wasn't certain whether the reads and writes
would go to disk or not.

So I ran it anyway. Eunice doesn't distinguish between system time and user
time because VMS doesn't. So I'm reporting system+user time and total time.
Are you ready?

System+User time: 1.9 to 2.1 seconds
Total time: 5.0 seconds

I'm running a Vax 750 with FPA using Eunice 3.2 on top of VMS 3.4.

-- 
Joe Buck				|  Entropic Processing, Inc.
UUCP: {ucbvax,ihnp4}!dual!epicen!jbuck  |  10011 N. Foothill Blvd.
ARPA: dual!epicen!jbuck@BERKELEY.ARPA   |  Cupertino, CA 95014

hammond@petrus.UUCP (Rich A. Hammond) (12/16/85)

> In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes:
> >
> >	IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK
> >... The benchmark is a "C" program which measures Unix kernel performance. 
> 
Dan Tsao writes:
> 	Well I don't want to flame too much. Just a few comments.
> 
> 	Basically, I find it difficult to take this benchmark and the presented
> results too seriously.
> 
> 	- I have trouble understanding the point of the benchmark program.
> ...  It's not even something a normal user can relate to,
> such as "copying files on a X is twice as fast as Y".
> 
> 	- It is obviously a single point measurement. It can tell you very
> little about how particular applications or the system in general will run.
> 
> 	- The numbers are way to small to interpret with any substantial
> significance (i.e. you should run the benchmark with say 10000, rather than 1000
> in the the loop). The difference between the various VAX 11/750 times are,
> for example 7.2 to 9.4 . I could be convince there is significance there, but...
> 
> 	- That a Radio Shack 16A performs 25% better than a VAX 11/750 is cute
> but little practical interest (read ridiculous, a benchmark that tells me that
> is probably not going to be very useful, are we really to think that an
> Amdahl 470/V8 is only 12% faster than a VAX8600, that a Pyramid is slower than
> a VAX 11/780).

a) I agree it doesn't measure everything, but it does check three important
aspects that affect overall system performance: context switch costs,
copying costs, and the cost of finding the buffer in the buffer cache.
b) You want to avoid using the disks, since after all, an IBM PC with a
fast hard disk would probably outperform an 8600 with an RK05.
Thus, the statement "system A copies files twice as fast as system B"
is only useful knowing the I/O configuration (was it massbus/unibus disks
on a Vax?, what type disks, ....).
c) I agree, run the benchmark with more times through the loop on fast
machines.  1000 is probably enough on small machines.
d) The point about the benchmark results is not that they are ridiculous,
but that they might show up areas which need work.  For example, if
you simply port UNIX to a large machine and increase the number of buffers
without thinking about the way the buffer cache works, you are likely to
find that you have, say, 1024 buffers chained into 60 queues.  Whereas on a
pdp11 you had 60 buffers in 60 queues.  Which one will take less time to
find a buffer in?  Raw machine speed alone won't tell you the answer.
Further, lets suppose you built a machine with lots of registers and a
load/store architecture (i.e. RISC, Pyramid).  It turns out the cost of
doing a context switch is higher (save all registers) and the load/store
architecture is at its worst on doing memory to memory copies.  Thus,
a pyramid might very well do worse than a Vax 11/780.  I timed a long to
long copy on a pyramid in user mode, it was only 1.15 * the 11/780.
Given that the pyramid has a slow context switch....
e) The variation among machines of the same model is real, we have two
780's and one is consistently about 5% faster on benchmarks.  We have
two Pyramids and again, one is consistently faster on the same benchmarks.
One should always take +/- 10% on benchmarks to compare machines.

Rich Hammond, Bell Communications Research

larry@geowhiz.UUCP (Larry McVoy) (12/18/85)

In article <761@petrus.UUCP> hammond@petrus.UUCP (Rich A. Hammond) writes:
>> In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes:
>> >
>> >	IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK
>> >... The benchmark is a "C" program which measures Unix kernel performance. 
>> 
>Dan Tsao writes:
>> 	Well I don't want to flame too much. Just a few comments.
>> 
>> 	Basically, I find it difficult to take this benchmark and the presented
>> results too seriously.

I tend to agree with Dan.  I think what people would like to see is a 
benchmark which measures how well Unix, running multiple users, performs
on each machine.  The benchmark would have to measure something that did
not vary widely (such as I/O devices), as those results would only reflect
how much one had spent on the bus & disk.  So, how about this:

The dryhstone benchmarks are considered good tests of the CPU (at least by 
me they are), but don't really test Unix at all (in fact some people run 
them in standalone mode).  How about a version, (called forkstone?), which
runs the dryhstone as 1, 2, 8, and 64 concurrent processes?  This would
show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single
user level, and 64) what happens when you have multiple users.  

It would not test I/O, which is a hard thing to test fairly.  It would get
rid of those Z80 dryhstones (flame, flame) as they're not multi tasking...

I guess if there is any response and nobody wants to do it, I'll hack the
drystones.  I think it would be better if the original author did it, as
{s}he probably can understand that bastardized {C}Ada source.

Please post your views to the net.  I don't want to discuss this via mail.

-- 
Larry McVoy
-----------
Arpa:  mcvoy@rsch.wisc.edu                              
Uucp:  {seismo, ihnp4}!uwvax!geowhiz!geophiz!larry      

"If you are undertaking anything substantial, C is the only reasonable 
 choice of programming language"   -  Brian W. Kerninghan

gemini@homxb.UUCP (Rick Richardson) (12/19/85)

Larry McVoy writes:
>I tend to agree with Dan.  I think what people would like to see is a 
>benchmark which measures how well Unix, running multiple users, performs
>on each machine.  The benchmark would have to measure something that did
>not vary widely (such as I/O devices), as those results would only reflect
>how much one had spent on the bus & disk.  So, how about this:
>
>The dryhstone benchmarks are considered good tests of the CPU (at least by 
>me they are), but don't really test Unix at all (in fact some people run 
>them in standalone mode).  How about a version, (called forkstone?), which
>runs the dryhstone as 1, 2, 8, and 64 concurrent processes?  This would
>show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single
>user level, and 64) what happens when you have multiple users.  
>
>It would not test I/O, which is a hard thing to test fairly.  It would get
>rid of those Z80 dryhstones (flame, flame) as they're not multi tasking...
>
>I guess if there is any response and nobody wants to do it, I'll hack the
>drystones.  I think it would be better if the original author did it, as
>{s}he probably can understand that bastardized {C}Ada source.

I don't think that running multiple dhrystones would measure anything more
than the cost of doing a context switch once every <scheduling granularity>.
Except on a multiple processor machine, the time will be N*1 dhrystone +
M context switches.  There are easier ways to measure the time to do a context
switch.  If you want to measure multi-user response, you've GOT to open the
IO can-of-worms, since they WILL be doing IO.

Rick Richardson, PC Research, Inc. (201) 922-1134
..!ihnp4!houxm!castor!{rer,pcrat!rer} <--Replies to here, not to homxb!!!

P.S. Rheinhold Weicker is the author of Dhrystone.  I apologize for
creating the bastardized {C}Ada source from his original Ada!

larry@geowhiz.UUCP (Larry McVoy) (12/21/85)

>I wrote:
>>I tend to agree with Dan.  I think what people would like to see is a 
>>benchmark which measures how well Unix, running multiple users, performs
>>on each machine.  The benchmark would have to measure something that did
>>not vary widely (such as I/O devices), as those results would only reflect
>
Rick Richardson writes:
>I don't think that running multiple dhrystones would measure anything more
>than the cost of doing a context switch once every <scheduling granularity>.
>Except on a multiple processor machine, the time will be N*1 dhrystone +
>M context switches.  There are easier ways to measure the time to do a context
>switch.  If you want to measure multi-user response, you've GOT to open the
>IO can-of-worms, since they WILL be doing IO.
>
>P.S. Rheinhold Weicker is the author of Dhrystone.  I apologize for
>creating the bastardized {C}Ada source from his original Ada!

Well, ok, so you don't think multiple dhrystones would be interesting.  Hmm...
I do - it would be interesting to know how well they do when there's lots of
them.  You say it's no more than testing context switches implying that
all context switches are equal.  Uh-uh.  For example:  I heard (from guy
harris who I'm sure will correct any inaccuracies) that Sun-3 memory management
is done such that 8 memory mapping context blocks are in memory at all times.
This leads to fast-fast-fast response for active jobs <= 8, but what happens
when you go to 16? 32?

I think we both agree that testing I/O is a mess.  Really hard to get an 
objective and accurate reflection of a machines performance.  I think we
also both agree that what people would like to see is some sort of 
measurement of a machines multi-{user,tasking} capability.  So, I made a 
pass -- what have you to offer instead?

-larry

BTW - sorry about the {C}Ada crack, just my peevishness at not being able
      to decipher it...
-- 
Larry McVoy
-----------
Arpa:  mcvoy@rsch.wisc.edu                              
Uucp:  {seismo, ihnp4}!uwvax!geowhiz!geophiz!larry      

"If you are undertaking anything substantial, C is the only reasonable 
 choice of programming language"   -  Brian W. Kerninghan

jph@whuxlm.UUCP (Holtman Jim) (12/22/85)

> Larry McVoy writes:
> >I tend to agree with Dan.  I think what people would like to see is a 
> >benchmark which measures how well Unix, running multiple users, performs
> >on each machine.  The benchmark would have to measure something that did
> >not vary widely (such as I/O devices), as those results would only reflect
> >how much one had spent on the bus & disk.  So, how about this:
> >
> >The dryhstone benchmarks are considered good tests of the CPU (at least by 
> >me they are), but don't really test Unix at all (in fact some people run 
> >them in standalone mode).  How about a version, (called forkstone?), which
> >runs the dryhstone as 1, 2, 8, and 64 concurrent processes?  This would
> >show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single
> >user level, and 64) what happens when you have multiple users.  
> >
> >It would not test I/O, which is a hard thing to test fairly.  It would get
> >rid of those Z80 dryhstones (flame, flame) as they're not multi tasking...
> >
> >I guess if there is any response and nobody wants to do it, I'll hack the
> >drystones.  I think it would be better if the original author did it, as
> >{s}he probably can understand that bastardized {C}Ada source.
> 
> I don't think that running multiple dhrystones would measure anything more
> than the cost of doing a context switch once every <scheduling granularity>.
> Except on a multiple processor machine, the time will be N*1 dhrystone +
> M context switches.  There are easier ways to measure the time to do a context
> switch.  If you want to measure multi-user response, you've GOT to open the
> IO can-of-worms, since they WILL be doing IO.
> 
> Rick Richardson, PC Research, Inc. (201) 922-1134
> ..!ihnp4!houxm!castor!{rer,pcrat!rer} <--Replies to here, not to homxb!!!
> 
> P.S. Rheinhold Weicker is the author of Dhrystone.  I apologize for
> creating the bastardized {C}Ada source from his original Ada!

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
Results for VAX 8600 running SVR2

1.2   Real
1.1   System
0.0   User

stubbs@ncr-sd.UUCP (Jan Stubbs) (01/02/86)

In article <1035@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes:
> There are easier ways to measure the time to do a context
>switch.  If you want to measure multi-user response, you've GOT to open the
>IO can-of-worms, since they WILL be doing IO.

How about the following as a multiuser benchmark?

iocall&
dhrystone&
iocall&
dhrystone&
etc..... 

Putting the above in a shell file and getting stop watch times on a
dedicated system gives a reasonable approximation of a real system
workload.  If you want physical IO in there as well, add a few cc hello.c&.
If you want to simulate user think time, add a sleep between programs.
Vary the mix of these programs to simulate your prospective use of the machine.
If you really want to get fancy, have one shell file for each simulated user
and measure response time degradation as you add simulated users. IOCALL and thecc invocations would have to be modified to use unique file names or they
will write on top of each other.

We have done this with some success, the problem is getting any two performance
people to agree on what is an appropriate mix.

The AIM benchmarks from AIM Technology (Santa Clara, CA.) attempt to do this 
sort of thing but more comprehensively, for a price and they provide results for many machines as well.

The above opinions are those of the author only.

Jan Stubbs

Jan Stubbs

brent@poseidon.UUCP (Brent P. Callaghan) (01/03/86)

The IOCALL and similar system benchmarks are fine for 
COMPARING computer systems, but if you want some numbers
for ACTUAL degradation in user response time, you have
to bite the bullet and get some real user activity
on those ttys.

Rounding up 10 or 20 "typical" users is not all that
easy, and its even harder to have them perform the
same activity repeatedly over a week or so of performance
measurements.

If you can afford the luxury of another computer, you can
connect it's ttys through null modems to your test system
and run a user process on each line.  Each process reads
a script of things to do.  The script can also specify
"think" times, typing rates etc.  The "user" processes
are untiring, repeatable, and very accurate in response
time measurement. 
-- 
				
Made in New Zealand -->		Brent Callaghan
				AT&T Information Systems, Lincroft, NJ
				{ihnp4|mtuxo|pegasus}!poseidon!brent
				(201) 576-3475

baker@hpfcla.UUCP (01/04/86)

	Here at HP we use a system to test the multiuser  performance of
	our  mini/micro  systems using a scheme much like that described
	in previous  notes.  The system is called termial  emulation and
	performance evaluation (TEPE).

	One or more  HP3000's  are used to simulate  users on the system
	under  test  (SUT).  Scripts  were  developed  to  simulate  the
	behavior  of users in several  different  environments  (program
	development, word processing,  computationally intensive, etc.).
	TEPE  generates  a random  inter-character  typing  delay  and a
	random "think time" between  commands given time ranges for each
	interval.  This random delay coupled with a staggered start time
	for each  "user" on the SUT,  prevents  corrupt  timings  due to
	lock-stepped script execution.  The script is iterated until the
	specified time interval has elapsed.  Measurements  taken at the
	start of the test are  discarded  because  not all "users"  have
	instantiated  a session  and the load has not  stablized.  Tools
	for  collecting,  summarizing  and  plotting  the data have been
	developed to allow performance characterization up to 64 users.

	Obviously, this is not a standard test that many vendors publish
	as  performance  data.  So,  if  HP  is  interested  in  a  true
	multiuser comparision to a competitors system, a machine must be
	obtained  for the  duration  of the test.  TEPE is used  largely
	used within HP for comparing internal products and for tuning.

	Jim Baker
	hpfcla!baker
	Fort Collins System Division

bzs@bu-cs.UUCP (Barry Shein) (01/04/86)

Re: measuring multi-user system performance

The only sane thing I have heard in years is what DEC is doing with
ULTRIX (they say the method originated at AT&T) called Remote Terminal
Emulation (RTE.)

The basic idea is you put null modems on the tty mux's between two
systems (the one being measured and another, spare.) You then run
scripts through the mux from the spare machine and it records various
performance benchmarks (eg. response time, service time etc.)

Scripts can have random time distribution intervals so it needn't appear
that everyone (say, 32 terminal lines) are banging at once, one can go
so far as to simulate breaks etc by random long delays between type-ins.
Measurement could be done for hours or even days.

What is left at that point is to find scripts, one reasonable source
would be to record sessions on a current system (this would require, I
presume, backing up the system, starting recording, restoring onto the
test system and then re-running the script, it's hard, I agree, if it
is at all different then something is not going to work, and recording
things like the keystrokes to a full-screen editor would probably be
invasive, it would slow down the system being recorded and thus
probably alter users' behavior.) On the other hand, I fully believe
some reasonable compromise could be generated by simply observing
a system which could be considered analogous to the multi-user behavior
you are trying to measure.

There is obviously more to this story, I am not pretending to lay out
a methodology here, just an overview, but the idea is provocative, even
just in its obvious simplicity (wanna find out how a system performs when
it is being typed at? type at it!)

Does anyone know of or have the ATT references (I assume there are
Bell Lab TMs about this.)

	-Barry Shein, Boston University

p.s. this info derived from a session at DECUS on ULTRIX Performance
measurement by (?sorry, someone from) the ULTRIX Eng crew.

* Canonical trademark notice.

boston@celerity.UUCP (Boston Office) (01/07/86)

In article <835@bu-cs.UUCP> bzs@bu-cs.UUCP (Barry Shein) writes:
>Re: measuring multi-user system performance
>
>The only sane thing I have heard in years is what DEC is doing with
>ULTRIX (they say the method originated at AT&T) called Remote Terminal
>Emulation (RTE.)
>
>The basic idea is you put null modems on the tty mux's between two
>systems (the one being measured and another, spare.) You then run
>scripts through the mux from the spare machine and it records various
>performance benchmarks (eg. response time, service time etc.)
>
>Scripts can have random time distribution intervals so it needn't appear
>that everyone (say, 32 terminal lines) are banging at once, one can go
>so far as to simulate breaks etc by random long delays between type-ins.
>Measurement could be done for hours or even days.
>
Prime does this as well, using a tool called the Terminal Simulator.

patrick@mcc-db2.UUCP (Patrick McGehearty) (01/08/86)

> Re: measuring multi-user system performance
> 
> The only sane thing I have heard in years is what DEC is doing with
> ULTRIX (they say the method originated at AT&T) called Remote Terminal
> Emulation (RTE.)
> 
> Does anyone know of or have the ATT references (I assume there are
> Bell Lab TMs about this.)
> 
- actually, RTE has been around for a long time.  The earliest
reference I used in my thesis work was:
Lassettre, E.R., and A.L. Scherr.
"Modeling and Performance of the OS/360 Time-Sharing Option (TSO)"
Academic Press, 1967, pages 57-72.
DEC has been used RTE since before 1974 to my knowledge.
I consider RTEs the best way (from the point of view of data quality)
to evaluate the performance of time-sharing environments.
The major drawback is the cost of developing a flexible RTE system.
For an extensive discussion of related issues, see
"Guidelines for the Measurement of Interactive Computer Service
Throughput, Turnaround Time, and Response Time." by M.D. Abrams,
Technical Report, Federal Information Processing Standards
Publication, 1979.