[comp.benchmarks] How do maxed out users compare to users with think times

tonys@pyra.co.uk (Tony Shaughnessy) (06/11/91)

I am running a benchmark that has been supplied by a customer. The benchmark
simulates n users (up to 100) entering transactions into the application
(a business accounting package) as quickly as possible, without any think
times or any simulated typing rate. Obviously this will be a far heavier
load than 100 real users with think times. The question is, how much heavier?

How do I calculate how many real life users these 100 maxed out users simulate?
I have looked at several approaches. Firstly you could look at the number
of transactions per second put through the database. If I was getting 100
transactions per second from my maxed out users, and the 100 real life users
would only put through 10 transactions per second, then each of my users
is looking like 10 real users. The problem is that when the response times
approach the real life think times, or are much greater than real life response
times, this breaks down because the number of transactions per second then
depends on the response time. Obviously as I add more users,response times
are going to get worse, and TPS is going to get lower. However, there is still
a constant load on the system no matter how many users I put on, because the
system is CPU bound all the time, and even a few of my users can use 100%
of the CPU when they are maxed out. Maybe you could look at the response times
that one of my users produces, and the TPS figure for one user, and compare
that instead, but then that doesn't take any multi user effects into account.

In real life, the load on the system will be 
some function of think time (smaller think times - larger load) and 
the response time (larger response time could be either the cause or 
the effect of larger load, or both). However, with my maxed out users, the
load is independent of the response time as a cause, although larger response
times will be an effect of a larger load. This assumes that load is defined
as percentage of CPU utilisation. Maybe I should be looking at other
definitions of load? See below.

If I try and say that in real life, I will do n TPS, where n is

	number of users / (response time + think time)

then how do I calculate how many real users each of my maxed out users
represents, because think time is zero, and response time varies with the
number of users.

I could use another definition of load - the size of the run queue (this
is a Unix system). I could look at the run queue in a real system and
on my system and compare. Unfortunately I do not have a real system to look
at, I only have my system. The only metrics I know are the predicted think
times, target response times, and actual response times. I can work out
predicted TPS, and I know actual TPS.

A further question is - how does my maxed out user load differ from the
equated number of users with think times? If each of my users is like 10 users
with think times, how do 30 of my users compare to 300 real users? If the
transaction rate is the same, then will the load be the same? I don't think so.
For one thing, contention may be less with 30 users, that is contention for 
locks, etc. . Also, overhead from context switches will be less. What other
effects will there be?

I am going to go away and think hard about this one. My mind is in turmoil.

--
Tony Shaughnessy
tonys@pyra.co.uk

	"Pedal away those tag-nut blues"

zink@hare.cdc.com (Ken Zink) (06/12/91)

In article <676640053.AA19240@flaccid> tonys@pyra.co.uk (Tony Shaughnessy) writes:
>
>I am running a benchmark that has been supplied by a customer. The benchmark
>simulates n users (up to 100) entering transactions into the application
> ... without any think times or any simulated typing rate. 
>
>How do I calculate how many real life users these 100 maxed out users simulate?
>
> [ ]
>
>If I try and say that in real life, I will do n TPS, ...
>

I think this is on the right track - represent your system as being capable of
performing n TPS (with the probably silent caveat that there is only about
10 users contending for resources at any one time).  There is no single type
of user!  If the "interators" with your system are themselves intelligent
systems (e.g., programmed PCs), the benchmak may be a correct representation
of the workload.  If the users are clerk/typist performing very repetitive
work, they may achieve a 50 to 60 wpm typing rate (Translate that into
transactions/minute appropriate to whatever they're doing.) with a short
(one or two second) think time.  On the other hand, if the users are
executives, plan on a 2 to 3 wpm typing rate and 20 to 2000 think times.

Again, if the interface is graphic instead of character, the net TPS
is usually higher.

The customer will have to tell you something about the characteristics
of the expected users.  If they can't, I'd suggest presenting your performance
in the form of a set of curves with input rate and think times as the dependent
variables.

Ken

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (06/13/91)

In article <676640053.AA19240@flaccid> tonys@pyra.co.uk (Tony Shaughnessy) writes:

| How do I calculate how many real life users these 100 maxed out users simulate?

  Answer 1, you probably can't make any nice transformation, you have to
add delay times to the input stream and feed it with another system
through multiple serial or socket connections. Anything else will simply
be a guess, pick a number from 1.5 to ten and you'll be right sometime.
See below.

| In real life, the load on the system will be 
| some function of think time (smaller think times - larger load) and 
| the response time (larger response time could be either the cause or 
| the effect of larger load, or both). However, with my maxed out users, the
| load is independent of the response time as a cause, although larger response
| times will be an effect of a larger load. This assumes that load is defined
| as percentage of CPU utilisation. Maybe I should be looking at other
| definitions of load? See below.

  You have two times, not one. One is think time, used to decide what to
type. The other is type time, varies with the typist *and* the
information. The average user types his/her password and userid faster
than anything else they enter.

  Now, the think time (time 1) depends on what's being asked (obvious)
and system response (ah-ha!). That is, when response time gets worse
than about 200ms attention slips a hair. If your load shows that
response is above 1 sec, add 1-2 sec to think time, and after 10-15 sec
the mind drops completely out of gear. Thus, as your load increases all
of a sudden your productivity totally goes to hell.

  This isn't my opinion, IBM did a very serious study of this issue back
in the late 70's, and since neither humans nor seconds have changed much
I bet it's still valid (and no I can't find it any more, we used it for
sizing for a while).

  The plot of operator response vs system response is a step function,
with the last step looking somewhat like a pie falling off a table.

| I am going to go away and think hard about this one. My mind is in turmoil.

Justly so! Doing a good simulation requires the key fake routine to know
how long it is between the last ENTER and the next time the system is
ready for input. Then the time1 factor is adjusted to be realistic.

The important thing here is that you have realized there is a problem.
I have spent hours trying to get management to admit that poor response
hurts productivity by far more than the dialation factor in response.
However, the other side of the coin is that when response is already
bad, you don't lose as much if it gets worse, since everyone is running
in full context switch mode anyway.

  Now you can either simulate it, or estimate it, but at least you know
you're faking it if you estimate, and your estimates are going to be
pessimistic and more accurate.

-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

john@qip.UUCP (John Moore) (06/13/91)

In article <4046@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) writes:
]In article <676640053.AA19240@flaccid> tonys@pyra.co.uk (Tony Shaughnessy) writes:
]
]| How do I calculate how many real life users these 100 maxed out users simulate?

We have been doing extensive benchmarking to predict hotel reservation system
behavior. Our approach is as follows:
  -run the benchmarks with simulated "maxed out users." Measure the
   transaction rate that is sustainable with acceptable response time.
  -observe real users and measure their "think time"
  -ratio to determine numbers of users we can support

Typical numbers for one configuration:
  -max TPS: 10
  -think time per user 27 seconds
  -equivalent max number of users 270

Note that the definition of a "transaction" is critical here. If you are
using a system where every keystroke is a "transaction," things are likely
to get wierd.

By the way, our benchmarking has been testing commercial relational databases
(which must remain unnamed) on Unix super-mini multi-cpu systems. We find
(surprisingly on first blush, reasonable with analyses) that performance is 
limitted by CPU speed, not I/O bandwidth. We also find that performance 
increases linearly with the number of CPU's up to some number, after
which the increase per CPU is much smaller. This implies that there
are critical bottlenecks in the database software.
-- 
John Moore HAM:NJ7E/CAP:T-Bird 381 {ames!ncar!noao!asuvax,mcdphx}!anasaz!john 
USnail: 7525 Clearwater Pkwy, Scottsdale,AZ 85253 anasaz!john@asuvax.eas.asu.edu
Voice: (602) 951-9326        Wishful Thinking: Long palladium, Short Petroleum
Opinion: Support ALL of the bill of rights, INCLUDING the 2nd amendment!
Disclaimer: The opinions expressed here are all my fault, and no one elses.

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (06/14/91)

Bill Davidson writes:

 This isn't my opinion, IBM did a very serious study of this issue back
in the late 70's, and since neither humans nor seconds have changed much
I bet it's still valid (and no I can't find it any more, we used it for
sizing for a while).



Yes.  At least three three articles have been published in the IBM Systems
Journal during the last ~20 years.  One of the authors is Thadhani.  I don't
have the complete set of references, but the Thadhani article is:

"Interactive User Productivity", IBM Systems Journal, vol. 20, Number Four,
1981, pp 407-428.  

Look also at Vol 18, #1, pp 143-163, and vol 13, No 1, pp 1-18.

In addition, the work has been summarized and presented other places, and I
saw a summary of it last spring by an IBMer.  The conclusion is simple:

Subsecond response time does matter.

More or less, most of the work I have seen, including the IBM work, and other
work, points to having keystroke echo less than 100 ms, and trivial response
time (time to respond to a small command, like "ls" of less than 200 ms.
This is the *total* response time, including network delay, so the host
delay often must be even smaller.  

The other conclusion that IBM has come to is that productivity improvements
are, for lack of a better term, "multiplicative".  Fast response time,
good CASE tools, and high quality, well-documented libraries can combine 
to provide MUCH improved productivity over any single improvement.
Which is good: it means that we aren't wasting our time providing better
tools to programmers.  It really makes a big difference in cost, productivity,
and schedules.


-- 
  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster@george.arc.nasa.gov 
  Phone:  415/604-1056                            #include <std.disclaimer> 

croft@csusac.csus.edu (Steve Croft) (06/14/91)

In article <6648@qip.UUCP> john@qip.UUCP (John Moore) writes:
>We have been doing extensive benchmarking to predict hotel reservation system
>behavior. Our approach is as follows:
>  -run the benchmarks with simulated "maxed out users." Measure the
>   transaction rate that is sustainable with acceptable response time.
>  -observe real users and measure their "think time"
>  -ratio to determine numbers of users we can support
>
>Typical numbers for one configuration:
>  -max TPS: 10
>  -think time per user 27 seconds
>  -equivalent max number of users 270

There should be a clarification about these numbers; the 270 would
indicate the number of users the database engine can handle.  It
does not mean you could actually run 270 people since that could exhaust
compute resources.  In other words, it might be harder for the
OS to support 270 user applications than it is for the database
engine to handle 270 transactions every 27 seconds...

Steve Croft
stevec@water.ca.gov

john@qip.UUCP (John Moore) (06/14/91)

In article <1991Jun13.215233.11601@csusac.csus.edu> croft@csusac.csus.edu (Steve Croft) writes:
]>Typical numbers for one configuration:
]>  -max TPS: 10
]>  -think time per user 27 seconds
]>  -equivalent max number of users 270
]
]There should be a clarification about these numbers; the 270 would
]indicate the number of users the database engine can handle.  It
]does not mean you could actually run 270 people since that could exhaust
]compute resources.  In other words, it might be harder for the
]OS to support 270 user applications than it is for the database
]engine to handle 270 transactions every 27 seconds...

Not in our case:
  (1)the database consumes almost all of the CPU used
  (2)the benchmark uses the entire application
  (3)we run the same number of processes on our system no matter how many
   users there are - our number of processes is more closely related
   to the number of processors.
  (4)the OS we are using (Pyramid's) scales linearly with the number
   of user processes to well beyone 270, so even in the absence of (3),
   we still wouldn't run out of CPU (but we might run out of memory).

Sorry about the confustion.
-- 
John Moore HAM:NJ7E/CAP:T-Bird 381 {ames!ncar!noao!asuvax,mcdphx}!anasaz!john 
USnail: 7525 Clearwater Pkwy, Scottsdale,AZ 85253 anasaz!john@asuvax.eas.asu.edu
Voice: (602) 951-9326        Wishful Thinking: Long palladium, Short Petroleum
Opinion: Support ALL of the bill of rights, INCLUDING the 2nd amendment!
Disclaimer: The opinions expressed here are all my fault, and no one elses.

jhd@maths.bath.ac.uk (James Davenport) (06/16/91)

>>   This isn't my opinion, IBM did a very serious study of this issue back
>> in the late 70's, and since neither humans nor seconds have changed much
>> I bet it's still valid (and no I can't find it any more, we used it for
>> sizing for a while).
True indeed. The author of the study was Walt Docherty, IBM Yorktown Heights.
I believe it was published in IBM J. R&D.
James Davenport