[comp.lang.ada] Performance Benchmarking

roseman@ccu.UManitoba.CA (roseman) (08/09/89)

I'm involved in doing a bit of performance benchmarking work on a couple
of Ada compilers.  I'm wondering if there is anyone else out there doing
similar kind of work.  I've also got a few questions.

Right now we're using the PIWG (Performance Issues Working Group) test
suite to do the tests.  This seems to be "the" standard Ada test suite
out there.  I'm wondering first off if people are using other tests,
and if so, what?  (Furthermore, where did they come from, why are you
using them, etc?)

Second, the machine we're running PIWG on is a Unix based system.  With
the PIWG, we're running into some problems.  The tests themselves are
very very short - total times including iteration is well under a second
for most of them!

The problem with that is its almost impossible to get any accurate
measurements that way - you've got all the little Unix daemons popping in
and out and using up some time.  We have tests which vary from 0 usecs
to almost 4 (per iteration that is), which is most unacceptable!

What can you do to correct things?  Run tests 25 (e.g.) times and take
the best?  The average?  Increase the iteration count to some ridiculous
amount to try to compensate?

I guess this is getting into general benchmarking procedures (any digests
or lists devoted to this out there?).. but how are tests like this supposed
to be used?  Surely, this must be an old problem.  You have various companies
out there who are publishing PIWG numbers for their compilers, but what
are they measuring?  Is it reasonable to measure on a souped-up system
(e.g. high priority, kill the daeomons), or do people want to see results
on a real Unix system?

If anyone has any answers, comments, pointers to any papers covering these
issues, etc. I would very much like to hear from you.  I ask only that if
you post to the list that you also send a copy to my userid directly, as
my time is so tight these days I can't keep up with the digest.  Thanks.

Mark Roseman, University of Manitoba
<ROSEMAN@ccu.UManitoba.CA> or <ROSEMAN@UOFMCC.BITNET>

pfw@aber-cs.UUCP (Paul Warren) (08/10/89)

In article <275@ccu.UManitoba.CA>, roseman@ccu.UManitoba.CA (roseman) writes:
> The problem with that is its almost impossible to get any accurate
> measurements that way - you've got all the little Unix daemons popping in
> and out and using up some time.  We have tests which vary from 0 usecs
> to almost 4 (per iteration that is), which is most unacceptable!
> 
> What can you do to correct things?  Run tests 25 (e.g.) times and take
> the best?  The average?  Increase the iteration count to some ridiculous
> amount to try to compensate?

One good way of doing avoiding all the little daemons is to suspend
"cron", which is responsible for getting them running periodically.
Also, things like running the tests during periods of low use, especially
if your machine is part of a network.

Large iteration counts also help.  How are you measuring the times?
Are you using the unix command "time", or are you using CALENDAR.CLOCK
or some other means?  When measuring times under Unix, you can learn
quite a lot from the user cpu, the system cpu and the elapsed time.

A colleague of mine wrote a package for timing portions of code.
You declare a marker for every fragment, and make a call to start
recording the time and another one to stop at the appropriate
points.  We used it to measure "tools" written on top of CAIS,
and both the "tools" and the CAIS implementation were heavily
instrumented using this package.  If anyone is interested I'll
post it.



-- 
Paul Warren,				tel +44 970 622439
Computer Science Department,		pfw%cs.aber.ac.uk@uunet.uu.net (ARPA)
University College of Wales,		pfw@uk.ac.aber.cs (JANET)
Aberystwyth, Dyfed, United Kingdom. SY23 3BZ.

roseman@ccu.UManitoba.CA (roseman) (08/12/89)

Since posting that message, a few additional tests have been run:

Most of the Unix daemons were killed (using "killall") -- no effect

The iteration counts were bumped way up -- marginal effect

So we're still stumped in other words.  For information's sake,
the timings are done by 9interfacing to the "C" routine
"times" within the program.

This should be much more accurate than either using the Unix time command or
A

the CALENDAR package.
A

lm@logm1.logm.se ("Lennart M}nsson") (08/30/89)

Mark Roseman, University of Manitoba, <ROSEMAN@ccu.UManitoba.CA>
 or <ROSEMAN@UOFMCC.BITNET>, asks for methodological advice when 
doing performance benchmarking. 

I did some heavy testing a couple of years ago. It was on a VAX and 
not on UNIX but the problem with disturbing processes seems to be 
analog. 

My conclusion was that the only comparable measure is achieved when 
trying to eliminate the measurment noise as far as possible. Thus I 
always runned the tests on night time when no other people was supposed
to use the system. I ALSO CONTROLLED THAT IT WAS IN FACT SO. Secondly
I turned of all possibly interfering process, which seems to correspond 
you turning of all deamons running around. (If they are needed for your 
Ada program to run, well then they should be part of the mesaure)

Thirdly I ran each test three times each night checking that I got 
stable measurment. ( I made up a script calculating means and issuing 
warnings for to big deviations, it also made printouts of the reports, 
be sure not to let them interfere with the next test, printing is often 
done in parallell with other program execution) Above this I repetead 
the tests another night to check long time stability.  

Now, if you only see to that resorces are as memory usage etc is returned to 
system for each test, you can fairly well rely that you mesure the capacity
of your Ada code, under no disturbancies. What you really is interested in
a how it perform on a normally loaded system. But then you can't compare
figures between different implementations if you can't set up a standard
load situation or do so many iterations that you can count the mean 
disturbancy as a standard situation.

Testing is not easily done !

Lennart Mansson
Telelogic AB, Box 4148, S-203 12 MALM\, SWEDEN 
Phone:  +46-40-25 46 36
Fax:    +46-40-25 46 25
E-mail: lm@logm.se

------