roseman@ccu.UManitoba.CA (roseman) (08/09/89)
I'm involved in doing a bit of performance benchmarking work on a couple of Ada compilers. I'm wondering if there is anyone else out there doing similar kind of work. I've also got a few questions. Right now we're using the PIWG (Performance Issues Working Group) test suite to do the tests. This seems to be "the" standard Ada test suite out there. I'm wondering first off if people are using other tests, and if so, what? (Furthermore, where did they come from, why are you using them, etc?) Second, the machine we're running PIWG on is a Unix based system. With the PIWG, we're running into some problems. The tests themselves are very very short - total times including iteration is well under a second for most of them! The problem with that is its almost impossible to get any accurate measurements that way - you've got all the little Unix daemons popping in and out and using up some time. We have tests which vary from 0 usecs to almost 4 (per iteration that is), which is most unacceptable! What can you do to correct things? Run tests 25 (e.g.) times and take the best? The average? Increase the iteration count to some ridiculous amount to try to compensate? I guess this is getting into general benchmarking procedures (any digests or lists devoted to this out there?).. but how are tests like this supposed to be used? Surely, this must be an old problem. You have various companies out there who are publishing PIWG numbers for their compilers, but what are they measuring? Is it reasonable to measure on a souped-up system (e.g. high priority, kill the daeomons), or do people want to see results on a real Unix system? If anyone has any answers, comments, pointers to any papers covering these issues, etc. I would very much like to hear from you. I ask only that if you post to the list that you also send a copy to my userid directly, as my time is so tight these days I can't keep up with the digest. Thanks. Mark Roseman, University of Manitoba <ROSEMAN@ccu.UManitoba.CA> or <ROSEMAN@UOFMCC.BITNET>
pfw@aber-cs.UUCP (Paul Warren) (08/10/89)
In article <275@ccu.UManitoba.CA>, roseman@ccu.UManitoba.CA (roseman) writes: > The problem with that is its almost impossible to get any accurate > measurements that way - you've got all the little Unix daemons popping in > and out and using up some time. We have tests which vary from 0 usecs > to almost 4 (per iteration that is), which is most unacceptable! > > What can you do to correct things? Run tests 25 (e.g.) times and take > the best? The average? Increase the iteration count to some ridiculous > amount to try to compensate? One good way of doing avoiding all the little daemons is to suspend "cron", which is responsible for getting them running periodically. Also, things like running the tests during periods of low use, especially if your machine is part of a network. Large iteration counts also help. How are you measuring the times? Are you using the unix command "time", or are you using CALENDAR.CLOCK or some other means? When measuring times under Unix, you can learn quite a lot from the user cpu, the system cpu and the elapsed time. A colleague of mine wrote a package for timing portions of code. You declare a marker for every fragment, and make a call to start recording the time and another one to stop at the appropriate points. We used it to measure "tools" written on top of CAIS, and both the "tools" and the CAIS implementation were heavily instrumented using this package. If anyone is interested I'll post it. -- Paul Warren, tel +44 970 622439 Computer Science Department, pfw%cs.aber.ac.uk@uunet.uu.net (ARPA) University College of Wales, pfw@uk.ac.aber.cs (JANET) Aberystwyth, Dyfed, United Kingdom. SY23 3BZ.
roseman@ccu.UManitoba.CA (roseman) (08/12/89)
Since posting that message, a few additional tests have been run: Most of the Unix daemons were killed (using "killall") -- no effect The iteration counts were bumped way up -- marginal effect So we're still stumped in other words. For information's sake, the timings are done by 9interfacing to the "C" routine "times" within the program. This should be much more accurate than either using the Unix time command or A the CALENDAR package. A
lm@logm1.logm.se ("Lennart M}nsson") (08/30/89)
Mark Roseman, University of Manitoba, <ROSEMAN@ccu.UManitoba.CA> or <ROSEMAN@UOFMCC.BITNET>, asks for methodological advice when doing performance benchmarking. I did some heavy testing a couple of years ago. It was on a VAX and not on UNIX but the problem with disturbing processes seems to be analog. My conclusion was that the only comparable measure is achieved when trying to eliminate the measurment noise as far as possible. Thus I always runned the tests on night time when no other people was supposed to use the system. I ALSO CONTROLLED THAT IT WAS IN FACT SO. Secondly I turned of all possibly interfering process, which seems to correspond you turning of all deamons running around. (If they are needed for your Ada program to run, well then they should be part of the mesaure) Thirdly I ran each test three times each night checking that I got stable measurment. ( I made up a script calculating means and issuing warnings for to big deviations, it also made printouts of the reports, be sure not to let them interfere with the next test, printing is often done in parallell with other program execution) Above this I repetead the tests another night to check long time stability. Now, if you only see to that resorces are as memory usage etc is returned to system for each test, you can fairly well rely that you mesure the capacity of your Ada code, under no disturbancies. What you really is interested in a how it perform on a normally loaded system. But then you can't compare figures between different implementations if you can't set up a standard load situation or do so many iterations that you can count the mean disturbancy as a standard situation. Testing is not easily done ! Lennart Mansson Telelogic AB, Box 4148, S-203 12 MALM\, SWEDEN Phone: +46-40-25 46 36 Fax: +46-40-25 46 25 E-mail: lm@logm.se ------