crowl@rochester.UUCP (11/02/87)
In article <864@tut.cis.ohio-state.edu> manson@tut.cis.ohio-state.edu (Bob Manson) writes: >Alright, I'm a little pissed over the recent posting of these supposed >"performance figures". Exactly what are these things supposed to mean? Well, >they are compiled programs on different systems that are run and supposed to >represent the speed of the various processors, MIPS etc. We all know that >MIPS is mostly a meaningless figure (Okay, so there's some debate there, but I >see no meaning to how many instructions per second a processor runs-"The >COPYMEM instruction blockmoves the entire memory to disk space, but takes >1,000,000 usec to execute, with an effective MIPS of 1/1000000"). This is in part a terminology problem. John Mashey's performance brief states mips as relative to a VAX 11/780. (Note that the 780 executes about 500,000 instructions per second. Its original one mips figure came from early comparisons with IBM machines claiming one mips.) I suggest we stop calling performance "mips", and start being more specific about what we really mean. I suggest the term "Vax Relative Performance". Unfortunately, that is not enough. We must define what configuration of Vax we use as the baseline. I suggest an 11/780 with full memory and a floating point accellerator. CPU oriented benchmarks should run completely in physical memory. The compiler and operating system also affect performance. To make the base machine highly available, both should be common. I suggest Unix BSD 4.2 as the base operating system and the portable C compiler as the base C language compiler. This allows realistic Unix/C benchmarks like grep, nroff, etc. Note that such benchmarks must have the same source. Putting a better compiler on the Vax will increase its relative performance, so DEC can honestly sell a 780 as having a Vax Relative Performance greater than one. Of course, 780's are becoming scarce. We may have to pick another machine just to keep the base machine readily available. Suggestions? -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
hansen@mips.UUCP (Craig Hansen) (11/03/87)
In article <3806@sol.ARPA>, crowl@cs.rochester.edu (Lawrence Crowl) writes: > The compiler and operating system also affect performance. To make the base > machine highly available, both should be common. I suggest Unix BSD 4.2 as > the base operating system and the portable C compiler as the base C language > compiler. BSD 4.3 has been out long enough to be "common," and has significant improvements in the performance of compiled Fortran code, and some (more modest) improvements in C code. VMS has better compilers, both C and Fortran, in terms of how the resulting code perform. The availability of a common machine is important, however. MIPS has both 4.3 and VMS 780's, but no longer keeps a 4.2 around. It's been obsoleted by 4.3. > Of course, 780's are becoming scarce. We may have to pick another machine just > to keep the base machine readily available. Suggestions? Digital Review uses the roughly equivalent MicroVax as a base machine in their comparisons. Ultimately, the Vax Relative Performance measure will never be much more precise than defining the inch as "Three round and dry barleycorns laid end to end." In fact, it's more like a cubit - it depends on what the leading machine of the day is (or was). The 780 VRP at best provides rough comparisons of machine performance, and by all rights ought to be a moving target. Since we (at MIPS) are clearly making an effort to be using the best compiler system we can muster when benchmarking our machines, we consider it only fair to use the best compiler system that can be made available for the VAX 780. We are trying to be conservative as possible in our performance claims in an environment where inflated and misleading claims are the order of the day, but at times it seems to work against us.... -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
alan@pdn.UUCP (Alan Lovejoy) (11/03/87)
In article <3806@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: >[Re: "mips", let's] start being more specific about what we really mean. >I suggest the term "Vax Relative Performance". > >Unfortunately, that is not enough. We must define what configuration of Vax >we use as the baseline. I suggest an 11/780 with full memory and a floating >point accellerator. CPU oriented benchmarks should run completely in physical >memory. > >The compiler and operating system also affect performance. To make the base >machine highly available, both should be common. I suggest Unix BSD 4.2 as >the base operating system and the portable C compiler as the base C language >compiler. This allows realistic Unix/C benchmarks like grep, nroff, etc. Note > >Of course, 780's are becoming scarce. We may have to pick another machine just >to keep the base machine readily available. Suggestions? The installed base of 11/780's is puny by comparison to the popular micros. Why not choose one of them? I suggest the IBM PS/2 Model 60 as the baseline. Most systems will be faster, so that the relative performance ratios will mostly be above 1. Most benchmarkers will be able to get access to such a machine. Most readers will have some idea of the real performance of the baseline machine. And the machine is likely to be around for a long time, so that familiarity with it will not be short-lived. And even after its demise, people will remember it. I strongly vote against PCC as the compiler. It is known to vary considerably in code-generation quality on different architectures. Not good. It's better if people use the best compiler avaiable for the system, whatever that is. As for operating system, I suggest an average based on standalone, UNIX V and the most widely used OS on that system, if that is not UNIX V. UNIX V is clearly headed for standard status. Leave UNIX out if it's not available. I also strongly urge against UNIX utilities as benchmarks: not everyone uses or can use UNIX (a sacrilege, I know). --alan@pdn P.S. I'm a Mac enthusiast and despise the Intel architecture, but that only makes the idea of having the PS/2 m60 as a baseline even *more* attractive since it would create an atmosphere where everyone would market their systems as being x times faster than Big Blue's pride and joy.
crowl@cs.rochester.edu (Lawrence Crowl) (11/04/87)
In article <1714@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >I strongly vote against PCC as the compiler. It is known to vary considerably >in code-generation quality on different architectures. Not good. It's better >if people use the best compiler avaiable for the system, whatever that is. My intent was that the base architecture/operating system/compiler be constant. This means that some readily available compiler be part of the base system. I wanted to avoid cases where people benchmark relative to two different compilers. I want people to use the compiler appropriate to their machine. >As for operating system, I suggest an average based on standalone, UNIX V and >the most widely used OS on that system, if that is not UNIX V. UNIX V is >clearly headed for standard status. Leave UNIX out if it's not available. I >also strongly urge against UNIX utilities as benchmarks: not everyone uses or >can use UNIX (a sacrilege, I know). I do not understand your reasoning for averaging UNIX and something else. Why not just provide two number, one for each operating system on that machine. I suspect that the OS will have little effect on CPU bound jobs (relative to the compiler anyway). The reason for choosing UNIX utilities is that those people interested in a UNIX box have more realistic measures of performance. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
hansen@mips.UUCP (Craig Hansen) (11/05/87)
In article <3907@sol.ARPA>, crowl@cs.rochester.edu (Lawrence Crowl) writes: > My intent was that the base architecture/operating system/compiler be constant. > This means that some readily available compiler be part of the base system. I > wanted to avoid cases where people benchmark relative to two different > compilers. I want people to use the compiler appropriate to their machine. Unfortunately, the base architecture/operating system/compiler, from a competitive point of view, isn't constant. Each prospective customer has a set of machines and applications that are relevant to them. VAX (TM DEC) VMS (TM DEC) users generally stick up their noses at VAX BSD 4.2-relative measurements, because they know that the BSD compilers are inferior. The VAX 780 is fast becoming an irrelevant machine, because faster/cheaper VAXes are available. UNIX V (TM ATT) comes with a file-system that's only good for coffee breaks and lunch hours, so file system performance relative to an off-the shelf sys-V machine isn't meaningful....you get the idea. When you're talking about (decimal) orders of magnitude of performance above a VAX 780, the two machines being compared aren't in the same regime anymore. (Remember that the VAX 780 is approaching its tenth anniversary for goshsakes.) The real problem is that the characteristics of the architecture are going to be different for 10-20-50 MIPS machine than they are for a VAX 780, in terms of cache and memory system design, pipelining, compiler optimizations, register windows, etc., so that the selection of programs to run on the machine heavily influences the performance ratio. We've all seen how much Dhrystone overestimates the performance of "10 MIPS" RISC machines, compared to larger, more realistic workloads. The only good thing I'll say about the VAX 780 is that its scalar floating point performance (about a MegaDoubleWhetstone) is fairly well-balanced against its scalar integer performance (about 1 MIPS). Thus, if you build a machine in which FP applications and integer applications both perform about the same V.R.P., you'll have a reasonably well-balanced machine. As to standarizing on a single compiler/OS, remember that the company producing the base architecture has an interest in making their machines look competitive. When competitors say their machine is 10X a VAX 780, when using trussed up benchmarks and a markedly inferior compiler/OS on the VAX, DEC should by all rights be screaming bloody murder. Should DEC claim that their VAX 780 is 1.5 times faster than their VAX 780? -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
reiter@endor.harvard.edu (Ehud Reiter) (11/05/87)
In article <881@mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes: >As to standarizing on a single compiler/OS, remember that the company >producing the base architecture has an interest in making their machines >look competitive. When competitors say their machine is 10X a VAX 780, when >using trussed up benchmarks and a markedly inferior compiler/OS on the VAX, >DEC should by all rights be screaming bloody murder. Should DEC claim that >their VAX 780 is 1.5 times faster than their VAX 780? At one time, I was convinced that there was a well-defined "inflation" pattern in MIPS as you went from big computer company to little computer company. So, for example, an IBM "1 MIPS" machine would have the same "performance" as a DEC "2 MIPS" machine, which had the same performance as a SUN "4 MIPS" machine, which had the same performance as an "8 MIPS" machine from brand X start-up computer company. Each company in the hierarchy would ignore its smaller competitors (as being beneath its dignity to comment on), and proudly claim that it had a large price/performance advantage over its larger competitors. Today, as the computer industry finally starts being somewhat competitive (as opposed to being a monopoly by you know who), I think the big companies (IBM, DEC) are being forced to be a bit more sophisticated in their marketing, and stress things customers really care about, like software, reliability, peripherals, and performance in specific applications (usually floating-point or I/O intensive applications, not "integer crunching" ones). It's mainly the little companies (and university people) who still go around bragging about magic numbers which they pull from thin air and call "MIPS". The problem with MIPS is that attempts to measure "integer crunching" performance, and (a) It is impossible to summarize "integer crunching" performance in one number. (b) In any case, not many customers care about integer crunching performance. So, I think that attempts to give good definitions for "MIPS" will remain fairly academic exercises, of little relevance to the "real world" of computing. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) reiter@harvard.harvard.EDU (new ARPA)
crowl@cs.rochester.edu (Lawrence Crowl) (11/05/87)
In article <881@mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes: >When competitors say their machine is 10X a VAX 780, when using trussed up >benchmarks and a markedly inferior compiler/OS on the VAX, DEC should by all >rights be screaming bloody murder. Should DEC claim that their VAX 780 is 1.5 >times faster than their VAX 780? DEC can and should claim that their 780/VMS/vcc configuration is 1.5 times the (hypothetical) standard 780/4.2/pcc configuration. DEC may also pick the benchmarks that show their machine in its best light and publish those figures. Remember, that a Vax Relative Performance figure must be taken in the context of the benchmarks used to measure the performance. If someone does not feel the benchmark set is fair to their machine, they are free to use another. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (11/05/87)
In article <3806@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: [ ... ] |Unfortunately, that is not enough. We must define what configuration of Vax |we use as the baseline. I suggest an 11/780 with full memory and a floating |point accellerator. CPU oriented benchmarks should run completely in physical |memory. | |The compiler and operating system also affect performance. To make the base |machine highly available, both should be common. I suggest Unix BSD 4.2 as |the base operating system and the portable C compiler as the base C language |compiler. This allows realistic Unix/C benchmarks like grep, nroff, etc. Note |that such benchmarks must have the same source. Putting a better compiler on |the Vax will increase its relative performance, so DEC can honestly sell a 780 |as having a Vax Relative Performance greater than one. I think this depends on what you want to test; if you want to run 4.3BSD it's a good way to test, if you want to know how fast your C and FORTRAN programs will run, you should use the best compilers, etc, if that's what you want to know. I suspect it is in most cases. Programs which measure the raw speed of the hardware will give results which often don't match the high level language results. This doesn't imply that either is wrong, but that you have to know what you want to measure. I have a benchmark suite which I use for UNIX (about 70 machines so far), and I run with the default compiler and whatever you get with "-O" for an option. I may repeat with other optimization options if available, and often see a major change in performance, not always for the better. Among other things, I measure the highest scalar speed available from C for short, long, float, and double. I measure speed of transcendental functions and the time to do a compare and branch for integer and float. I do a turing machine simulation, grey to binary and binary to grey. If I have the machine to myself I run multitasking benchmarks, and if it has vector capability I test that also. I look at the compile speed also, and disk performance (to see what I get by using a "better" drive). What this suite tells me is the profile of capability for the machine. There is no one number I can find which is a meaningful index of performance, and if pressed I use the realtime to run the entire test, relative to a VAX 11/780. This is as meaningful as any other one number. I suggest that benchmarks using the "standard" are only valid if you are testing the machine performance rather than the typical time it takes to do things using the best tools on a system. Even then the PCC performance varies from machine to machine in quality, etc. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (11/05/87)
In article <1714@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: [ ... ] |The installed base of 11/780's is puny by comparison to the popular |micros. Why not choose one of them? I suggest the IBM PS/2 Model 60 as |the baseline. Most systems will be faster, so that the relative |performance ratios will mostly be above 1. Most benchmarkers will be |able to get access to such a machine. Most readers will have some idea |of the real performance of the baseline machine. And the machine is |likely to be around for a long time, so that familiarity with it will |not be short-lived. And even after its demise, people will remember it. I like your idea. However the 60 is pretty low as a starting point, and very limited. Perhaps a model 80 would be better, since it represents something current practice. Perhaps running AIX or Xenix/386, since the 32 bit performance is about double the 16 bit performance (as I measured it, using Xenix/[23]86). This also allows benchmarks on paging, which the 60 doesn't. |As for operating system, I suggest an average based on standalone, |UNIX V and the most widely used OS on that system, if that is not |UNIX V. UNIX V is clearly headed for standard status. Leave UNIX out |if it's not available. I also strongly urge against UNIX utilities as |benchmarks: not everyone uses or can use UNIX (a sacrilege, I know). I guess I would rather see the most common UNIX version, rather than specifying something which raises strong feelings. For some machines I would like to see all UNIX versions, such as the RT/PC or VAX. I would rather see both sets of numbers, since I probably will have to choose one or the other. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
mash@mips.UUCP (John Mashey) (11/06/87)
In article <3113@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >So, I think that attempts to give good definitions for "MIPS" will remain >fairly academic exercises, of little relevance to the "real world" of computing. Unfortunately, one can do the whole song-and-dance about a) No one number is enough b) Benchmark the real applications c) Individual machines vary a lot and somebody who hearsthat, and intellectually accepts that, will STILL then ask you: "But how many mips is it?" In particular, the trade press ends up doing this by default, as they end up showing mips-ratings, even in the same issues that contain good explanations of why this is only a gross approximation. Telling an accurate story takes a tremendous amount of work, and is nontrivial to understand; that's why people want mips-ratings. sigh. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (11/06/87)
In article <3113@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: |The problem with MIPS is that attempts to measure "integer crunching" |performance, and | | (a) It is impossible to summarize "integer crunching" performance |in one number. | (b) In any case, not many customers care about integer crunching |performance. Here I don't feel that you are correct... machine usage seems to fall into two categories of user, the "number crunchers" who need f.p. performance, and the rest of the software development, word processing, E-mail, record keeping world. The speed of integer arithmetic is *very* important to most groups. One of the things I have noted in my own benchmarking is that the one thing which best predicts the performance overall is the integer test and branch time (usual disclamers about no one number), and that machines which do well in "transient response" are more pleasant to use. Transient response is the time to do little things, like ls, cat, etc. The RT/PC scored very well on that, and even though it was not competitive with a Sun overall, it was more pleasant to use. <<<< all my own opinions >>>> -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
reiter@endor.harvard.edu (Ehud Reiter) (11/06/87)
In article <884@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >Unfortunately, one can do the whole song-and-dance about > a) No one number is enough > b) Benchmark the real applications > c) Individual machines vary a lot >and somebody who hearsthat, and intellectually accepts that, will STILL >then ask you: "But how many mips is it?" In particular, the trade press >ends up doing this by default, as they end up showing mips-ratings, >even in the same issues that contain good explanations of why this is only >a gross approximation. There's no question that the trade press is in love with MIPS, and that people who should know better still ask about MIPS. The point, though, is that since it is impossible to define a single figure that measures performance, we, the (enlightened?) readers of comp.arch, should not waste our time trying to do so. We should also realize that MIPS do have one great advantage over newer and more "scientific-sounding" performance measures, and that is that since most people do realize that there is something funny about MIPS, they take MIPS figures with a large grain of salt. I have a feeling that this is less true of the newer and more "scientifically defined" benchmarks like Dhrystone, which many people take much more seriously than MIPS, even though Dhrystone suffers from the same fundamental problem as MIPS that single figures are meaningless (not to mention Dhrystone's numerous technical difficulties, which have been discussed at length on comp.arch). Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) reiter@harvard.harvard.EDU (new ARPA)
jss@hector.UUCP (Jerry Schwarz) (11/06/87)
In article <3113@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: > >Today, as the computer industry finally starts being somewhat competitive >(as opposed to being a monopoly by you know who), I think the big companies >(IBM, DEC) are being forced to be a bit more sophisticated in their >marketing, and stress things customers really care about, like software, >reliability, peripherals, and performance in specific applications (usually >floating-point or I/O intensive applications, not "integer crunching" ones). Maybe I'm wrong, but it has been my impression that IBM has always (at least 35 years) had very sophisticated marketing strategies, and have usually (again over the course of 35 years) stressed things other than raw speed. Jerry Schwarz
henry@utzoo.UUCP (Henry Spencer) (11/08/87)
> ... UNIX V (TM ATT) comes with a file-system that's only good for > coffee breaks and lunch hours... Whereas Berklix comes with a file-system that's only good for producing big numbers on *single-user* benchmarks... :-) -- Those who do not understand Unix are | Henry Spencer @ U of Toronto Zoology condemned to reinvent it, poorly. | {allegra,ihnp4,decvax,utai}!utzoo!henry
reiter@endor.harvard.edu (Ehud Reiter) (11/10/87)
In article <7786@steinmetz.steinmetz.UUCP> davidsen@crdos1.UUCP (bill davidsen) writes: >In article <3113@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >|The problem with MIPS is that attempts to measure "integer crunching" >|performance, and ... >| (b) In any case, not many customers care about integer crunching >|performance. > >Here I don't feel that you are correct... machine usage seems to fall >into two categories of user, the "number crunchers" who need f.p. >performance, and the rest of the software development, word processing, >E-mail, record keeping world. The speed of integer arithmetic is *very* >important to most groups. My own impression has been that people doing the above tasks care more about I/O performance (speed of terminals, disks, etc) than CPU performance. The main exception to this is that people want the OS to quickly perform the bookkeeping overhead associated with doing I/O (many systems are more limited by the speed at which the OS can do the bookkeeping than the actual speed at which the I/O devices perform). However, since different machines have vastly different OS's, MIPS ratings (or any hardware-only performance measure) gives very little insight into how quickly machines can do I/O bookkeeping. Machine X could perform integer adds at half the speed of machine Y, but still be able to maintain an I/O throughput rate that was ten times as high as machine Y, simply because X had an OS which was much better suited to the application. So, what I'm saying is that what I believe most people care about is not raw hardware integer compute speed, but the speed at which the OS can perform its chores, and there is not necessarily much correlation between the two numbers. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) reiter@harvard.harvard.EDU (new ARPA)
radford@calgary.UUCP (Radford Neal) (11/11/87)
In article <881@mips.UUCP>, hansen@mips.UUCP (Craig Hansen) writes: > As to standarizing on a single compiler/OS, remember that the company > producing the base architecture has an interest in making their machines > look competitive. When competitors say their machine is 10X a VAX 780, when > using trussed up benchmarks and a markedly inferior compiler/OS on the VAX, > DEC should by all rights be screaming bloody murder. Should DEC claim that > their VAX 780 is 1.5 times faster than their VAX 780? This looks like a good argument *for* standardizing on an obsolete machine You've *got* to standardize all aspects of the base system, otherwise numbers today aren't comparable with numbers last year. An obsolete machine's software won't be improving much, so this won't mislead people. And the obsolete machine's manufacturer won't care what people think of its performance any more. So the ideal benchmark standard is an obsolete machine (with static operating system and compiler), that is nevertheless still common and likely to remain so. The Vax/780 and the IBM PC are about as close as one is likely to get, but the IBM PC has a "typical" architecture only if you restrict it to "small model" programs, which is not tolerable. Thus the current use of a Vax/780 seems as good as we can hope for, though standardizing on some old operating system (e.g 4.2) is likely to be a problem (who wants to keep it around just for benchamarking?) Radford Neal The University of Calgary