casey@admin.cognet.ucla.edu (Casey Leedom) (08/14/88)
In article <282@quintus.UUCP> ok@quintus () writes: > > ... kLI/s are defined solely by that particular benchmark, by the way. > Other benchmarks may be "procedure calls per second", but _only_ Naive > Reverse gives "logical instructions". I believe "kLI/s" is 1000's of Logical Inferences per second (but I may be wrong of course). This is normally abrieviated as kLIPS. Really fast PROLOG machines are rated in mLIPS (10^6 LIPS). LIPS is a logical analog to the floating point FLOPS metric. Note that both LIPS and FLOPS are useful measures while MIPS is of debatable use - at least until the industry can standardize the measure and stop gloming a system's entire performance profile into a single number. Casey
ok@quintus.uucp (Richard A. O'Keefe) (08/15/88)
In article <15221@shemp.CS.UCLA.EDU> casey@cs.ucla.edu.UUCP (Casey Leedom) writes: >In article <282@quintus.UUCP> ok@quintus () writes: >> >> ... kLI/s are defined solely by that particular benchmark, by the way. >> Other benchmarks may be "procedure calls per second", but _only_ Naive >> Reverse gives "logical instructions". > > I believe "kLI/s" is 1000's of Logical Inferences per second (but I may >be wrong of course). This is normally abrieviated as kLIPS. Really fast >PROLOG machines are rated in mLIPS (10^6 LIPS). Right, it is "logical _inferences_ per second". Silly me. There is a single specific benchmark, called naive reverse, which happens to do 496 procedure calls. To determine the kLI/s rating, you run this benchmark N times, for some large N. If it takes T seconds, you report (496*N)/T as the LIPS rating. When you are benchmarking, it is necessary to be precise about what you have measured. Some people have taken any old small program and reported the number of procedure calls it did per second as LIPS. It simply won't *DO*! Procedures can have different numbers of arguments, and the cost of head unification can range from next to nothing to exponential in the size of the arguments. Don't get me wrong: Naive Reverse is not a specially good benchmark. (Think about the fact that native code for it fits comfortably into a 68020's on-chip instruction cache...) But using *different* benchmarks when talking about different machines can't yield better comparisons! There is a more comprehensive set of micro-benchmarks which was described in AI Expert last year. Instead of a single LI/s rating, it would be better to report an "AIE spectrum". But even the best micro-benchmarks don't always predict the performance of real programs well, for reasons explained in the SmallTalk books, amongst others. One of the things which makes the DLM article credible is that it reports figures for several other (small) benchmarks (I surmise that "quickstart" really meant "quicksort"). I have seen enough papers that report really high performance where the system described seems never to have run anything _but_ Naive Reverse. At least the DLM is realer than that!
eugene@eos.UUCP (Eugene Miya) (08/16/88)
In article <292@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Don't get me wrong: Naive Reverse is not a specially good benchmark. I see you came from prolog and cross posted to arch. >There is a single specific benchmark, called naive reverse, which happens >to do 496 procedure calls. To determine the kLI/s rating, you run this >benchmark N times, for some large N. If it takes T seconds, you report >(496*N)/T as the LIPS rating. I've stated this many times in comp.arch, and I'll repeat this once for the prolog community benefit. Measurement of repetition isn't equivalent to repetition of measurement on a computer. Cache, paging, and optimization conspire against oversimplistic measurements of this type. >When you are benchmarking, it is necessary to be precise You said it all. I've been trying to find out what "really constitutes a Logical Instruction" As far as I can tell, it's totally arbitrary whereas Instructions and Operations tend to correspond to discrete states (barring instruction pipelining, yes yes....). (Yes I have Gabriel's thesis and others). Your keyword about measuring prolog is "naive." This isn't a putdown, but the prolog community will have to recognize some of these problems. Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene "Send mail, avoid follow-ups. If enough, I'll summarize."
ok@quintus.uucp (Richard A. O'Keefe) (08/20/88)
In article <1303@eos.UUCP> eugene@eos.UUCP (Eugene Miya) writes: >In article <292@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >>Don't get me wrong: Naive Reverse is not a specially good benchmark. > >I see you came from prolog and cross posted to arch. > Misemphasis: it was a joint posting to both groups because I thought the original article (comments on a paper about a new architecture) were relevant to both groups. >I've stated this many times in comp.arch, and I'll repeat this once >for the Prolog community's benefit. Measurement of repetition >isn't equivalent to repetition of measurement on a computer. Cache, >paging, and optimization conspire against oversimplistic >measurements of this type. We *know* that. But we *also* know that if you measure one iteration of a typical micro-benchmark it falls below the resolution of the clock. Running nrev a few thousand times is to get a figure which can be distinguished from clock quantisation. Let me summarise my position here: (1) A paper describing a machine called DLM appeared in FGCS. (2) The paper compared the DLM with a 68020 using *different* micro benchmarks (3) one of which is the official definition of LI/s, but (4) neither of which is good. (5) Because of (2) and other reasons, it appears that the special-purpose machine is not as much of an advance over conventional chips as it seems. >I've been trying to find out what "really constitutes a Logical >Instruction" As far as I can tell, it's totally arbitrary whereas >Instructions and Operations tend to correspond to discrete states >(barring instruction pipelining, yes yes....). (Yes I have Gabriel's >thesis and others). Gabriel's thesis? Do you mean Tick's? Logical Inferences per second are defined by the naive reverse benchmark and by nothing else. The place to look for the definition is Warren's thesis. The term has been *mis*applied as "number of procedure calls per second" which can be almost anything depending on what code you run. >Your keyword about measuring prolog is "Naive." This isn't a putdown, >but the Prolog community will have to recognize some of these problems. >Another gross generalization from Eugene Miya. Well, if it isn't a putdown, it'll do until a real one comes along. We *know* that these micro-benchmarks don't extrapolate well, but they're the best we've got. ("Naive" refers to the algorithm, by the way.) Some constructive advice about how to structure a benchmark suite to compare implementations of a high-level language on a range of 32-bit workstations would be really welcome, and if the comp.arch community is really so sophisticated, such advice should be forthcoming, no?