despoix@imag.imag.fr (Frederic DESPOIX) (02/20/90)
Newsgroups: comp.sys.m88k Subject: DG's use of m88k Expires: References: Sender: Reply-To: despoix@imag.fr (Frederic DESPOIX) Followup-To: Distribution: world Organization: IMAG-LGI, University of Grenoble, France Keywords: I am sure most of you have read Alan LOVEJOY's (alan@oz.nm.paradyne.com) latest release of SPEC benchmarks (see comp.arch, Latest SPECmarks). Since I will soon be using a DG 4000, I would like DG specialists to give me reasons why DG 5010 server has so BAD results compared with MIPS 3240 for example, and with other m88k based machines, like Motorola Delta 8864SP. One reason someone gave us here was the VERY SMALL cache (2x16Ko). I must admit I can't understand why not more. Any more reasons ? Allo, rtp ? fred. ------------------------------------------------------------------------------- From: alan@oz.nm.paradyne.com (Alan Lovejoy) Newsgroups: comp.arch Subject: Latest SPECmarks Message-ID: <7377@pdn.paradyne.com> Date: 12 Feb 90 16:23:51 GMT References: <8859@portia.Stanford.EDU> <5190@convex.convex.com> <1850@cbnewsi.ATT.COM> <2938@oakhill.UUCP> <3085@rtmvax.UUCP> Organization: AT&T Paradyne, Largo, Florida In article <3085@rtmvax.UUCP> wbeebe@rtmvax.UUCP (Bill Beebe) writes: >In article <2938@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes: >Something else that's interesting. In the February 7th Microprocessor >Report, page 4, under new SPEC numbers, a Moto system with a 33 Mhz 88K came >up with a 17.8 SPECmark. Congratulations. However, the article goes on to >note that the 88K SPECmark was only 1% over the SPARC's 17.6 SPECmark (as >well as the MIPS). I would be most interested to see SPECmarks for the >Heurikon board (or any other system) running the 040 at 25 Mhz or even 33 >Mhz. SPECmark gcc espresso spice doduc nasa7 ll Moto Delta Model 8612 17.8 18.3 23.0 14.8 12.2 17.5 23.9 MIPS M/2000 17.6 19.1 18.3 12.1 17.6 18.4 23.8 Sparcserver 490 17.6 21.4 16.5 16.4 14.0 19.9 19.5 MIPS RC3260 17.3 18.5 18.0 11.9 17.3 18.2 23.3 Solbourne 5/801 16.3 19.5 16.3 14.9 11.6 16.1 18.2 MIPS RC3240 16.0 15.5 17.7 12.1 15.9 18.1 20.4 Moto Delta 8864SP Departmental Comp. 15.2 17.5 19.4 12.5 10.1 15.2 20.7 HP Appollo DN 10010 13.9 12.5 12.9 11.8 23.0 20.4 6.7 DG AV 6200 Server 12.7 13.4 17.1 11.0 9.2 11.9 17.5 Moto Delta 8864SP 12.2 14.0 15.5 10.0 8.1 12.2 16.5 Sparcstation 330 11.8 13.8 11.6 11.4 9.5 14.0 11.2 DECSystem 5400 11.3 10.9 13.4 8.9 9.7 12.6 13.3 DG AV 5010 Server 10.1 10.9 13.3 8.7 7.1 9.5 13.8 DG AV 310 WKS 9.7 9.9 13.1 8.3 6.9 9.3 13.5 DEC VAX 6000 M450 9.2 5.1 6.5 6.2 6.9 29.1 7.2 Sparcstation 1 8.4 10.7 8.9 8.2 5.0 10.2 9.0 DEC VAX 6000 M410 6.8 5.1 6.5 6.2 6.9 8.2 7.2 SPECmark eqntolt matrix300 fpppp tomcatv CPU/MHz Moto Delta Model 8612 17.8 20.7 21.5 15.3 14.9 88000/33MHz MIPS M/2000 17.6 18.3 13.3 20.4 17.7 R3000/25MHz Sparcserver 490 17.6 17.6 22.5 18.8 12.3 SPARC/33MHz MIPS RC3260 17.3 17.9 13.1 20.0 17.3 R3000/25MHz Solbourne 5/801 16.3 17.2 22.6 17.9 11.8 SPARC/33MHz MIPS RC3240 16.0 17.1 13.8 17.8 13.9 R3000/?? Moto Delta 8864SP Departmental Comp. 15.2 16.0 18.4 14.7 11.6 88000/?? HP Appollo DN 10010 13.9 7.8 9.4 31.4 19.9 PRISM/?? DG AV 6200 Server 12.7 13.6 14.1 10.9 11.0 88000/?? Moto Delta 8864SP 12.2 12.8 14.7 11.7 9.3 88000/?? Sparcstation 330 11.8 12.6 14.7 13.1 8.2 SPARC/?? DECSystem 5400 11.3 12.8 10.1 12.5 10.1 R?000/?? DG AV 5010 Server 10.1 10.7 11.1 8.7 8.9 88000/?? DG AV 310 WKS 9.7 10.5 10.9 8.3 8.3 88000/?? DEC VAX 6000 M450 9.2 6.7 13.3 7.5 20.9 VAX/?? Sparcstation 1 8.4 9.7 11.0 7.8 6.0 SPARC/?? DEC VAX 6000 M410 6.8 6.7 6.5 7.5 7.4 VAX/?? Benchmark numbers are relative to VAX 11/780 = 1.0, so higher numbers are better. The programs gcc, espresso, spice, doduc, nasa7, ll, eqntolt, matrix300, fpppp and tomcatv are real programs, not synthetic or toy codes. The SPECmark is the geometric mean of the ten benchmark programs (the Nth root of the product of the membmers of a list with 10 members). An examination of the numbers would lead one to conclude that the benchmarks are heavily affected by the system as a whole, not just the processor. Such things as cache size/speed, memory speed and disk access speed must contribute significantly to the results. Also, the 88k systems typically have 1/4 as much cache as some of the competing Rx000 and SPARC systems (32k versus 128k), even though the 88k is quite capable of supporting 128k of cache. If someone could supply more information about each benchmark, and each system (cache size, CPU clock rate, etc), that might prove enlightening. ____"Congress shall have the power to prohibit speech offensive to Congress"____ Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. Mottos: << Many are cold, but few are frozen. >> << Frigido, ergo sum. >> ___________________________________________________________________________ DESPOIX Frederic, despoix@imag.fr LGI-DU, B.P. 53x despoix@imag 38041 GRENOBLE cedex, FRANCE uunet.uu.net!imag!despoix Tel : (33)76.51.46.00 ext. 51.43 /------------------------------ Fax : (33)76.44.66.75 | NB : Usual disclaimer applies
tom@ssd.csd.harris.com (Tom Horsley) (02/22/90)
In article <7521@imag.imag.fr> despoix@imag.imag.fr (Frederic DESPOIX) writes: > Benchmark numbers are relative to VAX 11/780 = 1.0, so higher numbers are > better. The programs gcc, espresso, spice, doduc, nasa7, ll, eqntolt, > matrix300, fpppp and tomcatv are real programs, not synthetic or toy codes. > The SPECmark is the geometric mean of the ten benchmark programs (the Nth > root of the product of the membmers of a list with 10 members). If you believe that all these benchmarks are real programs and not toys you have been reading SPEC literature, not the benchmark source. Matrix300 spends 99.99% of its cycles in a single line of code, even Whetstone is a better benchmark than this. Espresso is not much better, spending 90% of its time in the compare routine it passes to qsort(). > If someone could supply more information about each benchmark, and each > system (cache size, CPU clock rate, etc), that might prove enlightening. There are lots of different clock rates/cache sizes/compilers out there. This is the information you are supposed to pay hundreds of dollars to get from SPEC. One thing that is worth noting is that ALL of the floating point benchmarks are strictly double precision, and double precision is somewhat slow on the 88100 architecture implementation because internal paths are only 32 bits wide. There are quite a lot of applications areas that do not need double precision (graphics processing is often a good example, when you only need to map images onto a screen 1000 pixels wide, double precision is overkill). The SPEC benchmark suite needs to have at least one or two single precision benchmarks added to give it a better balance. (I am quite aware that there are also applications requiring double precision, I just want to see some single precision numbers in SPEC, don't tell me I claimed no one need double precision). -- ===================================================================== domain: tahorsley@ssd.csd.harris.com USMail: Tom Horsley uucp: ...!novavax!hcx1!tahorsley 511 Kingbird Circle or ...!uunet!hcx1!tahorsley Delray Beach, FL 33444 ======================== Aging: Just say no! ========================
tom@ssd.csd.harris.com (Tom Horsley) (02/22/90)
In article <TOM.90Feb21110800@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes: > Espresso is not much better, spending 90% of its ^^^^^^^^ > time in the compare routine it passes to qsort(). Stop. Don't send me mail, I know. Espresso doesn't call qsort(). I meant to write that *eqntott* spends 90% of its time in the compare routine. While I am babbling, I might as well throw in a complaint about the xlisp benchmark as well. This would be a much better benchmark (at least for evaluating the compiler used to build xlisp) if the lisp program the benchmark executes exercised more of the xlisp interpreter. Xlisp is indeed a real program, but the benchmark only executes small pieces of it because the input data is a toy lisp program. -- ===================================================================== domain: tahorsley@ssd.csd.harris.com USMail: Tom Horsley uucp: ...!novavax!hcx1!tahorsley 511 Kingbird Circle or ...!uunet!hcx1!tahorsley Delray Beach, FL 33444 ======================== Aging: Just say no! ========================
mash@mips.COM (John Mashey) (02/27/90)
In article <TOM.90Feb21110800@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes: >In article <7521@imag.imag.fr> despoix@imag.imag.fr (Frederic DESPOIX) writes: >> Benchmark numbers are relative to VAX 11/780 = 1.0, so higher numbers are >> better. The programs gcc, espresso, spice, doduc, nasa7, ll, eqntolt, >> matrix300, fpppp and tomcatv are real programs, not synthetic or toy codes. >> The SPECmark is the geometric mean of the ten benchmark programs (the Nth >> root of the product of the membmers of a list with 10 members). >If you believe that all these benchmarks are real programs and not toys you >have been reading SPEC literature, not the benchmark source. Matrix300 >spends 99.99% of its cycles in a single line of code, even Whetstone is a >better benchmark than this. Espresso is not much better, spending 90% of its >time in the compare routine it passes to qsort(). >> If someone could supply more information about each benchmark, and each >> system (cache size, CPU clock rate, etc), that might prove enlightening. > >There are lots of different clock rates/cache sizes/compilers out there. >This is the information you are supposed to pay hundreds of dollars to get >from SPEC. > >One thing that is worth noting is that ALL of the floating point benchmarks >are strictly double precision, and double precision is somewhat slow on the >88100 architecture implementation because internal paths are only 32 bits >wide. > >There are quite a lot of applications areas that do not need double >precision (graphics processing is often a good example, when you only need >to map images onto a screen 1000 pixels wide, double precision is overkill). >The SPEC benchmark suite needs to have at least one or two single precision >benchmarks added to give it a better balance. (I am quite aware that there >are also applications requiring double precision, I just want to see some >single precision numbers in SPEC, don't tell me I claimed no one need double >precision). All these criticisms are well-taken, althought the programs are either real ones, or derived kernels from real ones where the bulk of the time is spent in small loops (that's the way many scientific codes really are). There is no doubt that some of the ones that are there will drop out sooner or later as we find better ones; the point of this exercise was to start getting some real data out into the open that people could pound away at, get a good process in place, and start making progress. With regard to single-precision: we'd love to have some, and I suspect there are some on the list of candidates; nobody could find one fast enough for the first round that could make it thru all the rest of the criteria, whereas we had 64-bit ones in plenty. We welcome more people joining and doing work..... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
rfg@ics.uci.edu (Ronald Guilmette) (02/27/90)
In article <36461@mips.mips.COM> mash@mips.COM (John Mashey) writes: > >All these criticisms are well-taken, althought the programs are either >real ones, or derived kernels from real ones where the bulk of the time is >spent in small loops (that's the way many scientific codes really are). If individual program are measuring for "scientific" performance, they should probably be labeled as such. Now that I got thah off my chest, I'd like to ask a question about the GCC member of the SPEC suite. It is my understanding that GCC spends a great deal of its time doing code generation and/or optimization. Obviously, the actual amount of time spent doing these things can (potentially) vary a great deal depending upon the target machine that you are having GCC generate code for. Therefore, it would seem that in order to obtain a really "fair" apples-to-apples comparison of performance using GCC as a test case, you would have to run GCC on host machine X and have it compile some code for target machine Z and then run GCC hosted on machine Y and again have it produce code targeted for the same target machine (Z) as in previous tests. I'd just like to know if this is in fact what happens when the GCC "benchmark" is evaluated as part of SPEC benchmarking. Is that how it is done, or do the benchmarkers run GCC hosted on X and targeted for X and then run GCC hosted on Y and targeted for Y? If so, that seems to be a possible source of misleading information. Of course, some people might say that the cost of compiling for a particular machine *should* be factored in as a part of the overall "performance" of the machine itself. Perhaps that's true, but again it may be a question of properly "labeling" the results so that people who read the SPEC numbers fully understand that compilation speed FOR THAT MACHINE was factored in. (People who do no software development will not care about this factor at all, but conversly, professional software developers might care about that particular performance factor above all else). // Ron Guilmette (rfg@ics.uci.edu) // C++ Entomologist // Motto: If it sticks, force it. If it breaks, it needed replacing anyway.
mash@mips.COM (John Mashey) (02/27/90)
In article <25EA4069.17611@paris.ics.uci.edu> rfg@ics.uci.edu (Ronald Guilmette) writes: >In article <36461@mips.mips.COM> mash@mips.COM (John Mashey) writes: >If individual program are measuring for "scientific" performance, they >should probably be labeled as such. In the SPEC newsletter, each of the benchmarks is described in at least moderate detail, and the scientific ones are certainly identifiable. >Now that I got thah off my chest, I'd like to ask a question about the >GCC member of the SPEC suite. > >It is my understanding that GCC spends a great deal of its time doing >code generation and/or optimization. Obviously, the actual amount of >time spent doing these things can (potentially) vary a great deal depending >upon the target machine that you are having GCC generate code for. Therefore, >it would seem that in order to obtain a really "fair" apples-to-apples >comparison of performance using GCC as a test case, you would have to run >GCC on host machine X and have it compile some code for target machine Z >and then run GCC hosted on machine Y and again have it produce code targeted >for the same target machine (Z) as in previous tests. That's what it does: the target is always the same. >Of course, some people might say that the cost of compiling for a particular >machine *should* be factored in as a part of the overall "performance" >of the machine itself. Perhaps that's true, but again it may be a question >of properly "labeling" the results so that people who read the SPEC numbers >fully understand that compilation speed FOR THAT MACHINE was factored in. >(People who do no software development will not care about this factor >at all, but conversly, professional software developers might care about >that particular performance factor above all else). There may sometime be such a benchmark, but the gcc one isn't it. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086