mangler@cit-vax.Caltech.Edu (Don Speck) (06/13/88)
A while ago, Rick Richardson was looking for a microprocessor that could squeeze 4000 Dhrystones out of a 4 MHz 16-bit bus. Is this even possible? That's only 3 MB/s of bandwidth per MIPS, barely enough to fetch instructions. Even the MC68010, which was designed for slow memory, needs more like 7 MB/s per MIPS. What's the lowest memory/cache bandwidth requirement per MIPS that has been attained? I.e. please add some numbers to this table: Processor avg read bus bandwidth MB/s:MIPS latency width at the CPU MIPS ratio SUN2 (68010) 400ns 16 5 MB/s 0.7 7 Microvax II 400ns 32 10 MB/s 0.9 11 VAX-11/750 ~440ns 32 9 MB/s 0.6 15 VAX-11/780 ~440ns 32 12 MB/s 1.0 12 PDP-11/55 300ns 16 7 MB/s 1? 7? 88000 45ns? 32 185 MB/s? 17 11? Cray-1S 137ns 64 640 MB/s 20? 32? 16 MHz MIPSco ? 32 120 MB/s? 10 13? 70 MHz WM ? 32 3000 MB/s? 100? 30? Acorn RISC Machine 32 ? (I'm less sure about numbers appearing later in the table). I'm wondering if there is some formula for the maximum number of MIPS that can be extracted from a memory system, based on its bandwidth, bus size, and latency, i.e. "with that memory/cache system you can't get more than N mips"? With a large enough table of the above type, perhaps one could derive some rules of thumb in this direction? Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
jesup@cbmvax.UUCP (Randell Jesup) (06/13/88)
In article <6921@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: > Processor avg read bus bandwidth MB/s:MIPS > latency width at the CPU MIPS ratio >SUN2 (68010) 400ns 16 5 MB/s 0.7 7 >Microvax II 400ns 32 10 MB/s 0.9 11 >VAX-11/750 ~440ns 32 9 MB/s 0.6 15 >VAX-11/780 ~440ns 32 12 MB/s 1.0 12 >PDP-11/55 300ns 16 7 MB/s 1? 7? >88000 45ns? 32 185 MB/s? 17 11? >Cray-1S 137ns 64 640 MB/s 20? 32? >70 MHz WM ? 32 3000 MB/s? 100? 30? >16 MHz MIPSco ? 32 120 MB/s? 10 13? Try these (john?) 32+32 128? 12? ~10? 40 Mhz Rpm-40 100ns 32+16 120 MB/s 33 ~4 data inst native If one is talking Vax Mips (which from the original msg we aren't): 40 Mhz Rpm-40 100ns 32+16 120 MB/s 14-16 ~8-9 data inst vax Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
hankd@pur-ee.UUCP (Hank Dietz) (06/13/88)
In article <6921@cit-vax.Caltech.Edu>, mangler@cit-vax.Caltech.Edu (Don Speck) writes: > A while ago, Rick Richardson was looking for a microprocessor > that could squeeze 4000 Dhrystones out of a 4 MHz 16-bit bus. ... > Processor avg read bus bandwidth MB/s:MIPS > latency width at the CPU MIPS ratio > SUN2 (68010) 400ns 16 5 MB/s 0.7 7 > Microvax II 400ns 32 10 MB/s 0.9 11 > VAX-11/750 ~440ns 32 9 MB/s 0.6 15 > VAX-11/780 ~440ns 32 12 MB/s 1.0 12 ... > I'm wondering if there is some formula for the maximum number of > MIPS that can be extracted from a memory system, based on its > bandwidth, bus size, and latency, i.e. "with that memory/cache > system you can't get more than N mips"? With a large enough table > of the above type, perhaps one could derive some rules of thumb in > this direction? Well, obviously there is such a formula using your definition of bandwidth... in fact, you effectively used the formula above. The major source of inconsistency is in what constitutes a MIP. Consider: 1. The average number of bits of memory referenced per instruction executed (hence also per MIP) depends on the instruction set and its encoding. The lower bound is 0 (i.e., processor crunching a microcoded instruction within its own registers) and the maximum is large-but-finite. 2. Your "bandwidth at the CPU" measure simply makes the use of CPU-internal registers/cache/instruction-decode-logic and the operand precsion of the machine all important. For example, if we assume that, on average, a 32-bit operand will be loaded/stored from CPU-external memory every 4 instructions and there are 8-bits per instruction, we would find that we need 2MB/s (16 MBits/s) for one MIP, giving a ratio of 2:1 in your terminology. Once you've picked your benchmark (persumably, Dhrystones) and set the precision of the operands, you're measuring how space-efficiently instructions are encoded and how well the CPU-internal memory system works -- not really all that interesting, because the choice of what to call CPU-internal and what to call CPU-external is completely arbitrary. If you break-down the bandwidth measure into bandwidths of the component parts (i.e., on-chip registers, cache, etc.), then you might get some interesting results...? -hankd
tim@amdcad.AMD.COM (Tim Olson) (06/14/88)
In article <6921@cit-vax.Caltech.Edu>, mangler@cit-vax.Caltech.Edu (Don Speck) writes: From To:mangler@cit-vax.Caltech.Edu Mon Jun 13 11:17:41 1988 Date: Mon, 13 Jun 88 11:17:42 PDT To: mangler@cit-vax.Caltech.Edu (Don Speck) Subject: Re: Maximum MIPS for a given memory bandwidth? In-Reply-To: Message from "Don Speck" of Jun 13, 88 at 2:46 am | A while ago, Rick Richardson was looking for a microprocessor | that could squeeze 4000 Dhrystones out of a 4 MHz 16-bit bus. | | Is this even possible? That's only 3 MB/s of bandwidth per MIPS, | barely enough to fetch instructions. Even the MC68010, which was | designed for slow memory, needs more like 7 MB/s per MIPS. The Am29000 gets 35600 Dhrystones (1.1) at 25MHz with two-cycle first access, single-cycle burst caches. This is well over the 1000 Dhrystones / MHz that Rick requested (although with a 32-bit bus instead of 16 bits). At 4 MHz, memory access times would be 250ns, easily within DRAM range. | What's the lowest memory/cache bandwidth requirement per MIPS that | has been attained? I.e. please add some numbers to this table: | | Processor avg read bus bandwidth MB/s:MIPS | latency width at the CPU MIPS ratio | SUN2 (68010) 400ns 16 5 MB/s 0.7 7 | Microvax II 400ns 32 10 MB/s 0.9 11 | VAX-11/750 ~440ns 32 9 MB/s 0.6 15 | VAX-11/780 ~440ns 32 12 MB/s 1.0 12 | PDP-11/55 300ns 16 7 MB/s 1? 7? | 88000 45ns? 32 185 MB/s? 17 11? | Cray-1S 137ns 64 640 MB/s 20? 32? | 16 MHz MIPSco ? 32 120 MB/s? 10 13? | 70 MHz WM ? 32 3000 MB/s? 100? 30? | Acorn RISC Machine 32 ? These numbers were derived from the Am29000 Architectural Simulator running Dhrystone 1.1 (since the original question was pertaining to Dhrystones). The bandwidth requirements are actual, not theoretical peak. Since you didn't specify whether MIPS were native or VAX-MIPS, I have calculated both: Am29000 (Video DRAM) -------------------- Ave Read Latency 160 ns (load/store/jump) 40 ns (instruction burst) Bus Width 32 bits (* 2 -- separate instruction & data buses) Bandwidth at CPU 44.7 MB/s instruction, 11.5 MB/s data. MIPS 12.7 Native, 15.2 VAX MIPS MB/s/MIPS ratio: 3.70 (VAX) 4.43 (Native) Am29000 (Caches) ---------------- Ave Read Latency 80 ns (load/store/jump) 40 ns (instruction burst) Bus Width 32 bits (* 2 -- separate instruction & data buses) Bandwidth at CPU 62.2 MB/s instruction, 15.8 MB/s data. MIPS 17.4 Native, 22.3 VAX MIPS MB/s/MIPS ratio: 3.50 (VAX) 4.48 (Native) -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
daver@daver.UUCP (Dave Rand) (06/14/88)
In article <22050@amdcad.AMD.COM> tim@amdcad.AMD.COM (Tim Olson) writes: > > Am29000 (Video DRAM) >MIPS 12.7 Native, 15.2 VAX MIPS > Am29000 (Caches) >MIPS 17.4 Native, 22.3 VAX MIPS I am confused. How can a risc machine have a higher "vax mips" than native mips? MORE (not less) risc instructions are required to do the same task, when compared to a vax. If you are saying that the 29000 is 22.3 TIMES FASTER than a vax, then say that - what you have said is not reasonable. I cannot believe that you can execute a vax instruction in 78% of the time of a native instruction (17.4/22.3). This implies that, if you can execute a native instruction in 1 clock, that you can execute a vax instruction (memory-to-memory add, for example), in 0.78 clocks! -- Dave Rand {pyramid|hoptoad|nsc|vsi1}!daver!daver
george@wombat.UUCP (George Scolaro) (06/14/88)
In article <22050@amdcad.AMD.COM> tim@amdcad.AMD.COM (Tim Olson) writes: > Am29000 (Video DRAM) >Bandwidth at CPU 44.7 MB/s instruction, > 11.5 MB/s data. >MIPS 12.7 Native, 15.2 VAX MIPS > Am29000 (Caches) >Bandwidth at CPU 62.2 MB/s instruction, > 15.8 MB/s data. > >MIPS 17.4 Native, 22.3 VAX MIPS The implication is that the 29000 only requires the above bandwidth to achieve the MIPS indicated. The 29000 has a 100 Mbyte/second bus bandwidth. I am sure if you limited the bus bandwidth to the above stated figures the MIPS would decrease quite a bit. Sure, the average bandwidth required is as stated, but to achieve the high MIPS you still need the peak 100 Mbytes/second, otherwise wait states wouldn't hurt the 29000 performance, right? -- george scolaro. {pyramid|hoptoad|nsc|vsi1}!daver!wombat!george
oconnor@sungoddess.steinmetz (Dennis M. O'Connor) (06/14/88)
An article by jesup@cbmvax.UUCP (Randell Jesup) says:
] In article <...>mangler@cit-vax.Caltech.Edu (Don Speck) writes:
] > Processor avg read bus bandwidth MB/s:MIPS
] > latency width at the CPU MIPS ratio
] 40 Mhz Rpm-40 100ns 32+16 120 MB/s 33 ~4
] data inst native
]
] If one is talking Vax Mips (which from the original msg we aren't):
] 40 Mhz Rpm-40 100ns 32+16 120 MB/s 14-16 ~8-9
] data inst vax
]
] Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
How soon they forget :-). Correct figures for bandwidth of the RPM40
are 160 MBytes/sec of data, and 80 MBytes/sec of instructions, for
a total of 240 MB/s of AVAILABLE bandwidth. Unless Randell is
quoting AVERAGE figures, but those would depend on the instruction
mix (i.e. the application).
Hi Randell : got your mail, mail to you bounced. I'll try again.
--
Dennis O'Connor oconnor%sungod@steinmetz.UUCP ARPA: OCONNORDM@ge-crd.arpa
"Never confuse USENET with something that matters, like PIZZA."
tim@amdcad.AMD.COM (Tim Olson) (06/14/88)
In article <291@wombat.UUCP> george@wombat.UUCP (George Scolaro) writes: | The implication is that the 29000 only requires the above bandwidth to | achieve the MIPS indicated. Sorry -- I didn't mean to imply that. Obviously if you want to execute at close to an instruction per cycle, you must be able to supply that peak rate at the pins. However, I think that average bandwidth requirements are much more interesting -- it tells more about the cost and complexity of a memory design than the peak rating, and seemed to be more in line with what the original poster was asking. | The 29000 has a 100 Mbyte/second bus bandwidth. Actually, it is 100MB/s for instruction and 100MB/s for data, although the "sustained" peak is more like 170MB/s (running a series of loads or stores). -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
george@wombat.UUCP (George Scolaro) (06/15/88)
In article <22063@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes: >In article <291@wombat.UUCP> george@wombat.UUCP (George Scolaro) writes: >| The implication is that the 29000 only requires the above bandwidth to >| achieve the MIPS indicated. > >Sorry -- I didn't mean to imply that. Obviously if you want to execute >at close to an instruction per cycle, you must be able to supply that >peak rate at the pins. However, I think that average bandwidth >requirements are much more interesting -- it tells more about the cost >and complexity of a memory design than the peak rating, Does average bandwith tell more about the memory design? Output from the AMD29000 simulator (V4.21 PC) indicates that with 0 wait states the device attains 20.71 MIPS. With 1 wait state on every memory access the device attains 14.08 MIPS (Quote from Byte May 88). So, just 1 wait state impacts the performance quite dramatically. For the 29000 to achieve maximum performance the memory must support burst mode and as near to zero wait states as possible. Thus even though the average bandwidth requirement that was quoted was around 70 Mbytes/second, one wait state, which reduces the bandwidth to 100 Mbytes/second has a major impact on the Dhrystone benchmark. Of course adding cache changes the memory speed requirements, but then cache is just high speed memory hidden behind a high-tech. name. >Actually, it is 100MB/s for instruction and 100MB/s for data, although >the "sustained" peak is more like 170MB/s (running a series of loads or >stores). Yeah, neat. I like the support for burst mode on both the instruction and data paths. Also noted is support for burst write on the data path. george scolaro. {pyramid|hoptoad|nsc|vsi1}!daver!wombat!george
mangler@cit-vax.Caltech.Edu (Don Speck) (06/15/88)
In article <22063@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes: > I think that average bandwidth > requirements are much more interesting -- it tells more about the cost > and complexity of a memory design than the peak rating, and seemed to be > more in line with what the original poster was asking. Average bandwidth requirements are the interesting thing for shared-memory multiprocessors, but I was asking about uniprocessors, where all of the bandwidth is dedicated to one processor and costs the same to provide whether the processor uses all of it or not. I consider caches to be part of the memory system, i.e. part of the von Neumann bottleneck. Instead of using the ambiguous term "MIPS", I should have said "number of times the speed of a VAX/780". Unfortunately it wouldn't fit in the column headings. Dhrystones would have been less ambiguous. I didn't expect enough accuracy that it would make much difference. So the table is amended as follows: Processor avg read bus bandwidth VAX MB/s:MIPS latency width available "MIPS" ratio 25 MHz 88000 45ns? 32+32 185 MB/s? 17 11? 16 MHz MIPSco ? 32+32 120 MB/s? 10? 13? 40 MHz RPM40 100ns 32+16 240 MB/s 15 16 25 MHz AMD 29000 80ns 32+32 170 MB/s 22 8 The AMD 29000 is remarkably bandwidth-efficient, despite using (on average) less than half of the memory cycles available. (Maybe this points out the efficacy of their optimizer). How much would the 29000 slow down if it had only one 32-bit path to a combined instruction+data cache, i.e. half as much peak memory bandwidth available? I had assumed that efficient use of bandwidth would require a narrow path to memory (with bit-addressable bit-serial being the most efficient). Perhaps this is not necessary. I still suspect that there's some lower bound on the number of bytes exchanged with cache/memory to perform the work of a "mythical" instruction. Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
tpmsph@ecsvax.UUCP (Thomas P. Morris) (06/16/88)
As many or all of the readers of this group are well aware, what most literature refers to as "MIPS" or "VAX MIPS" is not really a "MIP", per se. Then the more enlightened literature points out that the comparison is really to a mythical 1.0 "MIP" VAX 11/780. Why don't we just refer to "VUPS", the term DEC coined an uses in their own literature? (VAX 11/780 Unit Processor(S)) At least they are making a nod at the apparent fact that a 780 is not `really' a 1 MIP machine... -- ----------------------------------------------------------------------------- Tom Morris BITNET: TOM@UNCSPHVX UNC School of Public Health UUCP : ...!mcnc!ecsvax!tpmsph -----------------------------------------------------------------------------
hankd@pur-ee.UUCP (Hank Dietz) (06/16/88)
In article <6955@cit-vax.Caltech.Edu>, mangler@cit-vax.Caltech.Edu (Don Speck) writes: ... > I consider caches to be part of the memory system, i.e. part of the > von Neumann bottleneck. ... > Instead of using the ambiguous term "MIPS", I should have said "number > of times the speed of a VAX/780". Unfortunately it wouldn't fit in the > column headings. Dhrystones would have been less ambiguous. I didn't > expect enough accuracy that it would make much difference. ... > I still suspect that there's some lower bound on the number of > bytes exchanged with cache/memory to perform the work of a > "mythical" instruction. What the *!%# are you measuring? 1. What are "MIPS"? (a) Is it millions of instructions executed per second or is it relative speed (VAX 780 = 1 MIP)? (b) Is it a peak rating or an average for some code? (c) If for a code, what code, with what precision requirements (e.g., is it fair to compare 16-bit to 32-bit operations?), and is it the hand-generated best code or are we benchmarking compilers? 2. Bandwidth of what? (a) Do you measure bandwidth at: Main memory? Main memory with some VM paging overhead? Caches? On-chip (i.e., CPU internal) caches and registers? (b) Peak or average? (c) Any concept of shared access? I.e., are you considering that I/O or other processors (e.g., an FPU) might share access to "memory"? If so, does their bandwidth count? I gave you the (rather trivial) formula for determining the ratio in my last posting... let me just repeat that the minimum bandwidth is essentially ZERO. This would be achieved by a machine which had a single (probably microcoded) instruction to, for example, perform the Dhrystone benchmark using values kept in registers. With no memory references (we don't count program loading, right?), the number of MIPS has nothing to do with the bandwidth. SO, what is MY point? The only way to get numbers to compare is to get numbers relating comparable things... ALL numbers should be as completely broken-down as possible (e.g., list register, cache, main mem. bandwidth separately) and fully labelled. -hankd
jesup@cbmvax.UUCP (Randell Jesup) (06/16/88)
In article <11234@steinmetz.ge.com> oconnor%sungod@steinmetz.UUCP writes: >An article by jesup@cbmvax.UUCP (Randell Jesup) says: >] > Processor avg read bus bandwidth MB/s:MIPS >] > latency width at the CPU MIPS ratio >] 40 Mhz Rpm-40 100ns 32+16 120 MB/s 33 ~4 >] data inst native >] >] If one is talking Vax Mips (which from the original msg we aren't): >] 40 Mhz Rpm-40 100ns 32+16 120 MB/s 14-16 ~8-9 >] data inst vax >How soon they forget :-). Correct figures for bandwidth of the RPM40 >are 160 MBytes/sec of data, and 80 MBytes/sec of instructions, for >a total of 240 MB/s of AVAILABLE bandwidth. Oops. So I can't count. Corrected figures: 40 Mhz Rpm-40 100ns 32+16 240 MB/s 33 ~8 data inst native If one is talking Vax Mips (which from the original msg we aren't): 40 Mhz Rpm-40 100ns 32+16 240 MB/s 14-16 ~15 data inst vax Actually, the real numbers we want are for what it does with a fast cache, and a slow memory bus beyond that. Then measure the Mips/main memory bandwidth. Of course, then we get into cache sizing problems... >Hi Randell : got your mail, mail to you bounced. I'll try again. > Dennis O'Connor oconnor%sungod@steinmetz.UUCP ARPA: OCONNORDM@ge-crd.arpa Dennis: Try uunet!cbmvax!jesup (or steinmetz!uunet!cbmvax!jesup) Randell Jesup, Commodore Engineering {uunet|rutgers|ihnp4|allegra}!cbmvax!jesup
jesup@cbmvax.UUCP (Randell Jesup) (06/16/88)
>In article <11234@steinmetz.ge.com> oconnor%sungod@steinmetz.UUCP writes: >>How soon they forget :-). Correct figures for bandwidth of the RPM40 >>are 160 MBytes/sec of data, and 80 MBytes/sec of instructions, for >>a total of 240 MB/s of AVAILABLE bandwidth. I did some thinking, and have a guess at actual average bandwidth (VERY rough): 100% of instruction bandwidth + 30-40% data bandwidth ~= 140MB/s 40 Mhz Rpm-40 100ns 32+16 140 MB/s 33 ~4.5 data inst avg native If one is talking Vax Mips (which from the original msg we aren't): 40 Mhz Rpm-40 100ns 32+16 140 MB/s 14-16 ~9-10 data inst avg vax Randell Jesup, Commodore Engineering {uunet|rutgers|ihnp4|allegra}!cbmvax!jesup
tim@amdcad.AMD.COM (Tim Olson) (06/16/88)
In article <292@wombat.UUCP> george@wombat.UUCP (George Scolaro) writes: | Does average bandwith tell more about the memory design? Output from | the AMD29000 simulator (V4.21 PC) indicates that with 0 wait states | the device attains 20.71 MIPS. With 1 wait state on every memory access | the device attains 14.08 MIPS (Quote from Byte May 88). So, just 1 wait | state impacts the performance quite dramatically. That is true, it does if that wait-state is inserted into all instruction requests. I was thinking more along the lines of burst mode: there are many schemes whereby we can supply an instruction per cycle in burst mode, with some increased latency for starting the burst access. Video-DRAM designs, Static-column memories, interleaved DRAMS are all examples. These designs are usually much cheaper than the single-cycle SRAM needed to provide peak bandwidth, which may not even be required. -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
tim@amdcad.AMD.COM (Tim Olson) (06/16/88)
In article <6955@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: | So the table is amended as follows: | | Processor avg read bus bandwidth VAX MB/s:MIPS | latency width available "MIPS" ratio | 25 MHz 88000 45ns? 32+32 185 MB/s? 17 11? | 16 MHz MIPSco ? 32+32 120 MB/s? 10? 13? | 40 MHz RPM40 100ns 32+16 240 MB/s 15 16 | 25 MHz AMD 29000 80ns 32+32 170 MB/s 22 8 ^^^^ Well, on Dhrystone 1.1, anyway! ;-) It would probably be more "reasonable" to reduce this to 17, which is what we see for large UNIX utilities. | The AMD 29000 is remarkably bandwidth-efficient, despite using | (on average) less than half of the memory cycles available. | (Maybe this points out the efficacy of their optimizer). That certainly has to be taken into account. | How much would the 29000 slow down if it had only one 32-bit | path to a combined instruction+data cache, i.e. half as much | peak memory bandwidth available? I just ran the benchmarks. Both models are Video-DRAM memory with 4-cycle jumps, loads, and stores, and 1-cycle instruction burst capability. The first model has split I/D (i.e. can have an instruction burst concurrent with a load or store). The second must drop I-burst for every load or store, wait for the load or store to complete, then start up the I-burst again (another 4 cycles). This simulates connection to the memory through a single shared I/D bus. Model Dhrystones (1.1) Split I/D: 24294 Shared I/D: 18428 This is a drop in performance of 24%. Part of this is due to not being able to execute other instructions concurrently with an in-progress load or store, because they cannot be fetched simultaneously. The other part is due to restarting the I-burst after a random load or store breaks it. -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
mash@mips.COM (John Mashey) (06/16/88)
In article <5275@ecsvax.UUCP> tpmsph@ecsvax.UUCP (Thomas P. Morris) writes: >As many or all of the readers of this group are well aware, what most >literature refers to as "MIPS" or "VAX MIPS" is not really a "MIP", >per se. Then the more enlightened literature points out that the >comparison is really to a mythical 1.0 "MIP" VAX 11/780. Why don't >we just refer to "VUPS", the term DEC coined an uses in their own >literature? (VAX 11/780 Unit Processor(S)) At least they are making >a nod at the apparent fact that a 780 is not `really' a 1 MIP machine... As noted before (some discussions are like boomerangs, they always come back): 1 VUP == 1 VAX 11/780, with VMS compilers. Note that some people also compare to MicroVAX II's (an MVUP!) which are slower than 780s, especially on floating point (be careful to compare apples with apples when looking at the Digital Review benchmarks, for example, which are often expressed as MVUP ratings). For DEC, VUPs work fine, because they compare CPUs in a family, using the same software. Thus, if the compilers optimize better over time, the processor ratios remain grossly constant per benchmark, and given that DEC uses a lot of benchmarks, I'd guess that compiler improvements that favor one model over another probably get washed out. When used elsewise, a VUP is a moving target, and if your compilers don't improve as fast as DEC's, your VUP-rating can diminish over time! As we've said many times, trying to boil even CPU performance down to 1 number is nonsense [for example, a "6-VUP" 8700 can be anywhere from 3-7X faster than a 780; other vendor's systems can very even more relative to the 780], but as much as you hate it, you get forced into it. argh. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
alan@pdn.UUCP (Alan Lovejoy) (06/16/88)
>In article <6955@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: >| So the table is amended as follows: >| >| Processor avg read bus bandwidth VAX MB/s:MIPS >| latency width available "MIPS" ratio >| 25 MHz 88000 45ns? 32+32 185 MB/s? 17 11? Where did you get a 25 MHz 88000? I believe all benchmark figures so far are based on the 20 MHz part, including the 17 VUPS rating. Motorola says that the pcc/88k compiler produces Dhrystone code that runs 25,000 times/sec on the 20 MHz part, the Green Hills C/88k compiler gets 34,000/sec, and Tadpole Technology claims 45,000/sec with their compiler. All at 20 MHz, but not necessarily with that same cache sizes in the case of Tadpole. -- Alan Lovejoy; alan@pdn; 813-530-8241; Paradyne Corporation: Largo, Florida. Disclaimer: Do not confuse my views with the official views of Paradyne Corporation (regardless of how confusing those views may be). Motto: Never put off to run-time what you can do at compile-time!
randys@mipon2.intel.com (Randy Steck) (06/17/88)
In article <6955@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: >So the table is amended as follows: > > Processor avg read bus bandwidth VAX MB/s:MIPS > latency width available "MIPS" ratio >25 MHz 88000 45ns? 32+32 185 MB/s? 17 11? >16 MHz MIPSco ? 32+32 120 MB/s? 10? 13? >40 MHz RPM40 100ns 32+16 240 MB/s 15 16 >25 MHz AMD 29000 80ns 32+32 170 MB/s 22 8 > Another data point: Processor avg read bus bandwidth VAX MB/s:MIPS latency width available "MIPS" ratio 20 MHz 80960 ? 32 53.3 MB/s 8 6.7 The ratio is so low because of the on-board instruction cache and the existence of more complete addressing modes in the load/store instructions. The external address bus is a multiplexed bursting bus. Performance degradation is about 7% for each wait state. This contrasts nicely with the 15-20% degradations and separate busses typically seen. A better number to give for this table would be one that took into account the bandwidth available from the internal instruction cache. Unfortunately, this is a relatively difficult number to calculate and really depends on the data/instruction access mix. When I find some time, I will try to do this for the Dhrystone benchmark. Randy Steck Intel Corp. ...intelca!mipon2!randys