kruger@16bits.dec.com (I've got 50nS memory. What did you say?) (03/02/88)
a) And architecture working at a prestious law firm, making over $40K to start. b) An architecture that has separate address and data busses so that instruction fetches don't stop the processor from doing data transfers, which is what it's supposed to be doing! Motorola claims that the 68030 has an "internal Harvard architecture" by which they mean it has separate internal instruction and data caches. dov
brucek@hpsrla.HP.COM (Bruce Kleinman) (03/08/88)
+------- | Motorola claims that the 68030 has an "internal Harvard architecture" by | which they mean it has separate internal instruction and data caches. +------- Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Bruce Kleinman brucek%hpnmd@hpcea.hp.com -or- ...hplabs!hpnmd!brucek Hewlett Packard - Network Measurements Division Santa Rosa, California
lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (03/09/88)
In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) writes about the 68030: >Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Actually, it will. Remember, the CDC 6600 got a win from an "instruction stack" of 480 bits ! Plus, the two caches access in parallel (versus the one cache of the 68020). Plus, the caches now take one clock (versus 2 on the 68020). Plus, the caches now have burst refill (if the board designer supports it, of course.) All in all, a clear improvement. I don't hear any suggestions as to a better use for the silicon. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
bcase@Apple.COM (Brian Case) (03/10/88)
In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) > writes about the 68030: >>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Talking about 68030. > >Actually, it will. Remember, the CDC 6600 got a win from an "instruction >stack" of 480 bits ! > >All in all, a clear improvement. I don't hear any suggestions as to a better >use for the silicon. A little birdie with an EE degree told me that you can expect maybe a 20% improvement over a 68020 at the same clock rate. An improvement, yes, but a better use of silicon might have been some on-chip floating point. Or how about more pins so as to expose the harvard architecture to the external world?
bcase@Apple.COM (Brian Case) (03/10/88)
In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: >A little birdie with an EE degree told me that you can expect maybe a 20% >improvement over a 68020 at the same clock rate. Oops, I should have said that the little birdie also showed me the system running real stuff. That is one amazing little bird (has trouble holding the soldering iron though).
brucek@hpsrla.HP.COM (Bruce Kleinman) (03/11/88)
+------- | >Ahh, those massive 256-Byte caches are really going to speed this | > puppy up :-) | | Actually, it will. Remember, the CDC 6600 got a win from an "instruction | stack" of 480 bits ! | | Plus, the two caches access in parallel (versus the one cache of the 68020). | Plus, the caches now take one clock (versus 2 on the 68020). | Plus, the caches now have burst refill (if the board designer supports it, | of course.) | | All in all, a clear improvement. I don't hear any suggestions as to a better | use for the silicon. +------- All in all a clear improvement? Over the '020, perhaps ... The 256-byte data cache is of questionable value, as the miss-rate will be fairly high. I would supply numbers for "fairly high," but I can't seem to find any miss-rate data for caches of smaller than 1 Kbyte. Furthermore, the restricted implementation of the data cache makes it useless in multi-processor systems as well as systems with DMA. The data cache has no facilities for coherency. Solution - disable the data cache or flush, flush away. Suggestions as to a better use for the silicon? OK ... Expand the I-cache. I suspect that loosing the D-cache could make room for a 1 Kbyte I-cache. This would offer honest improvements in almost all systems. High-perf machines can surround the '030 with an hefty sys-cache for increased performance. Memory bandwidth problems, you say? Bring the Harvard architecture out to the pins - which is exactly what Motorola did for the 88000. Bruce Kleinman brucek%hpnmd@hpcea.hp.com -or- ...hplabs!hpnmd!brucek Hewlett Packard - Network Measurements Division Santa Rosa, California
pv@tut.fi (Vuorimaa Petri Kalevi) (03/12/88)
From article <7614@apple.Apple.Com>, by bcase@Apple.COM (Brian Case): > A little birdie with an EE degree told me that you can expect maybe a 20% > improvement over a 68020 at the same clock rate. An improvement, yes, but > a better use of silicon might have been some on-chip floating point. Floating point unit takes much more space than instruction cache. Actually more room than memory management unit and instruction cache together. You must also remember that there's just not that much more space on the chip: Motorola had to drop the number of address translation cache entries from 64 (MC68851) to 22 (MC68030) to fit MMU in. At this moment it's just not possible to add on-chip floating point unit to MC68020. So, Motorola made the best choice they could. > Or how about more pins so as to expose the harvard architecture to the > external world? That means nearly 200 pins (128 + 32 + 32 + control signals). And that's much more than I myself have ever seen (or is there such?). If the address space for instructions was smaller, then maybe it could be possible, but that causes troubles for the applications all ready using MC68020. -- Petri Vuorimaa Tampere University of Technology / Computer Systems Lab pv@tut.FI PO. BOX. 527, 33101 Tampere, Finland
bcase@Apple.COM (Brian Case) (03/15/88)
In article <2781@korppi.tut.fi> pv@tut.fi (Vuorimaa Petri Kalevi) writes: >Floating point unit takes much more space than instruction cache. >Actually more room than memory management unit and instruction cache >together. You must also remember that there's just not that much >more space on the chip: Motorola had to drop the number of address >translation cache entries from 64 (MC68851) to 22 (MC68030) to fit MMU in. >At this moment it's just not possible to add on-chip floating point unit >to MC68020. So, Motorola made the best choice they could. Yeah, there is some problem with chip area. However, the T800 transputer has on-chip floating point plus a 2K byte ram plus 4 links plus.... There are some floating-point assists that might have fit. One thing is for sure, Mot has never been know for having the best process technology, er, I mean densest anyway. This may be changing somewhat.
alan@pdn.UUCP (Alan Lovejoy) (03/19/88)
In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: /In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: />In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) /> writes about the 68030: />>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) / /Talking about 68030. / /> />Actually, it will. Remember, the CDC 6600 got a win from an "instruction />stack" of 480 bits ! /> />All in all, a clear improvement. I don't hear any suggestions as to a better />use for the silicon. / /A little birdie with an EE degree told me that you can expect maybe a 20% /improvement over a 68020 at the same clock rate. An improvement, yes, but /a better use of silicon might have been some on-chip floating point. Or /how about more pins so as to expose the harvard architecture to the external /world? The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction cache turned on (compared to its being turned off). Apparently the '030 gets **AT LEAST** a 30% performance boost from turning on the data cache (so I have been told by those who have benchmarked one). Enough said. --alan@pdn
mash@mips.COM (John Mashey) (03/20/88)
In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >cache turned on (compared to its being turned off). Apparently the >'030 gets **AT LEAST** a 30% performance boost from turning on the data >cache (so I have been told by those who have benchmarked one). > >Enough said. 1) Even small I-caches are almost always useful: the question that started this all was whether or not small D-caches were useful, and if so, how much, and under what circumstances. 2)One would expect (rightly or wrongly) that overall system design would heavily influence the benefit level of a small on-chip D-cache. I.e., one would expect that, for example, turning the D-cache on would help more in a Sun-3/160 - style design (no external cache) than in a /260 design (well-designed, fast external cache). (Expectations could be wrong, but...) 3) Data would help: whenever 68030 systems become widely available, and especially if there are convenient ways to turn the caches on/off, people could do comp.arch a large service by: a) running large, realistic benchmarks b) reporting the results c) reporting the clock speed and overall memory system configuration. Until then, all we've got to go on is indirect reports, having no idea what sorts of benchmarks and configurations are being tested. Alan: can you possibly offer more fo the details, or is it still proprietary? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
radford@calgary.UUCP (Radford Neal) (03/21/88)
In article <2594@pdn.UUCP>, alan@pdn.UUCP (Alan Lovejoy) writes: > The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction > cache turned on (compared to its being turned off). Apparently the > '030 gets **AT LEAST** a 30% performance boost from turning on the data > cache (so I have been told by those who have benchmarked one). > > Enough said. This is a bit hard to believe, seeing as the 68020 takes two cycles to access a word from cache, and only three to access the same word from external memory (if my memory serves me right). Of course, the external memory might be slow, and require wait states to be added to the three cycles. If you're willing to go for that, however, I'm sure I could build a system in which turning on the cache speeds things up by a factor of ten. (Well, actually, I'm not sure I could, not having much experience with a soldering iron, but you know what I mean... :-) What's needed are data on: (1) the benefit of the '030's data cache assuming zero wait states (and a 32-bit bus), (2) the benefit for various other memory configurations, (3) the overall benefit in systems typical of various applications. Radford Neal
pf@diab.UUCP (Per Fogelstr|m) (03/21/88)
In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >cache turned on (compared to its being turned off). ............... > >--alan@pdn From our experience tests has shown that turning off the internal cache in the "020" results in a 20% slowdown if the external memory don't have wait states. So Your figure must be from a system with slow external memory, right ??
chow@batcomputer.tn.cornell.edu (Christopher Chow) (03/24/88)
In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes: |In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: ||The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction ||cache turned on (compared to its being turned off). ............... || ||--alan@pdn | |From our experience tests has shown that turning off the internal cache in the |"020" results in a 20% slowdown if the external memory don't have wait states. | |So Your figure must be from a system with slow external memory, right ?? My own tests have also shown that on a Mac II, the 68020 instruction cache results in a 20% performance change. (I think the Mac II has 1 wait state.) Christopher Chow /---------------------------------------------------------------------------\ | Internet: chow@tcgould.tn.cornell.edu (128.84.248.35 or 128.84.253.35) | | Usenet: ...{uw-beaver|ihnp4|decvax|vax135}!cornell!batcomputer!chow | | Bitnet: chow@crnlthry.bitnet | | Phone: 1-607-253-6699 Address: 7122 N. Campus 7, Ithaca, NY 14853 | | Delphi: chow2 PAN: chow | \---------------------------------------------------------------------------/
alan@pdn.UUCP (Alan Lovejoy) (03/25/88)
In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes: >In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >>The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >>cache turned on (compared to its being turned off). ............... >> >>--alan@pdn > >From our experience tests has shown that turning off the internal cache in the >"020" results in a 20% slowdown if the external memory don't have wait states. > >So Your figure must be from a system with slow external memory, right ?? Absolutely correct. 150ns DRAMs to be precise. Using 45ns SRAMs, the figure is closer to the 20% you quoted (my source gets a 30% difference with his benchmarks and his compiler). Somehow only the 150ns DRAM figure stuck in my mind (perhaps because my main interest is personal computers where 45ns SRAMs are too expensive). The numbers from the '030 also are for relatively slow external memory and no external cache. Sorry, but I can't be more specific than that. But an interesting point is raised here: what's good for a $50,000 workstation may not be so good for a $5000 pc, and vice-versa. What's good for running UNIX&C may not be good for running Smalltalk, and vice versa. This is not new information, but the discussion in this group tends to lose sight of it at times. --alan@pdn