roger@nsc.UUCP (04/14/87)
To stimulate some more valued discussions, let me lift some of what is discussed in the 32532 overview brochure; " At least a dozen manufacturers have brought 32-bit solutions to the marketplace. While each design is similar in the broad view, the specifics of each implementation can vary greatly. And it is those specifics that determine which is best for your needs. The specifics of the NS32532, however are unprecedented in 32-bit microprocessor architectures. In fact National has applied for eigth separate patents on the NS32532: 1.) The method of detecting and handling memory-mapped I/O by a pipelined microprocessor. ----- Think about that for a while. The 32532 has a 1024 byte 2 way set associative data cache. Without the special method of handling I/O, writing I/O drivers is somewhat problematic. 2.) Maintaining coherence between a microprocessors integrated cache and the external memory. ------ Since both the Instruction and Data caches are physical caches, we were able to devise a means to provide "hardware" cache coherence hooks. Coherency can be maintaind without cubersome software overhead and at cost in performance. 3.) Monitoring control flow in a microprocessor ----- in other words, branch prediction. 4.) The concept of a fully integrated cache, Memory Management Unit, and Instruction pipeline. 5.) Method of simultanous references to the cache and Bus Interface unit. 6.) Method for completing instructions without waiting for writes. ---- Yes thats right. Reads have priority over writes. Writes are buffered in a 2 entry FIFO. There is one exception to this rule ----- memory mapped I/O as in patent # 1 above. 7.) Method of optimizing instruction fetches. 8.) MMU that is accessible by the instruction unit, address unit and the execution unit. These unique and innovative architectural refinements give the NS32532 key performance advantages in a variety of 32-bit applications." I'm open to discussion on any of these unique attributes. ------- Roger
shebanow@ji.Berkeley.EDU (Mike Shebanow) (04/14/87)
In article <4206@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes: > The specifics of the NS32532, however are unprecedented in 32-bit >microprocessor architectures. In fact National has applied for >eigth separate patents on the NS32532: > >1.) The method of detecting and handling memory-mapped I/O > by a pipelined microprocessor. ----- Think about > that for a while. The 32532 has a 1024 byte 2 way set > associative data cache. Without the special method > of handling I/O, writing I/O drivers is somewhat problematic. What happens in a VAX??? It has the same problem. How about the 680X0?? Same thing again. The simple solution is to have a bit in the page table entry saying that this is I/O. That way, the data is uncached. Is there something wrong with this solution? >2.) Maintaining coherence between a microprocessors integrated cache > and the external memory. ------ Since both the Instruction > and Data caches are physical caches, we were able to devise > a means to provide "hardware" cache coherence hooks. Coherency > can be maintaind without cubersome software overhead and at > cost in performance. This is new for a microprocessor, but not in general. Is what you are doing a different method for cache coherency (Archibald and Baer in the Nov. 86 ACM Transactions on Computer Systems has a good survey)? >3.) Monitoring control flow in a microprocessor ----- in other > words, branch prediction. This again is new for a micro, but not in general. What type of branch prediction are you doing (Lee and Smith IEEE Computer Jan. 84, MacFarling and Hennesy 13th ISCA June 86, JE. Smith 8th SCA 81, IBM <too many to list>)? >6.) Method for completing instructions without waiting for writes. ---- > Yes thats right. Reads have priority over writes. Writes are > buffered in a 2 entry FIFO. There is one exception to this > rule ----- memory mapped I/O as in patent # 1 above. Again, I don't understand what is new and unique. This is a well known technique. Alan Smith in his '83 (82???) ACM paper on "CPU Cache Memories" describes such a write buffer being used to improve write-through cache performance. Sorry for creating a flame letter, but maybe I am confused about what is being patented here. Are you claiming that the concepts above are patentable, or that the methods used to reduce the concepts to practice are patentable? Mike Shebanow (shebanow@ji.berkeley.edu)
hansen@mips.UUCP (04/15/87)
In article <18308@ucbvax.BERKELEY.EDU>, shebanow@ji.Berkeley.EDU (Mike Shebanow) writes: > In article <4206@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes: > > The specifics of the NS32532, however are unprecedented in 32-bit > >microprocessor architectures. In fact National has applied for > >eigth separate patents on the NS32532: > Sorry for creating a flame letter, but maybe I am confused about > what is being patented here. Are you claiming that the concepts above > are patentable, or that the methods used to reduce the concepts to practice > are patentable? Seems to me that all Roger said was that National has applied for these patents. For all we know, the applications might be rejected because they are considered either not "novel" or because they are judged to be "obvious to one skilled in the state of the art." -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
davet@oakhill.UUCP (04/15/87)
In article <4206@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes: > > The specifics of the NS32532, however are unprecedented in 32-bit >microprocessor architectures. In fact National has applied for >eigth separate patents on the NS32532: Only eight patents? I'm just a software guy and I was associated with four patent applications for the MC68020. What follows is not a critique of the NS32532 at all, just a comment on your highly touted list of architectural breakthroughs. Remember, I'm only a software guy so Motorola may have already done some of the other things in your list I don't address. But if you take into consideration all of the other microprocessor firms representing themselves in this newsgroup, I would be rather suprized if your list doesn't turn to zip. >1.) The method of detecting and handling memory-mapped I/O > by a pipelined microprocessor. ----- Think about > that for a while. The 32532 has a 1024 byte 2 way set > associative data cache. Without the special method > of handling I/O, writing I/O drivers is somewhat problematic. Motorola offers this via several means. First, a non-cachable bit in our MMU descriptor can be used to indicate I/O space. Second, a class of instructions which lock the bus automatically avoid using on-chip cache. Third, external hardware can signal any bus cycle to be non-cached thus forcing the next reference to again come out onto the external bus. >3.) Monitoring control flow in a microprocessor ----- in other > words, branch prediction. The MC68010 (out about 5 years now?) supported this for it's DBcc set of branch instructions (loop mode.) Yes it was more primitive, but the idea is the same. >5.) Method of simultanous references to the cache and Bus Interface unit. The MC68020 does this. Instruction references go to both the on chip cache and to the bus controller. The bus controller aborts it's cycle if the cache comes up with the data. >6.) Method for completing instructions without waiting for writes. ---- > Yes thats right. Reads have priority over writes. Writes are > buffered in a 2 entry FIFO. There is one exception to this > rule ----- memory mapped I/O as in patent # 1 above. The MC68020 has a one buffer write mechanism. Intel claims that both their 286 and 386 chips support a one buffer write queue also. >7.) Method of optimizing instruction fetches. Most latter day microprocessors could make this claim. Do you have unique logic on the part to accomplish this? >8.) MMU that is accessible by the instruction unit, address unit > and the execution unit. Again, unique logic on the part is necessary for a patent here. >I'm open to discussion on any of these unique attributes. > >------- Roger First you need to establish just what is unique or not. -- Dave Trissel Motorola Semiconductor Inc., Austin, Texas {ihnp4,seismo}!ut-sally!im4u!oakhill!davet
kenm@sci.UUCP (04/16/87)
In article <4206@nsc.nsc.com>, roger@nsc.nsc.com (Roger Thompson) writes: > To stimulate some more valued discussions, let me lift some > of what is discussed in the 32532 overview brochure; > > " At least a dozen manufacturers have brought 32-bit solutions to the > marketplace. While each design is similar in the broad view, the specifics > of each implementation can vary greatly. And it is those specifics > that determine which is best for your needs. > > The specifics of the NS32532, however are unprecedented in 32-bit > microprocessor architectures. In fact National has applied for > eigth separate patents on the NS32532: Introduction: When I say "we" below I mean a group of CPU designers I was part of at HP for about 4 years (80-84). > > 1.) The method of detecting and handling memory-mapped I/O > by a pipelined microprocessor. ----- Think about > that for a while. The 32532 has a 1024 byte 2 way set > associative data cache. Without the special method > of handling I/O, writing I/O drivers is somewhat problematic. Not clear just what the problem is. Presumably the I/O addresses can identifiy themselves, so the cache just has to pay attention. > > 2.) Maintaining coherence between a microprocessors integrated cache > and the external memory. ------ Since both the Instruction > and Data caches are physical caches, we were able to devise > a means to provide "hardware" cache coherence hooks. Coherency > can be maintaind without cubersome software overhead and at > cost in performance. An extra tag set for the instruction cache so it can monitor all writes to the data cache. A simpler solution it to make it illegal architecturally to write into your own instruction stream and to provide a mechanism for flushing cache blocks. > > 3.) Monitoring control flow in a microprocessor ----- in other > words, branch prediction. We used a small special purpose cache for this. The way it worked was that the address of the conditional branch was hashed down to 9 bits which were used to index a 512x2 bit ram. The two bits were used to implement a "slow learner" state machine that predicted which way the branch would go. We saw a 95% prediction rate if programs were allowed to run long enough without a context switch. With context switch effects this dropped into the 80-85% rate for our test cases. Being a slow learner means that it only makes one mistake on the execution of a loop, on the very last pass. We also tried various 1,2, and 3 bit state machines but none of them worked as well. Credit for this goes to Mike Manlove at HP. There is also quite a bit of literature on the subject. > > 4.) The concept of a fully integrated cache, Memory Management Unit, > and Instruction pipeline. Pretty vague. I have heard lots of "concepts" in this area. > > 5.) Method of simultanous references to the cache and Bus Interface unit. Ditto. > > 6.) Method for completing instructions without waiting for writes. ---- > Yes thats right. Reads have priority over writes. Writes are > buffered in a 2 entry FIFO. There is one exception to this > rule ----- memory mapped I/O as in patent # 1 above. I remember reading about CDC machines back in the dark ages doing this. Essentially the output fifo contained both addresses and data and each read did a partial comparison (about 8 bits) of the read address against all the write addresses in the fifo and if a match was found then the data was grabbed out of the fifo and the writes had priority. Virtual addressing might complicate this if aliasing is allowed. > > 7.) Method of optimizing instruction fetches. Instruction buffers. Instruction caches. Fetching multiple paths simultaniously. Using branch prediction to fetch the probable path. Putting the instruction decoder on the other side of the instruction cache. (this takes the next address and branch target calculation out of the critical path) ... > > 8.) MMU that is accessible by the instruction unit, address unit > and the execution unit. If it wasn't, how would the processor work? > > These unique and innovative architectural refinements give the > NS32532 key performance advantages in a variety of 32-bit applications." > > > I'm open to discussion on any of these unique attributes. > > ------- Roger I'm going to be interested in how many of these National manages to patent. I'm also sure a lot of good engineering work went into the 32532 but most new ideas in this area aren't. Ken McElvain decwrl!sci!kenm
roger@nsc.nsc.com (Roger Thompson) (04/17/87)
In article <299@dumbo.mips.UUCP>, hansen@mips.UUCP (Craig Hansen) writes: > Seems to me that all Roger said was that National has applied for these > patents. For all we know, the applications might be rejected because they > are considered either not "novel" or because they are judged to be "obvious > to one skilled in the state of the art." > -- I haven't disappeared. I have been working on responding to several email requests. I have been working sort of as a FIFO and I'm responding by time and date. I have but one or two left. As it relates to the patents, there is a difference between applying for a patent and actually getting one. Our success rate is very very high. But in the time between applying and actually being granted a patent and having the text available from the Gov't printing office ------- well, it could be late 1988 or early 1989. In the interim, the text is held by the patent office as confidential material so I can't send drafts of it off to anyone. In fact I haven't seen the full drafts myself. What I can do however is answer questions in almost any detail one wishes. I may have to search out an answer or two. Roger
roger@nsc.nsc.com (Roger Thompson) (04/19/87)
In article <863@oakhill.UUCP>, davet@oakhill.UUCP (Dave Trissel) writes: > some of the other things in your list I don't address. But if you take into > consideration all of the other microprocessor firms representing themselves > in this newsgroup, I would be rather suprized if your list doesn't turn to > zip. We'll see what transpires ---- the list could even get longer, but the length of the list won't change how the features of the 32532 operate togehter. > Motorola offers this via several means. First, a non-cachable bit in our > MMU descriptor can be used to indicate I/O space. Second, a class of > instructions which lock the bus automatically avoid using on-chip cache. > Third, external hardware can signal any bus cycle to be non-cached thus > forcing the next reference to again come out onto the external bus. I presume your references here are to the 68030 since the 020 doesn't support a data cache. The non-cachable bit is the classic solution, special classes of instructions which lock the bus??? I understand the need for bus interlocks and yes in this case you wish to avoid the cache ----- but how does this relate to Memory mapped I/O. My comment relates to physical I/O devices and the mechanism we have designed into the 532 to both force an external cycle (via hardware) as well as to serialize read and writes. This is required since the internal pipeline normally prioritzes reads over writes. > The MC68010 (out about 5 years now?) supported this for it's DBcc set of > branch instructions (loop mode.) Yes it was more primitive, but the idea > is the same. Computer architecture continues to evolve -- I agree and the designers of tomorrows micros will borrow from the past BUT in the process they will add new wrinkles. What is the effectiveness of the 010s prediction, what overall performance gain/loss does it provide since it is only supported in one class of branch instructions. > The MC68020 does this. Instruction references go to both the on chip cache > and to the bus controller. The bus controller aborts it's cycle if the cache > comes up with the data. > The concept is quite similar, BUT far more complicated in the case of the 532 since it also has an internal MMU. Yes you say the 030 will support that. Yes --- but the 030 really only supports the TLB and even then the caches are virtual. The 532 contains the whole MMU , with a 64 entry TLB, and physical caches both quite a bit larger than on the 030. > The MC68020 has a one buffer write mechanism. Intel claims that both their > 286 and 386 chips support a one buffer write queue also. Agreed, but the new wrinkle here relates to how it reacts to bus errors interrupts, traps and other activities which affect the performance of the pipeline. > Most later day microprocessors could make this claim. Do you have > unique logic on the part to accomplish this? Yes ---- the issue gets interesting since the 32532 supports dynamic bus sizeing. There are situations where intructions are fetched both sequentially and non-sequentially. > Again, unique logic on the part is necessary for a patent here. > That is the requirement of the patent office. Sorry for the delay in responding Dave --- but I think I'm caught up now. Roger
amos@instable.UUCP (Amos Shapir) (04/19/87)
Before you use the 'it has been done before' argument to flame National's patents of the 32532, keep in mind that a specific implementation of a combination of ideas may be patented even if each one of them has been done before separately. Are there any patent lawyers in the audience? -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. (972)52-522261 amos%nsta@nsc.com {hplabs,pyramid,sun,decwrl} 34.48'E 32.10'N
baum@apple.UUCP (Allen J. Baum) (04/21/87)
-------- [] I think the argument about whether National has something that is innovative or patentable is not a question that can be answered by examining the claims in some marketing literature. Obviously, products have been delivered and patented that have all of the features claimed- but that doesn't mean that National did it the same way they did, or that National didn't do it in a way that has much better price/performance or functionality. What National has patent applications for doesn't have to be something that hasn't been done before, or even something better than has been done before; it only has to do it in a different manner. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
roger@nsc.UUCP (04/22/87)
In article <4042@sci.UUCP>, kenm@sci.UUCP (Ken McElvain) writes: > > > > 1.) The method of detecting and handling memory-mapped I/O > > by a pipelined microprocessor. ----- > Not clear just what the problem is. Presumably the I/O addresses > can identify themselves, so the cache just has to pay attention. There are two hardware mechanisms. One is a hand-shake protocol using two signals, one called IOINH/ and one called IODEC/. These will both force references to the data cache to be non-cacheable as well as force the proper sequencing of reads and writes.The second mechanism is by dedicating the upper 16 Mbytes of the memory map to be for I/O. > > 2.) Maintaining coherence between a microprocessors integrated cache > > and the external memory. ---- > An extra tag set for the instruction cache so it can monitor all writes > to the data cache. A simpler solution it to make it illegal architecturally > to write into your own instruction stream and to provide a mechanism > for flushing cache blocks. The issue here is more related to providing hooks to allow hardware external to the CPU to invalidate the internal caches. There are 7 cache invalidate address inputs and 4 control lines that will allow external hardware to invalidate either an entire cache, or set of a cache, or an individual line (16 bytes) of a cache or set. > > 3.) Monitoring control flow in a microprocessor ----- > We used a small special purpose cache for this. The way it worked > was that the address of the conditional branch was hashed down to 9 bits > which were used to index a 512x2 bit ram. The two bits were used to > implement a "slow learner" state machine that predicted which way the > branch would go. We saw a 95% prediction rate if programs were allowed > dropped into the 80-85% rate for our test cases. Being a slow learner > means that it only makes one mistake on the execution of a loop, > on the very last pass. We also tried various 1,2, and 3 bit state machines > but none of them worked as well. Credit for this goes to Mike Manlove at > HP. There is also quite a bit of literature on the subject. Your approach is far more elaborate than the one we use. Part of the reason is that the 32532 was/is targeted towards applications which are context switch intensive. Our approach takes into account that programs typically have loops and that branches backward are taken more often than not. Our brochure is confusing in this area. The predictor section of the chip has a separate address calculation unit so that this can be done in parallel with other operations. I will give a more detailed response in this area in reference to a posting by Craig Hansen. > > 6.) Method for completing instructions without waiting for writes. ---- > I remember reading about CDC machines back in the dark ages doing this. > Essentially the output fifo contained both addresses and data and > each read did a partial comparison (about 8 bits) of the read address > against all the write addresses in the fifo and if a match was found > then the data was grabbed out of the fifo and the writes had priority. > Virtual addressing might complicate this if aliasing is allowed. Our approach is not this elaborate. Since the data cache is write-through, the cache is always up to date and external writes can be delayed. In addition to this, there are mechanisms that check whether a subsequent instruction is reading an operand befor it has been written even in the cache. The read will be delayed. This is somewhat similar to how the pipe handles register referneces. > > 7.) Method of optimizing instruction fetches. > Instruction buffers. > Instruction caches. > Fetching multiple paths simultaniously. > Using branch prediction to fetch the probable path. > Putting the instruction decoder on the other side of the instruction > cache. (this takes the next address and branch target calculation > out of the critical path) The reference here was more related to fetching the instruction opcode itself. Yes we have buffers and caches etc as you list above, but since the CPU supports dynamic bus sizing, instruction fetching can be from 8, 16 or 32 bit wide memory. There are scenarios where both non-sequential and sequential fetching is supported. Roger