afgg6490@uxa.cso.uiuc.edu (12/12/89)
comp.arch might be interested in a few details about the Cyrix math chip. This is a 387 / Weitek compatible chip, boasting significant speedups. (Biggest speedup, of course, is avoiding the coprocessor interface). FP add features a normalization estimator, that seems to be a signed-digit adder (carry free) followed by a leading zero counter. This is done in parallel with the carry propagate mantissa addition. The normalization estimate yields a shift count that is immediately used to set up a variable shifter for the mantissa sum, and gets it within 1 bit shift of the correctly normalized result. The biggest (literally) feature of the chip is a 69x17 multiplier array, Internally this uses signed digit representation; the article also implies that the 17 bits are signed, but a call to Cyrix did not confirm this. Using this array IEEE extended multiplication is done in 4 cycles, but quite a few more cycles are required to fetch and store data, denorm check, etc. Overhead almost dominates actual computation. Most interesting is Cyrix's divide algorithm. Instead of a 2 or 4 bit digit selection, such as have been described extensively elsewhere (usually with the 4 bit, radix 16, implemented as two cascaded radix 4 stages with differing amounts of overlap), Cyrix uses the 69x17 multiplier for a 17 bit (radix 128K) digit selection. To do this, they use a 20 bit (I think it's really only 19) bit approximation to 1/X. They multiply the partial remainder at each cycle to estimate the next 17 bit quotient digit Q=R*(1/X). Then they back-multiply to obtain a true partial remainder R' = R-X*Q (I conjecture that last step). The 20 bit approximation to 1/X is obtained by Newton Raphson, beginning with an 8 bit lookup value, and 3 iterations. Signed digit multiplication allows negative quotient bits to be used. The quotient digits are stored in redundant form, and subsequently converted to non-redundant form; Cyrix does not appear to use the on-the-fly conversion to non-redundant form that Lang, Ercegovac describe, and Fandrianto uses in his dividers. I think that Cyrix's use of the multiplier will soon be emulated (although not necessarily the same algorithm, which Matula is patenting). Hybrid Newton-Raphson/Digit Selectoiion algorithms may well be used, with NR up to the multiplier width, followed by a multiplicative digit selection thereafter. Alternatively, Lang and Ercegovac's division that uses rounding to select quotient digits, after a normalization process that is implicitly multiplication by a 1/X approximation, can be used. Or, with full multiplier arrays, NR to compute 1/X, followed by an evaluation of the remainder using a full multiplier, in order to get IEEE exact rounding, could be used. IE. the return to "long division" methods was in part motivated by IEEE exact rounding, which full multipliers make possible for faster division algorithms. Reference: "Advancing the Standard in Floating Point Performance" Tom Brightman, Cyrix Corp, Richardson Texas, High Performance Systems, November 1989 Cyrix makes reports on accuracy, etc. available on request; I am awaiting mine... --- PS. I have finally finished my "survey" of computer arithmetic. I'll put the bibliography up here over Xmas, after I've cleaned it up a bit. The report is very long and verbose, I haven't had time to clean it up, so I'm going to hang onto it for a while.
mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) (12/12/89)
In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: | | comp.arch might be interested in a few details about the | Cyrix math chip. This is a 387 / Weitek compatible chip, | boasting significant speedups. (Biggest speedup, of course, | is avoiding the coprocessor interface). In article <1904@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) asks: > Two questions: (1) since this is a plug-in replacement for the 80387, >how does it avoid the coprocessor interface, and (2) in what way is it >Weitek compatible? I haven't heard that it will run Weitek code, is this >a typo or underpublicized feature? (1) It is a direct 80387 replacement, using the same co-processor interface. (2) It is 80387 compatible, not Weitek compatible. The prices I have seen are intermediate between the 80387 and Weitek 3167. The operations are greatly speeded up relative to the 80387. Floating-point add and multiply operations finish in 6 clocks instead of almost 40. I don't remember the details of the 80387 coprocessor interface, but I would guess that communication and synchronization overhead will dominate when this chip is used. The benchmarks published by Cyrix show speedups of typically 35% on application code. The bar charts are very misleading, since the range of the abscissa is 0.8 to 1.5. The 80387 performance is normalized to 1.0, and the Cyrix performance is around 1.3-1.4. This makes the bars for the Cyrix about .55 units long vs. the 0.2 unit lengths of the bars for the 80387 --- this makes it look like the Cyrix-equipped system is almost 3 times as fast as the 80387-equipped system.... -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu mccalpin@scri1.scri.fsu.edu mccalpin@delocn.udel.edu
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/12/89)
In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: | | comp.arch might be interested in a few details about the | Cyrix math chip. This is a 387 / Weitek compatible chip, | boasting significant speedups. (Biggest speedup, of course, | is avoiding the coprocessor interface). Two questions: (1) since this is a plug-in replacement for the 80387, how does it avoid the coprocessor interface, and (2) in what way is it Weitek compatible? I haven't heard that it will run Weitek code, is this a typo or underpublicized feature? -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
hui@joplin.mpr.ca (Michael Hui) (12/13/89)
In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: >PS. I have finally finished my "survey" of computer arithmetic. > I'll put the bibliography up here over Xmas, after I've cleaned > it up a bit. The report is very long and verbose, I haven't had > time to clean it up, so I'm going to hang onto it for a while. I am very excited that you are putting together a survey. Being new to the arithmatic side of things, I have always wanted to read a detailed document on the evolution of floating point ALU architectures, starting with what Seymour Cray did when he was at Control Data. This field is facinating to me because it combines high speed circuit design with innovative algorithms. The same could be said about high performance graphics accellerators. Michael Hui hui@mprgate.mpr.ca 604-985-4214 Vancouver B.C. Canada
mfinegan@uceng.UC.EDU (michael k finegan) (12/14/89)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: >| >| comp.arch might be interested in a few details about the >| Cyrix math chip. This is a 387 / Weitek compatible chip, >| boasting significant speedups. (Biggest speedup, of course, >| is avoiding the coprocessor interface). > Two questions: (1) since this is a plug-in replacement for the 80387, >how does it avoid the coprocessor interface, ~ ~ ~ >-- >bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) According to the technical support group at Cyrix, there is a "memory mode" where the Cyrix chip interprets the instruction stream, and decodes 80x87 opcodes without 80x86 intervention. This has been demo'ed, and requires extra hardware (i.e. plug in board ?). Apparently this arrangement yields the full 5.5 Mflop performance; the 33MHz (etc.) 80386 SLOWS DOWN the Cyrix chip :-). I have looked at the manual for the chip - it looks like it duplicates the 80x87 only - not Weitek. Mike Finegan mfinegan@uceng.UC.EDU
afgg6490@uxa.cso.uiuc.edu (12/14/89)
>I am very excited that you are putting together a survey. Being new to >the arithmatic side of things, I have always wanted to read a detailed >document on the evolution of floating point ALU architectures, starting >with what Seymour Cray did when he was at Control Data. This field is >facinating to me because it combines high speed circuit design with >innovative algorithms. The same could be said about high performance >graphics accellerators. > >Michael Hui hui@mprgate.mpr.ca 604-985-4214 Vancouver B.C. Canada Sorry, I didn't do too much history. All I did was read the past, say, 5 years' worth of papers and try to get a coherent picture in my mind of the state of the art and practice of computer arithmetic. I'd like to read a historical survey too. THe best I've seen are still Hwang, and Waser and Flynn - although I would still describe these as "introductory" texts in computer arithmetic, not as detailed as I would like. Hwang, especially, provides a bit of history. (You know what I would like? I'd like to have a "Art of Computer Architecture" series much like Knuth's "Art of Computer Programming", with sections on arithmetic, instruction sets, memory and busses, I/O, compilers, and so on... If it doesn't exist, I'd like to write it, but I don't have time or $$ to do so.)
hui@joplin.mpr.ca (Michael Hui) (12/15/89)
In article <112400014@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: >(You know what I would like? I'd like to have a "Art of Computer Architecture" >series much like Knuth's "Art of Computer Programming", with sections on >arithmetic, instruction sets, memory and busses, I/O, compilers, and so >on... If it doesn't exist, I'd like to write it, but I don't have time or >$$ to do so.) Or petition Seymour Cray to do it, as a retirement project. (PLEASE, no misunderstandings here. I am not passing judgement on who is most "fit" to do it. The above is just a random thought from a telecom engineer, not a computer design engineer.)
afgg6490@uxa.cso.uiuc.edu (12/15/89)
>>(You know what I would like? I'd like to have a "Art of Computer Architecture" >>series much like Knuth's "Art of Computer Programming", with sections on >>arithmetic, instruction sets, memory and busses, I/O, compilers, and so >>on... If it doesn't exist, I'd like to write it, but I don't have time or >>$$ to do so.) > >Or petition Seymour Cray to do it, as a retirement project. > >(PLEASE, no misunderstandings here. I am not passing judgement on who is >most "fit" to do it. The above is just a random thought from a telecom >engineer, not a computer design engineer.) No problem, I'm not offended. Can Seymour write? I've thought of asking Knuth, but I think I'll hold off until the "Art of Computer Programming" is finished. :-) Note that this is a multi-year project, O(decades) - if its done any faster it probably isn't worth it. There are a lot of books that pretend to fit the bill, but they don't... Think of how much Knuth has helped computer programming with his oeuvre... Or how about "Encyclopedistes de l'Ordinateur", after Diderot? If the Encyclopedistes promoted the French Revolution, what would a Computer Encyclopedia do?