[comp.arch] Cyrix - Fast Divide, etc.

afgg6490@uxa.cso.uiuc.edu (12/12/89)

comp.arch might be interested in a few details about the
Cyrix math chip.  This is a 387 / Weitek compatible chip,
boasting significant speedups.  (Biggest speedup, of course,
is avoiding the coprocessor interface).

FP add features a normalization estimator, that seems to
  be a signed-digit adder (carry free) followed by a 
  leading zero counter.  This is done in parallel with the 
  carry propagate mantissa addition. The normalization estimate
  yields a shift count that is immediately used to set up a
  variable shifter for the mantissa sum, and gets it within
  1 bit shift of the correctly normalized result.

The biggest (literally) feature of the chip is a 69x17 multiplier
  array,  Internally this uses signed digit representation;
  the article also implies that the 17 bits are signed, but a
  call to Cyrix did not confirm this.  Using this array IEEE
  extended multiplication is done in 4 cycles, but quite a few more
  cycles are required to fetch and store data, denorm check, etc.
  Overhead almost dominates actual computation.

Most interesting is Cyrix's divide algorithm.  Instead of a 2 or 4 bit
digit selection, such as have been described extensively elsewhere
(usually with the 4 bit, radix 16, implemented as two cascaded radix 4
stages with differing amounts of overlap), Cyrix uses the 69x17 multiplier
for a 17 bit (radix 128K) digit selection.   To do this, they
use a 20 bit (I think it's really only 19) bit approximation to 1/X.
They multiply the partial remainder at each cycle to estimate the
next 17 bit quotient digit Q=R*(1/X).  Then they back-multiply to
obtain a true partial remainder  R' = R-X*Q (I conjecture that last step).
The 20 bit approximation to 1/X is obtained by Newton Raphson,
beginning with an 8 bit lookup value, and 3 iterations.
Signed digit multiplication allows negative quotient bits to be used.
The quotient digits are stored in redundant form, and subsequently
converted to non-redundant form; Cyrix does not appear to use
the on-the-fly conversion to non-redundant form that Lang, Ercegovac
describe, and Fandrianto uses in his dividers.
    I think that Cyrix's use of the multiplier will soon be emulated
(although not necessarily the same algorithm, which Matula is patenting).
Hybrid Newton-Raphson/Digit Selectoiion algorithms may well be used, with NR
up to the multiplier width, followed by a multiplicative digit selection
thereafter.  Alternatively, Lang and Ercegovac's division that uses
rounding to select quotient digits, after a normalization process that
is implicitly multiplication by a 1/X approximation, can be used. Or,
with full multiplier arrays, NR to compute 1/X, followed by an evaluation
of the remainder using a full multiplier, in order to get IEEE exact
rounding, could be used.   IE. the return to "long division" methods
was in part motivated by IEEE exact rounding, which full multipliers
make possible for faster division algorithms.


Reference:

	"Advancing the Standard in Floating Point Performance"
	Tom Brightman, Cyrix Corp, Richardson Texas,
	High Performance Systems, November 1989

	Cyrix makes reports on accuracy, etc. available on request;
	I am awaiting mine...

---

PS. I have finally finished my "survey" of computer arithmetic.
   I'll put the bibliography up here over Xmas, after I've cleaned
   it up a bit.   The report is very long and verbose, I haven't had
   time to clean it up, so I'm going to hang onto it for a while.

mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) (12/12/89)

In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
| 
| comp.arch might be interested in a few details about the
| Cyrix math chip.  This is a 387 / Weitek compatible chip,
| boasting significant speedups.  (Biggest speedup, of course,
| is avoiding the coprocessor interface).

In article <1904@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E
Davidsen Jr) asks:
>  Two questions: (1) since this is a plug-in replacement for the 80387,
>how does it avoid the coprocessor interface, and (2) in what way is it
>Weitek compatible? I haven't heard that it will run Weitek code, is this
>a typo or underpublicized feature?

(1) It is a direct 80387 replacement, using the same co-processor
interface.

(2) It is 80387 compatible, not Weitek compatible.  The prices I have
seen are intermediate between the 80387 and Weitek 3167.

The operations are greatly speeded up relative to the 80387.
Floating-point add and multiply operations finish in 6 clocks instead
of almost 40.  I don't remember the details of the 80387 coprocessor
interface, but I would guess that communication and synchronization
overhead will dominate when this chip is used.

The benchmarks published by Cyrix show speedups of typically 35% on
application code.  The bar charts are very misleading, since the range
of the abscissa is 0.8 to 1.5.  The 80387 performance is normalized to
1.0, and the Cyrix performance is around 1.3-1.4.  This makes the bars
for the Cyrix about .55 units long vs. the 0.2 unit lengths of the
bars for the 80387 --- this makes it look like the Cyrix-equipped
system is almost 3 times as fast as the 80387-equipped system....
--
John D. McCalpin - mccalpin@masig1.ocean.fsu.edu
		   mccalpin@scri1.scri.fsu.edu
		   mccalpin@delocn.udel.edu

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/12/89)

In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
| 
| comp.arch might be interested in a few details about the
| Cyrix math chip.  This is a 387 / Weitek compatible chip,
| boasting significant speedups.  (Biggest speedup, of course,
| is avoiding the coprocessor interface).

  Two questions: (1) since this is a plug-in replacement for the 80387,
how does it avoid the coprocessor interface, and (2) in what way is it
Weitek compatible? I haven't heard that it will run Weitek code, is this
a typo or underpublicized feature?
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

hui@joplin.mpr.ca (Michael Hui) (12/13/89)

In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
>PS. I have finally finished my "survey" of computer arithmetic.
>   I'll put the bibliography up here over Xmas, after I've cleaned
>   it up a bit.   The report is very long and verbose, I haven't had
>   time to clean it up, so I'm going to hang onto it for a while.

I am very excited that you are putting together a survey. Being new to
the arithmatic side of things, I have always wanted to read a detailed
document on the evolution of floating point ALU architectures, starting
with what Seymour Cray did when he was at Control Data. This field is
facinating to me because it combines high speed circuit design with
innovative algorithms. The same could be said about high performance
graphics accellerators.

Michael Hui   hui@mprgate.mpr.ca   604-985-4214  Vancouver B.C. Canada

mfinegan@uceng.UC.EDU (michael k finegan) (12/14/89)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

>In article <112400012@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
>| 
>| comp.arch might be interested in a few details about the
>| Cyrix math chip.  This is a 387 / Weitek compatible chip,
>| boasting significant speedups.  (Biggest speedup, of course,
>| is avoiding the coprocessor interface).

>  Two questions: (1) since this is a plug-in replacement for the 80387,
>how does it avoid the coprocessor interface,
~
~
~
>-- 
>bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)

According to the technical support group at Cyrix, there is a "memory mode"
where the Cyrix chip interprets the instruction stream, and decodes 80x87
opcodes without 80x86 intervention. This has been demo'ed, and requires extra
hardware (i.e. plug in board ?). Apparently this arrangement yields the full
5.5 Mflop performance; the 33MHz (etc.) 80386 SLOWS DOWN the Cyrix chip :-).
I have looked at the manual for the chip - it looks like it duplicates the
80x87 only - not Weitek.

					Mike Finegan
					mfinegan@uceng.UC.EDU

afgg6490@uxa.cso.uiuc.edu (12/14/89)

>I am very excited that you are putting together a survey. Being new to
>the arithmatic side of things, I have always wanted to read a detailed
>document on the evolution of floating point ALU architectures, starting
>with what Seymour Cray did when he was at Control Data. This field is
>facinating to me because it combines high speed circuit design with
>innovative algorithms. The same could be said about high performance
>graphics accellerators.
>
>Michael Hui   hui@mprgate.mpr.ca   604-985-4214  Vancouver B.C. Canada

Sorry, I didn't do too much history.  All I did was read the past, say,
5 years' worth of papers and try to get a coherent picture in my mind
of the state of the art and practice of computer arithmetic.  I'd like
to read a historical survey too.

THe best I've seen are still Hwang, and Waser and Flynn - although I would
still describe these as "introductory" texts in computer arithmetic,
not as detailed as I would like.  Hwang, especially, provides a bit
of history.

(You know what I would like? I'd like to have a "Art of Computer Architecture"
series much like Knuth's "Art of Computer Programming", with sections on
arithmetic, instruction sets, memory and busses, I/O, compilers, and so 
on...  If it doesn't exist, I'd like to write it, but I don't have time or
$$ to do so.)

hui@joplin.mpr.ca (Michael Hui) (12/15/89)

In article <112400014@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes:
>(You know what I would like? I'd like to have a "Art of Computer Architecture"
>series much like Knuth's "Art of Computer Programming", with sections on
>arithmetic, instruction sets, memory and busses, I/O, compilers, and so 
>on...  If it doesn't exist, I'd like to write it, but I don't have time or
>$$ to do so.)

Or petition Seymour Cray to do it, as a retirement project.

(PLEASE, no misunderstandings here. I am not passing judgement on who is
most "fit" to do it. The above is just a random thought from a telecom
engineer, not a computer design engineer.)

afgg6490@uxa.cso.uiuc.edu (12/15/89)

>>(You know what I would like? I'd like to have a "Art of Computer Architecture"
>>series much like Knuth's "Art of Computer Programming", with sections on
>>arithmetic, instruction sets, memory and busses, I/O, compilers, and so 
>>on...  If it doesn't exist, I'd like to write it, but I don't have time or
>>$$ to do so.)
>
>Or petition Seymour Cray to do it, as a retirement project.
>
>(PLEASE, no misunderstandings here. I am not passing judgement on who is
>most "fit" to do it. The above is just a random thought from a telecom
>engineer, not a computer design engineer.)

No problem, I'm not offended. Can Seymour write?

I've thought of asking Knuth, but I think I'll hold off until the 
"Art of Computer Programming" is finished. :-)  Note that this is a 
multi-year project, O(decades) - if its done any faster it probably isn't
worth it. There are a lot of books that pretend
to fit the bill, but they don't...  
Think of how much Knuth has helped computer programming with his oeuvre...

Or how about "Encyclopedistes de l'Ordinateur", after Diderot?
If the Encyclopedistes promoted the French Revolution, what
would a Computer Encyclopedia do?