[comp.arch] Dual FPUs?

mmm@cup.portal.com (Mark Robert Thorson) (03/16/90)

The Compaq Systempro allows you to have both a 387 and a Weitek chip
to be installed on the same 386 CPU.  This is for software compatibility,
because only a few programs, like AutoCAD, use the Weitek chip.  Everything
else uses the 387.  But I suppose a program written specifically for the
Compaq with dual FPU's could keep both of them busy.  The trap
handler might need to be rewritten, because both FPU's use the same
interrupt request line.

danh@halley.UUCP (Dan Hendrickson) (03/17/90)

In article <24915@princeton.Princeton.EDU} haahr@princeton.edu (Paul Haahr) writes:
}... the only applications (at least according to what people say
}in this group) where the traditional supercomputers are significantly
}faster than killer micros are vectorizable number crunchers.
}..............................  For codes like these, wouldn't it be
}possible to take advantage of two (or more) independent, off-the-shelf
}floating point units?
}

I would venture to guess that 64 - 32 bit registers is insufficient
to do much with in a vector machine.  For instance, the Cray's have
8 - 64 x 64 bit vector registers, and I believe the users would like
to see this number grow.

talex@blake.acs.washington.edu (Thomas Alexander) (03/17/90)

>In article <24915@princeton.Princeton.EDU> Paul Haahr writes:
> .....
> For codes like these, wouldn't it be possible to take advantage
> of two (or more) independent, off-the-shelf floating point units?
>
>For example, given a MIPS R3000, could one attach two R3010s, one as
>the usual coprocessor 1, and the other as coprocessor 2.  In loops in
>which the computations on elements i and i+1 can do not interfere with
>each other (ie, are vectorizable), do the computations on the different
>fpus.  This gives you a shot at overlapping (say) multiplications.
> ......

As a matter of fact, the TMS34020 Graphics System Processor (from Texas
Instruments) allows you to hook up to 8 of its floating-point coprocessors
in parallel. Each coprocessor has a 50 nsec cycle, resulting in the
(for advertising purposes only) peak performance of 40 MFLOPs when
doing a multiply-accumulate. Each coprocessor can be addressed and
accessed independently by the CPU (the 34020), and can execute different
instructions simultaneously. To top things off, each FPU can access
external microcode through a separate microcode bus, allowing you to
custom-tailor your coprocessor instructions.

Some people over here recently put together a system with one 34020
and four coprocessors. Expected peak performance of 160 MFLOPs, right?
They got about 3 on compiled code, rising to 10 or so on hand-optimized
assembly language. And this was on heavily vectorizable workloads -
2-D convolution on 1024 x 1024 images, where you can work on million
element vectors at a time doing nothing but multiply-accumulates (a
little oversimplification here).

Multiple coprocessors appear to suffer from one or more of the
following:

* limited instruction issue rate - can't keep them all busy.
* VERY limited data transfer rate - trying to support several data-hungry
	FPUs on one bus does not pay, especially when the same bus/memory
	system is already maxed out trying to keep up with the CPU.
* limited internal register resources - too many data transfers
	in and out of the FPU when large vectors are involved.

In a word - BANDWIDTH! Personally, I'll trade all the MFLOPs you can get
for one good megabyte/sec of data transfer :-)

- Tom

sauer@dell.dell.com (Charlie Sauer) (03/20/90)

In article <27907@cup.portal.com> mmm@cup.portal.com (Mark Robert Thorson) writes:
>The Compaq Systempro allows you to have both a 387 and a Weitek chip
>to be installed on the same 386 CPU.  This is for software compatibility,
>because only a few programs, like AutoCAD, use the Weitek chip.  Everything
>else uses the 387.  

Similarly, some 486 machines, such as the Dell 425E announced today, have a 
socket for the Weitek 4167, even though the 486 has builtin floating point.  I 
just watched a Weitek demo that alternates back and forth between the "387" and
the 4167.
-- 
Charlie Sauer  Dell Computer Corp.     !'s:uunet!dell!sauer
               9505 Arboretum Blvd     @'s:sauer@dell.com
               Austin, TX 78759-7299   
               (512) 343-3310

daveh@cbmvax.commodore.com (Dave Haynie) (03/20/90)

In article <27907@cup.portal.com> mmm@cup.portal.com (Mark Robert Thorson) writes:
>The Compaq Systempro allows you to have both a 387 and a Weitek chip
>to be installed on the same 386 CPU.  

Since the Weitek chip is addressed as a peripheral device, rather than a coprocessor,
this is nothing out of the ordinary.  In fact, you could have as many Weitek chips
attached as you have memory space, at least logically (eg, I don't imagine Compaq
or much anyone else leaves room on their PCB for more than one).  With the '387 as
an Floating Point Coprocessor, there are probably a fixed number of FPU slots
available.  I don't know how many coprocessors a '386 would permit -- a 68030 allows
a total of 8, though one slot is permanently allocated to the MMU, leaving a real
limit of 7 external coprocessors.  The FPU normally sits in a standard coprocessor
slot; additional FPUs would require custom software to drive them, though that's 
probably the case for almost every architecture around.


-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough