[comp.unix.wizards] Vax 86x0 FBOX decision summary

dpl@cisunx.UUCP (Dave P. Lithgow) (05/21/87)
This is a summary of the pertinent information concerning
the article I had posted awhile ago concerning a decision to
recommend/not recommend the acquisition of an FP86 (FBOX) for
a Vax 8650 which had been purchased without one.  This vax is
running DEC Ultrix-32 V1.2.

Thanks to several who greatly helped direct my investigation:
Jonathan Clark, Bill Mitchell, Guy Harris, and Rex Black.

[Guy Harris helps with:]

Script started on Thu Mar 26 23:42:58 1987
gorodish$ egrep "float|double" /arch/4.3/usr/src/bin/ld.c
				if (t >= sizeof (double))
					rnd = sizeof (double);
gorodish$ 

script done on Thu Mar 26 23:43:06 1987

Well, I don't know what DEC did to "/bin/ld" in ULTRIX, but it must
have been pretty impressive to get it to use floating point....

It *does* use *integer* modulo to compute some hash indices when
doing relocation relative to external symbols.  The following claim
was made in "Comments on 'The Case for the Reduced Instruction Set
Computer', Douglas W. Clark and William D. Strecker, ACM Computer
Architecture News, Vol. 8, No. 6, 15 October 1980, on page 38:

	The explanation of this anomaly is that the 780's Floating
	Point Accelerator speeds up the multiply in the
	multi-instruction implementation, but doesn't see INDEX at
	all.

So it is conceivable that the F-BOX on the 8650 may speed up the modulo
computation as well.  I'd still want to see this 50% speedup
first-hand before I believed it, though.

[end of Guy's response]
[Bill Mitchell:]
I've heard from good authorities that the 8600 FPA does integer multiplications,
so that's a source of potential speedup.  They apparently did this after
finding that a lot of programs do a lot of integer multiplies, but it's
also good for sales(!): "We don't do much floating point, so we don't need
an FPA."  "Well if you do integer multiplies, our FPA will help you out!"

There was an issue of the Digital Technical Journal (?) that was devoted to
the 8600 and I think there was an article on the FBOX in there.  Your
friendly DEC salesman can probably get you a copy.
[end of Bill Mitchell's reply]
--------------------------------------------------------------------------
[the research I promised:]
I did an egrep of Ultrix-32 (V1.2) ld.c which I forgot to mention, and got:

				if (t >= sizeof (double))
					rnd = sizeof (double);

(which is exactly the same as Guy Harris got with his egrep on BSD4.3 ld.c)

Further research:

[ quoted from the "Vax Hardware Handbook", Vol. 1-1986: pp.7-17 ]
-------------------------------------------------------------------------
Floating Point Accelerator Functions

The floating-point accelerator (Fbox) is used to decrease the time to per-
form floating point instructions and some integer instructions. [clue #1]
The Fbox logic contains the following functions:
o Adder and Multiplier Logic
o 16 General Purpose Registers
[I think the last sentence in this paragraph is the answer to the puzzle -dpl]
Operands received from the adder are multiplied by the multiplier logic
and the product is returned to the adder for rounding and normalization.
During floating-point operations, the multiplier operates on the fractions
and the adder operates on the exponent. During integer operations the
entire two's complement number is multiplied.

The adder logic performs addition, subtraction, and division and handles
the exponents for all the basic operations. The adder contains the logic to
unpack the floating-point numbers, to align fractions, and to round and
normalize the results. The operands are received from the OP bus [operand
bus -dpl] and returned to the W bus [write bus -dpl]. The Fbox receives GPR
[general processor register -dpl] updates and Ebox data, and accesses some
Fbox registers through the W bus.
---------------------------------------------------------------------------
[end of text quoted from Vax Hardware Handbook]

It would also seem that performing a straightforward simulation of the Fbox's
effect would require register-to-register clock tick timing info and bus
speed info ... not expecting to receive that really soon :-)... 

*** Further research:
% cc -S ld.c
% egrep "(mull2|mull3|mulw2|mulw3)" ld.s >> mulcount.ints
% wc -w mulcount
27
% grep mull2 mulcount.ints | wc -l
25
% grep mull3 mulcount.ints | wc -l
2
---> so there *are* places inside ld that an FBOX would help...
---> perhaps DEC wasn't so wrong after all...

Some history: VAX Hardware Handbook 1982-83
	All the floating point boards for the 730, 750 & 780 perform
all floating point instructions, EMOD, POLY, and also perform conversion
between floating point and integer data types, as well as:
FP730 - executes "32-bit integer multiplication and division instructions."
FP750 - executes "The FP750 also executes the integer multiplication
  instruction."
FPA aka FP780 - "enhances the performance of the 32-bit integer multiplication
  instructions."
The Vax Architecture Handbook Vol.1 1986, pp.4-18 documents 43 instructions
  handled by the FP750, including MULW{23}, and MULL{23}; The specific
  instructions handled by the 86xx FBOX are not documented, but I'd guess
  that at least one of the two pairs of integer opcodes handled by the
  FP750 is also handled by the 86xx FBOX (aka FP86).

*** further, probably definitive research:

	At long last, DEC has come through with a page entitled "FBOX
instruction set", listing the instructions handled by the FBOX: and here they
are:

ADDF*, ADDD*, ADDG*, SUBF*, SUBD*, SUBG*, DIVF*, DIVD*, DIVG*, MULF*, MULD*,
MULG*, MULH*, POLYF, POLYD, POLYG, POLYH, MULL*, EMUL, EMODF, EMODD, EMODG,
EMODH, INDEX.
(where * denotes both 2-operand or 3-operand format)

	Notice that this set does intersect with the instructions used by
ld.s (for MULL*, since ld.s uses 25 MULL2 and 2 MULL3 instructions).  ld.s
does not use any of the other instructions assisted or executed by the
FBOX.  The documentation shows that some opcodes are "executed" by the FBOX
and for others the FBOX merely "assists".  Those which are "assisted" by
the FBOX are MULH*, POLYH, MULL*, EMUL, EMODF, EMODD, EMODG, EMODH, & INDEX.
The remainder are "executed" by the FBOX.

	For FBOX fans (i.e number crunchers), I include here all (2 paragraphs)
of the (unintended) info that DEC sent along with the FBOX instruction set
table:

Note again that the FP86 (i.e. FBOX) is *identical* in both the 8600 and 8650.
-------
The VAX 8600 Fbox is composed of two modules.
	1. Fbox Adder, FBA, a L0212 located in slot AC8
	2. Fbox multiplier, FBM, a L0213 located in slot AC7

If the FBOX is not installed, the following modules are installed.
	1. Fbox Terminator Module, FTM, a L0223 in slot AC8
	2. Fbox Jumper Module, FJM, a L0218 in slot AC7

If an Fbox is installed into a system that previously did not contain an
Fbox, no changes to the power system are required. The FTM and FJM modules
are described in appendix D.

The FBM contains jumpers that the console can read with the visibility paths.
These jumpers inform the console of the Fbox revision. If either Fbox module
is changed, the FBM must be changed to update the Fbox revision number.

Appendix D lists the functions of the FTM and FJM modules.
------- end of DEC-supplied FBOX info -------

						-David

David P. Lithgow		Sr. Systems Analy./Pgmr., Univ. of Pittsburgh
USENET:  {allegra,bellcore,ihpn4!cadre,decvax!idis,psuvax1}!pitt!cisunx!dpl
CCnet(DECnet): CISVM{123}::DPL,CISVXO::DPL (I admit it: I'm a unix and VMS) 
Bitnet(Jnet):  psuvax1!dpl@pittvms.bitnet    ( systems programmer too)
ARPA: (via UUCP) pitt!cisunx!dpl@cadre.arpa
ARPA: (via Bitnet) DPL%PITTVMS.BITNET@WISCVM.WISC.EDU
CSNET: dpl%pitt@csnet-relay