daw@houxs.UUCP (D.WOLVERTON) (12/06/86)
In some systems, the hardware floating point (fp) unit is _optional_. The Itty Bitty Machines (IBM) PC is a good example. From the point of view of a compiler writer, how does one deal with that uncertainty? [<--this one's a rhetorical question] I know of, or can imagine, several flavors of code generation in the face of this situation: 1) Code generation emits calls to a floating point library. This library checks for the presence of fp hardware, and uses the fp hardware it is is present, otherwise it emulates the operation. 2) Like (1), but the test for fp unit is made before the function call. The code is larger, but in the case where the fp unit is present it is faster because not function call was performed. 3) Code generation pretends that the fp unit will always be present, so it emits code which uses the fp unit directly into the instruction stream. If a fp unit is not present, the hardware arranges for a trap to occur which transfers control to the OS. At this point either: a) The OS recognizes that a fp operation was intended, and completes the operation by executing its own emulation code. Control is then transferred back to the user code. b) The OS recognizes that a fp operation was intended, and calls a special fp emulation entry point in the user code. When the function which emulates the fp operation is finished, it transfers control back to the user code. 4) Code generation emits code which always causes transfer to the OS, e.g. by illegal opcodes or TRAP instructions. The OS then proceeds like (3a) or (3b) above except that the fp unit may be used if present. I like (3) the best. In the case where a fp unit is present, the performance is no worse than if it was assumed that the fp unit would _always_ be present. If a fp unit is not present, the user's code will still execute, but more slowly. Furthermore, the user can upgrade his floating point performance by adding the fp unit, without re-compiling his code. (3a) has the slight additional advantage over (3b) that the user programs will be smaller because they do not have to carry the baggage of a fp emulation library. However, (3) also requires that the fp unit architecture is known a priori. It also does not account for a need to support more than one incompatible fp unit. Now the questions: Are there other scenarios in use? Anyone have a different choice for "best"? Why? Which is "best" if more than one fp unit must be supported, or if the architecture of the fp unit is not known a priori? =================================================================== David Wolverton ...!ihnp4!houxs!daw AT&T Information Systems, Holmdel
merlin@hqda-ai.UUCP (David S. Hayes) (12/08/86)
In article <394@houxs.UUCP>, daw@houxs.UUCP (D.WOLVERTON) writes: > In some systems, the hardware floating point (fp) unit is _optional_. > 4) Code generation emits code which always causes transfer to the > OS, e.g. by illegal opcodes or TRAP instructions. The OS then > proceeds like (3a) or (3b) above except that the fp unit may be used > if present. > > Are there other scenarios in use? According to my memory, (which operates on fuzzy logic :-), the Sun-2 had several different optional FPU boards. The compiler would generate code that always trapped to the OS. Then: No FPU: OS calls a subroutine to do the work. FPU: OS replaces the user instruction (a 68010 TRAP) with the equivalent hardware instruction. The user program is then restarted at the new instruction, which now causes the FPU to do the work. I like this scheme. The overhead of going into the OS is only paid once (assuming you actually have a FPU). Once the OS changes the TRAP instruction, further FP work goes directly to the hardware, without software intervention. Of course, this had to be done once to each different FP instruction. In a large program, that could take a while. On the other hand, the most popular instructions should be replace fairly quickly. Anyone (particularly old Sun engineers) care to correct my memory? -- David S. Hayes, The Merlin of Avalon PhoneNet: (202) 694-6900 ARPA: merlin%hqda-ai@brl UUCP: ...!seismo!sundc!hqda-ai!merlin
greg@utcsri.UUCP (Gregory Smith) (12/08/86)
In article <394@houxs.UUCP> daw@houxs.UUCP (D.WOLVERTON) writes: >In some systems, the hardware floating point (fp) unit is _optional_. >The Itty Bitty Machines (IBM) PC is a good example. From the >point of view of a compiler writer, how does one deal with >that uncertainty? As a compiler writer, you should provide an option to directly use the fp unit. Run-time subroutines will be used otherwise. If you are writing distribution software, which must run whether an fp unit exists or not, and which uses the fpu if it does exist, then you have this problem. > >I know of, or can imagine, several flavors of code generation >in the face of this situation: > >1) Code generation emits calls to a floating point library. >This library checks for the presence of fp hardware, and uses >the fp hardware it is is present, otherwise it emulates the operation. > >2) Like (1), but the test for fp unit is made before the function >call. > >3) Code generation pretends that the fp unit will always be present, >so it emits code which uses the fp unit directly into the instruction >stream. If a fp unit is not present, the hardware arranges for a trap >to occur which transfers control to the OS [ and the fp op is emulated..] >4) Code generation emits code which always causes transfer to the >OS, e.g. by illegal opcodes or TRAP instructions. The OS then >proceeds like (3a) or (3b) above except that the fp unit may be used >if present. 5) The program contains a jump table to fp routines. At program start-up, the presence or absence of the fp unit is determined, and the jump table is modified to point either to routines which use the fp hardware, or to routines which do the work in software. The generated code then makes calls indirectly via this jump table. So no testing is done at run-time once the table is set up. Even better, but a a little weirder: The code directly calls routines which use the fp hardware. If there is no fp unit, the start-up code puts a jump instruction at the start of each routine, which jumps to the equivalent subroutine. This makes the code a little faster when an fp is present. When it isn't, the extra jump won't matter much anyway. -- ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg Have vAX, will hack...
johnl@ima.UUCP (John R. Levine) (12/09/86)
In article <394@houxs.UUCP> daw@houxs.UUCP (D.WOLVERTON) writes: >In some systems, the hardware floating point (fp) unit is _optional_. > ... >I know of, or can imagine, several flavors of code generation >in the face of this situation: > >1) Code generation emits calls to a floating point library. >This library checks for the presence of fp hardware, and uses >the fp hardware it is is present, otherwise it emulates the operation. This is the most common in PC languages I've seen. >2) Like (1), but the test for fp unit is made before the function >call. The code is larger, but in the case where the fp unit is >present it is faster because not function call was performed. Never seen it. PC compilers usually are more concerned with small code size than fast execution. >3) Code generation pretends that the fp unit will always be present, >so it emits code which uses the fp unit directly into the instruction >stream. If a fp unit is not present, the hardware arranges for a trap >to occur which transfers control to the OS. ... This is what the PDP-11 versions of Unix always did. Originally, the FP emulator was linked into all of the executables which caught their own illegal instruction faults and then did the emulation. More recent versions have moved the FP emulation into the OS so that you can just emit code that assumes that the floating point is present. Unfortunately, this trick does not work on PCs and other 8088 machines because if you have no 8087, your floating point instructions go into outer space and hang or return random results. One clever trick used in the PC/IX version of Unix is this: Every floating point instruction has to be preceded by a one-byte "wait" instruction to make sure that the FP unit has finished the preceding instruction. It also turns out that the first byte of all FP instructions is DC, DD, or DE hex. When the assembler emits an FP instruction, rather than emitting a wait instruction, it emits the first byte of an INT instruction which causes a software trap. The trap number is determined by the next byte in the instruction stream, which is the DC, DD, or DE. When the OS gets such an interrupt, it checks to see if the system has an 8087. If so, it patches the INT instruction to a WAIT and returns to it, so that the hardware executes the instruction. Otherwise it emulates the operation and returns. This means that there is a trap for each instruction the first time it is encountered in the program, but if there is an 8087, the program runs at full speed after that. The 80286 and its successors make this hack unnecessary, since they have a bit you can set to force traps on execution of unimplemented instructions. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.
guy@sun.uucp (Guy Harris) (12/09/86)
> Anyone (particularly old Sun engineers) care to correct my memory?
Yes. The Sun-2 did, in fact, have an FPU option, namely a Sky floating
point board. However, unless we did so in release 1.x, Sun NEVER generated
code to trap to the OS. In 2.0, I think the compiler generated subroutine
calls for floating-point operations. By default, the subroutines either
jumped to software floating-point routines or to Sky floating-point
routines, depending on whether the program detected that there was a Sky
board on the machine when it started (the C startup code did this check).
An option could tell the compiler to generate direct calls to the Sky
routines.
In 3.0 and later releases, the same scheme was used to support the 3
possibilities for Sun-3 floating point support: no hardware, MC68881, and
FPA. There were now options to tell the compiler to generate code to call
the "switched" floating-point routines (that used the information on what
hardware was present to choose which routines to jump to), to call the
software routines directly, to call the Sky routines, to use the 68881
instructions, or to use the FPA "instructions" ("move"s, etc. to the FPA
registers).
Since it took *several* 68010 instructions to make the Sky board perform a
floating-point operation (unless you're talking about the 68881 or, in some
cases, the FPA, there aren't any single-instruction floating-point
operations on Suns; there certainly weren't any on the Sun-2), you couldn't
just replace a TRAP with the operation in question. And, since many Sun-2s
didn't have a Sky board, the overhead of doing floating point in the OS
would have been prohibitive in most cases, so the OS certainly wouldn't have
done floating-point computations if there wasn't a Sky board.
--
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
guy@sun.com (or guy@sun.arpa)
stuart@bms-at.UUCP (Stuart D. Gathman) (12/10/86)
In article <394@houxs.UUCP>, daw@houxs.UUCP (D.WOLVERTON) writes: > In some systems, the hardware floating point (fp) unit is _optional_. > The Itty Bitty Machines (IBM) PC is a good example. From the > 3) Code generation pretends that the fp unit will always be present, > so it emits code which uses the fp unit directly into the instruction > stream. If a fp unit is not present, the hardware arranges for a trap > to occur which transfers control to the OS. At this point either: > 4) Code generation emits code which always causes transfer to the > OS, e.g. by illegal opcodes or TRAP instructions. The OS then With the *86/*87 chips, there is a very elegant solution combining 3 & 4. Both emulation and real code are easily defined to be identical except by a constant difference in the first byte. Then the OS can select for the presence or absence of the chip at program load time by using a relocation table to modify the initial instruction bytes if the *87 is absent. This way you get optimal emulator performance and optimal hardware performance. -- Stuart D. Gathman <..!seismo!dgis!bms-at!stuart>
spain@alliant.UUCP (12/10/86)
In article <394@houxs.UUCP> daw@houxs.UUCP (D.WOLVERTON) writes: > >In some systems, the hardware floating point (fp) unit is _optional_. >... > Are there other scenarios in use? I am familiar with one more mechanism, call it 3.5 which goes something like: 3.5) Code generation pretends that the fp unit will always be present, so it emits code which uses the fp unit directly in the instruction stream. If a fp unit is not present, the "hardware" in the form of the machines' microcode, emulates the instruction using the machine's integer hardware. No OS trapping is involved and there is no change of control from the user's code.
jc@piaget.UUCP (John Cornelius) (12/11/86)
David Wolverton gives 3 of the most common methods for doing floating point in an environment where the availability of floating point hardware is unknown. I suggest that the most common method, however, is to assume that floating point hardware does not exist. In Unix this is accomplished by having the cc command map to cc -f which uses library routines that do not check for the presence of floating point hardware. If floating point hardware is subsequently installed, the C compiler invocation routine (/bin/cc usually) is changed to cause pass 2 to emit actual floating point code of the desired type. -- John Cornelius (...!sdcsvax!piaget!jc)