landauer@sun.UUCP (11/22/86)
In article <3576@utcsri.UUCP>, greg@utcsri.UUCP (Gregory Smith) writes: > This is silly. Broken computers don't give wrong answers. They crash, > or they log soft errors, or they act flaky. It is almost impossible to > imagine a hardware fault that would have no visible effect other than > to make the 'value' (whatever it may be) of the output wrong. About ten years ago, I was debugging someone's disk driver (I didn't write it) on a PDP11/45 (it was long enough ago that I've forgotten which of the myriad PDP11 operating systems it was written for), and I was reaching the end of my patience. Without an instruction-level debugger (actually, we didn't have any debugger at all), it took quite a while to narrow down the problem. Finally, once I thought to suspect the hardware, I wrote a little FORTRAN program: x=2.0 y=x * x write(6,10) x,y 10 format( e12.6, 4x, e12.6 ) stop end and it printed out something like this: 0.200000e+01 0.183740e+07 As you can see, the problem turned out to be that the floating point unit was broken -- it was simply giving the wrong answers. Now why would an idiot write a disk driver that required floating point? Well, it had to translate the I/O requests from "block number" to a "cylinder, head, sector" form; this required 32-bit arithmetic, and the PDP11 floating point unit had some 32-bit arithmetic instructions, while the PDP11's integer unit did not. There are two morals to this one: 1. Broken computers do, occasionally, simply give wrong answers. 2. When you inherit a piece of code, make sure you understand what parts of the computer/system it requires. -- Doug Landauer Sun's Net: landauer@morocco Phone: 415 691-7655 ARPANET (aka DDN): landauer@sun.com UUCP: {amdahl, decwrl, hplabs, seismo, ...}!sun!landauer