[comp.arch] Broken computers giving wrong answers

landauer@sun.UUCP (11/22/86)

In article <3576@utcsri.UUCP>, greg@utcsri.UUCP (Gregory Smith) writes:
> This is silly. Broken computers don't give wrong answers. They crash,
> or they log soft errors, or they act flaky. It is almost impossible to
> imagine a hardware fault that would have no visible effect other than
> to make the 'value' (whatever it may be) of the output wrong.

About ten years ago, I was debugging someone's disk driver (I didn't write
it) on a PDP11/45 (it was long enough ago that I've forgotten which of the
myriad PDP11 operating systems it was written for), and I was reaching the
end of my patience.  Without an instruction-level debugger (actually, we
didn't have any debugger at all), it took quite a while to narrow down the
problem.  Finally, once I thought to suspect the hardware, I wrote a little
FORTRAN program:

	x=2.0
	y=x * x
	write(6,10) x,y
10	format( e12.6, 4x, e12.6 )
	stop
	end

and it printed out something like this:

0.200000e+01    0.183740e+07

As you can see, the problem turned out to be that the floating point unit
was broken -- it was simply giving the wrong answers.

Now why would an idiot write a disk driver that required floating point?
Well, it had to translate the I/O requests from "block number" to a
"cylinder, head, sector" form; this required 32-bit arithmetic, and the
PDP11 floating point unit had some 32-bit arithmetic instructions, while the
PDP11's integer unit did not.

There are two morals to this one:

1. Broken computers do, occasionally, simply give wrong answers.

2. When you inherit a piece of code, make sure you understand
   what parts of the computer/system it requires.
--
	Doug Landauer		    Sun's Net:		 landauer@morocco
	Phone:   415 691-7655	    ARPANET (aka DDN):   landauer@sun.com
	UUCP:  {amdahl, decwrl, hplabs, seismo, ...}!sun!landauer