[comp.sys.sun] sun bug 1008635: hardware? software? fixed?

forsyth@minster.york.ac.uk (05/11/89)

The Sun Technical Bulletin (Dec. 1988) includes the following report:

	Reference Number: 1008635
	Release: 3.2, 3.4, 3.5
	Synopsis: possible cache problem on 3/280

	Description:
	        Possible cache problems on 3/200 machines seem to cause
	        abnormal behaviour when the system is required to `page'
	        heavily.  The presence of an `ie1' card also seems to
	        contribute significantly to the occurance [sic] of failure.

Has the trouble been traced?  Was it hardware or software?  Has it been
fixed?  What was the ``abnormal behaviour''?  One application that pages
energetically sees a block of 16 bytes on a 0 mod 16 boundary cleared to
zero.  It is unpredictable by time, and location in the data space.  The
machine has an ie1 (all our 3/280s have an ie1).   16 bytes on a 16 byte
boundary sounds remarkably like a cache line to me.  Could the trouble be
a missing vac_*flush?  The program does not fail on a 3/50.

ks@ee.ecn.purdue.edu (Kirk Smith) (05/11/89)

In article <swordfish.610223134@minster.york.ac.uk> forsyth@minster.york.ac.uk writes:
>	Synopsis: possible cache problem on 3/280
>...
>Has the trouble been traced?  Was it hardware or software?

I filed the initial bug report.  It took a while to isolate it, but Sun
discovered that it was due to code in IP forwarding that would use bcopy
improperly.  The kernel attempted to use the bcopy "hardware", which will
not work if the addresses overlap.  In fact, it will trash the cache if
the source and destination overlap. Sun provided a fix for SunOS3.X that
included a modified ip_something.o and a new movc.o that had a check in
bcopy for this improper calling condition, and would panic rather than
trash the cache.  It is fixed in SunOS4.0.

					Kirk Smith
					Purdue EE