nr@notecnirp.Princeton.EDU (Norman Ramsey) (07/28/89)
I am writing a code generator for a DecStation 3100 (MIPS R2000 architecture) running Ultrix Worksystem V2.0 (rev. 7) System #1: Fri Mar 3 19:46:51 EST 1989 (I am told this is eqwuivalent to Ultrix 3.0.) Instead of generating assembly code, I generate machine code directly into a file. At run time, this code gets loaded into the data space (the heap) and then gets branched to. My programs are failing in nonrepeatable ways. The nonrepeatability has made things damned difficult to debug. I did find one unusual behavior that I want to ask the net about. I have a piece of straight-line code (40 instructions) that consists entirely of adds and stores. The stores are all fullword stores (4 bytes), and they store into 14 consecutive locations. The order of the locations stored is 2 1 5 4 3 8 7 6 11 10 9 14 13 12, so the 2nd word is stored first, then the 1st, then the 5th, etc. These stores are onto the heap, in new memory that has never been stored before (so everything is 0 until stored). On one run not all the stores went to the right places; post-mortem analysis showed that numbers 1 through 5 went in the right location. Numbers 6 and 7 were stored 12 bytes lower than they should have been (i.e. in locations 5 and 4 instead of locations 8 and 7). Number 8 went in the correct location. Numbers 9 and 10 were stored at location unknown (perhaps overwritten by a later store). Number 11 was correctly placed. Number 12 went to location unknown. Number 13 was 24 bytes low. Number 14 was correctly placed. I observed this pattern in memory after some small number of instructions (<100?) executed. I got a segmentation fault when C code fetched a pointer from location 14 (which should have been stored by store number 12). Since the location had never been stored, its contents were zero, and I caught a segmentation fault when my C code tried to dereference the zero. I isolated the offending code fragment and ran it several hundred thousand times in an effort to make it, and it alone, fail. It works fine by itself both in the text segment and in the data segment. I should add that I'm not doing anything fancy with delay slots; all the delay slots are filled with nops (add $0,$0,$0). Question: Can anyone out in net-land envision a failure mode (hardware or software) that would either lead to the results I describe or cause successive runs of the same program to behave very differently. I would be happy to hear from anyone via email (nr@princeton.edu, ...!allegra!princeton!nr) or by phone (609/452-5135). Please note I am sending followups to comp.sys.dec. Norman Ramsey nr@princeton.edu