aarons@syma.sussex.ac.uk (Aaron Sloman) (04/12/90)
My colleagues who have been porting Poplog to the DECStation 3100 running Ultrix have have come across some serious bugs. The worst is that the sharable (text) segment is over-writeable: Poplog provides some "fast" procedures for system programmers. If accidentally misused these can over-write the shared segment on the DECStation. This should be trapped by the memory management system and an error generated. On DECStation Ultrix this is not detected and instead the shared file gets corrupted, so that other users can suddenly find their programs misbehaving. Protecting the executable file does not prevent this as the pager presumably ignores protection, for speed. Poplog does not have this problem on any other operating system as far as I know. It would be interesting to know whether anyone else has met the problem and whether there are work-arounds. Unfortunately the problem is not easy to reproduce in a small program, so it may be related to the size of page-tables. (However, Poplog is not all that large - the problems arise with an image of not much more than one megabute.) We have reported the problem to Digital, but they have apparently not come across the problem. Comments and suggestions welcome. When the problem is fixed (presumably when Ultrix is fixed!) Poplog will be available on DEC RISC machines with Ultrix, providing incremental compilers for Prolog, Common Lisp, Pop-11 and ML in a common environment with X11 interface. Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons@cogs.sussex.ac.uk or: aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk aarons%uk.ac.sussex.cogs%nsfnet-relay.ac.uk@relay.cs.net BITNET: aarons%uk.ac.sussex.cogs@uk.ac UUCP: ...mcvax!ukc!cogs!aarons or aarons@cogs.uucp
jg@max.crl.dec.com (Jim Gettys) (04/13/90)
Are you sure that this is what is going on? Remember the machine has seperate I and D caches. If you are doing run-time loading of code into your address space that you will be executing, you must flush the I-cache. Failure to do so will cause random behavior as you are reporting. (i.e. executing whatever code happened to be left around in the Icache in those locations). I've seen N lisp/scheme implementors trip over this one, most recently within the last week.... So this sounds dreadfully familiar... - Jim
aarons@syma.sussex.ac.uk (Aaron Sloman) (04/13/90)
Previously I reported that in certain situations Ultrix allows the text segment to be corrupted on a DECstation jg@max.crl.dec.com (Jim Gettys) wrote in response: > Date: 12 Apr 90 17:22:14 GMT > Reply-To: jg@max.crl.dec.com (Jim Gettys) > Organization: DEC Cambridge Research Lab > Are you sure that this is what is going on? > > Remember the machine has seperate I and D caches. > > If you are doing run-time loading of code into your address > space that you will be executing, you must flush the > I-cache. Failure to do so will cause random behavior as > you are reporting. (i.e. executing whatever code happened to > be left around in the Icache in those locations). > > I've seen N lisp/scheme implementors trip over this one, most > recently within the last week.... So this sounds dreadfully > familiar... > - Jim We did at first have trouble with the data cache, e.g. because a user procedure that was being executed could be relocated as a result of garbage collection that it triggered. However, this problem was overcome by flushing the cache after each garbage collection (although it took some effort to discover the correct system call: the manual was wrong)! This, however, is not the source of our current problem. We can explicitly give a command to corrupt the text segment, then prove that it has been corrupted by using "cmp" to compare it with a saved copy. Ultrix should prevent the corruption and force an error. I am surprised that nobody else has met this problem. Aaron Sloman
jg@max.crl.dec.com (Jim Gettys) (04/13/90)
Presuming your diagnosis is correct, I'm surprised too... Garbage collection is not the only place you must be careful; when creating the procedure in the first place, you have to flush the I cache (or you'll see the old data) for the addresses of the procedure. If you've checked everything, then it is time to report the bug formally. - Jim
jg@max.crl.dec.com (Jim Gettys) (04/14/90)
There is a patch for a problem that sounds like yours you should be able to get from the Atlanta CSC, I found out this afternoon. the patch is for vm_mem.o "This patch fixes a problem encounted by problems which are self-modifying, or which execute data which they have written. It is seen if the executable data is paged out and then paged in for execution." - Jim