[comp.unix.ultrix] Problems with Ultrix on DECStation

aarons@syma.sussex.ac.uk (Aaron Sloman) (04/12/90)

My colleagues who have been porting Poplog to the DECStation 3100
running Ultrix have have come across some serious bugs.

The worst is that the sharable (text) segment is over-writeable: Poplog
provides some "fast" procedures for system programmers. If accidentally
misused these can over-write the shared segment on the DECStation. This
should be trapped by the memory management system and an error
generated. On DECStation Ultrix this is not detected and instead the
shared file gets corrupted, so that other users can suddenly find their
programs misbehaving. Protecting the executable file does not prevent
this as the pager presumably ignores protection, for speed.

Poplog does not have this problem on any other operating system as
far as I know.

It would be interesting to know whether anyone else has met the problem
and whether there are work-arounds. Unfortunately the problem is not
easy to reproduce in a small program, so it may be related to the size
of page-tables. (However, Poplog is not all that large - the problems
arise with an image of not much more than one megabute.)

We have reported the problem to Digital, but they have apparently not
come across the problem.

Comments and suggestions welcome.

When the problem is fixed (presumably when Ultrix is fixed!) Poplog will
be available on DEC RISC machines with Ultrix, providing incremental
compilers for Prolog, Common Lisp, Pop-11 and ML in a common environment
with X11 interface.

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons@cogs.sussex.ac.uk
or:
            aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk
            aarons%uk.ac.sussex.cogs%nsfnet-relay.ac.uk@relay.cs.net
    BITNET: aarons%uk.ac.sussex.cogs@uk.ac
    UUCP:     ...mcvax!ukc!cogs!aarons
            or aarons@cogs.uucp

jg@max.crl.dec.com (Jim Gettys) (04/13/90)

Are you sure that this is what is going on?

Remember the machine has seperate I and D caches.

If you are doing run-time loading of code into your address
space that you will be executing, you must flush the
I-cache.  Failure to do so will cause random behavior as
you are reporting. (i.e. executing whatever code happened to
be left around in the Icache in those locations).

I've seen N lisp/scheme implementors trip over this one, most
recently within the last week....  So this sounds dreadfully
familiar...
				- Jim

aarons@syma.sussex.ac.uk (Aaron Sloman) (04/13/90)

Previously I reported that in certain situations Ultrix allows the
text segment to be corrupted on a DECstation

jg@max.crl.dec.com (Jim Gettys) wrote in response:

> Date: 12 Apr 90 17:22:14 GMT
> Reply-To: jg@max.crl.dec.com (Jim Gettys)
> Organization: DEC Cambridge Research Lab

> Are you sure that this is what is going on?
>
> Remember the machine has seperate I and D caches.
>
> If you are doing run-time loading of code into your address
> space that you will be executing, you must flush the
> I-cache.  Failure to do so will cause random behavior as
> you are reporting. (i.e. executing whatever code happened to
> be left around in the Icache in those locations).
>
> I've seen N lisp/scheme implementors trip over this one, most
> recently within the last week....  So this sounds dreadfully
> familiar...
> 				- Jim

We did at first have trouble with the data cache, e.g. because a
user procedure that was being executed could be relocated as a
result of garbage collection that it triggered. However, this
problem was overcome by flushing the cache after each garbage
collection (although it took some effort to discover the correct
system call: the manual was wrong)!

This, however, is not the source of our current problem. We can
explicitly give a command to corrupt the text segment, then prove
that it has been corrupted by using "cmp" to compare it with a saved
copy. Ultrix should prevent the corruption and force an error.

I am surprised that nobody else has met this problem.

Aaron Sloman

jg@max.crl.dec.com (Jim Gettys) (04/13/90)

Presuming your diagnosis is correct, I'm surprised too...

Garbage collection is not the only place you must be careful;
when creating the procedure in the first place, you have to flush the
I cache (or you'll see the old data) for the addresses of the procedure.

If you've checked everything, then it is time to report the bug formally.
				- Jim

jg@max.crl.dec.com (Jim Gettys) (04/14/90)

There is a patch for a problem that sounds like yours you should
be able to get from the Atlanta CSC, I found out this afternoon.

the patch is for vm_mem.o

"This patch fixes a problem encounted by problems which are self-modifying,
or which execute data which they have written.  It is seen if the
executable data is paged out and then paged in for execution."
				- Jim