[net.arch] oops, corrupted memory again!

dan@prairie.UUCP (04/28/86)

-------------

>Now, if only somebody would invent an architecture where all objects,
>including dynamicly allocated objects, are isolated in memory, then any
>subscript error would cause an immediate memory fault.

   If I'm not mistaken, this was done on the iAPX432, using a capability-
based addressing scheme.  Dimmed the lights.  You could probably construct
such an environment on the 80286, but no one does, probably for efficiency
reasons.

   You're probably better off with a language that compiles checks into
the code, and an option to turn off those checks once you're confident
(?!) of the program.  With a capability-based architecture, you pay the
price all the time, whether you want to or not.


-- 
	Dan Frank
	    ... uwvax!geowhiz!netzer!prairie!dan
	    -or- dan@caseus.wisc.edu

kwh@bentley.UUCP (KW Heuer) (04/30/86)

In article <117@prairie.UUCP> prairie!dan (Dan Frank) writes:
[comments on overflow-checking architecture]
>   You're probably better off with a language that compiles checks into
>the code, and an option to turn [them] off...

As I mentioned, you can do it this way in C++, but when you want to use
pointers you have to copy three words instead of one.  (Or you can use
a language like pascal, which "solves" the problem by disallowing pointer
arithmetic.)  What I was thinking of, though, was a computer with strict
architecture that could be used for development and testing; when the
program is shipped to the Real World it would presumably run on "normal"
architecture.

Karl W. Z. Heuer (ihnp4!bentley!kwh), The Walking Lint

rbutterworth@watmath.UUCP (Ray Butterworth) (05/01/86)

>    You're probably better off with a language that compiles checks into
> the code, and an option to turn off those checks once you're confident
> (?!) of the program.  With a capability-based architecture, you pay the
> price all the time, whether you want to or not.

Many years ago I worked with a language in which all arrays had to
have dimensions that were a power of two (like 4.2 malloc).  The
code which indexed into the array simply anded the index with the
appropriate bit mask.  This was very fast, yet it guaranteed that
any bad indexes wouldn't corrupt anything except the array being
addressed.  As a side-effect, one could use this feature to cycle
continuously through an array or could even use negative indexes
without any extra overhead.

jer@peora.UUCP (J. Eric Roskos) (05/02/86)

> >Now, if only somebody would invent an architecture where all objects,
> >including dynamicly allocated objects, are isolated in memory, then any
> >subscript error would cause an immediate memory fault.
>
>    If I'm not mistaken, this was done on the iAPX432, using a capability-
> based addressing scheme.  Dimmed the lights.  You could probably construct
> such an environment on the 80286, but no one does, probably for efficiency
> reasons.

One problem with the 432's approach was that it was very extreme; I don't
think it's good to say "the 432 tried these approaches and it was too slow,
therefore the checking can't be efficiently implemented."

I posted some comments in here (net.arch) about a week ago on apparently
the same subject, but nobody replied in net.arch to it (although I got a
couple of replies by mail).  Of the people who replied by mail, one (whose
technical knowledge I have a high opinion of) pointed out that C compilers
exist where subscript/pointer checking is done in software, and that thus
it would seem likely that similar checking could be done in hardware.

The way you could do it (which was a point the 2 people replying seemed to
agree upon) was that, associated with all pointers, you should have a
"minimum address" and "maximum address" for the object being pointed to.
Bear in mind that in C array names are just constant pointers, so
constructs like a[i] can use this method as well as plain pointer
references such as *p.  If p is a pointer of type t, then to use p you
will have to first assign it a value by referencing an existing object, or
by creating a new one:

	typedef <whatever> t;
	t  a[100];
	t  *p;

	p = a;                  (1)
	p = &a[40];             (2)
	p = (t *)malloc(300);   (3)

In case 1 and 2, you can easily set p.base to &a[0], and p.bound to
&a[99], and set p.pointer to &a for (1) and to &a[40].  So p then carries
around with it the range of valid addresses it can point to.  (Note that
nothing says anything about what a pointer in C has to look like, so
p can easily be a 3-word struct-like object, and if you were building a
new machine to support such things, you could make the machine have
3-word address registers).

In case 3, you could have malloc set the base and bound -- though if
malloc is written in C then you'd have to provide some way to reference
the base and bound fields from within the language -- so things like
malloc would also work.  I had originally thought that some
counterexamples existed, but one of the respondants (John Gilmore) pointed
out that really the counterexamples involved essentially semantically
inconsistent uses of the pointers (e.g., having 2 pointers around and
changing the bounds on 1 of them).

In any case, if you change p, e.g. p++, then you'd change what I called
p.pointer above, and leave p.base and p.bound alone.  If you generated an
effective address which was outside [p.base .. p.bound], then you'd
generate an addressing fault.

I don't think this checking would be that slow, although on a machine with
a narrow bus (especially those like the 8088 where you are already
fetching pointers through multiple bus cycles) fetching the range
attributes of the pointer would increase the bus time by a factor of 3.
It would also reduce the number of register variables you could have, if
you kept the bounds in registers also -- I think it would work best if you
had a machine that had registers set aside specifically for pointer
checking.  On a machine such as the 3280*, which does quadword reads from
memory because the data bus is very wide, the bus overhead would be much
less.  So the checking by this method would probably not be that bad
(certainly not as bad as the 432, which I believe had to sometimes fetch
several descriptor objects in order to validate references) at least on
larger machines (and after all, microprocessors are getting larger all the
time in terms of width of the bus, etc.).

-----
*I cite this machine because I'm more familiar with it; I suspect probably
 other machines like Vaxes have similarly wide buses.
-- 
E. Roskos

ludemann%ubc-cs@ubc-cs.UUCP (05/03/86)

In article <780@bentley.UUCP> kwh@bentley.UUCP (KW Heuer) writes:
>[comments on overflow-checking architecture]
>         ...  What I was thinking of, though, was a computer with strict
>architecture that could be used for development and testing; when the
>program is shipped to the Real World it would presumably run on "normal"
>architecture.

A mere 7 years ago, I worked on a "real world" machine which _always_ 
had subscript range checking (and pointer validity checking) turned 
on.  It had to be very reliable (digital telephone switching).  
Definitely _not_ an AT&T product --- rather, Northern Telecom (BNR).  
I don't recall a single problem with bad pointers trashing memory.  

The machine's performance was pretty impressive (remember, this was 
1979 with a 16-bit architecture): average instruction time: 1usec 
(and most of the instructions were _not_ very RISC-y.  The only 
registers were for data and stack bases; everything else was done on 
the stack.  I don't recall any cache).  The machine had a fairly wide 
micro-code, so I think that there was very little overhead in the 
array index range checking.  

We did _all_ our programming in a dialect of Pascal.  If anyone had 
suggested turning off range checking on the production machine, s/he 
would have been laughed out of the building.  We derived great peace 
of mind from the builtin checking.

paul%unisoft@unisoft.UUCP (05/06/86)

<oog>

	I too spent many happy years working on a system that ALWAYS verified
pointers and array bounds ... Burroughs large systems (B6700 et al), they
have such a crude memory management (to be fair they were designed in the late
60s early 70s, and have a stack architecture) that the system integrity depends
on the fact the NOONE can make machine code with out using a special trusted
program called a "compiler" (ie only a "compiler" can set a file's type to
code file). It takes the super-user to mark a code file as being a "compiler"
code file. Needless to say no-body really misses not being able to turn off
range checking, except for the Fortran types who get programs from elsewhere ...

		Paul Campbell
		..!ucbvax!unisoft!paul