[comp.virus] Arrayboundcheck in C

XPUM01@prime-a.central-services.umist.ac.uk (Dr. A. Wood) (02/13/90)

This is not a virus as such; but program mishaps cause more system
upsets and loss of data than viruses do, and users should eliminate
all other causes of error before definitely suspecting a virus. One
main cause of mishaps is writing to arrays out of bounds. In Fortran
and Algol60 and Algol68 and similar, writing a compiler so it can
compile in array bound check mode is easy; but C can step pointers
along by adding arithmetic values, which complicates the job a lot. I
don't know if there are any C compilers with full arraybound check,
but Prime's C compiler hasn't got one. Bound checking array accesses
is easy; the problem is bound checking pointer accesses.  I hereby
submit a possible method of bound checking pointer accesses in C
programs.

In C, I define a <table> as any group of stored values of all the same
type which are all adjacent in store. There are these types of
tables:-

(1) <Arrays> declared at compile time with [ ] . E.g. 'int x[12],y[4][7];'.
A pointer to an array is created by one of these forms:-
a) An array element preceded by '&', e.g.               &x[i]    &y[5][j+k]
b) An arrayname followed by 'too few' subscripts, e.g.           y[h]
c) An arrayname without subscripts, e.g.                x        y
The arrayname can be some compound form such as a struct field or the like,
e.g. 'struct {int a; char[12]c; } z; ------ z.c' .

(2) <Allocations> alias <mallocks> created by calls of malloc() and
similar functions, which return a pointer to the allocation thus
created.

(3) <Runs>, i.e. consecutive members of a struct which are all of the
same type, e.g. 'x,y,z' in 'struct density {float x,y,z; double value;
} den;'. Pointers to them are created by prefixing a '&', e.g.
'&den.y'.

(4) Other cases where users are tempted to step a pointer over several
values, e.g.  'a,b,c,d' in the declaration 'double a,b,c,d;', are
compiler dependent and I will not consider them further.

My suggestion is for all pointer values to be accompanied by two other
pointer values which contain its safe range limits. (Thus
sizeof(<pointer type>), which == 6 in Prime C ordinarily, will become
18 in Prime C compiled in array bound check mode.) Examples are:-

declaration assumed     pointer value lower limit        upper limit  type
int x[4];               &x[3]         &x[0]              &x[4]        int*
int x[4];               x             &x[0]              &x[4]        int*
int y[6][7];            y[i]          y[0]               y[6]         int**
struct(int w,x,y,z;}a;  &a.x          &a.w               &a.z+1       int*
int k;                  malloc(k)     (returned value) (same + k bytes)
int s; /* not table */  &s            &s                 &s+1         int*

Procedure in the various uses of pointers:-    (Here, b and c are pointers)
sort of use                 example   procedure
accessing value pointed at  *b        check that b is within its limits.
pointer +- integer          b+i       check that b+i is within limits of b;
                                      copy limits of b as limits of b+i .
pointer with ++ or --       b++       (ditto)
pointer-pointer             b-c       error unless b and c have same ranges.
pointer[integer]            b[i]      treat as *(b+i) .
casting a pointer           (float*)c if casting to a pointer to a pointer, or
                                      to a pointer to a struct with a pointer
                                      member, the compiler should moan that
                                      "array bound check can't help here".

A pointer to an allocation which is lost by a call of 'free()', will
then be invalid.  Best not to call 'free()' when running in array
bound check mode.
- ----------------------------------------------------------------------
This should ensure that any pointer will only point to within the
bounds of the table that it was intended to point to.
{A.Appleyard} (email: APPLEYARD@UK.AC.UMIST), Tue, 13 Feb 90 08:38:40 GMT