[comp.sources.d] Obscure Not-Quite-Bug in Compress

pf@csc.ti.com (Paul Fuqua) (06/16/88)

     I recently translated compress to Common Lisp to run it on my Lisp
machine (I don't deal with versionless filesystems any more than I have
to).  Along the way I discovered a bit of code that is, strictly
speaking, a bug, but which C doesn't catch or seem to care about.
     In the function getcode, there is a local variable bp that is a
pointer into the array buf, which is 16 8-bit-bytes long (for 16-bit
compress).  There is some hairy code that is used when not on a Vax
(which does most of the work in one instruction).  When decompressing,
and the codes are 16 bits, and getcode is grabbing the last code in buf,
*bp starts off pointing at buf[14], is bumped to buf[15] by a *bp++,
then is bumped to buf[16] by another *bp++.
     At this point is the following fragment:

	/* high order bits. */
	code |= (*bp & rmask[bits]) << r_off;

*bp reads the word just off the end of buf, but rmask[bits] is always 0
in this case, so the word is unimportant and everything works.
     This bit caused me trouble because Common Lisp bounds-checks array
references (and lispms tend to crash when referencing unallocated
memory).  My correction was to check for rmask[bits] == 0 before doing
the above, so bp wouldn't reference off the end.  C, of course, doesn't
bounds-check, especially not when using pointers, and this bit of code
has been happily running on countless machines.
     Is this a bug or a feature?  I have my own opinion.

                              pf

Paul Fuqua
Texas Instruments Computer Science Center, Dallas, Texas
CSNet:  pf@csc.ti.com (ARPA too, sometimes)
UUCP:   {smu, texsun, im4u, rice}!ti-csl!pf

chris@mimsy.UUCP (Chris Torek) (06/16/88)

In article <51610@ti-csl.CSNET> pf@csc.ti.com (Paul Fuqua) writes:
>	/* high order bits. */
>	code |= (*bp & rmask[bits]) << r_off;
>
>*bp reads the word just off the end of buf, but rmask[bits] is always 0
>in this case, so the word is unimportant and everything works.
>     This bit caused me trouble because Common Lisp bounds-checks array
>references .... My correction was to check for rmask[bits] == 0 before doing
>the above, so bp wouldn't reference off the end.  C, of course, doesn't
>bounds-check, especially not when using pointers, and this bit of code
>has been happily running on countless machines.
>     Is this a bug or a feature?  I have my own opinion.

It is a bug, and it is not a `feature of C'; any legal C compiler could
produce a runtime error message to the effect of `reference outside of
address space'.

I would handle this by making the buffer one larger, so that there
is always something to read, even if it is not used.  For instance,
in my `PK' font reading code:

			if (rowsize < wb) {	/* get more row space */
				if (row)
					free(row);
				/* keep a slop byte */
				row = malloc((unsigned) (wb + 1));
				if (row == NULL)
					return (-1);	/* ??? */
				rowsize = wb;
			}

			...
					/*
					 * Finally, begin a new byte, or
					 * add to the current byte, with
					 * j more bits.	 We know j <= 8-b.
					 * (If j==0, we may be writing on
					 * our slop byte, which is why we
					 * keep one around....)
					 */

-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris