[net.lang.c] Why does Kernighan pg 106 work?

physics@utcsstat.UUCP (08/09/83)

Here is a fragment of code based directly on p.106,107
of Kernighan & Ritchie:

#define SIZE 30000
#define BLOCKS 15000
main() {
	char *blkptr[LINES]; /* Pointer to blocks of text, see f1 */
	if((n = f1(blkptr)) >= 0){
		 m =f2(blkptr,n); writestrings(blkptr,m); 
	}
}
f1(blkptr)
char *blkptr[];
{
	char s[SIZE]; /* Here is where the text is */
	blkptr[0] = s;
	while((s[i++] = getchar()) != EOF && i < (SIZE-1)))
		;
	s[i++] = '\0'; return(i);
}
f2(blkptr,n)
char *blkptr[];
{	for(i=0, i<n; i++){
		if(something){
			*(blkptr[0} + i) = '\0';
			blkptr[++nblock] = blkptr[0] + (i + 1);
	}	}  return(nblock);
}
This all works fine, except it is limited to an input size
of 30,000 chars, it eats a lot of memory, and the 
application doesnt' require that
the whole thing be accessed at once. Obvious solution: set
SIZE to a round number like 512, add appropriate flags to
f1, and change the if in main() to a while.  Except, when
one does that, on exit from f1 the array s gets garbage in
it.  Changing the declaration of s to static char s[SIZE];
fixes the problem.  But the question is ----  What's the difference
between an i_f and a w_h_i_l_e in main?  For an i_f s does not get
clobbered, but for a w_h_i_l_e it does.  Why does the code in
K & R, p106/107 work?
			David Harrison
			Dept. of Physics
			Univ. of Toronto
			...!linus!utzoo!utcsstat!physics

tom@rlgvax.UUCP (Tom Beres) (08/11/83)

Another small lesson in C for those who would like it.  Others may skip.

The problem mentioned is the result of a subroutine which is supposed to:
(a) return a pointer to something (in this case, a char array); and
(b) allocate the space for that something itself.  The trick, then, is
how the subroutine allocates the space. Take the simplified example:

main() {
	char *p;
	if (something) {
		p = foo();
		printf("%s", p);
	}
}

char *foo() {
	int i;
	char buf[80+1];
	for (i = 0; i < 80; i++) {
		if ((buf[i]=getchar()) == EOF || buf[i] == '\n')
			break;
	}
	buf[i] = 0;
	return(buf);
}

Note what happens because buf[] is declared to be an automatic variable.
foo() returns the address of buf[], which is assigned to p.  But upon
exit from foo(), buf[] is de-allocated (i.e. it is popped off the stack),
so p points to de-allocated space!  As long as the de-allocated space is
not re-allocated and re-used, p will point to the desired data.  As soon
as the de-allocated space is re-used (usually the result of another
subroutine call), the data p points to will be trashed.  So, the printf()
in main() should NOT work, but it might, just out of sheer luck and the
good graces of the O/S.  This is the setup in the question that was posed.
What was seen as the difference between an IF and a WHILE probably was
the difference between 1 and several subroutine calls.  The de-allocated
space survived after one call, but eventually got clobbered after several.

So, in foo() change the declaration to:

	static char buf[80+1];

Now, buf[] will not be de-allocated upon exit from foo(), and the above
example will work fine.  However, if main() is changed to:

	main() {
		char *p[10];
		int i;
		for (i = 0; i < 10; i++)
			p[i] = foo();
		for (i=0; i < 10; i++)
			printf("%s", p[i]);
	}

Because buf[] is a static, it will be re-used every time foo() is called,
so p[0], p[1], ..., p[9] end up ALL pointing to the same space (buf[]),
and what will be there will be the characters from the last call to foo().
Not what we want.

The resolution for this (and the most general solution) is what K & R did.
They had their equivalent to foo() read the line, then call alloc() to
allocate new space for the string, copy the string to the new space,
and return the address of the new space.  Therefore on every call, new space
was allocated and there was no problem of re-use and reallocation of space. 

- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tom