[comp.lang.c] stupid compilers

martin@prodix.liu.se (Martin Wendel) (08/31/90)

Can anyone explain to me why this piece of code is OK to run:

           #include <stdio.h>
           #include <strings.h>
           main()
           {
             char line[];
             char *tmp = "1234";
             strcpy(line, tmp);
             printf("%s\n", line);
           }

when this produce a segmentation fault:

           #include <stdio.h>
           #include <strings.h>
           main()
           {
             char *line;
             char *tmp = "1234";
             strcpy(line, tmp);
             printf("%s\n", line);
           }

I have a sparcstation 1+ and run SUNOS 4.03 and I have tried the
Sun C compiler and the GNU C compiler with and without the -ansi
flag set, but they all behave the same.

Thanks in advance.

 _____________________________________________________________
<     Martin Wendel         >     martin@solix.udac.uu.se     > 
 >    Postmaster at UDAC   <      Martin.Wendel@UDAC.UU.SE   <
<     Uppsala University -  >     Postmaster@UDAC.UU.SE       >
 >    Data Centre          <      Phone:  018 - 18 77 80     <
<     Sysslomansgatan 21    >       Int: +46 18 18 77 80      >
 >    S-750 02 Uppsala     <---------------------------------<
<     SWEDEN                >           /\/\ \ \/ /           >
 >_________________________<___________/ /\ \_\/\/___________<

martin@prodix.liu.se (Martin Wendel) (08/31/90)

Can anyone explain to me why this piece of code is O to run:

           #include <stdio.h>
           #include <strings.h>
           main()
           {
             char line[];
             char *tmp = "1234";
             strcpy(line, tmp);
             printf("%s\n", line);
           }

when this pboduce a segmentation fault:

           #include <stdio.h>
           #include <strings.h>
           main()
           {
             char *line;
             char *tmp = "1234";
             strcpy(line, tmp);
             printf("%s\n", line);
           }

I have a sparcstation 1+ and run SUNOS 4.03 and I have tried the
Sun C compiler and the GNE C compiler with and without the -ansi
flag set, but they all behave the same.

Thanks in advance.

 _____________________________________________________________
<     Martin Wendel         >     martin@solix.udac.uu.se     > 
 >    Postmaster at UDAC   <      Martin.Wendel@UDAC.UU.SE   <
<     Uppsala University -  >     Postmaster@UDAC.UU.SE       >
 >    Data Centre          <      Phone:  018 - 18 77 80     <
<     Sysslomansgatan 21    >       Int: +46 18 18 77 80      >
 >    S-750 02 Uppsala     <---------------------------------<
<     SWEDEN                >           /\/\ \ \/ /           >
 >_________________________<___________/ /\ \_\/\/___________<

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/01/90)

In article <163@prodix.liu.se> martin@prodix.liu.se (Martin Wendel) writes:
\\\
>           #include <stdio.h>
>           #include <strings.h>
>           main()
>           {
>             char line[];
>             char *tmp = "1234";
>             strcpy(line, tmp);
>             printf("%s\n", line);
>           }
\\\

Strictly speaking neither of these programs is ok. In the first one
you have allocated how many bytes for ``line''? (In is supposed
to be an array). In the second, which uses a char * instead of
an array, how many bytes have you allocated to store them?

None.

If you try:
		char line[5];

things will look a bit better -- remember to allocate 1 extra
byte to store that pesky null at the end of every (normal) C
string. Alternatively,
		char *line;
		line = malloc(5);
should also work.

-Kym Horsell

gld2@clutx.clarkson.edu (E.W.D, ,0,0) (09/01/90)

From article <163@prodix.liu.se>, by martin@prodix.liu.se (Martin Wendel):
> Can anyone explain to me why this piece of code is OK to run:
> main()
> {
>    char line[];
>    char *tmp = "1234";
>    strcpy(line, tmp);
>    printf("%s\n", line);
> }

main ()
{
   char line[];
   char *tmp = "1234\336\255\276\357";
   int   innocent = 030073335276;

   printf("wanted %08X\n", innocent);
   strcpy(line, tmp);
   printf("got    %08X\n", innocent);
}


wanted C0EDBABE
got    DEADBEEF

(in certain endian contexts).

Eliot W. Dudley                       edudley@rodan.acs.syr.edu
RD 1 Box 66
Cato, New York 13033                  315 437 0215

schlake@nmt.edu (William Colburn) (09/01/90)

In article <163@prodix.liu.se> martin@prodix.liu.se (Martin Wendel) writes:
>
>Can anyone explain to me why this piece of code is OK to run:
>
>           #include <stdio.h>
>           #include <strings.h>
>           main()
>           {
>             char line[];
>             char *tmp = "1234";
>             strcpy(line, tmp);
>             printf("%s\n", line);
>           }
>
>when this produce a segmentation fault:
>
>           #include <stdio.h>
>           #include <strings.h>
>           main()
>           {
>             char *line;
>             char *tmp = "1234";
>             strcpy(line, tmp);
>             printf("%s\n", line);
>           }
>

It seems to me that they should BOTH fail.  You are copying a string to a
pointer, and not having the pointer point anyplace.  The fact that the first
program works is pure luck.

#include <stdio.h>
#include <strings.h>

char *malloc();
int strlen();

main()
{
  char *line;
  char *tmp = "1234";
  int strsize;

  strsize=strlen(tmp);
  line=malloc(strsize+1);
  strcpy(line,tmp);
  printf("%s\n",line);
}

							Schlake

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (09/01/90)

In article <163@prodix.liu.se>, martin@prodix.liu.se (Martin Wendel) writes:
>
>Can anyone explain to me why this piece of code is OK to run:
>
>           #include <stdio.h>
>           #include <strings.h>
>           main()
>           {
>             char line[];
>             char *tmp = "1234";
>             strcpy(line, tmp);
>             printf("%s\n", line);
>           }
>
>when this produce a segmentation fault:
>
>           #include <stdio.h>
>           #include <strings.h>
>           main()
>           {
>             char *line;
>             char *tmp = "1234";
>             strcpy(line, tmp);
>             printf("%s\n", line);
>           }
>
Simple.  In  the first case, strcpy receives the address of array
line, and copies the string to it, clobbering whatever  variables
happen  to follow the array in memory. The program has a bug, but
it does not result in  an  exception.  In  the  second  case  the
address  that  strcpy  tries  to  use is the value of the pointer
line. Since it  has  not  been  initialized  you  should  not  be
surprized that it happens to point somewhere illegal.
-------------------+-------------------------------------------
Alastair Dunbar    | Edmonton: a great place, but...
Edmonton, Alberta  | before Gretzky trade: "City of Champions"
CANADA             | after Gretzky trade: "City of Champignons"
-------------------+-------------------------------------------

mccaugh@sunb0.cs.uiuc.edu (09/02/90)

Hmmmmm: "pure luck" that the first program works, comments the previous note.
But the question orignally posed did not address correctness of either program,
but rather why the second version precipitates a segmentation fault where the
first one does not. So what is the key difference in the two programs? It would
appear to be in the declaration of variable 'line' which is a null (length = 0)
string in the first declaration (char []) and a char-ptr in the second. We are
not informed as to whether the assignment (via 'strcpy') caused the problem or
the subsequent 'printf' but I would suspect the latter. (If the former caused
the problem in the second version, why not in the first?) Perhaps "%s" allows
for "normal" execution in the first program -- since 'line' is technically a
string -- but not in the second. I, too, have encountered a similar problem
with C compilers on VAXen.  My point is that certain compilers MAY draw some
serious distinction between char-ptrs and "true" strings (char [*]) even when
the string is null.

   Since un-initialized ptrs so often lead to segmentation faults, here is my
guess as to what happened. The first declaration (char line [];) must have
initialized variable 'line' as a char-ptr to some 0-length area, while the
second declaration (char *line;) left 'line' un-initialized. Hence the value
of 'line' in the first case was legitimate -- even if it addressed 0-length
space -- while the un-initialized "value" of 'line' in the second case could 
not even be considered legitimate.

 Scott McCaughrin

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/03/90)

In article <24700008@sunb0.cs.uiuc.edu> mccaugh@sunb0.cs.uiuc.edu writes:
\\\
>   Since un-initialized ptrs so often lead to segmentation faults, here is my
>guess as to what happened. The first declaration (char line [];) must have
>initialized variable 'line' as a char-ptr to some 0-length area, while the
>second declaration (char *line;) left 'line' un-initialized. Hence the value
>of 'line' in the first case was legitimate -- even if it addressed 0-length
\\\

Your analysis seems substantially correct -- but why guess? Try
running:

main(){
	char a[];
	char *b;
	printf("%d\n",sizeof(a));
	printf("%d\n",sizeof(b));
	}

Output:

0
4

The *complier* sure thinks ``a'' is zero length -- if any
string copy is done there you will essentially be copying
``above the stack pointer'' (if nothing else is declared local).
This *may*, but not necessarily, cause a trap (depends on the hardware).

-Kym Horsell

chris@mimsy.umd.edu (Chris Torek) (09/03/90)

In article <24700008@sunb0.cs.uiuc.edu> mccaugh@sunb0.cs.uiuc.edu writes:
>... the question orignally posed did not address correctness of either
>program, but rather why the second version precipitates a segmentation
>fault where the first one does not. So what is the key difference in
>the two programs? It would appear to be in the declaration of variable
>'line' ...

This much is correct.  Quick recap: the program that caused a `segmentation
fault, core dumped' result was of the form

  prog2>	main() { char *line; strcpy(line, "foo"); }

while the program that appeared to work was of the form

  prog1>	main() { char line[]; strcpy(line, "foo"); }

>which is a null (length = 0) string in the first declaration (char [])

This is a slightly peculiar definition for `null string'.  The program
labelled `prog1' has a constraint violation: the subscript brackets in
the declaration must not be empty.  A buggy compiler allowed the empty
declaration, and---since I happen to know the internal implementation
of this compiler, I know what it did---treated it as `char line[0];',
reserving zero bytes for the array `line'.

>and a char-ptr in the second. We are not informed as to whether the
>assignment (via 'strcpy') caused the problem or the subsequent 'printf'
>but I would suspect the latter.

You would suspect incorrectly.

>(If the former caused the problem in [prog2], why not in [prog1]?)

The actual generated code on a VAX for prog1 is (unoptimized but
slightly simplified):

	_main:
		.word	0		# save no registers
		subl2	$0,sp		# allocate 0 bytes of stack for line[]
		pushab	L1		# push &"foo"[0]
		pushab	(fp)		# push &line[0]
		calls	$2,_strcpy	# call strcpy()
		ret			# return from main, no value
	L1:	.ascii	"foo\0"		# C string {f,o,o,\0}

Compare this with a correct program in which line[] is declared as
`char line[4];':

	_main:
		.word	0		# save no registers
		subl2	$4,sp		# allocate 4 bytes of stack for line[]
		pushab	L1		# push &"foo"[0]
		pushab	-4(fp)		# push &line[0]
		calls	$2,_strcpy	# call strcpy()
		ret			# return from main, no value
	L1:	.ascii	"foo\0"		# C string {f,o,o,\0}

The only difference between these two programs at run time is what goes
on the stack.  Assume that the stack pointer sp in main() is 0x7fffeb80.
(At the entry to a subroutine, the VAX makes sp==fp; fp is later used
to mean `sp before we adjusted it with a subl2 or push instruction'.)
In the first program, the `subl2 $0' does not affect sp at all; then we
have a pushab that pushes, say, 0x1000 on the stack, and in the process
changes sp to 0x7fffeb7c.  Then we have a `pushab (fp)'; this pushes
0x7fffeb80 on the stack, in the process changing it to 0x7fffeb78.  The
`calls $2' then pushes 2, and then a register save mask, and some other
stuff.  strcpy() then copies {foo\0} (four bytes) to locations 0x7fffeb80
through 0x7fffeb83, and returns to main().  At this point the word `foo\0'
has overwritten whatever used to be at 0x7fffeb80.  The only question then
is: what was there, and was it any use?

As it happens, on the VAX, what was there was a 0, and it does not get
used.

In the corrected program, strcpy() copies into four bytes at 0x7fffeb7c..
0x7fffeb7f, which were set aside for that purpose by the `subl2 $4,sp'.

Program 2, on the other hand, compiles to code something like this:

	_main:
		.word	0		# save no registers
		subl2	$4,sp		# make space for `line'
		pushab	L1		# push &"foo"[0]
		pushl	-4(fp)		# push value of `line'
		calls	$2,_strcpy
		ret
	L1:	.ascii	"foo\0"

Again, on the VAX, this might start with sp=fp=0x7fffeb80.  The subl2
would then set sp=0x7fffeb7c.  The `pushl' instruction would then push
the contents of locations 0x7fffeb7c..0x7fffeb7f.  This is, on the VAX,
normally preset to 0 (newly allocated stack pages are cleared so that
programs cannot search memory for passwords, or unencrypted files from
the last invocation of the editor, or whatever).  Thus, this asks strcpy
to copy {foo\0} into location 0.  Location 0 is not writable, and the
program gets a segmentation violation signal and crashes.

>My point is that certain compilers MAY draw some
>serious distinction between char-ptrs and "true" strings (char [*]) even
>when the string is null.

*Every* C compiler *must* draw a serious distinction between a pointer
and an array.  See the Frequently Asked Questions list for some of the
differences.  The two are not and never have been equivalent, and an
array is never a pointer.  An array *object* is *converted to* a
pointer *value* in some places, and in one very special place an array
*declaration* is *rewritten as* a pointer declaration.  But an array
never `is' a pointer.

>   Since un-initialized ptrs so often lead to segmentation faults, here is my
>guess as to what happened. The first declaration (char line [];) must have
>initialized variable 'line' as a char-ptr to some 0-length area, while the
>second declaration (char *line;) left 'line' un-initialized. Hence the value
>of 'line' in the first case was legitimate -- even if it addressed 0-length
>space -- while the un-initialized "value" of 'line' in the second case could 
>not even be considered legitimate.

Not quite.  The empty brackets incorrectly passed through the compiler,
leaving line[] as a zero-length area but *NOT* a `pointer to' a
zero-length area.  The pointer aspect only comes in when the array name
(`line') is used in a value context (argument to strcpy); then the
compiler changes the object to a value by computing the address of
element 0 of the array.

In other words, the analysis above derives one correct conclusion from
false premises.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris