[comp.sys.ibm.pc] TC2.0 bugfree ?

rhm@druwy.ATT.COM (Roger Massey) (01/20/89)

No bugs in TC2.0 ?

This small model C program :

sub( cp )
unsigned char 	*cp;
{
	int 	i;

	i = (*cp++ << 8) + *cp++;
}

generates the following code fragment (I left out some of the .asm):
	

_TEXT	segment	byte public 'CODE'
;	?debug	L 1
_sub	proc	near
	push	bp
	mov	bp,sp
	sub	sp,2
	push	si
	mov	si,word ptr [bp+4]
;	?debug	L 6
	mov	al,byte ptr [si]
	mov	ah,0
	mov	cl,8
	shl	ax,cl
	mov	dl,byte ptr [si]
	mov	dh,0
	add	ax,dx
	mov	word ptr [bp-2],ax
	inc	si
	inc	si
@1:
;	?debug	L 7
	pop	si
	mov	sp,bp
	pop	bp
	ret	
_sub	endp
_TEXT	ends


note that si (i.e. cp) is not incremented between references
but instead after both references.

Roger Massey
AT&T  Denver

andrews@calgary.UUCP (Keith Andrews) (01/21/89)

In article <3785@druwy.ATT.COM>, rhm@druwy.ATT.COM (Roger Massey) writes:
> No bugs in TC2.0 ?
> 
> This small model C program :
> 
> sub( cp )
> unsigned char 	*cp;
> {
> 	int 	i;
> 
> 	i = (*cp++ << 8) + *cp++;
> }
> 
    *** Omitted code showing that "cp" gets incremented twice at end of
        generate code instead of after each reference ***
> 
> Roger Massey
> AT&T  Denver

Sorry, this isn't a bug.  The order of evaluation of the ++ operators in
an expression is not guaranteed.  As evidence, this is what lint has to
say about the above code fragment:

	Script started on Fri Jan 20 09:21:51 1989
	1: lint foo.c
	foo.c(6): warning: cp evaluation order undefined
	foo.c(6): warning: i set but not used in function sub
	sub defined( foo.c(3) ), but never used
	2: ^D
	script done on Fri Jan 20 09:22:04 1989

Nevertheless, Turbo C likes to make noises about practically everything else,
I wonder they don't issue warnings about this?

					Keith Andrews
					andrews@cpsc.UCalgary.CA

ralf@b.gp.cs.cmu.edu (Ralf Brown) (01/21/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
}No bugs in TC2.0 ?
}
}This small model C program :
}
}sub( cp )
}unsigned char 	*cp;
}{
}	int 	i;
}	i = (*cp++ << 8) + *cp++;
}}
}
}generates the following code fragment (I left out some of the .asm):
[fragment omitted]
}
}note that si (i.e. cp) is not incremented between references
}but instead after both references.

That is not a bug.  K&R explicitly state that the compiler may do the 
increment any time between the value's use and the next sequence point
(comma, end of statement, etc.).  You cannot rely on order-of-execution
within an expression, since the compiler is free to rearrange things as
it sees fit.  The code fragment 

	i = 4 ;
	printf("%d",i++ + i++ + i++) ;

has three legal results:
	12
	14
	15
depending on how many of the increments are deferred until after the 
additions.
-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?
--

kneller@cgl.ucsf.edu (Don Kneller) (01/21/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
>No bugs in TC2.0 ?
>
>sub( cp )
>unsigned char 	*cp;
>{
>	int 	i;
>
>	i = (*cp++ << 8) + *cp++;
>}
>
>[ ASM code removed which shows cp is not incremented between references
>  but instead after both references - dgk ]

This is perfectly valid behavior.  Not exactly DWIM, but certainly not
disallowed.  In essence, foo++ in an expression means to use the value
of foo and, sometime before proceeding to the next line, increment the
value of foo.  We all know what you mean to say in the above expression,
but you don't have complete control over the order of evaluation!  That
is, you have no control over which of (*cp++ << 8) or *cp++ is first
to be evaluated.  The compiler makers are free to do whatever they please.

The take-home lesson (as such) is for C programmers to never depend on
the order of evaluation in expressions where the operators have equal
precedence.  Currently C has no way of forcing the order.  Only a few
operators can force explicit order (e.g. || && ,).

- don

P.S.  I recently fell for (getchar() << 8) + getchar().
-----
	Don Kneller
UUCP:		...ucbvax!ucsfcgl!kneller
INTERNET:	kneller@cgl.ucsf.edu
BITNET:		kneller@ucsfcgl.BITNET

johnl@ima.ima.isc.com (John R. Levine) (01/21/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
>No bugs in TC2.0 ?
> ...
>	i = (*cp++ << 8) + *cp++;

 [ compiles as though he had written
	i = (*cp << 8) + *cp, cp += 2; ]

No bug here, consult your C manual.  C only promises that a postfix ++ will
be executed before the next sequence point and the only sequence point here
is the end of the statement.  Try this instead:

	i = (cp[0] << 8) + cp[1], cp += 2;
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
{ bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
You're never too old to have a happy childhood.

chris@mimsy.UUCP (Chris Torek) (01/21/89)

In article <16674@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu
(RAMontante) writes:
>I am cross-posting the following to comp.lang.c, because the language
>expertise is there.  I am not convinced that you have any guarantee about
>just when the post-increments happen -- consider p.50 of K&R 1st edition,
>for example.  So it's not a TC bug, just an implementor's choice.

This is correct.  The pANS uses the notion of `sequence points' to
decide when a side effect (such as post-increment) must happen; there
is no sequence point within a simple assignment expression like the
one quoted below.

Incidentally, the quoted assembly shows that TC2.0 (Turbo C, I suppose)
misses one possible optimisation.

>>	i = (*cp++ << 8) + *cp++;

cp is in the SI register at this point; the goal is to compute AX=[SI]<<8:

>>	mov	al,byte ptr [si]
>>	mov	ah,0
>>	mov	cl,8
>>	shl	ax,cl

A much better sequence is

	mov	al,0
	mov	ah,byte ptr [si]

A simple peephole optimiser should be able to catch this.  More complex
analysis might even figure out that

	i = (cp[0] << 8) + cp[1];
	cp += 2;

could be computed as

	mov	ah,byte ptr [si]
	inc	si
	mov	al,byte ptr [si]
	inc	si
	; i is now in ax

while

	i = cp[0] + (cp[1] << 8);

is even more simple:

	mov	ax,word ptr [si]

but I would not expect most compilers to manage either of these.  (If
you rearranged the source to read

	i = *(int *)cp;

I would expect the latter sequence.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

naughton%wind@Sun.COM (Patrick Naughton) (01/21/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
>No bugs in TC2.0 ?
>
>This small model C program :
>sub( cp )
>unsigned char 	*cp;
>{
>	int 	i;
>
>	i = (*cp++ << 8) + *cp++;

This is bogus code anyways...

Remember that '+' is commutative.
you are saying that:
	i = (*cp++ << 8) + *cp++;
is the same as:
	i = *cp++ + (*cp++ << 8);
which it is obviously not.

You should use:
	i = (*cp << 8) + *(cp+1); cp += 2;

...but the code you sent out did look like it showed up a compiler bug,
none the less.

-Patrick


    ______________________________________________________________________
    Patrick J. Naughton				    ARPA: naughton@Sun.COM
    Window Systems Group			    UUCP: ...!sun!naughton
    Sun Microsystems, Inc.			    AT&T: (415) 336 - 1080

rjchen@phoenix.Princeton.EDU (Raymond Juimong Chen) (01/21/89)

<537@cs-spool.calgary.UUCP>, andrews@calgary.UUCP (Keith Andrews):
> Nevertheless, Turbo C likes to make noises about practically everything else,
> I wonder they don't issue warnings about this?

Maybe because they don't know...

In an issue of Turbo Technix, an example in one of the articles
actually RELIED ON the order of evaluation to get the correct answer.
And the author proceeded to explain that two results are reasonable
(left-to-right and right-to-left evaluation) and furthermore adds
that the left-to-right evaluation is CORRECT!

I wrote them a letter about it, but by that time the magazine had
gone extinct.
-- 
Raymond Chen	UUCP: ...allegra!princeton!{phoenix|pucc}!rjchen
		BITNET: rjchen@phoenix.UUCP, rjchen@pucc
		ARPA: rjchen@phoenix.PRINCETON.EDU, rjchen@pucc.PRINCETON.EDU
"Say something, please!  ('Yes' would be best.)" - The Doctor

abcscnge@csuna.UUCP (Scott "The Pseudo-Hacker" Neugroschl) (01/23/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
[ statement in about TC2.0 being bug free, context of a subroutine ]
>	i = (*cp++ << 8) + *cp++;


I can't find it in my copy of K&R, but I think that they (and most
C compilers I have seen) indicate that ++'ing the same variable multiple
times in a single expression is undefined.  Hence, this statement cannot
really be used as an example of a TC bug.

NO FLAMES PLEASE!!!  I would like any ANSI X3J11'ers or TC hacks or anyone
who can find documentation one way or the other to respond in a reasonable
manner.

-- 
Scott "The Pseudo-Hacker" Neugroschl
UUCP:  ...!sm.unisys.com!csun!csuna!abcscnge
-- "Beat me, whip me, make me code in Ada"
-- Disclaimers?  We don't need no stinking disclaimers!!!

abcscnge@csuna.UUCP (Scott "The Pseudo-Hacker" Neugroschl) (01/23/89)

In article <15560@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>(If you rearranged the source to read
>
>	i = *(int *)cp;
>
>I would expect the latter sequence.)

The problem with this construct is that it is not portable.  Many
machines (read 68000) require integer data to be aligned.
Actually, i have in my "standard library" of home brewed routines:

int intat(p)
unsigned char *p;
{
	int x = 0;
	x = (int)(*p) * 0x100;
	x += *++p;
	return(x);
}

long longat(p)
unsigned char *p;
{
	return(((long)intat(p) * 0x10000L) + (long)intat(p+2));
}

/* Please note these are from memory, I think I handled sign extension woes
   but these are just skeletal for the idea */

Actually, these two routines aren't even completely portable because they
assume a big-endian architecture.  The would have to be rewritten (or ifdefed)
for a little endian architecture.

-- 
Scott "The Pseudo-Hacker" Neugroschl
UUCP:  ...!sm.unisys.com!csun!csuna!abcscnge
-- "Beat me, whip me, make me code in Ada"
-- Disclaimers?  We don't need no stinking disclaimers!!!

clutx.clarkson.edu (Jason Coughlin,221 Rey,,) (01/23/89)

From article <537@cs-spool.calgary.UUCP>, by andrews@calgary.UUCP (Keith Andrews):

> Nevertheless, Turbo C likes to make noises about practically everything else,
> I wonder they don't issue warnings about this?
>

   I compiled something like this with Turbo C and 4.2 BSD cc:

  int hello(c)
  char c;
  {
     switch (c) {
        case 'a': ....
            return (b*a);
        case 'b': ....
            return (b*c);
     }
  }

	This is JUST an example, but TC didn't give me ANY warnings, but
   4.2 BSD figured out that my simple return at the end of the function
   was inconsistent.  >I wonder why they don't issue warnings like this?

--
Jason Coughlin
(jk0@clutx, jk0@clutx.clarkson.edu)

ralf@b.gp.cs.cmu.edu (Ralf Brown) (01/23/89)

In article <2041@sun.soe.clarkson.edu> jk0@sun.soe!clutx.clarkson.edu.UUCP writes:
}   I compiled something like this with Turbo C and 4.2 BSD cc:
[sample code omitted]
}	This is JUST an example, but TC didn't give me ANY warnings, but
}   4.2 BSD figured out that my simple return at the end of the function
}   was inconsistent.  >I wonder why they don't issue warnings like this?

Do you have warnings turned on (-w)?  I put the code thru TC2.0, and it
definitely warned that a value should be returned at the end of the function.
Adding a return; at the end of the function generates the different warning
that
   "Both return and return of a value used in function hello"

-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?
--

colburn@sip7.SRC.Honeywell.COM (Mark H. Colburn) (02/08/89)

In article <3785@druwy.ATT.COM> rhm@druwy.UUCP (MasseyR) writes:
>	i = (*cp++ << 8) + *cp++;

>note that si (i.e. cp) is not incremented between references
>but instead after both references.

Note, that this is not necesarily incorrect.  It is undefined to
include two operations with side effects which reference the same
variable.  Among other things it is bad coding style.  Worse, it
provides unexpected results on a number of compilers.

The code would be better written:

	i = (*cp << 8) + *(cp + 1);
	cp += 2;

One of the classic examples of side effects is given below:

	i = 10;
	i++ = 10 * i++;

what is the value of I going to be?  111? 120?  Remember
that C may entirely reorder your expression before evaluating it.

You shouldn't write code like that.

Mark H. Colburn           MN65-2300		colburn@SRC.Honeywell.COM
Systems Administration and Support
Honeywell Systems & Research Center