[comp.lang.c] TC2.0 bugfree ?

chris@mimsy.UUCP (Chris Torek) (01/21/89)

In article <16674@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu
(RAMontante) writes:
>I am cross-posting the following to comp.lang.c, because the language
>expertise is there.  I am not convinced that you have any guarantee about
>just when the post-increments happen -- consider p.50 of K&R 1st edition,
>for example.  So it's not a TC bug, just an implementor's choice.

This is correct.  The pANS uses the notion of `sequence points' to
decide when a side effect (such as post-increment) must happen; there
is no sequence point within a simple assignment expression like the
one quoted below.

Incidentally, the quoted assembly shows that TC2.0 (Turbo C, I suppose)
misses one possible optimisation.

>>	i = (*cp++ << 8) + *cp++;

cp is in the SI register at this point; the goal is to compute AX=[SI]<<8:

>>	mov	al,byte ptr [si]
>>	mov	ah,0
>>	mov	cl,8
>>	shl	ax,cl

A much better sequence is

	mov	al,0
	mov	ah,byte ptr [si]

A simple peephole optimiser should be able to catch this.  More complex
analysis might even figure out that

	i = (cp[0] << 8) + cp[1];
	cp += 2;

could be computed as

	mov	ah,byte ptr [si]
	inc	si
	mov	al,byte ptr [si]
	inc	si
	; i is now in ax

while

	i = cp[0] + (cp[1] << 8);

is even more simple:

	mov	ax,word ptr [si]

but I would not expect most compilers to manage either of these.  (If
you rearranged the source to read

	i = *(int *)cp;

I would expect the latter sequence.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

abcscnge@csuna.UUCP (Scott "The Pseudo-Hacker" Neugroschl) (01/23/89)

In article <15560@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>(If you rearranged the source to read
>
>	i = *(int *)cp;
>
>I would expect the latter sequence.)

The problem with this construct is that it is not portable.  Many
machines (read 68000) require integer data to be aligned.
Actually, i have in my "standard library" of home brewed routines:

int intat(p)
unsigned char *p;
{
	int x = 0;
	x = (int)(*p) * 0x100;
	x += *++p;
	return(x);
}

long longat(p)
unsigned char *p;
{
	return(((long)intat(p) * 0x10000L) + (long)intat(p+2));
}

/* Please note these are from memory, I think I handled sign extension woes
   but these are just skeletal for the idea */

Actually, these two routines aren't even completely portable because they
assume a big-endian architecture.  The would have to be rewritten (or ifdefed)
for a little endian architecture.

-- 
Scott "The Pseudo-Hacker" Neugroschl
UUCP:  ...!sm.unisys.com!csun!csuna!abcscnge
-- "Beat me, whip me, make me code in Ada"
-- Disclaimers?  We don't need no stinking disclaimers!!!