[comp.sys.amiga.tech] Lattice 4.1 register yuck!

dca@toylnd.UUCP (David C. Albrecht) (06/18/88)

Oooh, I'm annoyed.  For those of you that use Lattice 4.1 I have found
what I consider a big BUG.  Not the 'code doesn't work variety' but rather
the 'produces lousy code' type.  I have developed over the last year or
so the instincts of localizing register variables and auto variables with
their area of usage to give maximum efficiency in register allocation and
use of stack space.  Come to find out this can produce worse code than
not using register declarations at all.  Gack! Ack! Phooey!

Example:
    If I was to loop through an array assigning the elements to a
    value I might have:
 
    { register short i;
      for (i = 0; i < MAX_SIZE; i++)
        {
          some_array[i] = -1;
        }
    }

    One expects the register variable i to be dead after the last brace
    and thus available for re-use, right?  Wrongo!  Lattice doesn't
    think so.

Or how bout this:

     { short i[100];
       i[0] = 12;
     }

     You would expect the space for i on the stack to be freed after the last
     brace for re-use.  Nope.

Want proof?
Take a look at the following code section:

main(argc,argv)
int argc;
char **argv;
{
  { register long r1, r2, r3, r4, r5;
    long a[1];

    r1 = 0;
    a[r1] = 1;
    r2 = 2;
    r3 = 3;
    r4 = 4;
    r5 = r4 + r3;
    a[r1] = r1 + r2;
  }
  { register long r1, r2, r3, r4;
    long a[1];

    r1 = 0;
    a[r1] = r1 + r1;
    r2 = 2;
    r3 = 3;
    r4 = a[r1];
  }
    
}

Now lets look at the omd:
Some notes:  Lattice seems to reserve D0-D3 for its use and allocates
starting with D0.  It allocates D4-D7 to register variables and starts
with D7.

LATTICE OBJECT MODULE DISASSEMBLER V2.00

Amiga Object File Loader V1.00
68000 Instruction Set

EXTERNAL DEFINITIONS

_main 0000-00

SECTION 00 "testcase.o" 00000054 BYTES
main(argc,argv)
int argc;
char **argv;
{
0000 4E55FFE8                   LINK      A5,FFE8
0004 48E73F00                   MOVEM.L   D2-D7,-(A7)
  { register long r1, r2, r3, r4, r5;
    long a[1];

    r1 = 0;
0008 7E00                       MOVEQ     #00,D7      First register alloc
    a[r1] = 1;
000A 2007                       MOVE.L    D7,D0       First lattice reg
000C 2200                       MOVE.L    D0,D1       Second lattice reg
000E E541                       ASL.W     #2,D1
0010 7401                       MOVEQ     #01,D2      Third lattice reg
0012 2B8210E8                   MOVE.L    D2,E8(A5,D1.W)  Note a[] is at E8.
    r2 = 2;
0016 7C02                       MOVEQ     #02,D6      Second register alloc
    r3 = 3;
0018 7A03                       MOVEQ     #03,D5      Third register alloc
    r4 = 4;
001A 7804                       MOVEQ     #04,D4      Last register alloc
    r5 = r4 + r3;
001C 2404                       MOVE.L    D4,D2       Reuse lattice reg
001E 2404                       MOVE.L    D4,D2       ?
0020 D485                       ADD.L     D5,D2
    a[r1] = r1 + r2;
0022 2607                       MOVE.L    D7,D3       Last lattice reg
0024 2600                       MOVE.L    D0,D3
0026 D686                       ADD.L     D6,D3
0028 2B8310E8                   MOVE.L    D3,E8(A5,D1.W)
  }
  { register long r1, r2, r3, r4;
    long a[1];

    r1 = 0;                                           Note that it put r1 in
002C 7200                       MOVEQ     #00,D1      a lattice reg not a
   						      user reg.
    a[r1] = r1 + r1;
002E 2601                       MOVE.L    D1,D3
0030 E543                       ASL.W     #2,D3
0032 2001                       MOVE.L    D1,D0
0034 D081                       ADD.L     D1,D0
0036 2B8030EC                   MOVE.L    D0,EC(A5,D3.W)  Note a[] is at EC.

    r2 = 2;                                           Out of lattice regs
003A 7002                       MOVEQ     #02,D0      Saves r2 on the stack
003C 2B40FFF8                   MOVE.L    D0,FFF8(A5) gag!

    r3 = 3;                                           Ditto!
0040 7003                       MOVEQ     #03,D0
0042 2B40FFF4                   MOVE.L    D0,FFF4(A5)

    r4 = a[r1];                                       Ditto again.
0046 2B7530ECFFF0               MOVE.L    EC(A5,D3.W),FFF0(A5)
  }
    
}
004C 4CDF00FC                   MOVEM.L   (A7)+,D2-D7
0050 4E5D                       UNLK      A5
0052 4E75                       RTS

SECTION 01 "__MERGED" 00000000 BYTES

Moral of the story is check your register variables they may not be
producing the code you expect.

We are not amused.  Time to go find the LBBS number.  Growl, snarl, snap.

David Albrecht

dillon@CORY.BERKELEY.EDU (Matt Dillon) (06/22/88)

:    One expects the register variable i to be dead after the last brace
:    and thus available for re-use, right?  Wrongo!  Lattice doesn't
:    think so.

	Damn right it should.  I do the same sort of thing... use localized
register variables.

	One thing I have yet to see addressed properly by either Aztec or
Lattice is the following:

	{
	    register short i, j, k;

	    for (i = 0; i < 10; ++i)
		<blah>
	    for (j = 0; j < 10; ++j) 
		<blah>
	    for (k = 0; k < 10; ++k)
		<blah>
	}

	I is not used while J is being used, neither I or J are being
	used while K is being used, etc....

	ONLY ONE REGISTER SHOULD BE USED FOR ALL THREE REGISTER VARIABLES!!

	Often, I have all sorts of temporary variables of differing types
	(usually differing pointer types), and even though I use them
	sequentially (where the same register could have been used), the
	compiler always assigns different registers to them.  Allowing
	multiple register variables to 'share' registers under the above
	circumstances greatly increases register utilization.

:Or how bout this:
:
:     { short i[100];
:       i[0] = 12;
:     }
:
:     You would expect the space for i on the stack to be freed after the last
:     brace for re-use.  Nope.

	Actually, no.  Usually, stack space is overlayed:

	{
	    {
		char x[256];
		<blah>
	    }
	    {   
		char y[256];
		<blah>
	    }
	}

	I.e. all stack is allocated at entry.  In this case, 256 bytes should
	be allocated because x and y do not mix.  Think about the efficiency
	this gives you.  You have a tight loop:

	for (i = 0; i < 1000; ++i) {
	    short x = i << 2;
	    <blah>
	}

	You do NOT want stack space to be allocated and deallocated for
	every loop!!!!  That's right, the variable x DIES on every loop
	by semantics.

				-Matt

glewis@cit-vax.Caltech.Edu (Glenn M. Lewis) (06/23/88)

In article <8806212123.AA01328@cory.Berkeley.EDU> Matt Dillon writes:
>...
>	One thing I have yet to see addressed properly by either Aztec or
>Lattice is the following:
>
>	{
>	    register short i, j, k;
>
>	    for (i = 0; i < 10; ++i)
>		<blah>
>	    for (j = 0; j < 10; ++j) 
>		<blah>
>	    for (k = 0; k < 10; ++k)
>		<blah>
>	}
>
>	I is not used while J is being used, neither I or J are being
>	used while K is being used, etc....
>
>	ONLY ONE REGISTER SHOULD BE USED FOR ALL THREE REGISTER VARIABLES!!

	Are you suggesting that the compiler ought to figure out that the
value of 'i' will not be used again in this function, and the same for 'j'
and 'k'?  It seems that you are.  That would be interesting.

	Often, when manipulating strings, I have a register variable such
as 'i' the runs through the string in a for loop, and then I use the value
after it, and then continue in another for loop.  But I don't believe that
I have ever written a routine that declared more than one register variable
where that variable couldn't be re-used, if the value was no longer needed.
In other words, if I had a situation like the one above, I would just say
	"register int i"
and just let 'i' handle all those loops.  I don't see any need to allocate
two more variable to do the work that the first one could have done.

	I believe it would be dangerous to let the compiler re-use a variable
in the manner that you describe, especially when using a debugger.  If you
check the address and/or value of 'i' or 'j', they would be the same, and
you would think that a bug has been found.

	But yes, I agree that if the compiler were smart enough to look
through the entire routine before allowing re-use of a register variable,
that it should work properly.  I would just like to point out that it is
easy enough for the programmer to detect these situations, and use the
register variable over "manually".

							-- Glenn

-- 
glewis@cit-vax.caltech.edu

scott@applix.UUCP (Scott Evernden) (06/23/88)

In article <8806212123.AA01328@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>	One thing I have yet to see addressed properly by either Aztec or
>Lattice is the following:
>
>	    register short i, j, k;
>
>	    for (i = 0; i < 10; ++i)
>		<blah>
>	    for (j = 0; j < 10; ++j) 
>		<blah>
>	    for (k = 0; k < 10; ++k)
>		<blah>
>...
>	ONLY ONE REGISTER SHOULD BE USED FOR ALL THREE REGISTER VARIABLES!!

As much as I might agree with you, I have yet to encounter
a compiler that will do this.  Even the best optimizing compilers
fair no better than Manx and Lattice in this area.  Does anyone
know different??

BTW (and to fill some space here so this will pass rn), I noted
while studying PDC some years ago, that it actually would ignore
'register' declarations altogether.  The dang thing actually
selected registers based on a computed usage of variables
in generated blocks.  Incredible.

-scott

louie@trantor.umd.edu (Louis A. Mamakos) (06/23/88)

In article <728@applix.UUCP> scott@applix.UUCP (Scott Evernden) writes:
>
>As much as I might agree with you, I have yet to encounter
>a compiler that will do this.  Even the best optimizing compilers
>fair no better than Manx and Lattice in this area.  Does anyone
>know different??
>

I've used two compilers which do what you want.  Given this test program:

main(argc, argv)
	int argc; char **argv;
{
	register int i, j, k;

	for(i = 0; i < 20; i++) {
		foo();
	}
	for(j = 0; j < 30; j++) {
		foo();
	}
	for(k = 0; k < 660; k++) {
		foo();
	}
}


The Greenhills 68000 C compiler produces this code:

	SECTION	9
	XDEF	main
main:
	MOVE.L	D2,-(SP)
	MOVE.L	8(SP),D0
	MOVE.L	12(SP),D0
	MOVEQ	#0,D2
.L10:
	JSR	foo
	ADDQ.L	#1,D2
	MOVEQ	#20,D0
	CMP.L	D2,D0
	BGT	.L10
	MOVEQ	#0,D2
.L7:
	JSR	foo
	ADDQ.L	#1,D2
	MOVEQ	#30,D0
	CMP.L	D2,D0
	BGT	.L7
	MOVEQ	#0,D2
.L4:
	JSR	foo
	ADDQ.L	#1,D2
	CMPI.L	#660,D2
	BLT	.L4
	MOVE.L	(SP)+,D2
	RTS	
	SECTION	14
* allocations for main
*	D2	i
*	D2	j
*	D2	k
*	8(SP)	argc
*	12(SP)	argv
	SECTION	9
	SECTION	14
	XREF	foo
* allocations for module
	SECTION	9
	END

Note that it does, in fact, use D2 for all three register variables.

Looking at the output of the GNU C compiler (unfortunately, I only have the
VAX target around at the moment..) we see much the same thing:

#NO_APP
.text
	.align 1
.globl _main
_main:
	.word 0x40
	clrl r6
L4:
	calls $0,_foo
	incl r6
	cmpl r6,$20
	jlss L4
	clrl r6
L8:
	calls $0,_foo
	incl r6
	cmpl r6,$30
	jlss L8
	clrl r6
L12:
	calls $0,_foo
	incl r6
	cmpl r6,$660
	jlss L12
	ret


This uses r6 for all three register variables. 


I'd love to have a GNU C compiler hosted on my Amiga, just need a few more
meg of memory.
Louis A. Mamakos  WA3YMH    Internet: louie@TRANTOR.UMD.EDU
University of Maryland, Computer Science Center - Systems Programming

tom@garth.UUCP (Tom Granvold) (06/23/88)

-
     The C compiler for the Intergraph Clipper is able to know the lifetime
of a variable and can reuse the same register.  They call this 'register
allocation by coloring'.  This is done in addition to many other optimizations,
and the variables do not need to be declared register in order for the complier
to do this.  Of course this is not of much help to use Amiga owners.

Tom Granvold

dillon@CORY.BERKELEY.EDU (Matt Dillon) (06/24/88)

>I thought (to an extent) that the idea behind C was that a sufficiently
>simple, machine-oriented language would make optimizers unnecessary,
>since you could specify HOW you wanted things done at the simplest level.

	No.  It *allows* you to get down to the bare bones when you want,
but a programmer would do it only for very critical sections of code, as
it usually makes the code unreadable.

>For example, isn't it reasonable that a compiler should produce better
>code for B than A?
>
>A:  *p = ~mask[column & 15] & *p	B: register int x;
>       |  mask[column & 15] & value;	   x = mask[column & 7];
>					  *p = (~x & *p) | (x & value);
>

	See what I mean?

>Then again, why program in a high-level language at all? :-)

	Portability.  For example, you might wonder why the very first
language available for the MC88000 is C (by Greenhills, in fact)?  Because
with that, one can port just about any UNIX OS to it in less than a month,
the libraries in even less time.  Once you've got that, suddenly thousands
of programs are available without having to be ported at all... simply
recompile and <poof>.

	Also, debuggability (is that a word?)

	In fact, one is less likely to make an error coding in a high level
language than coding in assembly, assuming he knows the language of course.

>The point is, compilers shouldn't put their time into making ridiculous code
>resonable; they should spend their time making reasonable code tight.

	It depends what your definition of reasonable code is, doesn't it.
Frankly, I would rather have something that's readable.

						-Matt

dca@toylnd.UUCP (David C. Albrecht) (06/25/88)

> 	One thing I have yet to see addressed properly by either Aztec or
> Lattice is the following:
> 
> 	{
> 	    register short i, j, k;
> 
> 	    for (i = 0; i < 10; ++i)
> 		<blah>
> 	    for (j = 0; j < 10; ++j) 
> 		<blah>
> 	    for (k = 0; k < 10; ++k)
> 		<blah>
> 	}
> 
> 	I is not used while J is being used, neither I or J are being
> 	used while K is being used, etc....
> 
> 	ONLY ONE REGISTER SHOULD BE USED FOR ALL THREE REGISTER VARIABLES!!
> 

Well, this is a bit more complicated as it requires live/dead analysis.
In brace entry/exit they have to clear the variables from the symbol table,
in a proper implementation freeing register variable allocations and local
variable stack space ought to be relatively easy.  Virtually every C compiler
I know of gets this right.  Even pcc (gasp).  To some degree you can use
this 'basic' feature to get good register allocation in the absense of a
good register allocator.  This is what really miffs me about Lattice's
screw up.


> :Or how bout this:
> :
> :     { short i[100];
> :       i[0] = 12;
> :     }
> :
> :     You would expect the space for i on the stack to be freed after the last
> :     brace for re-use.  Nope.
> 
> 	Actually, no.  Usually, stack space is overlayed:
> 
Apologies for obscure terminology.
The compiler should be maintaining the value for the stack necessary to store
local variables for the routine.  On exit from a set of braces any local
variables should be 'freed' and thus that stack space be available for re-use.
Note that when I say 'freed' I am referring to a compile time concept here
not any sort of actual run-time adjustment of the stack pointer.
I would expect that the high water mark of the stack space required in the
routine would be allocated at runtime on entry and deallocated on exit but local
variables would reuse portions of the allocated section if they are not
simultaneously active or as Matt put it they should be 'overlayed'.
The point remains, however, that Lattice 4.1 doesn't 'overlay' variables local
to braces within the body of a routine but rather allocates space enough for
every variable in the routine.  This isn't as big a faux paux as the register
allocation but it is generally wasteful.

postnews
f
o
o
d

David Albrecht

dillon@CORY.BERKELEY.EDU.UUCP (06/30/88)

>where that variable couldn't be re-used, if the value was no longer needed.
>In other words, if I had a situation like the one above, I would just say
>	"register int i"
>and just let 'i' handle all those loops.  I don't see any need to allocate
>two more variable to do the work that the first one could have done.

	Please, don't remark on my lack of a good example.  I *DID* say
that this comes up (all that time in my case) not when you have the 
same type, but differing types... usually differing pointer types.

FOREXAMPLE, I might want to put a passed pointer variable in a register
so I can initialize some other structure, but beyond that never use
the passed pointer variable again.

poof(ss)
register SOMESTRUCTURE *ss;
{
    register BLAH *blah;
    ss->x = 43;
    ss->t = 23;
    ss->querty = "hello";
    blah->ss = ss;
    blah
    blah
    blah ...

    <lots of code that never uses ss again>
}

In my case, this occurs often enough that I use up all the available
registers and then I'm up shit creek.  Using sub code blocks only 
partially fixes the situation.

>	I believe it would be dangerous to let the compiler re-use a variable
>in the manner that you describe, especially when using a debugger.  If you
>check the address and/or value of 'i' or 'j', they would be the same, and
>you would think that a bug has been found.

	I won't argue with you too much, since you are obviously unaware 
that this is a standard compiler design practice.

>that it should work properly.  I would just like to point out that it is
>easy enough for the programmer to detect these situations, and use the
>register variable over "manually".

	I'm glad you agree with me, though I disagree with your last
remark.

>
>							-- Glenn
>

					-Matt