c2 optimizes calls instructions

david@ukma.UUCP (David Herron, NPR Lover) (08/19/85)

Index: /lib/c2, BRL-Unix release 3, likely appears in other 4.?BSD.

Problem: Calls instructions have a format "calls <const-#-of-args>,<addr>".
	The <const-#-of-args> argument will sometimes be "optimized" into
	a register reference.  This isn't necessary, and breaks the
	massager which helps create the code in the kernel.

Repeat-by: Compile the following program segment:

	b() {}
	a()
	{
		register int i;

		i = 0;
		b();
	}

	For which you get:

LL0:
	.data
	.text
	.align	1
	.globl	_b
_b:
	.word	L12
	jbr 	L14
L15:
	ret
	.set	L12,0x0
L14:
	jbr 	L15
	.data
	.text
	.align	1
	.globl	_a
_a:
	.word	L17
	jbr 	L19
L20:
	clrl	r11
	calls	$0,_b
	ret
	.set	L17,0x800
L19:
	jbr 	L20

After running /lib/c2 -i:

.data
LL0:.data
.text
.align	1
.globl	_b
.set	L12,0x0
.data
.text
_b:.word	L12
ret
.align	1
.globl	_a
.set	L17,0x800
.data
.text
_a:.word	L17
clrl	r11
calls	r11,_b
ret


	Notice that the $0 has been changed to r11.

Fix: I was going to provide a fix, but I took one look at the code for
	c2 and decided I wasn't that brave :-).  I did notice a few
	places in c21.c which looked to be likely candidates for fixing.
	What I suggest is that c2 be taught not to optimize calls instructions.

Work-around:  Add rules to asm.sed or inline (as appropriate) to match
	all the possibilities that calls might be optimized into.
	Since this is non-trivial I would really prefer for c2 to be fixed.


Could someone who still runs 4.2BSD (and 4.3BSD too) check to see if the
bug is that widely spread?  And System V while yer at it..... :-)




-- 
--- David Herron
--- ARPA-> ukma!david@ANL-MCS.ARPA
--- UUCP-> {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!david
---        {ihnp4,decvax,ucbvax}!cbosgd!ukma!david

Hackin's in me blood.  My mother was known as Miss Hacker before she married!

chris@umcp-cs.UUCP (Chris Torek) (08/22/85)

>Index: /lib/c2, BRL-Unix release 3, likely appears in other 4.?BSD.

>Problem: Calls instructions have a format "calls <const-#-of-args>,<addr>".
>	The <const-#-of-args> argument will sometimes be "optimized" into
>	a register reference.  This isn't necessary, and breaks the
>	massager which helps create the code in the kernel.

It saves a little teensy bit of time.  Anyway, here's how to work
around it.  I added a new flag to /lib/c2 (-c) but it should be
obvious how to make it always do what -c does.

First, change /usr/src/lib/c2/c20.c to handle "-c".  (Beware, these
are to what amounts to the 4.3BSD c2; your code may vary slightly.)

RCS file: RCS/c20.c,v
retrieving revision 1.1
diff -c2 -r1.1 c20.c
*** /tmp/,RCSt1015521	Thu Aug 22 01:45:07 1985
--- c20.c	Thu Aug 22 01:44:13 1985
***************
*** 15,18
  caddr_t sbrk();
  int ioflag;
  int fflag;
  long	isn	= 2000000;

--- 15,19 -----
  caddr_t sbrk();
  int ioflag;
+ int cflag;
  int fflag;
  long	isn	= 2000000;
***************
*** 57,60
  			if ((*argv)[1]=='i') ioflag++;
  			else if ((*argv)[1]=='f') fflag++;
  			else nflag++;
  		} else if (infound==0) {

--- 58,62 -----
  			if ((*argv)[1]=='i') ioflag++;
  			else if ((*argv)[1]=='f') fflag++;
+ 			else if ((*argv)[1]=='c') cflag++;
  			else nflag++;
  		} else if (infound==0) {


Second, modify the optimizer code in /usr/src/lib/c2/c21.c to check
cflag before optimizing "calls" constants.  This also includes a
fix from Donn Seeley for a longstanding fencepost error:

RCS file: RCS/c21.c,v
retrieving revision 1.1
diff -c2 -r1.1 c21.c
*** /tmp/,RCSt1015526	Thu Aug 22 01:45:24 1985
--- c21.c	Thu Aug 22 01:44:49 1985
***************
*** 14,17
  #define NUSE 6
  int ioflag;
  int biti[NUSE] = {1,2,4,8,16,32};
  int bitsize[] = {	/* index by type codes */

--- 14,18 -----
  #define NUSE 6
  int ioflag;
+ int cflag;
  int biti[NUSE] = {1,2,4,8,16,32};
  int bitsize[] = {	/* index by type codes */
***************
*** 509,512
  		break;
  
  /* .rx,.rx,.rx */
  	case PROBER:

--- 510,521 -----
  		break;
  
+ /* .rx,.rx */
+ 	case CALLS:
+ 		if (cflag)	/* don't optimize constants in calls */
+ 			break;
+ 		/* fall through */
+ 	case MTPR:
+ 	case CMP:
+ 	case BIT:
  /* .rx,.rx,.rx */
  	case PROBER:
***************
*** 514,522
  	case CASE:
  	case MOVC3:
- /* .rx,.rx */
- 	case MTPR:
- 	case CALLS:
- 	case CMP:
- 	case BIT:
  		splitrand(p);
  		/* fool repladdr into doing right number of operands */

--- 523,526 -----
  	case CASE:
  	case MOVC3:
  		splitrand(p);
  		/* fool repladdr into doing right number of operands */
***************
*** 724,728
  			if (!equstr(regs[RT3],"-(sp)")) p->combop=T(MOVA,BYTE);
  			else {p->combop=T(PUSHA,BYTE); *cp2=0;}
! 			if (uses[r]==0) {uses[r]=p; regs[r][0]=OPX<<4;}
  			p->pop=0;
  		}

--- 728,735 -----
  			if (!equstr(regs[RT3],"-(sp)")) p->combop=T(MOVA,BYTE);
  			else {p->combop=T(PUSHA,BYTE); *cp2=0;}
! 			if (r < NUSE && uses[r] == 0) {
! 				uses[r]=p;
! 				regs[r][0]=OPX<<4;
! 			}
  			p->pop=0;
  		}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

ado@elsie.UUCP (Arthur David Olson) (08/22/85)

> >Index: /lib/c2, BRL-Unix release 3, likely appears in other 4.?BSD.
> >Problem: Calls instructions have a format "calls <const-#-of-args>,<addr>".
> >	The <const-#-of-args> argument will sometimes be "optimized" into
> >	a register reference.  This isn't necessary. . .
> 
> It saves a little teensy bit of time. . .

Hmmm. . .let's check this out with the following program on a VAX 11/750
(4.1bsd, cp_urev = 94, cp_hrev = 72):

	main(argc, argv)
	int	argc;
	char *	argv[];
	{
		register int	i;

		i = atoi(argv[1]);
		do {
			one();
			two();
		} while (--i > 0);
	}

	dummy1() { dummy1(); dummy1(); dummy1(); dummy1(); dummy1(); }

	one()
	{
		register int	i, j;

		i = 1000;
		do {
			j = 0;
			subr();
		} while (--i > 0);
	}

	dummy2() { dummy2(); dummy2(); dummy2(); dummy2(); dummy2(); }

	two()
	{
		register int	i, j;

		i = 1000;
		do {
			subr();
			j = 0;
		} while (--i > 0);
	}

	dummy3() { dummy3(); dummy3(); dummy3(); dummy3(); dummy3(); }

	subr() {}

If I name the above "try.c" and
	cc -c -O try.c ; cc -p -O try.o ; a.out 10000 ; prof
I get this output:
	 %time  cumsecs  #call  ms/call  name
	  60.8   325.10                  _subr
	  20.2   433.28                  _one
	  18.9   534.22                  _two
	   0.1   534.73                  _main
which says that the function "one", with its
	L33:clrl	r10
	calls	r10,_subr
	sobgtr	r11,L33
takes more time than function "two", with its
	L46:calls	$0,_subr
	clrl	r10
	sobgtr	r11,L46
loop.

Comments?
--
Bugs is a Warner Brothers trademark.
UNIX is an AT&T Bell Laboratories trademark.
BRL is a Bethesda Research Laboratories trademark.
--
	UUCP: ..decvax!seismo!elsie!ado    ARPA: elsie!ado@seismo.ARPA
	DEC, VAX and Elsie are Digital Equipment and Borden trademarks

hans@log-hb.UUCP (Hans Albertsson) (08/24/85)

In article <5207@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
> .....
>which says that the function "one", with its
>	L33:clrl	r10
>	calls	r10,_subr
>	sobgtr	r11,L33
>takes more time than function "two", with its
>	L46:calls	$0,_subr
>	clrl	r10
>	sobgtr	r11,L46
>loop.
>
>Comments?

You have changed the alignment, the "_subr" arg to calls is at a non-optimum
address, such as an odd word address or maybe even worse, in one,
assuming it's on an optimum address in two. The VAX architecture allows
any alignment for any part of any instruction, but the bus is fixed in
both width and memory access alignment, even for cache accesses. This 
may sometimes severely penalise seemingly optimum programs.

The difference 20 to 18 is furthermore very small in comparison with some
such effects I have seen. I remember once when removing completely an
inactive ( always false ) IF statement ( some 8 out of 30 lines, I seem to
remember ) DOUBLED both user and system times..... That felt VERY
humiliating, I can assure you. The code became smaller consistent with the
reduction of the source program size, but... We could find NO other
explanation.  ( The language was a very early stage of the TeleSOFT ADA. )
This would also seem to completely invalidate stuff like Dhrystone
benchmarks...

----------
This is the FIRST time ever I feel the need to point out that any
opinions expressed above are my own, and does not represent an official
TeleLOGIC opinion. In fact, some people here are likely to disagree
violently. With anything I say, on any subject...
-- 
Hans Albertsson, USENET/uucp: {decvax,philabs}!mcvax!enea!log-hb!hans
Real World:  TeleLOGIC AB, Box 1001, S-14901 Nynashamn,SWEDEN