[comp.lang.c] defeating the optimiser

chris@mimsy.UUCP (Chris Torek) (04/30/89)

>In article <17133@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>If your compiler does not understand `volatile', and has no way to
>>disable optimisation, you are out of luck.  (You can resort to assembly
>>language subroutines.)

In article <10136@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>Back, back!  (Making the sign of the cross.)  No need to resort to
>assembly language for something so simple.
>
>What is the real problem here?  It's that the compiler knows that
>we only need to inspect one byte in order to determine the state of
>the bit.  So how do we outwit the compiler? ... [various suggestions
deleted]

As someone else has already pointed out, this approach leads to the
dreaded Compiler Upgrade Problem.  The next release of the compiler
may require you to change all of your defeat mechanisms.  As it happens,
though, you can usually get away with only a few small assembly
routines---often you need only one for each special instruction.

For instance, some Unibus devices respond differently to a `bisw2'
(r/m/w) instruction than they would to a `movw'(read) ... `movw'(write)
sequence.  But you need not write an entire driver in assembly.
If the compiler will not cooperate, at worst you can write

	bisw(&reg, bits);

and have the routine

	_bisw:	.globl	_bisw
		.word	0
		bisw2	8(ap),*4(ap)
		ret

somewhere callable.  Often you can insert this sort of thing directly
into the compiler's assembly output (most serious compilers are capable
of producing assemblable code, even if their default is to produce
object code directly) to avoid subroutine call overhead.  Sun provide
a program called `inline' that uses this approach, and (I presume)
also tries to avoid unnecessary pushes and pops, changing something
like

		pea	a4@(12)
		jsr	_readlong
		movl	#10,d1
		btst	d1,d0	| btst cannot test bit 10 directly

plus

	_readlong:
		movl	sp@(4),a0
		movl	a0@,d0
		rts

into

		lea	a4@(12),a0
		movl	a0@,d0
		movl	#10,d1
		btst	d1,d0

or even (if smart enough) merging the lea+movl into one movl.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

bill@twwells.uucp (T. William Wells) (04/30/89)

In article <17195@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
:                                                           Sun provide
: a program called `inline' that uses this approach, and (I presume)
: also tries to avoid unnecessary pushes and pops, changing something
: like
:
:               pea     a4@(12)
:               jsr     _readlong
:               movl    #10,d1
:               btst    d1,d0   | btst cannot test bit 10 directly
:
: plus
:
:       _readlong:
:               movl    sp@(4),a0
:               movl    a0@,d0
:               rts
:
: into
:
:               lea     a4@(12),a0
:               movl    a0@,d0
:               movl    #10,d1
:               btst    d1,d0
:
: or even (if smart enough) merging the lea+movl into one movl.

If this is what I think it is, I was reading about this (it's in the
floating point manual, an obvious place, right?) some time ago. What
is described there are .il files, which you use by naming them on
your cc command.

The .il files contain assembly code which the compiler inserts in
line for you. The manual gives some instructions on how to write the
functions in such a way that the optimizer will remove the function
call overhead.

It's a neat trick if you need inline assembly. And avoids nonportables
like the asm keyword. (I think I got this right. It's been a while.)

---
Bill                            { uunet | novavax } !twwells!bill

rsalz@bbn.com (Rich Salz) (05/01/89)

>The .il files contain assembly code ...
...
>It's a neat trick if you need inline assembly. And avoids nonportables
>like the asm keyword. ...


Hunh?
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.