[net.lang.c] ANSII C, optimization, and "hardware registers"

padpowell@wateng.UUCP (PAD Powell) (10/12/84)

I have just run into a really fun thing with an optimizer.  The problem
was in the code for a hardware level driver, which wanted to:
1. Stuff a value into a register.
2. Look at the register until a flag (bit) went high.

The code written was
	struct regs{
		int r1;
	} *csr;
	...
	csr->r1 = ST_START;
	while( (csr->r1 & ST_DONE) == 0 );

Well, imagine my surprise when the code generated only did:
	1.  loaded the ST_START value into CPU register (byte value actually)
	2.  placed the CPU register value into memory (word value)
	3.  Did not generate a test instruction, cause the ST_START and
		ST_DONE value were identical.

Now here is the question:
1.  Was this legal code generation?
2.  Note that this compiler did "simple" optimizations as part of the code
	generation. Is this legal?

I know of several ways around this, but I thought that it should be addressed
by the ANSII standard.

Patrick Powell

henry@utzoo.UUCP (Henry Spencer) (10/14/84)

> [Compiler is optimizing out a wait-for-hardware-done loop.]
> ...
> 1.  Was this legal code generation?
> 2.  Note that this compiler did "simple" optimizations as part of the code
> 	generation. Is this legal?

As to whether it's legal by K&R, the only answer is "mumble".  This
thorny issue was never addressed in the old days.  The draft ANSI standard
has a "volatile" declaration that you can use to tell the compiler "don't
get tricky with this variable, it may change underfoot".
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bright@dataio.UUCP (Emperor) (10/15/84)

If you are using an optimizing compiler, and your code deals with
hardware registers and such, it is usually a good idea to turn off
all optimizations. Nearly all optimizing transformations applied
to a program will cause certain kinds of programs that deal with
hardware to fail. Since, however, 99% of code written does not
deal directly with hardware, and a good optimizer can double the
speed of the resulting code, the optimizer is worth keeping around.
				Walter Bright

guy@rlgvax.UUCP (Guy Harris) (10/17/84)

> I have just run into a really fun thing with an optimizer.  ...
> (Discussion of optimization that doesn't work on "volatile" locations
> like device registers)
> 
> Now here is the question:
> 1.  Was this legal code generation?
> 2.  Note that this compiler did "simple" optimizations as part of the code
> 	generation. Is this legal?

I'd say "yes" to both questions.  (BTW, if this was code for a VAX-11, there's
an undocumented "-i" flag to "c2" which turns off these optimizations; the
4.2BSD Makefile uses it for anything declared as "device-driver" in the
"files" or "files.vax" file.)

> I know of several ways around this, but I thought that it should be addressed
> by the ANSII standard.

It is; there's a pseudo-storage-class called "volatile" which says "this
is subject to change without notice, so don't be clever and optimize references
to it."  This is actually useful in for things other than device registers,
given that the UNIX kernel has data within it shared by multiple processes,
and that several versions of UNIX, as well as other OSes, support data shared
between user processes.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/17/84)

> Well, imagine my surprise when the code generated only did:
> 	1.  loaded the ST_START value into CPU register (byte value actually)
> 	2.  placed the CPU register value into memory (word value)
> 	3.  Did not generate a test instruction, cause the ST_START and
> 		ST_DONE value were identical.
> 
> Now here is the question:
> 1.  Was this legal code generation?
> 2.  Note that this compiler did "simple" optimizations as part of the code
> 	generation. Is this legal?

Step 2 of the generated code was sloppy but legal... incomplete optimization.

Optimizations certainly are legal, but your C compiler needs an escape
to avoid the sort of over-optimization you have described.  Frequently
this is done by testing for references that can be seen at compile time
to be possible "I/O page" addresses and skip optimizations for them.
The given example would not have been detected under this test since the
CSR variable was loaded at run-time, not compile-time.

The ANSI C committee was supposed to be considering this issue; the
BLISS-style "volatile" type modifier was mentioned as one possibility.
(This tells the compiler not to optimize any expression containing
the variable so flagged.)  I don't know the outcome of the discussions.

tom@ucbcad.UUCP (10/18/84)

I ran into some similar problems when I was writing a device driver.  I
had to put in some weird kludges to make things work.  MOST of my
problems could be solved by avoiding the optimizer, but not all.  In any
case, I wanted to use the optimizer to tighten the code as much as I
could.  I concluded that there should be a storage class that indicates
hardware side-effects.

I also thought it would be nice to have storage attributes for read-only
registers (I guess this is the "const" storage class in the proposed
ANSI standard -- I don't think much of the mnemonic value here, but I
suppose consistency is more important than mnemonics) and one for
write-only registers, so you would get a compile-time error if you tried
to read a write-only register or vice versa.

I'm sure there are other strange storage classes/attributes that people
would like to see in the standard.  What do people think is a reasonable
set?  I personally think the side-effect class is very important (IS
this in the proposed standard?  I don't remember seeing it, but I may be
senile.), but the others are harder to justify.

			Tom Laidig

thomson@uthub.UUCP (Brian Thomson) (10/19/84)

The 'volatile' keyword will tell an (optimizing) compiler that a
location can change value asynchronously, but sometimes that isn't
enough.
Another common idiosyncrasy of hardware is that it is read-only
or write-only, or can only be accessed using byte or halfword or
fullword accesses.
I believe the Ritchie PDP-11 compiler would compile
		a = b + c;
into
		mov	b,a
		add	c,a
in appropriate circumstances; this clearly will fail if 'a' refers to
a write-only register.
The second situation is exemplified by registers in nexus space
on a Vax (forgive me if you don't know what this means), which
may only be accessed as longwords, and by Unibus registers which
may only be accessed as bytes or halfwords.
A favourite improvement made by /lib/c2 when compiling
		short a;
		...
		if (a&1) ...
is to replace the straightforward
		bitw	$2,_a
		jeql	L1
by
		jlbc	_a,L1
(Jump if Lower Bit Clear, that means) which is fine except that
the first sequence references _a as a halfword while the second
does a longword reference.

( the undocumented "-i" flag to vax /lib/c2 disables
  these optimizations that alter reference size, and doesn't have
  anything to do with memory volatility as suggested by an earlier
  article )
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson

henry@utzoo.UUCP (Henry Spencer) (10/19/84)

A paranoid compiler could, presumably, assume that "volatile" meant not
only that the location might change underfoot, but also that there was
something strange about it and it had better be accessed in the most
straightforward way possible.  My own thought would be that the special
keyword ("volatile" is perhaps not an ideal choice, it's too specific)
should simply mean that the compiler should be as paranoid as possible
on the given architecture.  I suspect that trying to pin down the exact
semantics is both difficult and unwise.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

alan@apollo.uucp (Alan Lehotsky) (10/22/84)

APOLLO's implementation of C has support for both the notion of
memory being shared between asynchronous activities (VOLATILE)
and memory-mapped i/o (DEVICE).  Rather than implement these as
storage classes, we chose to create an "extensible" extension
mechanism with an attributes list.

The semantics of VOLATILE are based on the notion that the
named variable can be modified between any two references, so
that the variable may not be a component of any common subexpression,
nor may it be hoisted out of a loop.

DEVICE implies more stringent conditions on the optimizer and code
generator. (Still somewhat of a DWIM [Do What I Mean] situation, though)
In addition to implying VOLATILE, it indicates that extra references
by the compiled code will be MOST unwelcomed.  As an explicit example,
a DEVICE location will never be the target of a CLR instruction by the
68000 code generator, as this instruction does a READ-MODIFY bus cycle!
(which can really annoy some of the dumber write-only Multibus peripheral
cards!)

The flavor of the syntax for this extension is

       int devreg #attribute[volatile];
      
      short outmask #attribute[device(write)],
            inmask #attribute[device(read)];

(NO FLAMES ABOUT HERETICAL VIOLATIONS OF THE K&R "BIBLE", PLEASE)

We also support an "ADDRESS(expr)" attribute which allows a name to
be associated with a compile-time constant expression which denotes
a memory address.

All of the above functionality also appears in our PASCAL, with
different syntax (similar to VAX-11 PASCAL's attributes).  The
implementation and semantics of this was based on very similar
work I implemented in DIGITAL's common BLISS compilers in the late
70's.  (Just another example of BLISS being a much better system's
programming language than C will EVER be.....)