[comp.lang.c] volatiles

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (04/12/88)

  I've sat back and digested some of this debate over volatile.  I've come
to the conclusion that its not a good idea to add it to the language.

  Lets first discuss the basic premise behind it's proposed addition.
For volatiles, the rational seems to be they are needed to correctly
handle variables that may be modified by multiple threads of execution.  I
think this statement covers the problems associated with  direct multi-tasking,
signal handles, and memory mapped i/o.

  The above rational it not totally correct however.  It misses a key point
that I belief is critical to the whole argument for the addition of volatiles
which is "heavy" optimization of the data flow variety must also be taking
place in order to justify the addition of volatile.  It is with this last
clause that I find many problem.

	1)  Volatile is being used to make up for a deficiency in the
	    data flow algorithm (their inability to handle multiple
	    threads).  I have a couple of complaints along this line.
	    One being, it is not clear to me that volatile will
	    be sufficient in handling the deficiencies of data
	    flow optimizations.  Is there "prior art" to suggest
	    that it will?  Do there exist better technique to
	    handle data flow analysis (or similar optimizations)
	    within a multi-thread environment?

	2)  When variables are not correctly declared as volatiles,
	    a program will exhibit different behavior between
	    the optimized and unoptimized versions.  I have two
	    complaints about this.  One being this sort of 
	    behavior is contradictory to the over all philosophy
	    behind optimization.  A optimization on a language is
	    the set of  transformations that do not change the
	    behavior of the programs but are beneficial by
	    some metric.  Clearly, the first clause has been
	    violated.  Conclusion, it's inappropriate to
	    try to perform standard data flow analysis techinques
	    in a multi-threaded environment.  My second complain
	    stems from a more pragmatic stand point.  Mainly,
	    how does one go about debugging a program that
	    works in the unoptimized version, pukes in the
	    optimized version.   All the source language debuggers
	    I know of only work only on unoptimized code.  If you
	    try the printf technique, you may find your programming
	    changing behavior simple because of the presents of
	    the printf statement (loop invariant may not migrate out
	    of the loop if accessed by the print statement).
	    I can only picture the horrors of trying to debug in
	    that sort of environment.
-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

len@mips.COM (Len Lattanzi) (04/13/88)

In article <1394@pt.cs.cmu.edu> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:

:
:	1)  Volatile is being used to make up for a deficiency in the
:	    data flow algorithm (their inability to handle multiple
:	    threads).  I have a couple of complaints along this line.
:	    One being, it is not clear to me that volatile will
:	    be sufficient in handling the deficiencies of data
:	    flow optimizations.  Is there "prior art" to suggest
:	    that it will?  Do there exist better technique to
:	    handle data flow analysis (or similar optimizations)
:	    within a multi-thread environment?
volatile is implemented in Mips C (K&R + prototypes/volatile/void)
It has handled all cases put to it involving signal handlers, shared memory
and device i/o registers while applying optimization. Cases include
BSD4.3, SysVr3, processes using shmem for exclusive access.
:
:	2)  When variables are not correctly declared as volatiles,
:	    a program will exhibit different behavior between
:	    the optimized and unoptimized versions.  I have two
:	    complaints about this.  One being this sort of 
:	    behavior is contradictory to the over all philosophy
:	    behind optimization.  A optimization on a language is
:	    the set of  transformations that do not change the
:	    behavior of the programs but are beneficial by
:	    some metric.  Clearly, the first clause has been
:	    violated.  
Now hold on, first you say that the variables were *not correctly* declared
and then you demand the same behavior reqardless of optimization?
If anything optimization will stress the source code for *correct* 
declarations, if you depend on uninitialized data (auto), or
writing/reading beyond the end of a record in a manner that you will
most likely observe different behavior with and without optimization.
And if the unoptimized code behaves "correctly" you were LUCKY!
:		Conclusion, it's inappropriate to
:	    try to perform standard data flow analysis techinques
:	    in a multi-threaded environment.  My second complain
:	    stems from a more pragmatic stand point.  Mainly,
:	    how does one go about debugging a program that
:	    works in the unoptimized version, pukes in the
:	    optimized version.   All the source language debuggers
:	    I know of only work only on unoptimized code.
The mips compiler system supports 
     -g0     Have the compiler produce no symbol table informa-
             tion for symbolic debugging.  This is the default.

     -g1     Have the compiler produce additional symbol table
             information for accurate but limited symbolic debug-
             ging of partially optimized code.

     -g or -g2
             Have the compiler produce additional symbol table
             information for full symbolic debugging and not do
             optimizations that limit full symbolic debugging.

     -g3     Have the compiler produce additional symbol table
             information for full symbolic debugging for fully
             optimized code.  This option makes the debugger
             inaccurate.
:		  If you
:	    try the printf technique, you may find your programming
:	    changing behavior simple because of the presents of
:	    the printf statement (loop invariant may not migrate out
:	    of the loop if accessed by the print statement).
:	    I can only picture the horrors of trying to debug in
:	    that sort of environment.
With mips dbx and our extended symbol table format it is feasible to debug
optimized code without source changes. This dbx can insert breakpoints
that print out values, adjust/test dbx variables (like shell variables).
Also the symbol table keeps track of which statement an instruction
came from.
:-- 
:
:Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu
I hope this doesn't sound like marketing hype, just trying to give you
some idea of 'prior art'

Actually in true RISC spirit I'd rather the proposed 'C' standard weren't so
complex.
-- 
 Len Lattanzi	(len@mips.com)	<{ames,prls,pyramid,decwrl}!mips!len>
 My employers will disavow any knowledge of my opinions.
"If Ronald Reagan isn't lying, why do they keep cutting off parts of his nose?"
-Rep. Jimmy Hayes, D-La

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (04/13/88)

> volatile is implemented in Mips C (K&R + prototypes/volatile/void)
> It has handled all cases put to it involving signal handlers, shared memory
> and device i/o registers while applying optimization. Cases include
> BSD4.3, SysVr3, processes using shmem for exclusive access.

    Is it's sufficiency base totally on empirical evidence??  I know
a little about data flow analysis and volatile constuct seems like a
reasonable approach but you always have to wounder whether all
cases where taken into account.


 > :
 > :	2)  When variables are not correctly declared as volatiles,
 > :	    a program will exhibit different behavior between
 > :	    the optimized and unoptimized versions.  I have two
 > :	    complaints about this.  One being this sort of 
 > :	    behavior is contradictory to the over all philosophy
 > :	    behind optimization.  A optimization on a language is
 > :	    the set of  transformations that do not change the
 > :	    behavior of the programs but are beneficial by
 > :	    some metric.  Clearly, the first clause has been
 > :	    violated.  
 > Now hold on, first you say that the variables were *not correctly* declared
 > and then you demand the same behavior reqardless of optimization?

    YES!  If not the correct behavior then the ability to automatically
detect fault.

 > If anything optimization will stress the source code for *correct* 
 > declarations, if you depend on uninitialized data (auto), or
 > writing/reading beyond the end of a record in a manner that you will
 > most likely observe different behavior with and without optimization.
 > And if the unoptimized code behaves "correctly" you were LUCKY!
 
	Ah, but here's where your analogy is incorrect.  In my opinion
the code that didn't declare the variable to be a volatile  is correctly
written. It is the optimizer who is at fault for making an assumption
about the code that was incorrect!  The assumption being strict data flow
analysis techniques are applicatable - they are not!  

  How about another perspective. Why is the assumption made that any variable
not declared as volatile has no external references (alias that can't be
detected).  Why not leave it up to the programmer to mark which
variables have no aliases (this is starting to sound like an argument
for noalias in place of volatile isn't it :-).  With this approach,
the programmer as to go out of his way to screw up at least.
-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

bright@Data-IO.COM (Walter Bright) (04/14/88)

In article <1394@pt.cs.cmu.edu> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
>	2)  When variables are not correctly declared as volatiles,
>	    a program will exhibit different behavior between
>	    the optimized and unoptimized versions.  I have two
>	    complaints about this.  One being this sort of 
>	    behavior is contradictory to the over all philosophy
>	    behind optimization.  A optimization on a language is
>	    the set of  transformations that do not change the
>	    behavior of the programs but are beneficial by
>	    some metric.  Clearly, the first clause has been
>	    violated.  Conclusion, it's inappropriate to
>	    try to perform standard data flow analysis techinques
>	    in a multi-threaded environment.

Programs that run successfully when unoptimized and fail when optimized
suffer from one of the following:
	(1) The optimizer has bugs.
	(2) The program is incorrect, i.e. it is dependent on coincidental
	    or undefined behavior.
Lets presume that the optimizer is bug-free. That leaves us with (2).
For years I've seen programmers present me with the following reasoning:
	1. Program compiles and runs perfectly with compiler A.
	2. Program compiles but crashes with compiler B.
	3. Therefore compiler B has bugs in it.
Some of the causes for these problems have been:
	o	Program stores something 1 byte past the end of a malloc'd
		array. Some libraries leave a 'pad' at the end of malloc'd
		data.
	o	Program depends on char being signed/unsigned.
	o	Program depends on auto variables being initialized to 0.
	o	Program stores through uninitialized pointer, which winds
		up pointing to different locations when compiled with
		different compilers.
	o	Program depends on layout of storage of variables.
	o	Program depends on being able to decrement a pointer
		below the value returned from malloc, and have it test
		as 'less' than the malloc'd pointer. (Some of the examples
		in the C++ book depend on this.)
Making the statement that instead of exorcising the above problems the
compiler shouldn't optimize is fixing the symptom not the problem.
On most machines, big wins are realized by avoiding redundant load/stores
of variables into/from registers. These wins are on the order of 20-30%,
in both speed and space.
Many applications would have to go back to assembly if these were removed.
Data flow analysis is a powerful method to determine which load/stores
are redundant. Adding a bit to the type of a variable to prevent
redundant load/store elimination is trivial (I implemented volatile in
my optimizer).

>	    how does one go about debugging a program that
>	    works in the unoptimized version, pukes in the
>	    optimized version.

As far as debugging optimized code, there's no easy answer. What I do
personally is look at a mixed source/assembly listing of the routine
with the problem, after I've tried to narrow the problem down to the
smallest sequence of lines possible.

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (04/14/88)

> 
> Programs that run successfully when unoptimized and fail when optimized
> suffer from one of the following:
> 	(1) The optimizer has bugs.
> 	(2) The program is incorrect, i.e. it is dependent on coincidental
> 	    or undefined behavior.

   And for the case of volatiles, it's my opinion that the optimization 
technique is buggy.  It assumes that data flow analysis of the code 
can be used to conduct dead code elimination, variable induction,
migration of loop invariants, etc.  It can not when you have multi-threads
of execution with shared variable because in general any of the shared
variables may changed at any time.  volatile is just a HACK to make the
optimization technique work!   volatile adds no additional expressive
power to the language, neither is it adding convenience - it only
tell the optimizer where it is about to goof up so the optimizer
doesn't commit a blunder!

-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

levy@ttrdc.UUCP (Daniel R. Levy) (04/14/88)

In article <1412@pt.cs.cmu.edu>, edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
# [Optimization] assumes that data flow analysis of the code 
# can be used to conduct dead code elimination, variable induction,
# migration of loop invariants, etc.  It can not when you have multi-threads
# of execution with shared variable because in general any of the shared
# variables may changed at any time.  volatile is just a HACK to make the
# optimization technique work!

What optimization technique, then, would you suggest in the place of data
flow analysis with "volatile" to tell the optimizer that a variable that
COULD change behind its back, actually will do so?  The only alternatives
I can think of are showing the optimizer ALL your source code at once, which
blows the concept of incremental compilation and modules all to hell (and
what do you do about variables that can be modified by hardware?), or put
up with a very conservative, pessimistic "optimizer."  If you have something
better in mind than the conventional optimizer technology you are bashing,
tell us about it fella!
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
|     AT&T Data Systems Group    |  Disclaimer?  Huh?  What disclaimer???
|--------Skokie, Illinois--------|

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (04/14/88)

> What optimization technique, then, would you suggest in the place of data
> flow analysis with "volatile" to tell the optimizer that a variable that
> COULD change behind its back, actually will do so? The only alternatives
> I can think of are showing the optimizer ALL your source code at once, which
> blows the concept of incremental compilation and modules all to hell (and
> what do you do about variables that can be modified by hardware?), or put
> up with a very conservative, pessimistic "optimizer."  If you have something
> better in mind than the conventional optimizer technology you are bashing,
> tell us about it fella!

   One could argument why do data flow at all when, in my opinion,
the gains are minimum (in the range of 20% to 30% speed up of code
and reduction of code size) and payment high (incorrect code or an
obscure modification to the underlying language).  I'd rather my compiler
be pessimistic than wrong!

    Another option for you is to consider developing an entire new
language with multi-tasking support.  Design it in such way that it
is easy to optimize, and is general enough to describe external events
such as hardware as a special task.

   Another option for you is to have the user target the variables
for optimization instead of taking the attitude that its up to the user
to target those variable that are not be optimized.   Again my attitude
here is that at least the user has to go out of his way to screw
himself over.   

  My real fear about the addition of volatile is that a programs compiled
without data flow optimization will run  perfectly fine.  Porting it
to a machine with data flow optimizing compiler will break it.  Now who
is to blame?  I think in those cases the optimizer will become useless
and the only thing you will accomplish is giving a bunch of people
large head aches



-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

peter@athena.mit.edu (Peter J Desnoyers) (04/15/88)

>In article <1412@pt.cs.cmu.edu>, edw@IUS1.CS.CMU.EDU (Eddie Wyatt)
writes: 
># [Optimization] assumes that data flow analysis of the code
># can be used to conduct dead code elimination, variable induction,
># migration of loop invariants, etc.  It can not when you have multi-threads 
># of execution with shared variable because in general any of the shared 
># variables may changed at any time.  volatile is just a HACK to make the 
># optimization technique work! 
>

A compiler must make certain assumptions about the operation of a
machine in order to compile a program at all.  Namely, that the
processor functions in a defined manner when instructions are issued,
and that the memory acts in a defined fashion - when data is
written to a location, it can be read back. (of course, you may have
to wait to issue the read instruction on some architectures)

If memory locations are allowed to change their value arbitrarily,
then compilation (never mind optimization) is impossible.  Volatile
exists to allow code which accesses such 'magic' locations to work. Of
course, naive compilation usually works, but that is no guarantee.

@begin(flame)
Finally, to all those people who say "so what, everyone programs on
MSDOS or UNIX": The last company I worked for made real products.
Modems, networks, and other things like that. A lot more money is
spent on boxes - things that make noise when you kick them - than on
software, and a lot of the value of these electronic boxes is
software. Maybe these people just aren't on the net - the company I
worked for isn't. But they are out there, and there are a lot of them.
Much of the code for such imbedded systems is now written in C instead
of assembler, and it needs both optimization and 'volatile' to work.
Some compilers give you /**OPTIMIZE OFF**/, others give you some other
kludge, and others give you nothing.  I've seen some gory code to get
around this problem, and it shouldn't be necessary. 
@end(flame)

				Peter Desnoyers
				peter@athena.mit.edu

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/16/88)

In article <1414@pt.cs.cmu.edu> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:

>   One could argument why do data flow at all when, in my opinion,
>the gains are minimum (in the range of 20% to 30% speed up of code
>and reduction of code size) and payment high (incorrect code or an
>obscure modification to the underlying language).  I'd rather my compiler
>be pessimistic than wrong!

But the compiler isn't wrong, the code is!  If data flow optimization is
allowed, then programs have to be written with the single-thread notion in
mind.  Since 99% of execution of programs is single-thread, this is not an
unreasonable assumption to make.  Since this assumption is not always true,
however, there needs to be a way to get around it.  Volatile is the
proposed way around it (although personally, I don't think the solution is
good enough).

>    Another option for you is to consider developing an entire new
>language with multi-tasking support.  Design it in such way that it
>is easy to optimize, and is general enough to describe external events
>such as hardware as a special task.

Do you intend to rewrite all of Unix and all the C applications currently
in use in your new language?  And are you willing to train all the people
who are going to use this new language?  And what do we do during the three
or four years that it is going to take to fully develop this language?
Writing an entirely new language is not a viable solution for the short
term.

>   Another option for you is to have the user target the variables
>for optimization instead of taking the attitude that its up to the user
>to target those variable that are not be optimized.   Again my attitude
>here is that at least the user has to go out of his way to screw
>himself over.   

This may be true, but 99% of the variables in a program can legally be
optimized in this fashion.  This is like saying that all variables which
are going to have their address taken should have the keyword 'noregister'
in front of it.  All this does is lead to a very verbose language.

>  My real fear about the addition of volatile is that a programs compiled
>without data flow optimization will run  perfectly fine.  Porting it
>to a machine with data flow optimizing compiler will break it.  Now who
>is to blame?  I think in those cases the optimizer will become useless
>and the only thing you will accomplish is giving a bunch of people
>large head aches

Then just declare all your variables as volatile!  Just using a different
compiler or a different implementation of libraries can break code!  The
data flow optimization is not unreasonable, since you ought to know what
variables in a program can change by some mechanism other than normal
execution anyway.  If you don't know this I don't see how it is possible to
write a non-trivial program.  Did you ever assume that you could save the
result of strlen() for a string you could never change?  Without assuming
single thread operation, this is a bad assumption and should not be used
(and we would all have very inefficient programs).  Non-volatility always
has to be assumed to some degree.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

henry@utzoo.uucp (Henry Spencer) (04/17/88)

> ... volatile is just a HACK to make the
> optimization technique work!   volatile adds no additional expressive
> power to the language, neither is it adding convenience - it only
> tell the optimizer where it is about to goof up so the optimizer
> doesn't commit a blunder!

This whole debate is entirely a matter of definitions.  If you believe,
deep down in your heart, that C is basically a sequential language and the
compiler is entitled to assume that, then such optimizations are entirely
legitimate.  If you believe, deep down in your heart, that C ought to be
parallel-capable with no effort on your part, then of course such
optimizations are abominations that should not have to be explicitly
disabled.  There is no point in screaming at each other in attempts to
change the other's mind; you are arguing from different underlying
assumptions, and no amount of rational (or irrational) debate will change
anyone's mind unless the assumptions change.

Historically, by the way, C has been a sequential language and parallel
folks have had to watch their step and make allowances.  I see no reason
to change this.  In fact I see reasons not to change it, since parallelism
requires care anyway -- not every multiprocessor machine has hardware-
guaranteed cache consistency, for example -- and most existing C code
requires no parallelism and would prefer to run fast.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

daveb@geac.UUCP (David Collier-Brown) (04/19/88)

| In article <1414@pt.cs.cmu.edu> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
||     Another option for you is to consider developing an entire new
|| language with multi-tasking support.  Design it in such way that it
|| is easy to optimize, and is general enough to describe external events
|| such as hardware as a special task.

   Ok, see below...

In article <4437@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
| Do you intend to rewrite all of Unix and all the C applications currently
| in use in your new language?  And are you willing to train all the people
| who are going to use this new language?  And what do we do during the three
| or four years that it is going to take to fully develop this language?
| Writing an entirely new language is not a viable solution for the short
| term.

   Short term?  The short term is **always** already over.  

   The language has existed for a number of years, was announced in
a special issue of Software, Practice and Experience, has a small
but vocal programmer base, is used in several Universitys to teach
Operating Systems and is even available on an IBM Poisonous Computer...

  Per Brinch Hansen's "Edison":

  while (not done)
	outputBuffer = inputBuffer;
	cobegin 1
		fill(inputBuffer,stream1);
	and 2
		empty(outputBuffer,stream2);
	coend
  end while

--dave (minor syntactics errors admitted) c-b
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

jss@hector.UUCP (Jerry Schwarz) (04/20/88)

In order to understand "volatile" we first must first be able to
answer the question: "what is the meaning of a C program"?
It isn't the mapping of input to output.  I/O takes place only
through library functions and unhosted C environments don't even have
such functions.

It is clear that the meaning is somehow connected to the fetch/store
history of global variables and perhaps the call history
to some foreign (i.e. not written in C) functions, and even worse,
external events that cause C functions to be called.

But does this mean that the fetch/store history of all global
variables  is significant?  And must the compiler assume that there
is no relation between what the code stores at a location and what it
fetches later?

Prior to the introduction of "volatile" there was no satisfactory
answer to these questions.  To answer "yes" constrains the code
generator too much and results in slower compiled code. To answer
"no" results in confusion about the meaning of a C program and to
disputes about what an optimizer is allowed to do.

The introduction of "volatile" changes the situation.  The meaning of
a C program is now related to the fetch/store history of "volatile"
and only "volatile" locations.  Thus adding "volatile" to the
language doesn't make a minor change in the meaning of C programs
that use it.  If makes a radical change in the concept: "meaning of a
C program".

I still think that it's a good idea but I no longer think (as I used
to) that it is a simple one.

Jerry Schwarz
Bell Labs, Murray Hill

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/24/88)

In article <10242@ulysses.homer.nj.att.com> jss@hector (Jerry Schwarz) writes:
>The introduction of "volatile" changes the situation. ...
>I still think that it's a good idea but I no longer think (as I used
>to) that it is a simple one.

My opinion is that "volatile" provides a hook for implementation-specific
control over accesses to variables that might be botched by optimizations.
However, there is no portable meaning for "volatile".

Its main advantage lies in its ABSENCE, which gives the implementation
license to optimize the hell out of it (so long as aliasing is properly
maintained).