tomc@cloud9.Stratus.COM (Tom Clark) (02/01/89)
In any computer system, the programmer expects operations in the source code to be carried out in the order specified. This is especially true in a multi-processor system where multiple processes may share data. For example, process A may write two locations with x' and y'. Process B, seeing y', assumes x' has been written and reads it. This may be extended causally by process B then writing a third location z' and process C, seeing z', knowing that x' must have been written. If for some reason these writes are performed out of order, the model on which the programmers have based their thinking is wrong. Programmers have been used to such a model for quite a while. By the use of both explicit and implicit locks, they have exploited it to arrive at well-behaved high-performance MP systems. However, today compilers are optimizing and rearranging the order of the operations specified in the program (especially for RISC). In addition, newer high-performance processors will reorder operations within the chip to improve performance by the use of data-driven techniques. Also, newer computers have more complex busses (multiple paths) between the CPUs and memories. The problem of cache coherence also adds complexity to the problem. In the case of explicit locks, the programmer can give hints to the compiler and the hardware in order to force a certain sequence of operations. These hints may take the form of a 20 pound sledge at times (e.g. forcing serial operation of the CPU or disabling compiler optimization in some parts of code) but they will work. The problem comes with implicit locks. An implicit lock is the dependence on the ordering of data references (both reads and writes). These are often very hard to find by inspection, even if one has the time to examine all parts of the source code. Have people thought of how to get older software to work on newer machines and compilers? Obviously older applications and operating systems would like to take advantage of the new technology, but finding all these problems before putting the system into production is very difficult. The class of bugs introduced is such that testing cannot be depended upon to find them, and they will likely occur when the system is very busy (cost of failure is high). I'd like to hear any suggestions for dealing with this problem. Even if you can handle the issue for your own code, how do you train a customer to do it for their code? What techniques (hardware and software) can be applied? Is the problem solvable? - Tom Clark - Stratus Computer, Inc., Marlboro, MA - Disclaimer: My opinions, nobody elses.
billo@snacpac.npac.syr.edu (Bill O) (02/01/89)
In article <3492@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes: >In any computer system, the programmer expects operations in the source >code to be carried out in the order specified. > ... >However, today compilers are optimizing and rearranging the order of the >operations specified in the program (especially for RISC). In addition, >newer high-performance processors will reorder operations within the >chip to improve performance by the use of data-driven techniques. Also, >newer computers have more complex busses (multiple paths) between the >CPUs and memories. The problem of cache coherence also adds complexity >to the problem. But it is a problem that has been solved. There exist many protocols for efficient coherent caches that correctly implement atomic lock operations. > ... >The problem comes with implicit locks. An implicit lock is the >dependence on the ordering of data references (both reads and writes). >These are often very hard to find by inspection, even if one has the >time to examine all parts of the source code. Have people thought of >how to get older software to work on newer machines and compilers? >Obviously older applications and operating systems would like to take >advantage of the new technology, No optimizing or parallelizing compiler worth its salt will reorder statements (either through parallelization or global optimization) if such reordering would change the meaning of the program. The technology of optimizing compilers is about 30 years old, and is very robust. The technology of automatically parallelizing compilers is, perhaps, 10 years old, but there are many examples of success. A couple that we know about at NPAC are the parallelizing Fortran compilers for the Alliant FX/80 and the Encore Multimax. Compilers of this sort examine programs for loops that can be run in parallel on separate processors, and insert synchronization points for any data-dependencies that are found. The Alliant compiler also examines loops for vectorizability, and will usually produce code that runs "concurrent outer, vector inner", which means that an outer loop is having its iterations performed in parallel, while the inner loop has been "unwound" into vector operations. Both the Alliant and Encore compiler also perform "good old" global optimization, and both never NEVER perform an optimization if it would change the meaning of a program. > >I'd like to hear any suggestions for dealing with this problem. Even if >you can handle the issue for your own code, how do you train a customer >to do it for their code? I have a good deal of experience with the Alliant compiler, so I'll talk about it. The compiler, by its very nature, is conservative. It does not perform an optimization or parallelization if it thinks there's any chance that it could change the order of interdependent computations. Of course, it sometimes is too conservative, and will fail to optimize where it really could have. In these cases it prints an "informational message" which says what it thinks is the problem. The programmer can then opt, if he/she feels the compiler was been too conservative, to include a compiler directive in the code telling the compiler to go ahea and optimize anyway. These informational messages are tremendously helpful to our users. Being primarily a Fortran engine, the FX/80 is used principally by "real" scientific Fortran programmers -- not computer scientists, yet we have had real success in training our users how to interpret the messages, and when to try compiler directives >What techniques (hardware and software) can be >applied? Well, automatically parallelizing compilers, as mentioned, are a viable option. Perhaps the most aggressive compiler of this sort is the Fortran compiler for the Multiflow Trace machines. The Trace compilers move code even when it *will* affect meaning -- specifically, assumptions are made about the result of branch tests before the test is performed. These assumptions allow more parallelism to be exploited, and the compiler can "get away with it" because it inserts extra instructions to "undo the damage" in cases when the branch went a different way then predicted. Just as with the Alliant compiler, and with any good optimizing compiler, the overall semantics of the program are not changed. This is not handwaving. All of the techniques used by such compilers are provably correct. (Naturally, any compiler may have bugs, and an optimizing compilers is no exception, but that is a problem of software engineering). I should point out that Alliant is developing an optimizing/vectorizing/parallelizing C compiler too, so the techniques aren't limited to Fortran. It just so happens that Fortran is where the demand is, so that is why so much effort has been directed at the development of Fortran compilers. As for RISC, perhaps Tom is thinking about RISC-specific techniques such as inserting a branch sooner in the code than one would for a non-RISC machine. But such branch instructions are always of the deferred branch sort which guarantee that the semantics of the program will be preserved. Of course, RISC compilers also may perform global optimization, but such techniques are well known and understood. Finally, concerning explicit locks, an optimizing compiler will never move an explicit lock operation, either because the locking operation is built-into the language, and so it knows better, or because the locking operation is performed by an external subroutine, which the compiler will likely make worst-case assumptions about, thus preserving language semantics. Is the problem solvable? YES, but maybe I've just been rambling, and haven't answered your question > > - Tom Clark > - Stratus Computer, Inc., Marlboro, MA > - Disclaimer: My opinions, nobody elses. Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University (billo@cmx.npac.syr.edu) Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University (billo@cmx.npac.syr.edu) #! rnews 404 Relay-Version: V
brooks@vette.llnl.gov (Eugene Brooks) (02/01/89)
In article <3492@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes: >In any computer system, the programmer expects operations in the source >code to be carried out in the order specified. This is especially true >in a multi-processor system where multiple processes may share data. In ANSI C the tool to use to enforce reference order for reads and writes is the keyword volatile, although I suppose that the purveyors of this keyword did not have shared memory multiprocessors in mind when they inserted it into the ANSI C definition. If instruction reordering can be performed invisibly by the system at lower levels than the compiler, the effect of these keywords must trickle down into the hardware properly. We use the volatile keyword support in GNU C on our shared memory multiprocessors routinely in codes which require certainly references to actually go out to main memory for synchronization purposes.
hoefling@uicsrd.csrd.uiuc.edu (02/02/89)
Manufacturers of multi-processors provide program-restructurers (sometimes called "parallelizers" or "vectorizers"). All of the ones I know about accept a sequential version of the program and produce a new version of the program containing vector statements (if you have vector-processing hardware) and parallel constructs. There are many such restructurers in use in universities and I know of two commercial restructurers (KAP and VAST). Typically, these restructurers are used with Fortran, but some are beginning to be used for C and Lisp. Hope this answers your question. Jay Hoeflinger Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign UUCP: {ihnp4,uunet,convex}!uiucuxc!uicsrd!hoefling ARPANET: hoefling%uicsrd@uxc.cso.uiuc.edu CSNET: hoefling%uicsrd@uiuc.csnet BITNET: hoefling@uicsrd.csrd.uiuc.edu
tomc@cloud9.Stratus.COM (Tom Clark) (02/02/89)
In article <19635@lll-winken.LLNL.GOV>, brooks@vette.llnl.gov (Eugene Brooks) writes: > In article <3492@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes: > >In any computer system, the programmer expects operations in the source > >code to be carried out in the order specified. This is especially true > >in a multi-processor system where multiple processes may share data. > > In ANSI C the tool to use to enforce reference order for reads and writes > is the keyword volatile, although I suppose that the purveyors of this keyword > did not have shared memory multiprocessors in mind when they inserted it into > the ANSI C definition. Unfortunately volatile does not do it. Volatile does indeed force a write to memory instead of holding an intermediate result in a register, but it says nothing about ordering of instructions. Such a construct does not exist in most languages, except for the disabling of optimization. Sigh. Tom Clark, Stratus Computer
cik@l.cc.purdue.edu (Herman Rubin) (02/02/89)
In article <3492@cloud9.Stratus.COM>, tomc@cloud9.Stratus.COM (Tom Clark) writes: > In any computer system, the programmer expects operations in the source > code to be carried out in the order specified. The programmer expects the results to be as if the operations were carried out in the order specified. I see no reason why it is necessary that the actual computations be done in that order. There are architectures where that is not the case now. < This is especially true < in a multi-processor system where multiple processes may share data. < For example, process A may write two locations with x' and y'. Process < B, seeing y', assumes x' has been written and reads it. This may be < extended causally by process B then writing a third location z' and < process C, seeing z', knowing that x' must have been written. If for < some reason these writes are performed out of order, the model on which < the programmers have based their thinking is wrong. Programmers have < been used to such a model for quite a while. By the use of both < explicit and implicit locks, they have exploited it to arrive at < well-behaved high-performance MP systems. One way of blocking this is to require that a write to a location, including a register, put a block on that location. Also, a pending read from a location must put a block on a write to that location. For scalar operations, this can be handled in hardware at some cost. The only cheap way is to have a write block all reads and writes, and a pending read block all writes, but this will slow down code. > However, today compilers are optimizing and rearranging the order of the > operations specified in the program (especially for RISC). In addition, > newer high-performance processors will reorder operations within the > chip to improve performance by the use of data-driven techniques. Also, > newer computers have more complex busses (multiple paths) between the > CPUs and memories. The problem of cache coherence also adds complexity > to the problem. > > In the case of explicit locks, the programmer can give hints to the > compiler and the hardware in order to force a certain sequence of > operations. These hints may take the form of a 20 pound sledge at times > (e.g. forcing serial operation of the CPU or disabling compiler > optimization in some parts of code) but they will work. > > The problem comes with implicit locks. An implicit lock is the > dependence on the ordering of data references (both reads and writes). > These are often very hard to find by inspection, even if one has the > time to examine all parts of the source code. Have people thought of > how to get older software to work on newer machines and compilers? > Obviously older applications and operating systems would like to take > advantage of the new technology, but finding all these problems before > putting the system into production is very difficult. The class of bugs > introduced is such that testing cannot be depended upon to find them, > and they will likely occur when the system is very busy (cost of failure > is high). > > I'd like to hear any suggestions for dealing with this problem. Even if > you can handle the issue for your own code, how do you train a customer > to do it for their code? What techniques (hardware and software) can be > applied? Is the problem solvable? Here is a simple example of this problem, which requires knowledge of the machine to handle. It is an actual situation, not invented for this posting. A good method for generating pseudo-random numbers is to form x[n] = x[n-j] OP x[n-k], where there are several choices for OP (exclusive or is always one of them) depending on the machine. Now it is necessary that the values of x[n-j] and x[n-k], for sufficiently large n, are the values computed by the recur- rence. There only completely safe method I can think of is to make sure that the number of new items generated at one time is at most min(j,k). Whether more can be done depends on machine parameters. On the vector register machines in my ken, this is no problem, as the vector lengths are fairly small, although it can rule out small values of j and k. But on vector stream machines, if one knows, for example, that the an item M units back in the array must have been stored before reading, all that is needed is that j and k are at least M. Thus it takes interaction between the programmer and the compiler, and some machine dependence. It is inefficient not to allow the programmer to over- ride the safety features, but also dangerous. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
brooks@vette.llnl.gov (Eugene Brooks) (02/03/89)
In article <3507@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes: >Unfortunately volatile does not do it. Volatile does indeed force a write >to memory instead of holding an intermediate result in a register, but it >says nothing about ordering of instructions. Such a construct does not >exist in most languages, except for the disabling of optimization. Sigh. Reordering of instructions is fine as long as the meaning of the code is not changed. The volatile keyword is needed to take care of the added dimension that memory cells may change spontaneously or must be changed at the point of the write in order to effect communication to other devices which the compiler is not managing. The volatile references must be performed in proper order, and I believe that ANSI C specifies this, but any reordering of instructions which do not affect the values written to volatile memory cells poses no problem.
cquenel@polyslo.CalPoly.EDU (88 more school days) (02/03/89)
brooks@vette.llnl.gov (Eugene Brooks) writes: > In ANSI C the tool to use to enforce reference order for reads and writes > is the keyword volatile, although I suppose that the purveyors of this keyword > did not have shared memory multiprocessors in mind when they inserted it into > the ANSI C definition. tomc@cloud9.Stratus.COM (Tom Clark) writes: >Unfortunately volatile does not do it. Volatile does indeed force a write >to memory instead of holding an intermediate result in a register, but it >says nothing about ordering of instructions. Such a construct does not >exist in most languages, except for the disabling of optimization. Sigh. Wait a minute. What exactly is "ordering of instructions" ? You talk about it as if there is one true "order" for instructions generated by the compiler, and optimizations *change* that order. The compiler GENERATES the order. Probably no two compilers will generate the SAME code sequence for a significantly large program. So which one is right ? *Neither* obviously, right ? It depends. So, what you're trying to say is this : I want the compiler to generate code within certain constraints. I want it to generate code that is what I expect, so that I can second-guess the code and do things with my code that are directly supported by the language. This is not a goal that I can frown on easily. C's notable lack of support for multi-processor synchronization makes it very tempting to try to wangle a way to *reliably* synchronize processes on different processors, however, this is not a part of the language. Unless you can define exactly WHAT your constraints are, you can't expect a compiler implementor to generate his code by any other rules than that of supporting the definition of the language, and NOT "what the language has always done before." The reason "it's always worked before" is a Marketing reason for doing something, not a Software/Computer Engineering reason for doing it. --chris -- @---@ ----------------------------------------------------------------- @---@ \. ./ |Chris Quenelle (The First Lab Rat) cquenel@polyslo.calpoly.edu | \. ./ \ / | YOU can do away with mode-less editors in YOUR life time!!! | \ / ==o== ----------------------------------------------------------------- ==o==
khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) (02/03/89)
In article <1066@cmx.npac.syr.edu> billo@snacpac.npac.syr.edu.UUCP (Bill O'Farrell) writes: >means that an outer loop is having its iterations performed in parallel, while >the inner loop has been "unwound" into vector operations. Both the Alliant >and Encore compiler also perform "good old" global optimization, and both >never NEVER perform an optimization if it would change the meaning of a program. This is not quite true. Most very highly optimizing compilers have (often undocumented) unsafe modes which intentionally change the semantics of the code. On a CDC system with the u level of optimization (0,1,2,3,U) it produced correct results about 70% of the time, which made it a very useful tool. In addition, even modest levels of optimization can cause subtle changes in numerical results. On ieee 64-bit mode this is seldom seen, but in 32-bits it can often be observed. Keith H. Bierman It's Not My Fault ---- I Voted for Bill & Opus
schow@bnr-public.uucp (Stanley Chow) (02/03/89)
The problem that Tom posted refers to the problem of compilers doing optimizations that are correct for a single process but becomes wrong only when multi-processing. Specifically, if x and y are declared in shared memory (but not volatile), then it is legal for the compiler to reorder writes to them. All semantics can be preserved (if the compiler did it right). The problem comes from a second process looking at y and making inferences about the value of x. Reordered writes becomes a problem. Note that using 'volatile' is one possible solution but is incomplete in two ways: - if all my data is shared and I put volatile on everything, it can be a major major performance hit. - the ANSI definition of volatile is intended for dealing with I/O registers and such (at least what little I know of it). There can be circumstances when reordering of shared memory and my own process memory can cause errors. (I am not saying this is good or desirable, just that there are systems with this attribute). The more aggressive optimizations mentioned where program semantics can be violated momentarily (due to wrong branch prediction and or vector performance but fixed later) can be very nasty if an interrupt comes at the wrong time and the interrupt handler (or scheduler, or trap handler) was written knowing certain ordering of the memory references. Preemptive flame retardent: I am not defending this style of coding. But reality is that there exist large software bases that were written relying on this ordering (perhaps because the software is written before any body knew how to multi-processing). Stanley Chow ..!utgpu!bnr-vpa!schow%bnr-public (613) 763-2831 RISCy: We already know the ultimate RISC (the single instruction CPU) is a low performer. Why does anyone want to be RISCy?
bjornl@octopus.tds.kth.se (Bj|rn Lisper) (02/03/89)
In article <88088@sun.uucp> khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) writes: >In article <1066@cmx.npac.syr.edu> billo@snacpac.npac.syr.edu.UUCP (Bill >O'Farrell) writes: >> ..... Both the Alliant >>and Encore compiler also perform "good old" global optimization, and both >>never NEVER perform an optimization if it would change the meaning of a >>program. >This is not quite true. Most very highly optimizing compilers have >(often undocumented) unsafe modes which intentionally change the >semantics of the code. .... The problem is, of course, the difference between the semantics of the programming language and what the programmer considers to be the meaning. In scientific/numerical applications I think the intended meaning often is a function (e.g. the solution vector to a system of linear equations, which is a function of the system matrix and the right-hand side vector). The semantics of an imperative program (read: Fortran) is, however, based on states and state transitions. Since the state is comprised of the values of all the variables in a program and the value of the program counter, this means that an optimizing or parallelizing compiler really must take ALL variables into account, including those just intended as temporary work areas, if it is not to violate the semantics. Thus, a compiler for an imperative language can either be very conservative and abide to the semantics of the programming language, or try to guess the real intention of the programmer in order to allow itself a richer set of transformations. >In addition, even modest levels of optimization can cause subtle >changes in numerical results. On ieee 64-bit mode this is seldom seen, >but in 32-bits it can often be observed. Yes, this is another problem. For instance, a typical parallelizing operation is to use associativity to change a linear, inherently serial n-chain of binary operations into a balanced tree with O(log n) height. Sums can be treated this way PROVIDED THAT ADDITION IS ASSOCIATIVE. Floating point addition is, however, NOT associative. Changing the order of summation will give slightly different results, because round-off and cancellation errors depend on the magnitude of the added numbers. One can, in fact, easily construct examples where reordering of summations give disastrous results from a numerical point of view. Thus, a compiler should not use algebraic laws for real numbers to transform floating-point expressions without the consent of the programmer. Bjorn Lisper
eugene@eos.UUCP (Eugene Miya) (02/04/89)
Here we go again! Reinventing the HEP. Here we go again! Reinventing the ILLIAC. Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene "Post follow ups. Contribute to network noise."
cik@l.cc.purdue.edu (Herman Rubin) (02/04/89)
In article <7650@polyslo.CalPoly.EDU>, cquenel@polyslo.CalPoly.EDU (88 more school days) writes: ...................... > Wait a minute. What exactly is "ordering of instructions" ? > You talk about it as if there is one true "order" for instructions > generated by the compiler, and optimizations *change* that order. > > The compiler GENERATES the order. Probably no two compilers will > generate the SAME code sequence for a significantly large program. > So which one is right ? *Neither* obviously, right ? It depends. > > So, what you're trying to say is this : I want the compiler > to generate code within certain constraints. I want it to > generate code that is what I expect, so that I can second-guess DOES > the code and do things with my code that are directly supported > by the language. > > This is not a goal that I can frown on easily. Nor I. But the problem is more difficult than it seems. A "bug" was found in the CDC6600 architecture. Consider the following machine instructions, which I am writing in pseudocode, as I do not expect the readers to understand COMPASS. Xi denotes register X register i. X7 = X0/X1 memloc = X7 if(X3 < 0) goto LL ... LL memloc = X6 ... But the timing is such that if X3 < 0, then the "second" store is completed before the "first" store begins. The original architecture did not have a pending store block the issuance of a store instruction, and this was changed to prevent this from happening. Of course, this slows things down when this type of conflict does not occur. THIS one could be caught by a compiler. But there are others which can not. The programmer could tell if different symbolic locations are used, but not in all cases. Supposed the first memloc was array1[B3] and the second array2[B4]. It is much more difficult now. One could have hardware blocks, which requires the CPU to keep track of all pending load and store addresses. The upshot is that one can put severe general restrictions on the compiler and assembler, or one can try to get the programmer into the act at the machine level. I strongly suggest the latter. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
brooks@vette.llnl.gov (Eugene Brooks) (02/05/89)
In article <1120@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) notes
that failure of the hardware to enforce the the ordering of side effects to
memory locations can be a problem. We in fact ran into this problem with the
Cerberus multiprocessor simulator, accesses to shared memory are fully
pipelined through a packet switched network. Because conflicts in the
network could cause the actual stores to the memory banks to occur out of
order the set of a flag which is guarding access to data being written could
beat the actual writes to the data. In Cerberus, we dealt with this problem
by requiring return reciepts for all memory accesses. A counter in each
processor kept track of the number of pending memory operations issued by
the processor, and the instruction "sync" stalled further instruction
issue until all pending memory operations were complete. The user explicitly
used "sync" in his algorithm before a flag was set indicating written data
was ready. By seperating the "sync" operation as a seperate instruction
one could fully pipeline the writes of an aggregate data object. We actually
found that the use of "sync" was required in this sort of data transaction
for a customized Gauss elimination algorithm, without the "sync" the set of
a flag would indeed beat the writes of data it was protecting in rare cases
and caused incorrect execution of the algorthm.
brooks@maddog.llnl.gov or brooks@maddog.uucp
beyer@houxs.ATT.COM (J.BEYER) (02/06/89)
In article <3507@cloud9.Stratus.COM>, tomc@cloud9.Stratus.COM (Tom Clark) writes: > > In ANSI C the tool to use to enforce reference order for reads and writes > > is the keyword volatile, although I suppose that the purveyors of this keyword > > did not have shared memory multiprocessors in mind when they inserted it into > > the ANSI C definition. > > Unfortunately volatile does not do it. Volatile does indeed force a write > to memory instead of holding an intermediate result in a register, but it > says nothing about ordering of instructions. Such a construct does not > exist in most languages, except for the disabling of optimization. Sigh. The trick of disabling optimization to obtain a particular style of code at the assembly-language level cannot be relied upon, because anything an optimizer can do, a compiler can do as well. One need only imagine an ideal compiler that produces perfect code without an optimizer. With such a compiler, the concept of turning optimization does not exist. It seems to me that if the higher level language in use does not support the semantics desired, you must either rely on the characteristics of the compiler in use (and risk portability), change to a higher level language that does support the desired semantics, or write in a lower level language. But beware, some assemblers have optimizers in them, too. -- Jean-David Beyer A.T.&T., Holmdel, New Jersey, 07733 houxs!beyer
markhall@pyramid.pyramid.com (Mark Hall) (02/08/89)
In article <3507...>, tomc@cloud9.Stratus.COM (Tom Clark) writes: > > In ANSI C the tool to use to enforce reference order for reads and writes > > is the keyword volatile, > > Unfortunately volatile does not do it. Volatile does indeed force a write > to memory instead of holding an intermediate result in a register, but it > says nothing about ordering of instructions. I must be wrong about this (cuz no one else has posted it yet) but are you SURE ANSI volatile ``says nothing about the ordering of instructions''? Well, what does the part about ``the value of the volatile object shall agree with that prescribed by the abstract machine at all sequence points''? Dang, I thought it meant that you couldn't move computations involving volatile object across sequence points, which to me means you have constraints on ordering. As for operator application order, for the expression a <op1> b <op2> c, the evaluation must proceed `as if' op1 were applied first, and then op2, etc. I don't get it (I'm sure someone will set me straight, though). (I love getting into these discussions a month late). -Mark Hall (smart mailer): markhall@pyramid.pyramid.com (uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall
mash@mips.COM (John Mashey) (02/15/89)
In article <58218@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes: >In article <3507...>, tomc@cloud9.Stratus.COM (Tom Clark) writes: >> > In ANSI C the tool to use to enforce reference order for reads and writes >> > is the keyword volatile, >> >> Unfortunately volatile does not do it. Volatile does indeed force a write >> to memory instead of holding an intermediate result in a register, but it >> says nothing about ordering of instructions. > >I must be wrong about this (cuz no one else has posted it yet) >but are you SURE ANSI volatile ``says nothing about the ordering >of instructions''? Well, what does the part about ``the value >of the volatile object shall agree with that prescribed by the >abstract machine at all sequence points''? Dang, I thought it >meant that you couldn't move computations involving volatile >object across sequence points, which to me means you have >constraints on ordering. As for operator application order, for >the expression a <op1> b <op2> c, the evaluation must proceed >`as if' op1 were applied first, and then op2, etc. When you do global optimization on the UNIX kernel, including device drivers, "volatile" better work "right", regardless of what the standard says or doesn't say. Specifically, if you declare something volatile: a) Take the completely unoptimized version of the code, and consider the sequence of loads and stores from/to volatile variables. b) The optimized version of the code had better do exactly the same number, in the same sequence, of such loads and stores. If it doesn't work exactly this way, life will be hellish for the kernel folks, especially those writing device drivers. BTW, doing this right does not necessarily mesh well with classical optimization technology; it's very hard to get right and still have aggressive optimization; bugs are very subtle. [We used to do "binary search" on the kernel, i.e., optimization half of it; if it still worked, optimize half of the rest, etc, until the offending module was found. This was not fun.] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086