preston@asta.rice.edu (Preston Briggs) (06/14/91)
bs@frieda.mitre.org (Robert D. Silverman) writes: >: ibig = absdx .le. dmax > ^^^^^^^^^^^^^^^^^^^^^^^^ > >I would be very interested in seeing the assembler code that gets >emitted for this line of Fortran. How can this statement get executed >WITHOUT a branch?? The RS/6000 could do this, basically assigning ibig to one of the condition code registers. Actually, many machines could, though they don't always have enough CC registers to be useful. If this test can be issued enough in advance, then later tests of 'ibig' would be free. Don't know of any conditional assignments, though we might use a skip instruction to accomplish the purpose. Preston Briggs
bs@faron.mitre.org (Robert D. Silverman) (06/15/91)
In article <1991Jun14.163141.17728@rice.edu> preston@asta.rice.edu (Preston Briggs) writes: :bs@frieda.mitre.org (Robert D. Silverman) writes: : :>: ibig = absdx .le. dmax :> ^^^^^^^^^^^^^^^^^^^^^^^^ :> :>I would be very interested in seeing the assembler code that gets :>emitted for this line of Fortran. How can this statement get executed :>WITHOUT a branch?? : :The RS/6000 could do this, basically assigning ibig to one of the :condition code registers. Actually, many machines could, though they don't :always have enough CC registers to be useful. However, there is STILL a problem with the above expression. absdx .le. dmax is a BOOLEAN expression. It evaluates to either TRUE or FALSE. Hence, the above code simply says to store the value TRUE or FALSE into ibig according to whether absdx is less than dmax IT DOES NOT MEAN ibig = MINIMUM(absdx, dmax) For people to do this 'conditional assignement' it would have to be: ibig = if (absdx .le. dmax) then absdx else dmax -- Bob Silverman #include <std.disclaimer> Mitre Corporation, Bedford, MA 01730 "You can lead a horse's ass to knowledge, but you can't make him think"
preston@asta.rice.edu (Preston Briggs) (06/15/91)
bs@faron.mitre.org (Robert D. Silverman) writes: >:>: ibig = absdx .le. dmax >:> ^^^^^^^^^^^^^^^^^^^^^^^^ >:> >:>I would be very interested in seeing the assembler code that gets >:>emitted for this line of Fortran. How can this statement get executed >:>WITHOUT a branch?? And I wrote >:The RS/6000 could do this, basically assigning ibig to one of the >:condition code registers. And Silverman replies >However, there is STILL a problem with the above expression. > >absdx .le. dmax is a BOOLEAN expression. It evaluates to either TRUE >or FALSE. > >Hence, the above code simply says to store the value TRUE or FALSE >into ibig according to whether absdx is less than dmax > >IT DOES NOT MEAN > >ibig = MINIMUM(absdx, dmax) Right. I think it's time for some of the original context: andy@DEC-Lite.Stanford.EDU (Andy Freeman) wrote: > > isamax = 1 > dmax = abs(dx(1)) > do 30 i = 2, n > absdx = abs(dx(i)) > ibig = absdx .le. dmax > asm("<if ibig is true, replace isamax's value with i>") > asm("<if ibig is true, replace dmax's value with absdx>") > 30 continue Surely it's clear that ibig is intended to be a boolean, and hence may be (even "should be") allocated to a condition code register. I expect Silverman misread the original code, think that 'ibig' was to hold the min of absdx and dmax. The question he probably wanted to ask is: How do you implement asm("<if ibig is true, replace isamax's value with i>") ?? Unfortunately, I answered the question he asked rather than what he may have meant. Trying again, ... I suggest a skip instruction and a copy. The idea of a skip is that the instruction fetcher gets ahead of the executer. By the time the condition is resolved, the fetcher has already fetched well past the potentially skipped instruction; therefore, execution can continue, skipping or not, with minimal bubbling of the pipeline. Of course, this _is_ a form of conditional branch; but it's fast, executing in only 1 cycle, with an extra cycle for the copy (if not skipping). Tera's machine has (or will have) skip instructions. Probably others do too. Preston Briggs
slackey@bbn.com (Stan Lackey) (06/19/91)
In article <1991Jun14.163141.17728@rice.edu> preston@asta.rice.edu (Preston Briggs) writes: >bs@frieda.mitre.org (Robert D. Silverman) writes: > >>: ibig = absdx .le. dmax >> ^^^^^^^^^^^^^^^^^^^^^^^^ >> >>I would be very interested in seeing the assembler code that gets >>emitted for this line of Fortran. How can this statement get executed >>WITHOUT a branch?? > >Don't know of any conditional assignments, though we might >use a skip instruction to accomplish the purpose. I once worked on a machine that had instructions like x=min(y,z) in hardware. Also there were vector instructions that would elementwise compare vectors and build a vector of logicals which could be applied to a subsequent vector op. So a loop of the form where (a(i) .lt. b(i)) c(i) = d(i) + e(i) could be done with two vector instructions and no branches (other than loop control, of course). -Stan
tonys@pyra.co.uk (Tony Shaughnessy) (06/19/91)
In article <64739@bbn.BBN.COM> slackey@BBN.COM (Stan Lackey) writes: > >I once worked on a machine that had instructions like x=min(y,z) in hardware. >Also there were vector instructions that would elementwise compare vectors >and build a vector of logicals which could be applied to a subsequent vector >op. So a loop of the form > > where (a(i) .lt. b(i)) c(i) = d(i) + e(i) > >could be done with two vector instructions and no branches (other than >loop control, of course). > >-Stan How many cycles would it take to run each of these instructions? If it would take many cycles, then wouldn't this dominate any expense incurred in taking a branch, or are there other reasons for avoiding a branch? -- Tony Shaughnessy tonys@pyra.co.uk "Pedal away those tag-nut blues"
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (06/19/91)
>>>>> On 19 Jun 91 09:18:53 GMT, tonys@pyra.co.uk (Tony Shaughnessy) said: Tony> In article <64739@bbn.BBN.COM> slackey@BBN.COM (Stan Lackey) writes: > >I once worked on a machine that had instructions like x=min(y,z) in >hardware. Also there were vector instructions that would elementwise >compare vectors and build a vector of logicals which could be applied >to a subsequent vector op. So a loop of the form > > where (a(i) .lt. b(i)) c(i) = d(i) + e(i) > >could be done with two vector instructions and no branches (other than >loop control, of course). > >-Stan Tony> How many cycles would it take to run each of these instructions? Tony> If it would take many cycles, then wouldn't this dominate any Tony> expense incurred in taking a branch, or are there other reasons Tony> for avoiding a branch? On the Cyber 205/ETA-10, the second instruction requires one cycle per element, whether the result is stored or not. I think that the first instruction also requires one cycle per element. Then add two vector startups to the total time required. So the cycle count is something like (on the Cyber 205): WHERE block: 140 + 2*N original op: 70 + N (No IF tests -- compute at all locations) For long vectors, if a(i)<b(i) half of the time, then the WHERE block above will run at 1/4 the MFLOPS rate of the basic loop. If you get to re-use the control vector generated by the comparison, then you can amortize the overhead cost of the comparison over all of its uses. (The compiler was never smart enough to do this for you, so you had to create a bit vector manually and use it explicitly). Since scalar code tended to be a bit slow on the Cyber 205/ETA-10 machines, the point at which the scalar and vector codes ran at the same speed might be (for long vectors) where a(i) .lt. b(i) is true at least 15-20% of the time. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
slackey@bbn.com (Stan Lackey) (06/20/91)
>>>>>> On 19 Jun 91 09:18:53 GMT, tonys@pyra.co.uk (Tony Shaughnessy) said: > >>I once worked on a machine that had instructions like x=min(y,z) in >>hardware. Also there were vector instructions that would elementwise >>compare vectors and build a vector of logicals which could be applied >>to a subsequent vector op. So a loop of the form >> >> where (a(i) .lt. b(i)) c(i) = d(i) + e(i) >> >>could be done with two vector instructions and no branches (other than >>loop control, of course). > >Tony> How many cycles would it take to run each of these instructions? >Tony> If it would take many cycles, then wouldn't this dominate any >Tony> expense incurred in taking a branch, or are there other reasons >Tony> for avoiding a branch? The min/max took one or two depending upon which version of the CPU you had. The interesting thing about the scalar machine was it was heavily pipelined, and used branch prediction. Unfortunately, branches which depend upon floating point values are often wrong 50% of the time, causing the pipe to be "drained" and restarted. In this case the min/max took the same amount of time as the compare by itself, so the branch followed by the assignment would be added on. The vector case took one cycle per element plus overhead of a few cycles per instruction, so the total time was 2*(#elements + overhead). This machine did not have conditional scalar instructions as others mentioned in this thread, but that is a good solution to this class of problem: instructons are just fetched and dropped down a pipe; "masked" instructions are just no-op'ed; the scalar pipeline is not disturbed. -Stan