[comp.lang.misc] After aliassing -- parallelisation?

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/09/90)

In article <5085@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>By the way, as I've pointed out before, the _LACK_ of aliasing
>should be the default - the presence of possible aliasing should
>require the assertion.  It makes sense to make the most commonly
>desired configuration be the default.
 ^^^^^^^
	The key word here is "desired".  The programmer may "desire"
that the program run like a bat out of hell and that what it does is
what it seems to do.  These desires sometimes conflict.

	Anyway, what's worrying me now is the prospect of a replay
of this debate in respect of parallelisation.  The following C
fragment is erroneous:

	int i = 0;
	int f() { return i++; }
	...
	{ int j = (f(), 2) + (f(), 2);
	  printf ("%d\n", i);
	}

because the two invocations of "f" may be run in parallel, so that
the two occurrences of "i++" may interfere with each other.  But
"f" doesn't know it's going to be invoked in parallel, and the rest
of the program doesn't know that "f" is dangerous.

	In real life, this has rarely mattered so far.  If there
are two separate invocations of "f" on a serial processor, you
will get 2 for "i" no matter how you re-order the expression for
"j";  if (as in Fortran) you optimise out one of the calls, you
get 1, and the programmer who complains can be pointed to some
corner of the language definition.  But if the two calls of "f"
are handed over to different processors, you will usually get
2, but occasionally 1.  Insidious bugs strike again.

	Of course, we never write code like that.  But it is
easy to write code like "g() + h()" where "g" and "h" happen
(unknown to the writer) both to invoke "f".

	In languages with semaphores or similar, we can write
"f" like

	sema s = 1;
	f() { int k; down s; k = i++; up s; return k; }

but who is going to go to so much trouble [and why will we think of
it in the first place]?  What should users of C and Fortran do?
Pray?

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)

In article <1990Nov8.183058.25045@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
> 	int f() { return i++; }
> 	{ int j = (f(), 2) + (f(), 2);
> 	  printf ("%d\n", i);
  [ two f()s may run in parallel, so i's value is undefined ]

One thing that helps detect this situation is pure functions, which
depend only on their arguments and local variables initialized from
their arguments. (Clearly subroutines of pure functions must be pure.)
Once you add the ``pure'' keyword, the compiler can warn about parallel
non-pure functions. It also helps vectorization if you can assert that a
series of statements is pure.

  [ semaphores ]
> but who is going to go to so much trouble [and why will we think of
> it in the first place]?  What should users of C and Fortran do?
> Pray?

Be careful with static and global variables?

---Dan

jlg@lanl.gov (Jim Giles) (11/10/90)

From article <1990Nov8.183058.25045@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> [... lots of stuff about functions with side-effects in parallel ...]
> [... ending with a solution involving explicit semaphores        ...]
> 	sema s = 1;
> 	f() { int k; down s; k = i++; up s; return k; }
> 
> but who is going to go to so much trouble [and why will we think of
> it in the first place]?  What should users of C and Fortran do?
> Pray?

Hard to tell.  The problem, as stated, is beyond the language
definitions of either language (and most other popular languages as
well).  Certainly the semaphore idea will work (since semaphores are
presumed to be an extension available on all parallel machines).

My own opinion is that functions with side-effects should be
explicitly identified and the the compiler should be responsible for
them.  On a serial machine, the compiler would just be constrained
against reordering or eliminating calls of such functions.  On a
parallel machine, the compiler would be required, not just to
exclude concurrent operations on side-effect functions, but to
enforce a specific order on them.  This means that side-effects
would be expensive - but we knew that already.

J. Giles

jlg@lanl.gov (Jim Giles) (11/10/90)

From article <24674:Nov906:50:1890@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <1990Nov8.183058.25045@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
>> 	int f() { return i++; }
>> 	{ int j = (f(), 2) + (f(), 2);
>> 	  printf ("%d\n", i);
>   [ two f()s may run in parallel, so i's value is undefined ]
> 
> One thing that helps detect this situation is pure functions, which
> depend only on their arguments and local variables initialized from
> their arguments.  [...]

Finally an agreement!  Pretty much the same thing I said.  However,
I'm not too sure about your default - shouldn't 'pure' be the
default and 'impure' require the extra declaration? :-)  I'm only
half kidding here - I haven't got an opinion about which should be
the default, but it should not be selected by arbitrary means.  Do
an experiment to find the most common usage.  Do another to find
the most common expectations of programmers.  Then pick you default.

> [...]            (Clearly subroutines of pure functions must be pure.)

And how do you determine _this_ at compile time? :-)  Again, only
half kidding.  This is another thing that the loader would be very
good at enforcing.  I can _imagine_ arcane schemes involving header
files to do this earlier than load-time, but it again requires that
the headers and the routines themselves actually match.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/10/90)

In article <5499@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <24674:Nov906:50:1890@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > One thing that helps detect this situation is pure functions, which
> > depend only on their arguments and local variables initialized from
> > their arguments.  [...]
> Finally an agreement!  Pretty much the same thing I said.

Shall we get into an argument about who posted first? :-) (I did!)

> However,
> I'm not too sure about your default - shouldn't 'pure' be the
> default and 'impure' require the extra declaration? :-)

I'm pretty sure that impure is the right default, for various reasons.

> > [...]            (Clearly subroutines of pure functions must be pure.)
> And how do you determine _this_ at compile time? :-)  Again, only
> half kidding.  This is another thing that the loader would be very
> good at enforcing.  I can _imagine_ arcane schemes involving header
> files to do this earlier than load-time, but it again requires that
> the headers and the routines themselves actually match.

Jim, can you please shed your paranoia about .h files that don't reflect
c files?

There's absolutely nothing arcane about putting purity into prototypes
in header files, and enforcing these restrictions locally. It works
perfectly.

---Dan