[comp.sys.transputer] Re producers and consumers

HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") (01/30/91)

Dear Net,

>>The following was extracted from the article:
Jaap Hollenberg
"People are getting more comfortable with parallelism" - an interview
....
>> There is a tremendous amount of 
>management function placed on him. In the parallel Fortran paradigm the user
>has very little management requirement, he can write an algorithm that - apart
>from the fact that he must use the number of processors - works very much like
>the programming paradigm he is used to.  ... 

Does anybody have any comments regarding this article ? Particularly, can we say
that occam is a producer-initiated language  and Fortran is a 
consumer-initiated language ?

..... yes I have a comment 

	1	I am unable to understand what it means
	2	I don't agree with the parts I can make out.

The premise of the article apears to be that communication based languages
such as occam are less efficient than parallel fortran. The justification for
this apears to be that interprocess communication involves a lot of copying
of data. 

The first point I would make is that `efficiency' is very much less important
to me than coding clarity. I am quite prepared to sacrifice half my CPU time
if it makes the code easier to maintain. The amount of code I see which has
turned into useless junk because of (frequently misguided) attempts at
`optimisation' is quite depressing. If the code dosen't run fast enough I
would prefer to write a tool to perform the optimisation of the code than
start hacking arround in `parallel fortran'. The clarity of the CSP model and
the simplicity of coding it allows are for me it's most significant 
advantage.

My second point is that I do not beleive that the CSP paradigm is inefficient.
Most communications between processes involve small packets of data. Time taken
to copy the data is then less significant than the time taken to set up the
transfer. I agree that passing data arround a single processor via a common
block may be faster for large amounts of data but that trick is not 
particularly effective between different processors (even with shared memory
). In any case it is a technique allowed in the inmos occam compiler - just
turn off the useage checking. Provided that your synchronisation mechanism 
prevents read/write and write/write ambiguites it should work. If you need
that particular optimisation it is avaliable, however a better alternative
would probably be to reduce the amount of virtual parallelism and rewrite 
a parallel section as a single sequential one. If your problem allows effective
use of parallel common blocks in the manner described it almost certainly
isn't taking advantage of virtual parallelism.

Thirdly what is meant by 
>>>>> In the
>parallel Fortran area (anybody's standard, IBM, PCF) it is a receiver-initiated
>transmission.You can assume that the data is somehow left behind by name by the
>producers (we typically associate that with a storage cell in a shared-memory)
>and the consumer then names the data and retrieves it under its own-initiative
<<<<<
?

Does this mean that in this parallel Fortran there is no synchronisation
provided and therfore there is no need to worry about it?

Fourthly Occam and CSP are neither receiver nor originator oriented. The whole
point is that the communication is symetric. The only exception is the occam
ALT which only allows input guards. However it is simple enough to extend the
model to output guards - the only problem being that you start to lose 
efficiency since you have to provide a global synchronisation between all
pairs of processes which might possibly be engaging either directly or
indirectly in a given guarded communication. 

	Phillip M. Hallam-Baker
	ZEUS Group
	Oxford Dept nuclear Physics.

zenith@ensmp.fr (02/06/91)

In article <1461.9101292134@prg.oxford.ac.uk> HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes:

    from the article:
    Jaap Hollenberg

   "People are getting more comfortable with parallelism" - an interview
   ....  >> There is a tremendous amount of >management function placed
   on him. In the parallel Fortran paradigm the user >has very little
   management requirement, he can write an algorithm that - apart >from
   the fact that he must use the number of processors - works very much
   like >the programming paradigm he is used to.  ...

   Does anybody have any comments regarding this article ? Particularly,
   can we say
   that occam is a producer-initiated language  and Fortran is a 
   consumer-initiated language ?

   ..... yes I have a comment 
	   1	I am unable to understand what it means
           2	I don't agree with the parts I can make out.

   The premise of the article apears to be that communication based
   languages such as occam are less efficient than parallel fortran. The
   justification for this apears to be that interprocess communication
   involves a lot of copying of data.

   The first point I would make is that `efficiency' is very much less
   important to me than coding clarity. I am quite prepared to sacrifice
   half my CPU time if it makes the code easier to maintain. The amount
   of code I see which has turned into useless junk because of
   (frequently misguided) attempts at `optimisation' is quite
   depressing. If the code dosen't run fast enough I would prefer to
   write a tool to perform the optimisation of the code than start
   hacking arround in `parallel fortran'. The clarity of the CSP model
   and the simplicity of coding it allows are for me it's most
   significant advantage.

The premise of the article is probably correct.

I agree that coding clarity is important, but (and this is especially
the case on the transputer since it is so slow) efficiency is important.
Why is this? Well it derives from the fact that anyone writing parallel
code today is almost certainly doing so because they are looking for
high performance.

   My second point is that I do not beleive that the CSP paradigm is
   inefficient.  Most communications between processes involve small
   packets of data. Time taken to copy the data is then less significant
   than the time taken to set up the transfer.

Well, I would like to see a study done of the claim in your second
sentence here - but it's still going to be a few years before parallel
programming has been so common place that a study of programming
practices can produce meaningful results. Your comments are however
counter to my experience with Occam. Many times I received desperate
pleas from INMOS marketing asking me to talk to a customer because the
parallel code running on their shiny new box of 16 transputers only ran
at twice (or at times half :) the speed of their single processor
workstation. The reason almost always was that they were paying a copy
penalty. I see no evidence to support your claim and my experience
suggests otherwise.

Indeed, in your environment, it may seem that processes involve small
packets of data *because* the programmers involved are aware they will
incur a copy penalty if they do otherwise. Programming Language
Semiotics are a big interest of mine - we should talk about such issues
offline.

   I agree that passing data arround a single processor via a common
   block may be faster for large amounts of data but that trick is not
   particularly effective between different processors (even with shared
   memory ). In any case it is a technique allowed in the inmos occam
   compiler - just turn off the useage checking.

Where has your requirement for coding clarity gone? ... Out the window,
that's where ;-)

This just isn't convincing - look, the fact is anyone wanting to write
"efficient", topology specific Occam code does turn the usage checker off.
From what I hear of Occam 3 (Eek) there will now be cludges in the language
which are there essentially in recognition of this fact.

   Thirdly what is meant by >>>>> In the >parallel Fortran area
   (anybody's standard, IBM, PCF) it is a receiver-initiated
   >transmission.You can assume that the data is somehow left behind by
   name by the >producers (we typically associate that with a storage
   cell in a shared-memory) >and the consumer then names the data and
   retrieves it under its own-initiative <<<<<

Most Americans will not understand you're puzzlement or what you mean by
"what is meant by". It seems perfectly clear to me - although it is
written in American English not Oxford English. I'll translate:

* "In parallel Fortran (any standard) we can call communication
*<receiver-initiated>. Data is created by producers and mapped by
*receivers to local names."

Ok, the American original waffles and is imprecise (and it's not clear
that my interpretation is correct) - but come on this is News not IEEE
transactions or an Oxford monograph, and if you want to understand what
people in the USA say you'll have to be less pompous.

   Fourthly Occam and CSP are neither receiver nor originator oriented.
   The whole point is that the communication is symetric. The only
   exception is the occam ALT which only allows input guards. However it
   is simple enough to extend the model to output guards - the only
   problem being that you start to lose efficiency since you have to
   provide a global synchronisation between all pairs of processes which
   might possibly be engaging either directly or indirectly in a given
   guarded communication.

My advice to anyone wanting to write "efficient" Occam programs is
"don't use alt". But anyhow, now you're waffling (and maybe I am too:).

I wish people (and esp. people at Oxford) would stop using CSP and Occam
in the same breath. Occam is, at best, a poor and imperfect copy of CSP.
CSP is a rich and very elegant process mathematics (one of several such
notations around these days) and in its mathematical context it is very
useful to language designers and computer architects like myself.  But,
hey, I wouldn't ever give it to a programmer to write a program in!  To
associate this fine work with Occam continually is to do CSP a
disservice. Occam was a nice try, another step, didn't quite work out, I
think some of us learned a few lessons from it, time to move on.

Hey, have you ever had that Deja vu feeling? I'm almost certain I've
said this before ;-) [for the benefit of comp.parallel readers].

And in answer to the (rephrased) original question (I don't know the
originators id):

   "can we say that occam is ... producer-initiated ... and [the]
   Fortran [model] is a consumer-initiated ..."

No, I don't think we can say this. As Phillip says in his fourth point,
in Occam the synchronization characteristics of input and output are
such that either side can actually "initiate" a data exchange and
semantically the event happens when both processes are ready. I guess we
could use the term "consumer-initiated" (or "receiver-initiated") for
the Fortran model but I'm not sure it is strictly correct or useful to
do so.

Steven
--


--
Steven Ericsson Zenith * Email: zenith@ensmp.fr  *    Fax:(1)64.69.47.09
                       | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52
Center for Research in Computer Science - Centre de Recherche en Informatique
	     CRI - Ecole Nationale Superieure des Mines de Paris
	       35 rue Saint-Honore 77305 Fontainebleau France
    "All see beauty as beauty only because they see ugliness" LaoTzu

zenith@isatis.isatis.ensmp.fr (unknown) (02/06/91)

In article <1461.9101292134@prg.oxford.ac.uk> HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes:

    from the article:
    Jaap Hollenberg

   "People are getting more comfortable with parallelism" - an interview
   ....  >> There is a tremendous amount of >management function placed
   on him. In the parallel Fortran paradigm the user >has very little
   management requirement, he can write an algorithm that - apart >from
   the fact that he must use the number of processors - works very much
   like >the programming paradigm he is used to.  ...

   Does anybody have any comments regarding this article ? Particularly,
   can we say
   that occam is a producer-initiated language  and Fortran is a 
   consumer-initiated language ?

   ..... yes I have a comment 
	   1	I am unable to understand what it means
           2	I don't agree with the parts I can make out.

   The premise of the article apears to be that communication based
   languages such as occam are less efficient than parallel fortran. The
   justification for this apears to be that interprocess communication
   involves a lot of copying of data.

   The first point I would make is that `efficiency' is very much less
   important to me than coding clarity. I am quite prepared to sacrifice
   half my CPU time if it makes the code easier to maintain. The amount
   of code I see which has turned into useless junk because of
   (frequently misguided) attempts at `optimisation' is quite
   depressing. If the code dosen't run fast enough I would prefer to
   write a tool to perform the optimisation of the code than start
   hacking arround in `parallel fortran'. The clarity of the CSP model
   and the simplicity of coding it allows are for me it's most
   significant advantage.

The premise of the article is probably correct.

I agree that coding clarity is important, but (and this is especially
the case on the transputer since it is so slow) efficiency is important.
Why is this? Well it derives from the fact that anyone writing parallel
code today is almost certainly doing so because they are looking for
high performance.

   My second point is that I do not beleive that the CSP paradigm is
   inefficient.  Most communications between processes involve small
   packets of data. Time taken to copy the data is then less significant
   than the time taken to set up the transfer.

Well, I would like to see a study done of the claim in your second
sentence here - but it's still going to be a few years before parallel
programming has been so common place that a study of programming
practices can produce meaningful results. Your comments are however
counter to my experience with Occam. Many times I received desperate
pleas from INMOS marketing asking me to talk to a customer because the
parallel code running on their shiny new box of 16 transputers only ran
at twice (or at times half :) the speed of their single processor
workstation. The reason almost always was that they were paying a copy
penalty. I see no evidence to support your claim and my experience
suggests otherwise.

Indeed, in your environment, it may seem that processes involve small
packets of data *because* the programmers involved are aware they will
incur a copy penalty if they do otherwise. Programming Language
Semiotics are a big interest of mine - we should talk about such issues
offline.

   I agree that passing data arround a single processor via a common
   block may be faster for large amounts of data but that trick is not
   particularly effective between different processors (even with shared
   memory ). In any case it is a technique allowed in the inmos occam
   compiler - just turn off the useage checking.

Where has your requirement for coding clarity gone? ... Out the window,
that's where ;-)

This just isn't convincing - look, the fact is anyone wanting to write
"efficient", topology specific Occam code does turn the usage checker off.
>From what I hear of Occam 3 (Eek) there will now be cludges in the language
which are there essentially in recognition of this fact.

   Thirdly what is meant by >>>>> In the >parallel Fortran area
   (anybody's standard, IBM, PCF) it is a receiver-initiated
   >transmission.You can assume that the data is somehow left behind by
   name by the >producers (we typically associate that with a storage
   cell in a shared-memory) >and the consumer then names the data and
   retrieves it under its own-initiative <<<<<

Most Americans will not understand you're puzzlement or what you mean by
"what is meant by". It seems perfectly clear to me - although it is
written in American English not Oxford English. I'll translate:

* "In parallel Fortran (any standard) we can call communication
*<receiver-initiated>. Data is created by producers and mapped by
*receivers to local names."

Ok, the American original waffles and is imprecise (and it's not clear
that my interpretation is correct) - but come on this is News not IEEE
transactions or an Oxford monograph, and if you want to understand what
people in the USA say you'll have to be less pompous.

   Fourthly Occam and CSP are neither receiver nor originator oriented.
   The whole point is that the communication is symetric. The only
   exception is the occam ALT which only allows input guards. However it
   is simple enough to extend the model to output guards - the only
   problem being that you start to lose efficiency since you have to
   provide a global synchronisation between all pairs of processes which
   might possibly be engaging either directly or indirectly in a given
   guarded communication.

My advice to anyone wanting to write "efficient" Occam programs is
"don't use alt". But anyhow, now you're waffling (and maybe I am too:).

I wish people (and esp. people at Oxford) would stop using CSP and Occam
in the same breath. Occam is, at best, a poor and imperfect copy of CSP.
CSP is a rich and very elegant process mathematics (one of several such
notations around these days) and in its mathematical context it is very
useful to language designers and computer architects like myself.  But,
hey, I wouldn't ever give it to a programmer to write a program in!  To
associate this fine work with Occam continually is to do CSP a
disservice. Occam was a nice try, another step, didn't quite work out, I
think some of us learned a few lessons from it, time to move on.

Hey, have you ever had that Deja vu feeling? I'm almost certain I've
said this before ;-) [for the benefit of comp.parallel readers].

And in answer to the (rephrased) original question (I don't know the
originators id):

   "can we say that occam is ... producer-initiated ... and [the]
   Fortran [model] is a consumer-initiated ..."

No, I don't think we can say this. As Phillip says in his fourth point,
in Occam the synchronization characteristics of input and output are
such that either side can actually "initiate" a data exchange and
semantically the event happens when both processes are ready. I guess we
could use the term "consumer-initiated" (or "receiver-initiated") for
the Fortran model but I'm not sure it is strictly correct or useful to
do so.

Steven
--

--
Steven Ericsson Zenith * Email: zenith@ensmp.fr  *    Fax:(1)64.69.47.09
                       | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52
Center for Research in Computer Science - Centre de Recherche en Informatique
	     CRI - Ecole Nationale Superieure des Mines de Paris
	       35 rue Saint-Honore 77305 Fontainebleau France
    "All see beauty as beauty only because they see ugliness" LaoTzu

greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (02/07/91)

CSP is a fine theoretical entity, but it surely is not a programming language,
since CSP (as defined in Hoare's book) "programs" are not capable of actually 
doing anything.  (Hoare's definition of parallel composition is inadequate.)

Occam is the rather ugly result of one attempt at including the elegant
concepts of CSP in a practical and efficient programming language.
(Note that, ignoring the procedural/functional differences, the main 
conceptual difference between CSP and occam involves the action of parallel 
composition.)

As far as the debate over the 'copying controversy' goes, let me offer my
own opinions.  Whether one prefers a shared-variable paradigm or a message-
passing paradigm, anyone should recognize that it is just plain ugly to
mix the use of these paradigms.  In the absence of a virtual shared-memory
(in which the underlying message-passing is made invisible to the programmer),
physically distributed memory forces us to accept a message-passing paradigm.

At a purely practical level, the 'copy-penalty' is probably not a real issue
since any program that aspires to efficieny (when compared to sequential
programs) must be coarse-grained enough so as to make the communication
time negligible when compared to the computation time.  When this is the case
the 'copy-penalty' will obviously be negligible also.

Incidentally, the ALT statement in occam is NOT a source of inefficiency for
occam programs.  Since occam channels may be accessed by only two processes,
the system designers were able to implement the ALT statement quite
efficiently.  An ALT process enables all of its inputs, and then deschedules
itself until there is a matching output process for one of the ALT guards.  
Only then does the ALT process get rescheduled.  Since the ALT process does not
consume processor cycles during the waiting period (other processes can be 
executed), ALT execution is quite efficient.


Jonathan

zenith@isatis.isatis.ensmp.fr (unknown) (02/08/91)

In article <1991Feb6.122949.8210@rodan.acs.syr.edu> greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) writes:

   At a purely practical level, the 'copy-penalty' is probably not a real issue
   since any program that aspires to efficieny (when compared to sequential
   programs) must be coarse-grained enough so as to make the communication
   time negligible when compared to the computation time.  When this is the case
   the 'copy-penalty' will obviously be negligible also.

And only true when message passing is not used as the generalized
paradigm (as in Occam) and you're writing topology specific code with a
detailed awareness of the target machine... I agree.

On the other hand ...

Steven
--
Steven Ericsson Zenith * Email: zenith@ensmp.fr  *    Fax:(1)64.69.47.09
                       | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52
Center for Research in Computer Science - Centre de Recherche en Informatique
	     CRI - Ecole Nationale Superieure des Mines de Paris
	       35 rue Saint-Honore 77305 Fontainebleau France
    "All see beauty as beauty only because they see ugliness" LaoTzu

greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (02/12/91)

In article <ZENITH.91Feb8105612@isatis.isatis.ensmp.fr> zenith@isatis.isatis.ensmp.fr (unknown) writes:

>>At a purely practical level, the 'copy-penalty' is probably not a real issue
>>since any program that aspires to efficieny (when compared to sequential
>>programs) must be coarse-grained enough so as to make the communication
>>time negligible when compared to the computation time.  When this is the case
>>the 'copy-penalty' will obviously be negligible also.
>
>And only true when message passing is not used as the generalized
>paradigm (as in Occam)

Are you saying that such languages (as occam) can never be used to write
programs in which the communication time is negligible?  What is the basis for 
saying this?

>and you're writing topology specific code with a detailed awareness of the 
>target machine... I agree.

I'm not sure how this is relevant.  The 'copy-penalty' is only relevant to
synchronization of processes within one physical processor.  If you are
dealing with a system in which processes are automatically mapped to
processors, then you can never assume that two processes reside on a single 
processor.  Therefore, the program could not make use of shared variables
anyway.  (Assuming that the system does not provide a virtual shared memory.)

If you are suggesting that such systems could never support programs in which
the communication time were negligible, then I don't believe your statement to
be fundamentally true.  Systems that support non-topology specific programs 
currently involve large communication overhead, making it difficult (but not 
necessarily impossible) to develop programs in which the communication time is
negligible.  However, this is a result of the current state of technology.  
There is no reason for us to assume that the communication overhead for such 
systems will not see significant reductions in the future.

Jonathan

don@ohm.york.ac.uk (Don Goodeve) (02/12/91)

in article <1991Feb6.122949.8210@rodan.acs.syr.edu>, greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) says:

> Occam is the rather ugly result of one attempt at including the elegant
> concepts of CSP in a practical and efficient programming language.
> 
> Jonathan

Hmmm. well.

The issue as to the aesthetic nature of occam is a rather subjective
one. In my own experience, Occam (2) provides a consise, clear and
expressive medium for coding parallel applications. Competitors such
as parallel C etc. do not combine communications particularly elegantly
and cause programmers to err on the side of coarse-grain programming.

Occam and the transputer go together, both being developed along the lines
of Hoares' CSP together. As a result, Occam is about 30% or so more efficient
on the transputer in terms of binary size and execution speed than any other
programming medium. This may be seen as a limitation.

As a counter to this I would suggest that a general-purpose parallel machine
needs a solid foundation. The combination of the transputer and occam does
provide a unified foundation and is the only example of its' kind.

The CSP notation is perhaps not the most expressive or useful notation. In a lot
of ways it is limited in my opinion, but is very useful nevertheless. Milners'
CCS provides a greater flexibility and expressiveness although it does not
directly map to a language in the way that CSP does.

The advantage of basing a complete system design on such a model as CCS (CSP) is
the versatility that results. Any parallel programming language / paradigm can
be represented in the form of CCS. A system which can efficiently manage a
CCS-like language should therefore be able to support any paradigm built
on this foundation.

Some incomplete thoughts but I think you get the idea.

As regards shared memory, why not?? - OK so suddenly multiple processes have
to be bundled together to talk to the same section of memory (or to the same
device ...). This is NOT a problem, I agree that large shared memories are
bad news on the implementation front, but shared memory objects are not
necassarily a bad idea.

Enough rambling..... Back to work.....

 

-- 
 ---------------------------------------------
| Don --- Well why not? Someone has got to be!|
 ---------------------------------------------