HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") (01/30/91)
Dear Net, >>The following was extracted from the article: Jaap Hollenberg "People are getting more comfortable with parallelism" - an interview .... >> There is a tremendous amount of >management function placed on him. In the parallel Fortran paradigm the user >has very little management requirement, he can write an algorithm that - apart >from the fact that he must use the number of processors - works very much like >the programming paradigm he is used to. ... Does anybody have any comments regarding this article ? Particularly, can we say that occam is a producer-initiated language and Fortran is a consumer-initiated language ? ..... yes I have a comment 1 I am unable to understand what it means 2 I don't agree with the parts I can make out. The premise of the article apears to be that communication based languages such as occam are less efficient than parallel fortran. The justification for this apears to be that interprocess communication involves a lot of copying of data. The first point I would make is that `efficiency' is very much less important to me than coding clarity. I am quite prepared to sacrifice half my CPU time if it makes the code easier to maintain. The amount of code I see which has turned into useless junk because of (frequently misguided) attempts at `optimisation' is quite depressing. If the code dosen't run fast enough I would prefer to write a tool to perform the optimisation of the code than start hacking arround in `parallel fortran'. The clarity of the CSP model and the simplicity of coding it allows are for me it's most significant advantage. My second point is that I do not beleive that the CSP paradigm is inefficient. Most communications between processes involve small packets of data. Time taken to copy the data is then less significant than the time taken to set up the transfer. I agree that passing data arround a single processor via a common block may be faster for large amounts of data but that trick is not particularly effective between different processors (even with shared memory ). In any case it is a technique allowed in the inmos occam compiler - just turn off the useage checking. Provided that your synchronisation mechanism prevents read/write and write/write ambiguites it should work. If you need that particular optimisation it is avaliable, however a better alternative would probably be to reduce the amount of virtual parallelism and rewrite a parallel section as a single sequential one. If your problem allows effective use of parallel common blocks in the manner described it almost certainly isn't taking advantage of virtual parallelism. Thirdly what is meant by >>>>> In the >parallel Fortran area (anybody's standard, IBM, PCF) it is a receiver-initiated >transmission.You can assume that the data is somehow left behind by name by the >producers (we typically associate that with a storage cell in a shared-memory) >and the consumer then names the data and retrieves it under its own-initiative <<<<< ? Does this mean that in this parallel Fortran there is no synchronisation provided and therfore there is no need to worry about it? Fourthly Occam and CSP are neither receiver nor originator oriented. The whole point is that the communication is symetric. The only exception is the occam ALT which only allows input guards. However it is simple enough to extend the model to output guards - the only problem being that you start to lose efficiency since you have to provide a global synchronisation between all pairs of processes which might possibly be engaging either directly or indirectly in a given guarded communication. Phillip M. Hallam-Baker ZEUS Group Oxford Dept nuclear Physics.
zenith@ensmp.fr (02/06/91)
In article <1461.9101292134@prg.oxford.ac.uk> HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes: from the article: Jaap Hollenberg "People are getting more comfortable with parallelism" - an interview .... >> There is a tremendous amount of >management function placed on him. In the parallel Fortran paradigm the user >has very little management requirement, he can write an algorithm that - apart >from the fact that he must use the number of processors - works very much like >the programming paradigm he is used to. ... Does anybody have any comments regarding this article ? Particularly, can we say that occam is a producer-initiated language and Fortran is a consumer-initiated language ? ..... yes I have a comment 1 I am unable to understand what it means 2 I don't agree with the parts I can make out. The premise of the article apears to be that communication based languages such as occam are less efficient than parallel fortran. The justification for this apears to be that interprocess communication involves a lot of copying of data. The first point I would make is that `efficiency' is very much less important to me than coding clarity. I am quite prepared to sacrifice half my CPU time if it makes the code easier to maintain. The amount of code I see which has turned into useless junk because of (frequently misguided) attempts at `optimisation' is quite depressing. If the code dosen't run fast enough I would prefer to write a tool to perform the optimisation of the code than start hacking arround in `parallel fortran'. The clarity of the CSP model and the simplicity of coding it allows are for me it's most significant advantage. The premise of the article is probably correct. I agree that coding clarity is important, but (and this is especially the case on the transputer since it is so slow) efficiency is important. Why is this? Well it derives from the fact that anyone writing parallel code today is almost certainly doing so because they are looking for high performance. My second point is that I do not beleive that the CSP paradigm is inefficient. Most communications between processes involve small packets of data. Time taken to copy the data is then less significant than the time taken to set up the transfer. Well, I would like to see a study done of the claim in your second sentence here - but it's still going to be a few years before parallel programming has been so common place that a study of programming practices can produce meaningful results. Your comments are however counter to my experience with Occam. Many times I received desperate pleas from INMOS marketing asking me to talk to a customer because the parallel code running on their shiny new box of 16 transputers only ran at twice (or at times half :) the speed of their single processor workstation. The reason almost always was that they were paying a copy penalty. I see no evidence to support your claim and my experience suggests otherwise. Indeed, in your environment, it may seem that processes involve small packets of data *because* the programmers involved are aware they will incur a copy penalty if they do otherwise. Programming Language Semiotics are a big interest of mine - we should talk about such issues offline. I agree that passing data arround a single processor via a common block may be faster for large amounts of data but that trick is not particularly effective between different processors (even with shared memory ). In any case it is a technique allowed in the inmos occam compiler - just turn off the useage checking. Where has your requirement for coding clarity gone? ... Out the window, that's where ;-) This just isn't convincing - look, the fact is anyone wanting to write "efficient", topology specific Occam code does turn the usage checker off. From what I hear of Occam 3 (Eek) there will now be cludges in the language which are there essentially in recognition of this fact. Thirdly what is meant by >>>>> In the >parallel Fortran area (anybody's standard, IBM, PCF) it is a receiver-initiated >transmission.You can assume that the data is somehow left behind by name by the >producers (we typically associate that with a storage cell in a shared-memory) >and the consumer then names the data and retrieves it under its own-initiative <<<<< Most Americans will not understand you're puzzlement or what you mean by "what is meant by". It seems perfectly clear to me - although it is written in American English not Oxford English. I'll translate: * "In parallel Fortran (any standard) we can call communication *<receiver-initiated>. Data is created by producers and mapped by *receivers to local names." Ok, the American original waffles and is imprecise (and it's not clear that my interpretation is correct) - but come on this is News not IEEE transactions or an Oxford monograph, and if you want to understand what people in the USA say you'll have to be less pompous. Fourthly Occam and CSP are neither receiver nor originator oriented. The whole point is that the communication is symetric. The only exception is the occam ALT which only allows input guards. However it is simple enough to extend the model to output guards - the only problem being that you start to lose efficiency since you have to provide a global synchronisation between all pairs of processes which might possibly be engaging either directly or indirectly in a given guarded communication. My advice to anyone wanting to write "efficient" Occam programs is "don't use alt". But anyhow, now you're waffling (and maybe I am too:). I wish people (and esp. people at Oxford) would stop using CSP and Occam in the same breath. Occam is, at best, a poor and imperfect copy of CSP. CSP is a rich and very elegant process mathematics (one of several such notations around these days) and in its mathematical context it is very useful to language designers and computer architects like myself. But, hey, I wouldn't ever give it to a programmer to write a program in! To associate this fine work with Occam continually is to do CSP a disservice. Occam was a nice try, another step, didn't quite work out, I think some of us learned a few lessons from it, time to move on. Hey, have you ever had that Deja vu feeling? I'm almost certain I've said this before ;-) [for the benefit of comp.parallel readers]. And in answer to the (rephrased) original question (I don't know the originators id): "can we say that occam is ... producer-initiated ... and [the] Fortran [model] is a consumer-initiated ..." No, I don't think we can say this. As Phillip says in his fourth point, in Occam the synchronization characteristics of input and output are such that either side can actually "initiate" a data exchange and semantically the event happens when both processes are ready. I guess we could use the term "consumer-initiated" (or "receiver-initiated") for the Fortran model but I'm not sure it is strictly correct or useful to do so. Steven -- -- Steven Ericsson Zenith * Email: zenith@ensmp.fr * Fax:(1)64.69.47.09 | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52 Center for Research in Computer Science - Centre de Recherche en Informatique CRI - Ecole Nationale Superieure des Mines de Paris 35 rue Saint-Honore 77305 Fontainebleau France "All see beauty as beauty only because they see ugliness" LaoTzu
zenith@isatis.isatis.ensmp.fr (unknown) (02/06/91)
In article <1461.9101292134@prg.oxford.ac.uk> HALLAM@vax1.physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes: from the article: Jaap Hollenberg "People are getting more comfortable with parallelism" - an interview .... >> There is a tremendous amount of >management function placed on him. In the parallel Fortran paradigm the user >has very little management requirement, he can write an algorithm that - apart >from the fact that he must use the number of processors - works very much like >the programming paradigm he is used to. ... Does anybody have any comments regarding this article ? Particularly, can we say that occam is a producer-initiated language and Fortran is a consumer-initiated language ? ..... yes I have a comment 1 I am unable to understand what it means 2 I don't agree with the parts I can make out. The premise of the article apears to be that communication based languages such as occam are less efficient than parallel fortran. The justification for this apears to be that interprocess communication involves a lot of copying of data. The first point I would make is that `efficiency' is very much less important to me than coding clarity. I am quite prepared to sacrifice half my CPU time if it makes the code easier to maintain. The amount of code I see which has turned into useless junk because of (frequently misguided) attempts at `optimisation' is quite depressing. If the code dosen't run fast enough I would prefer to write a tool to perform the optimisation of the code than start hacking arround in `parallel fortran'. The clarity of the CSP model and the simplicity of coding it allows are for me it's most significant advantage. The premise of the article is probably correct. I agree that coding clarity is important, but (and this is especially the case on the transputer since it is so slow) efficiency is important. Why is this? Well it derives from the fact that anyone writing parallel code today is almost certainly doing so because they are looking for high performance. My second point is that I do not beleive that the CSP paradigm is inefficient. Most communications between processes involve small packets of data. Time taken to copy the data is then less significant than the time taken to set up the transfer. Well, I would like to see a study done of the claim in your second sentence here - but it's still going to be a few years before parallel programming has been so common place that a study of programming practices can produce meaningful results. Your comments are however counter to my experience with Occam. Many times I received desperate pleas from INMOS marketing asking me to talk to a customer because the parallel code running on their shiny new box of 16 transputers only ran at twice (or at times half :) the speed of their single processor workstation. The reason almost always was that they were paying a copy penalty. I see no evidence to support your claim and my experience suggests otherwise. Indeed, in your environment, it may seem that processes involve small packets of data *because* the programmers involved are aware they will incur a copy penalty if they do otherwise. Programming Language Semiotics are a big interest of mine - we should talk about such issues offline. I agree that passing data arround a single processor via a common block may be faster for large amounts of data but that trick is not particularly effective between different processors (even with shared memory ). In any case it is a technique allowed in the inmos occam compiler - just turn off the useage checking. Where has your requirement for coding clarity gone? ... Out the window, that's where ;-) This just isn't convincing - look, the fact is anyone wanting to write "efficient", topology specific Occam code does turn the usage checker off. >From what I hear of Occam 3 (Eek) there will now be cludges in the language which are there essentially in recognition of this fact. Thirdly what is meant by >>>>> In the >parallel Fortran area (anybody's standard, IBM, PCF) it is a receiver-initiated >transmission.You can assume that the data is somehow left behind by name by the >producers (we typically associate that with a storage cell in a shared-memory) >and the consumer then names the data and retrieves it under its own-initiative <<<<< Most Americans will not understand you're puzzlement or what you mean by "what is meant by". It seems perfectly clear to me - although it is written in American English not Oxford English. I'll translate: * "In parallel Fortran (any standard) we can call communication *<receiver-initiated>. Data is created by producers and mapped by *receivers to local names." Ok, the American original waffles and is imprecise (and it's not clear that my interpretation is correct) - but come on this is News not IEEE transactions or an Oxford monograph, and if you want to understand what people in the USA say you'll have to be less pompous. Fourthly Occam and CSP are neither receiver nor originator oriented. The whole point is that the communication is symetric. The only exception is the occam ALT which only allows input guards. However it is simple enough to extend the model to output guards - the only problem being that you start to lose efficiency since you have to provide a global synchronisation between all pairs of processes which might possibly be engaging either directly or indirectly in a given guarded communication. My advice to anyone wanting to write "efficient" Occam programs is "don't use alt". But anyhow, now you're waffling (and maybe I am too:). I wish people (and esp. people at Oxford) would stop using CSP and Occam in the same breath. Occam is, at best, a poor and imperfect copy of CSP. CSP is a rich and very elegant process mathematics (one of several such notations around these days) and in its mathematical context it is very useful to language designers and computer architects like myself. But, hey, I wouldn't ever give it to a programmer to write a program in! To associate this fine work with Occam continually is to do CSP a disservice. Occam was a nice try, another step, didn't quite work out, I think some of us learned a few lessons from it, time to move on. Hey, have you ever had that Deja vu feeling? I'm almost certain I've said this before ;-) [for the benefit of comp.parallel readers]. And in answer to the (rephrased) original question (I don't know the originators id): "can we say that occam is ... producer-initiated ... and [the] Fortran [model] is a consumer-initiated ..." No, I don't think we can say this. As Phillip says in his fourth point, in Occam the synchronization characteristics of input and output are such that either side can actually "initiate" a data exchange and semantically the event happens when both processes are ready. I guess we could use the term "consumer-initiated" (or "receiver-initiated") for the Fortran model but I'm not sure it is strictly correct or useful to do so. Steven -- -- Steven Ericsson Zenith * Email: zenith@ensmp.fr * Fax:(1)64.69.47.09 | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52 Center for Research in Computer Science - Centre de Recherche en Informatique CRI - Ecole Nationale Superieure des Mines de Paris 35 rue Saint-Honore 77305 Fontainebleau France "All see beauty as beauty only because they see ugliness" LaoTzu
greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (02/07/91)
CSP is a fine theoretical entity, but it surely is not a programming language, since CSP (as defined in Hoare's book) "programs" are not capable of actually doing anything. (Hoare's definition of parallel composition is inadequate.) Occam is the rather ugly result of one attempt at including the elegant concepts of CSP in a practical and efficient programming language. (Note that, ignoring the procedural/functional differences, the main conceptual difference between CSP and occam involves the action of parallel composition.) As far as the debate over the 'copying controversy' goes, let me offer my own opinions. Whether one prefers a shared-variable paradigm or a message- passing paradigm, anyone should recognize that it is just plain ugly to mix the use of these paradigms. In the absence of a virtual shared-memory (in which the underlying message-passing is made invisible to the programmer), physically distributed memory forces us to accept a message-passing paradigm. At a purely practical level, the 'copy-penalty' is probably not a real issue since any program that aspires to efficieny (when compared to sequential programs) must be coarse-grained enough so as to make the communication time negligible when compared to the computation time. When this is the case the 'copy-penalty' will obviously be negligible also. Incidentally, the ALT statement in occam is NOT a source of inefficiency for occam programs. Since occam channels may be accessed by only two processes, the system designers were able to implement the ALT statement quite efficiently. An ALT process enables all of its inputs, and then deschedules itself until there is a matching output process for one of the ALT guards. Only then does the ALT process get rescheduled. Since the ALT process does not consume processor cycles during the waiting period (other processes can be executed), ALT execution is quite efficient. Jonathan
zenith@isatis.isatis.ensmp.fr (unknown) (02/08/91)
In article <1991Feb6.122949.8210@rodan.acs.syr.edu> greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) writes: At a purely practical level, the 'copy-penalty' is probably not a real issue since any program that aspires to efficieny (when compared to sequential programs) must be coarse-grained enough so as to make the communication time negligible when compared to the computation time. When this is the case the 'copy-penalty' will obviously be negligible also. And only true when message passing is not used as the generalized paradigm (as in Occam) and you're writing topology specific code with a detailed awareness of the target machine... I agree. On the other hand ... Steven -- Steven Ericsson Zenith * Email: zenith@ensmp.fr * Fax:(1)64.69.47.09 | Francais:(1)64.69.47.08 | Office:(1)64.69.48.52 Center for Research in Computer Science - Centre de Recherche en Informatique CRI - Ecole Nationale Superieure des Mines de Paris 35 rue Saint-Honore 77305 Fontainebleau France "All see beauty as beauty only because they see ugliness" LaoTzu
greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (02/12/91)
In article <ZENITH.91Feb8105612@isatis.isatis.ensmp.fr> zenith@isatis.isatis.ensmp.fr (unknown) writes: >>At a purely practical level, the 'copy-penalty' is probably not a real issue >>since any program that aspires to efficieny (when compared to sequential >>programs) must be coarse-grained enough so as to make the communication >>time negligible when compared to the computation time. When this is the case >>the 'copy-penalty' will obviously be negligible also. > >And only true when message passing is not used as the generalized >paradigm (as in Occam) Are you saying that such languages (as occam) can never be used to write programs in which the communication time is negligible? What is the basis for saying this? >and you're writing topology specific code with a detailed awareness of the >target machine... I agree. I'm not sure how this is relevant. The 'copy-penalty' is only relevant to synchronization of processes within one physical processor. If you are dealing with a system in which processes are automatically mapped to processors, then you can never assume that two processes reside on a single processor. Therefore, the program could not make use of shared variables anyway. (Assuming that the system does not provide a virtual shared memory.) If you are suggesting that such systems could never support programs in which the communication time were negligible, then I don't believe your statement to be fundamentally true. Systems that support non-topology specific programs currently involve large communication overhead, making it difficult (but not necessarily impossible) to develop programs in which the communication time is negligible. However, this is a result of the current state of technology. There is no reason for us to assume that the communication overhead for such systems will not see significant reductions in the future. Jonathan
don@ohm.york.ac.uk (Don Goodeve) (02/12/91)
in article <1991Feb6.122949.8210@rodan.acs.syr.edu>, greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) says: > Occam is the rather ugly result of one attempt at including the elegant > concepts of CSP in a practical and efficient programming language. > > Jonathan Hmmm. well. The issue as to the aesthetic nature of occam is a rather subjective one. In my own experience, Occam (2) provides a consise, clear and expressive medium for coding parallel applications. Competitors such as parallel C etc. do not combine communications particularly elegantly and cause programmers to err on the side of coarse-grain programming. Occam and the transputer go together, both being developed along the lines of Hoares' CSP together. As a result, Occam is about 30% or so more efficient on the transputer in terms of binary size and execution speed than any other programming medium. This may be seen as a limitation. As a counter to this I would suggest that a general-purpose parallel machine needs a solid foundation. The combination of the transputer and occam does provide a unified foundation and is the only example of its' kind. The CSP notation is perhaps not the most expressive or useful notation. In a lot of ways it is limited in my opinion, but is very useful nevertheless. Milners' CCS provides a greater flexibility and expressiveness although it does not directly map to a language in the way that CSP does. The advantage of basing a complete system design on such a model as CCS (CSP) is the versatility that results. Any parallel programming language / paradigm can be represented in the form of CCS. A system which can efficiently manage a CCS-like language should therefore be able to support any paradigm built on this foundation. Some incomplete thoughts but I think you get the idea. As regards shared memory, why not?? - OK so suddenly multiple processes have to be bundled together to talk to the same section of memory (or to the same device ...). This is NOT a problem, I agree that large shared memories are bad news on the implementation front, but shared memory objects are not necassarily a bad idea. Enough rambling..... Back to work..... -- --------------------------------------------- | Don --- Well why not? Someone has got to be!| ---------------------------------------------