jim_d@cimcorMN.ORG (Jim Dahlberg) (03/18/89)
Here is a summary of the responses to my VME question regarding using VME to broadcast to other processors. As an aside, I was suprised that I received most of my replies via email. I thought that this was supposed to be more of a forum for everyone, so that everyone could benefit by the discussions. Here is a summary of the original question: | I am working on a multiprocessor system on the VME bus. [How can I] | *broadcast* data to the other processors' local memory? =========================================================================== FROM markw@hpsal2.HP.COM Try the following: Dedicate some region of VME address space or a "User defined" address modifier code (10-1F) to the broadcast address space. To do a broadcast, issue a write to the broadcast space. Your VME processors will accept the broadcast and no others will be affected. Depending on what you want to use broadcasts for, you may want to self-acknowledge (assert DTACK) the VME broadcast at the master (if, for example, the broadcast is a reset, which must be reliable). =========================================================================== >From mcc.com!shamash!@MCC.COM:rfg In article <646@cimcor.mn.org> you write: > > I am working on a multiprocessor system which will have >multiprocessors on the VME bus. It will be necessary for the >application software to broadcast data to the other processors' local >memory. I am particularily interested in this approach, i.e. what I would call "firm" coupling, wherein each processor does in fact have local memory, (so that the system in general has features of a loosely coupled system), but where there is also a hardware-supported broadcast capability which can be used to effectively simulate shared memory. Unfortunately, I have yet to be able to convince aynbody that this is a good or useful idea. (All of the people of this project are loose- coupling bigots). The particular case in which this approach would be a huge win would be Ada applications. As you may know, Ada's model of parallel computation pretty much requires hardware which at least simulates tight-coupling because of the language semantics which allow variables to be directly shared between multiple (otherwise independent) tasks. I suggested a multiprocessor system design including a unique broadcast capability in my 1987 CS Master's thesis. (Of course, because I went to a small State college in California, nobody ever read it). I'll try not to bore you too much, but, in a nutshell, here is the idea I had. Basically, on each "node" you would use a stock 86K MMU (68451?). I recall that in the specs for this part it said that there was a kind-of spare bit in each page descriptor entry (either in memory or cached in the MMU). This extra bit was called the "shared" bit or the "don't cache" bit or something. Anyway, it was always driven out on one of the MMU pins during each bus cycle. I figured that you could use this bit (that is to say the MMU output signal it produced) to tell the hardware whether or not the current memory location being accessed was *replicated* on other nodes. If it was not, or if the operation was a read, then the whole transaction is satisfied locally. If however the replicated bit is set for the current page, and if the operation is a write, then the memory interface hardware detects this fact, and *only* then causes the write to be done *both* locally, and also broadcast onto the global bus. This approach assures that only those pages which must cause broadcasts on writes will in fact do so. Also, the particular set of pages which cause broadcasting is, at all times, directly under control of the operating system, and can be easily changed by the OS. Now regarding the listening side. I also envisioned that on each "node" you would have a second MMU, just like the first, except that its address input lines are hooked to the global bus. It thus acts like a global bus "snoop", waiting for global bus transactions which it cares about before doing anything. As with the "sender" MMU, this "receiver" MMU would have its own set of mapping tables, which could be maintained by the operating system. Also, just like for the sender, you could use the extra bit in each of the mapping table entries to tell this receiver MMU that it needs to do something special whenever it sees input addresses which fall into this particular page. The "something special" it would do for global bus transactions which have addresses which fall onto the receiver MMU's "special pages" would be to go ahead and actually "accept" the boradcasted data, and actually do the write locally (at the receiver-MMU mapped local physical address). This whole scheme allows you to simulate a shared memory machine on what it mostly a loosely coupled machine, while minimizing global bus traffic as much as possible. Note that only writes (never reads) go onto the global bus. Also note that even for write, the operating system can setup the mapping tables so that only the writes which go to a particular (OS-determined) set of pages ever cause global bus traffic. This reduces global bus traffic (and contention) even further... to near the absolute minimum needed to implement simulated shared memory via broadcast-based distributed replication of shared variables. This scheme in effect uses local main memory banks in much the same way as caches are normally used tighly coupled multiprocessor systems. So what is the advantage of this scheme over typical caching? Significantly reduced contention for global resources (i.e. busses and/or memory) because of the fine-grained control (i.e. page by page) of broadcasting. Also, note that there is no reason that you could not use this scheme and also use traditional caching hardware at the same time. This would give you a double win. Well, I've said more that enough. I'd like to know what you think of the idea, and I'd be interested in finding out more about the system you are planning to build. If it is to have a broadcast capability, then perhaps you will be looking for somebody to port an Ada compiler to it someday. :-) ========================================================================= >From shamash!wheaties.ai.mit.edu!sundar This is in reference to your question regarding VME bus broadcasting. We looked at this problem over a year ago and there was no way you could do this without bending the specs a whole lot. I also dont know of anyone commercially who does this sort of thing. I'd appreciate it if you could send me a note if you hear otherwise. -Sundar ========================================================================= >From shamash!uunet.UU.NET!mcvax!memex.co.uk!peter It is feasible, but only if you know the details of ALL the boards you are broadcasting to. VME was not designed for this, but if your slowest board takes the data before the fastest one produces Dtack you should be OK. If your boards have CPUs on them which might hold up VME access to the local memory for unpredictable amounts of time you may have problems. A colleague here and I argue about this from time to time. He says, rightly, that it is not strictly illegal to do this; I say it against the spirit of the VME spec. What you really want is P896 FutureBus, this had broadcast write (and broadcall read) in the spec. > If necessary, we can 'bend' the VME spec to allow this, since we >are already planning to use a non-standard VME connector. Also all the >VME cards will be unique, so they don't have to conform to the standard. >But we would like to stay with the standard as much as possible. What do you mean by "unique"? I am curious why you want VME if you don't use the standard connector and your boards are also "unique". This seems to prevent you using any standard VME boards in your system. Peter Ilieve peter@memex.co.uk =========================================================================== Jim Dahlberg Internet: jim_d@cimcor.mn.org UUCP: uunet!rosevax!cimcor!jim_d