daveh@cbmvax.commodore.com (Dave Haynie) (03/22/91)
I'm looking for references, discussion, flames, whatever on various solutions to multiprocessing, especially in relation to support for said on microcomputer buses. It looks like the typical answer for "who caches what" on a microcomputer is somewhere between "the host processor is the only thing that can cache" to "nobody caches any shared memory". What kind of cache support protocols are being used, if any. The only bus I have any reference to that supports any sort of cache coherency scheme, at the moment, is FutureBus+. Which would imply that at the moment, zero microcomputer systems solve this problem. Along with cache problems, interrupts are another multiprocessor question. If a device issues an interrupt, which processor does it go to? Most systems seem to be saying "only the host processor". Pre-Apple NuBus did have a solution to this problem, but the solution was to simply eliminate normal level sensitive, easy to use interrupts. An I/O device would then generate an interrupt by mastering the bus and banging a magic location in the memory map of the processor of choice. Sure, it would work, but requiring bus master capability isn't an easy way to build a $50 serial port card that supports multiple masters. Anyone doing it better (again, in the context of a micro). -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "What works for me might work for you" -Jimmy Buffett
jerry@TALOS.UUCP (Jerry Gitomer) (03/22/91)
daveh@cbmvax.commodore.com (Dave Haynie) writes: |I'm looking for references, discussion, flames, whatever on various solutions |to multiprocessing, especially in relation to support for said on microcomputer |buses. Dave, do you want to limit this discussion to production chips such as the 80x86 and the 680x0 and buses or do you want to include custom chips, systems that never made it to market, and whatever? Jerry -- Jerry Gitomer at National Political Resources Inc, Alexandria, VA USA I am apolitical, have no resources, and speak only for myself. Ma Bell (703)683-9090 (UUCP: ...{uupsi,vrdxhq}!pbs!npri6!jerry
koll@NECAM.tdd.sj.nec.com (Michael Goldman) (03/23/91)
The only experience I've had with multiprocessing and bus issues was in a real-time system using 680x0 & Ready Systems' VRTX kernel and the related MPV (MultiProcessor for for VME). MPV uses a table in global memory to keep track of where a CPU is and how to access it. System calls (posts to queues, semaphores, etc.) can be made across the bus by interrupting the addressed CPU, (VME interrupt or mailbox interrupt) and passing a structure to the MPV code on the 2nd CPU that contains the parameters necessary to make the system call locally. In this situation, even though all memory is global, it is shared through semaphore-like arrangements, using the Motorola 68K TAS (test and set) instruction. One member of our team discovered an interesting problem with the TAS instruction using cycle-by-cycle-interleaved memory access. Although the VME spec requires that the bus be locked during the multi-cycle TAS to insure that TAS is an atomic operation, the local CPU bus did not. Thus, we occasionally had CPU-1's TAS interleave with CPU-2's TAS. This worked as follows: CPU-1 TEST (thinks lock open) SET (thinks it owns memory) CPU-2 TEST (thinks lock open) SET (thinks it owns memory) Memory 0 0 1 1 (trash) The solution for us was to put the MPV tables on a separate memory-only board. I believe Software Components' pSOS+ MPV-equivalent approaches the problem differently. Radstone sells a VME board that has a 68040 and a 68020 on the same board. They consider the 68020 as an I/O processor that fields all the interrupts for Ethernet, SCSI, VMEbus, etc. with separate memory for each CPU. One of the proprietary Real-Time Unix companies did something similar to minimize interrupt latency. There was a lot of concern about bus-snooping in the Futurebus+ discussions and I'm curious as to how it was resolved.
jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (03/23/91)
It appears that there are many people who don't realize that snooping caches on bus based multiprocessors are a reality in the marketplace. Encore and Sequent have both been offering products based on this technology for some time. The Encore Multimax uses NS 32000 family microprocessors, while Sequent builds machines around the M 68000 and Intel 80x86 families. In addition, the DEC Firefly is built around microvax processors, but I don't know if they built it commercially or just for in-house research. We have 2 Encore Multimax machines at Iowa, both with around 18 processors. One is used as a workhorse supporting undergraduate computer science instruction, the other supports only research users. Parallel processing on the Multimax is supported at the lowest level with shared memory and spin-locks. There is a threads package sitting on top of this low level. The Encore operating system (MULTIMAX) is a version of UNIX with memory sharing between processes allowed on a page by page basis (mark the page as shared, then fork); theres a suite of library routines to manage a shared heap using a shared version of malloc -- that's how I usually do shared memory applications on this machine. Doug Jones jones@cs.uiowa.edu
glew@pdx007.intel.com (Andy Glew) (03/23/91)
One member of our team discovered an interesting problem with the TAS instruction using cycle-by-cycle-interleaved memory access. Although the VME spec requires that the bus be locked during the multi-cycle TAS to insure that TAS is an atomic operation, the local CPU bus did not. Thus, we occasionally had CPU-1's TAS interleave with CPU-2's TAS. This is exactly the sort of cautionary tale I put in my "Survey of Synchronization Primitive Implementations". Michael - can you give me more details (like company name) or does that violate non-disclosure agreements? Other readers - if you have any other real-life examples of problems with synchronization primitive implementations, please send them to me. Reminiscing - this is the sort of thing that got me interested in computer systems architecture in the first place. -- --- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
bobg@.UUCP (Bob Greiner) (03/23/91)
In article <20037@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes: >I'm looking for references, discussion, flames, whatever on various solutions >to multiprocessing, especially in relation to support for said on microcomputer >buses. The Motorola Computer Group ships multiprocessors on VMEbus. The VME141 is a 68030 board with a write-through cache that supports queued invalidates. The VME188 is a 4 processor 88100 board with copy-back cache that supports cache coherence through retries. Other boards interact through uncached memory regions. The '188 is not cache coherent versus other cached masters on VMEbus. It is internally cache coherent over its proprietary bus. VME is used as an IO bus; IO masters that access main memory on the '188 see a hardware- enforced cache coherent image. This technique is used by many other companies: use VME to get to IO, have a proprietary bus (or switch) with cached processors and main memory. This limitation, no cache coherence over the standard backplane bus, is why I became the author of the cache coherence chapter of Futurebus+. Bob Greiner, bobg@phx.mcd.mot.com Not necessarily the opinion of Motorola.
ddr@cs.edinburgh.ac.uk (Doug Rogers) (03/26/91)
In article <20037@cbmvax.commodore.com>, daveh@cbmvax.commodore.com (Dave Haynie) writes: > I'm looking for references, discussion, flames, whatever on various solutions > to multiprocessing, especially in relation to support for said on microcomputer > buses. > > ............. The only bus I have any reference to that supports any > sort of cache coherency scheme, at the moment, is FutureBus+. Which would > imply that at the moment, zero microcomputer systems solve this problem. Apart from specialist in house busses like those used by Sequence > > Along with cache problems, interrupts are another multiprocessor question. If > a device issues an interrupt, which processor does it go to? Most systems seem > to be saying "only the host processor". Futurebus supports interupts through its message passing scheme, the distributed arbitration mechanism goes to two passes, the first showing a message is being sent that wins over arbitrators for the bus, and the second pass of the protocol places the message on hte bus. It is up to the other arbiters to recognise messages that are intended as interupts locally. Some messages are reserved for powerdown etc. There are 3 agreed profiles for futurebus, which all might be a bit heavy for small personal machines. New profiles are being put forward for such machines, if you are interested why not contact someone on the IEEE 896 working group. -- Douglas Rogers JANET: ddr@uk.ac.ed.lfcs Department of Computer Science UUCP: ..!mcvax!ukc!lfcs!ddr University of Edinburgh ARPA: ddr%lfcs.ed.ac.uk@nsfnet-relay.ac.uk Edinburgh EH9 3JZ, UK. Tel: 031-650 5172 (direct line)
koll@NECAM.tdd.sj.nec.com (Michael Goldman) (03/27/91)
In reply to Andy Glew's question about the name of the manufacturer that made the board allowing cycle-by-cycle interleaving, I can give you the name but it wouldn't be fair to virtually every other VME manufacturer that does the same. The only exception I can think of is Matrix (Raleigh, NC) which claims that it is slower to do the arbitration cycle-by-cycle than operation-by-operation.
markw@hpcuhe.cup.hp.com (mark williams) (03/27/91)
The base note asks for references to multiprocessors using microcomputers, especially in reference to cache coherence and interrupt handling. Judging from the poster's address, I think the writer might want a focused response (limited to MP micros using CISCy or RISCy chips). I could respond from that perspective, but I won't. This subject is is much broader. The amusing thing for me is that the issues are the same whether the processors are Am386s(tm) :+) or supercomputers, so the problem has been studied for a while. These topics are very well covered in the literature. If I were to recommend just one reference, it would be the excellent tutorial in Computer two years ago: "Synchronization, Coherence and Event Ordering in Multiprocessors", by Dubois, Scheurich and Briggs, IEEE Computer, Feb 1988. There are literally hundreds of other references to pursue, so many that an annotated bibliography has be done by Eugene Miya It was posted to the net recently. Check your archive. As for systems, currently dozens of MP systems are shipping, ranging from PCs (Compaq SystemPro, a dual 486) to Supercomputers (Crays). They use a wide range of architectures, from loosely coupled MP (the highly successful Tandem Computers) to shared memory MP. Within the shared memory camp, which has the most implementations, all the vendors I know of use a proprietary processor/memory bus and most couple to some standard bus like VME or EISA to leverage low-cost I/O controllers. The processor/memory bus maintains cache-coherence, which the standard I/O buses cannot do (without proprietary extensions). This leads to some tricky problems extending locks to I/O space with acceptable performance. Cache-coherence over I/O space can be implemented in the bus converter between the processor/memory bus and the I/O bus. This too is tricky. In an earlier post, Futurebus+ was mentioned as a potential solution, since it is a "standard bus" with a defined cache coherence scheme. Whenever the "standard" gets approved (and frozen), it's worth taking a look at. At least one major vendor has plans to use Futurebus+ as a standard I/O bus, because it's faster than other current "standard" I/O buses. Disclaimer: Just one man's opinion. Mark Williams