greg@xios.XIOS.UUCP (Greg Franks) (03/24/88)
I really hate to break up this interesting discussion on MIPS and VIPS et al, but a question has come to mind.... RISC generally implies single instruction per clock cycle, and a load/store type architecture. Now for _tightly coupled_ multiprocessing, one needs some sort of atomic test-and-set instruction. How do the various RISC chips provide this function, with LOCK prefixes, or with some other technique? Sign me Just curious... -- Greg Franks XIOS Systems Corporation, 1600 Carling Avenue, utzoo!dciem!nrcaer!xios!greg Ottawa, Ontario, Canada, K1Z 8R8. (613) 725-5411. "Those who stand in the middle of the road get hit by trucks coming from both directions." Evelyn C. Leeper.
ard@pdn.UUCP (Akash Deshpande) (03/30/88)
In article <503@xios.XIOS.UUCP>, greg@xios.XIOS.UUCP (Greg Franks) writes: > Now for _tightly coupled_ > multiprocessing, one needs some sort of atomic test-and-set instruction. > How do the various RISC chips provide this function, with LOCK prefixes, > or with some other technique? > Greg Franks RISC people (as I discovered at ASPLOS II, San Jose, Oct 87) would rather not speak of parallel processing. Reminds me of the ostrich. Ask them - "how are you going to maintain cache coherency, TLB flushing, accesses integrity, etc in a parallel processing system?" and they will say "why do you want parallel processing when one RISC machine is so much faster than even parallel CISCs?" I would prefer a philosophy that allows for clean parallelisability over any single cpu speedups. Akash -- Akash Deshpande Paradyne Corporation {gatech,rutgers,attmail}!codas!pdn!ard Mail stop LF-207 (813) 530-8307 o Largo, Florida 34649-2826 Like certain orifices, every one has opinions. I haven't seen my employer's!
petolino%joe@Sun.COM (Joe Petolino) (03/31/88)
>RISC generally implies single instruction per clock cycle, and a >load/store type architecture. Now for _tightly coupled_ >multiprocessing, one needs some sort of atomic test-and-set instruction. >How do the various RISC chips provide this function, with LOCK prefixes, >or with some other technique? SPARC provides this function with an atomic test-and-set instruction. From a processor chip's point of view, 'store-but-look-at-the-old-contents' is not much different from a simple 'store', so RISC does not necessarily preclude a test-and-set instruction. The really difficult part of interprocessor synchronization comes at the system level, outside of the processor chip. -Joe
garner@gaas.Sun.COM (Robert Garner) (03/31/88)
> Now for _tightly coupled_multiprocessing, one needs some sort of > atomic test-and-set instruction. How do the various RISC chips > provide this function, with LOCK prefixes, or with some other technique? The SPARC instruction set includes two instructions for this purpose: LDSTUB - Load/Store Unsigned Byte - reads a byte from memory and then rewrites the same byte to -1. SWAP - exchanges an integer register and a memory word. LDSTUB and SWAP are currently implemented as multi-cycle operations. Between the load and the store, the processor asserts a signal to the memory (or I/O) bus that prevents intervening accesses. (A precise requirement is that, if these instructions are issued by more than one processor, they must execute in some serial order.) Assuming a specialized memory system that includes an arithmetic unit, SWAP can also implement the Fetch_and_Add instruction. On another subject, I recall some confusion in an old msg about SPARC's multiply-step instruction (MULScc). The author thought that MULScc was limited to signed multiplies. This is certainly not true: MULScc implements both signed and unsigned 32x32 multiplies. [BTW, I noticed that the Am29000 has three instructions--Multiply Step (MUL), Multiply Last Step (MULL), and Multiply Step Unsigned (MULU). MULU and MULL seem unnecessary since the fix up for 32x32 unsigned multiplies (or a negative multiplier in the case of signed multiplies) requires only 3 cycles.] On yet another subject, am I still correct in believing that AMD's Am29C327 floating-point coprocessor does NOT directly execute the Am29000 floating-point instruction set? In other words, must Am29000 instructions such as FMUL and FEQ be emulated via a trap handler? Wouldn't this make them too slow? ----------------------------------- Robert Garner ARPA: garner@sun.com UUCP: {ucbvax,decvax,decwrl,seismo}!sun!garner Phone: (415) 960-1300 or (415) 691-2125
webber@porthos.rutgers.edu (Bob Webber) (03/31/88)
In article <47649@sun.uucp>, garner@gaas.Sun.COM (Robert Garner) writes: > ... > On yet another subject, am I still correct in believing that AMD's > Am29C327 floating-point coprocessor does NOT directly execute How does the 327 differ from the Am29027? The floating-point instructions trapping on the Am29000 looked a bit odd to me. Is the notion that it is important to standardize the interface to floating-point stuff so that people can buy floating-point chips later and not have to recompile? Or is it that one would want to ``hardwire'' the coprocessor interactions that are currently being done at trap when the chip space becomes available? --- BOB (webber@athos.rutgers.edu ; rutgers!athos.rutgers.edu!webber) [By the way, I have been pondering the SPARC and Am29000 chips for a while now trying to figure out if it is plausible to build a simple home computer around them. If any one has references that talk about the sort of glue that holds together a board with such a processor, 1 or 2 SCSI ports, 1 or 2 serial ports, and some static ram, I would certainly be interested.]
walter@garth.UUCP (Walter Bays) (04/01/88)
In article <2676@pdn.UUCP> ard@pdn.UUCP (Akash Deshpande) writes: > RISC people (as I discovered at ASPLOS II, San Jose, Oct 87) would > rather not speak of parallel processing. Reminds me of the ostrich. > Ask them - "how are you going to maintain cache coherency, TLB > flushing, accesses integrity, etc in a parallel processing system?" > and they will say "why do you want parallel processing when one > RISC machine is so much faster than even parallel CISCs?" Most RISC computers are, in a limited sense, multiprocessors, because you'll want at least an 80286 or something for an IOP. The Clipper has bus-watch hardware in the CAMMU's (Cache and Memory Management Unit) to assure cache and TLB consistency in copy-back mode. And yes, there is a test-and-set instruction. Of course you couldn't put too many Clippers (or any fast CPU) on a bus before it saturated... I'd rather not talk about it :-) -- ------------------------------------------------------------------------------ Any similarities between my opinions and those of the person who signs my paychecks is purely coincidental. E-Mail route: ...!pyramid!garth!walter USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303 Phone: (415) 852-2384 ------------------------------------------------------------------------------
jpa@celerity.UUCP (Jeff Anderson) (04/01/88)
In article <2676@pdn.UUCP> ard@pdn.UUCP (Akash Deshpande) writes: > RISC people (as I discovered at ASPLOS II, San Jose, Oct 87) would > rather not speak of parallel processing. Reminds me of the ostrich. > Ask them - "how are you going to maintain cache coherency, TLB > flushing, accesses integrity, etc in a parallel processing system?" > and they will say "why do you want parallel processing when one > RISC machine is so much faster than even parallel CISCs?" The Celerity 6000 has many RISC attributes, and was designed as a multiprocessor. The instruction set of all Celerity's processors includes separate fetch and receive instructions, which provided an easy mechanism for implementing semaphores in hardware without "special" instructions. Basically there is a semaphore "box" in the memory address space of the processor (which incidentally is implemented as 4 semaphores per 16KB page, in order that they can be assigned 4-at-a-time to the virtual address space of a user process). Each semaphore in the box has two "registers", a "content register" for reading and initializing, and an "access register" for P'ing and V'ing. V's are stores to the access register, and P's are fetches to the access register. The data returned indicates the semaphore state for determining whether the processor should wait. I won't give you any more specifics since it is patentable, but it's pretty fast to execute, and easy to implement. This semaphore implementation doesn't violate any RISC principle since the instruction set does not need to be tweaked and most of the work is done in special off-processor memory. Cache coherancy and TLB flushing present problems which understandably RISC chip designers would rather not commit to silicon; but there ARE fast, off-chip solutions. Maybe you're talking to the wrong group? We don't buy the argument that RISC people "don't care" about multiprocessing. Celerity does. -- The "J" Team at Celerity Computing JJ Whelan Jeff Anderson ucsd!celerity!jpa ucsd!celerity!jjw 619-271-9940
mikep@amdcad.AMD.COM (Mike Parker) (04/02/88)
In article <Mar.31.04.58.35.1988.1216@porthos.rutgers.edu> webber@porthos.rutgers.edu (Bob Webber) writes: > >How does the 327 differ from the Am29027? The floating-point >instructions trapping on the Am29000 looked a bit odd to me. Is the >notion that it is important to standardize the interface to >floating-point stuff so that people can buy floating-point chips later >and not have to recompile? Yes. >Or is it that one would want to >``hardwire'' the coprocessor interactions that are currently being >done at trap when the chip space becomes available? > >--- BOB (webber@athos.rutgers.edu ; rutgers!athos.rutgers.edu!webber) > No comment, except that if one were to find the chip space and hardwire what are now traps wouldn't that relate back to the "not have to recompile" question. >[By the way, I have been pondering the SPARC and Am29000 chips for a >while now trying to figure out if it is plausible to build a simple >home computer around them. If any one has references that talk about >the sort of glue that holds together a board with such a processor, 1 >or 2 SCSI ports, 1 or 2 serial ports, and some static ram, I would >certainly be interested.] Oh, no. SRAM, too expensive for a PC. The Am29000 memory design handbook ought to be available real soon with a couple of inexpensive memory systems that still provide better performance than a Sun 4. (Given some SRAM to build a cache, the Am29000 averages twice the performance of the Sun 4, including floating-point with those "slow" traps). Talk to phil@amdcad.amd.com about his $1200 PC accelerator performance. mikep -- ------------------------------------------------------------------------- UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!mike ARPA: amdcad!mike@decwrl.dec.com
dougj@rosemary.Berkeley.EDU.berkeley.edu (Doug Johnson) (04/02/88)
>In article <503@xios.XIOS.UUCP>, greg@xios.XIOS.UUCP (Greg Franks) writes: >> Now for _tightly coupled_ >> multiprocessing, one needs some sort of atomic test-and-set instruction. >> How do the various RISC chips provide this function, with LOCK prefixes, >> or with some other technique? >> Greg Franks >RISC people (as I discovered at ASPLOS II, San Jose, Oct 87) would >rather not speak of parallel processing. Reminds me of the ostrich. >Ask them - "how are you going to maintain cache coherency, TLB >flushing, accesses integrity, etc in a parallel processing system?" >and they will say "why do you want parallel processing when one >RISC machine is so much faster than even parallel CISCs?" >I would prefer a philosophy that allows for clean parallelisability >over any single cpu speedups. Take a look at the SPUR project being done at UC Berkeley. (There is an overview article in the November 1986 issue of Computer.) It is a RISC designed to do tightly coupled, coarse grained, multiprocessing. It addresses cache coherency (with snoopy caches), TLB flushing (no TLB, address translation is done in the cache), has a test_and_set, etc. -- Doug
vandys@hpindda.HP.COM (Andy Valencia) (04/02/88)
From all I can tell from the various MPs which have been built (or died trying... :->), the real point isn't which atomic ops you offer; it's how you make multiple CPUs live together in the same memory domain--cache consistency, bus bandwidth, and memory cycle time. I don't recall ever reading a post-mortem which said "if we'd only used synchronization mechanism X instead of Y, we would have been golden". Most of them instead talked about the unpleasant bus and memory traffic characteristics which show up when you try to get a respectable number of processors going simultaneously. And never underestimate what cache consistency is going to cost you. Andy Valencia
hjm@cernvax.UUCP (hjm) (05/09/88)
Dear All, I see the thorny subjects of RISC v. CISC and scalar v. vector have reared their ugly heads again, but in a different guise - multiprocessing! Allow to point out some of the ENGINEERING issues involved: - the cost of a computing system is primarily a function of size, weight and the number of chips or pins; - to go really fast and to be efficient, the hardware should be simple; So what am I trying to point out? Merely that a large amount of hardware in present-day machines is there because of difficulties in software. For example, take the common-place example of your local UNIX or VMS box. Inside these beasts is a *lot* of hardware to keep one user away from his fellow hackers. An equally large amount of hardware is provided for the demand-paged virtual memory system. Add to that a healthy(?) helping of cache chippery and what do you get - yes, a machine built upon boards the size of a small squash court! None of this hardware is simple, and applies to both the uniprocessor and the multiprocessor case. Now, add in the magic multiprocessor devices and all hell breaks loose on the hardware front (not to mention the software - groan). Everyones favourite trick seems to be finding evermore complicated ways of getting large numbers of CPUs to talk to the memory all at once. Just imagine an ever increasing number of waiters trying to get in and out of the same kitchen all at once through one door, and you can see the mess. OK, let's increase the number of doors ... in hardware terms this means separating the memory into several pages which can be accessed simultaneously, thereby increasing the effective bandwidth of the memory. Is this really admitting that shared memory is not necessary? Surely the highest bandwidth is achieved when each processor has its own memory which it shares with noone else? It also makes the hardware a lot smaller. To summarise all of this in a few points: - virtual memory is useful only when an application won't fit in physical memory. But memory is cheap, so with lots of Mbytes who needs it, especially if the program is written well. - multi-user machines are too complicated to be both fast and simple. - shared-memory is not necessary; it's a software issue that shouldn't be solved in hardware. For example, 10 MIPS of computation with 4 MB of ECC RAM can be placed on a single 4" x 6" Eurocard. Add multi-user support, virtual memory or multiple CPUs and the board looks like a football pitch in comparison. Guess which is cheaper as well! Remember, S I M P L E = F A S T = E F F I C I E N T = C H E A P. ------------------------------------------------------------------------------ Hubert Matthews (software junkie, surprisingly enough) ------------------------------------------------------------------------------ #include <disclaimer.h>
crowl@cs.rochester.edu (Lawrence Crowl) (05/10/88)
In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: > - virtual memory is useful only when an application won't fit in > physical memory. But memory is cheap, so with lots of Mbytes > who needs it, especially if the program is written well. Here are some counter-examples. Others can provide more. The benefit varies with your application. inter-process protection - If a process cannot address memory belonging to another, then it cannot trash it. Even when security is not an issue, correctness is. I prefer not to see errant programs trashing others. There are other approaches. copy on write - Virtual memory allows one to implement copy semantics without actually copying the data. This makes Unix fork and passing very large messages more efficient. single level store - One can manage huge amounts of data within a virtual address space without having to write file access code. The virtual address space becomes an easy file system. The operating system ensures that the portions I am actually working with are present. Does this mean my application "won't fit". Well if I knew I had to use real memory, I would use memory in a much more conservative manner. Since virtual memory affects my programming style, it is not a does/does not fit question. tagged addresses - Some implementers of Lisp use some bits of a large sparse address space to implement tags. If the address space were physical, it would require gigabytes of physical storage. Memory is not that cheap. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
grunwald@uiucdcsm.cs.uiuc.edu (05/10/88)
VM offers more than protection from errant programs. You can use it to: + do cheap heap based allocation. See the DEC-SRC report by Li and Appel (and someone else) on a heap based allocation scheme which uses page level protection to make heap allocation almost as efficient as stack based allocation. + Process migration. See Zayas (sp?) in the last SOSP. Using demand paging for process migration *even given crappy hardware* was a big win. + Cheap memory copies, less memory fragmentation, etc. + You *need* paging hardware for the software solutions to shared memory hardware. Also, I want a single-user, multi-programming machine. That means I *still* need protection -- from myself. As for caches being expensive. Well, if you presume that SRAM costs drop faster than cache controller chips, maybe. Even if you put 32Mb in a system, it's not going to help if you can only afford 32Mb of 150ns DRAM but you need 25ns SRAM to pump your system.
glennw@nsc.nsc.com (Glenn Weinberg) (05/11/88)
In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: > > To summarise all of this in a few points: > > - virtual memory is useful only when an application won't fit in > physical memory. But memory is cheap, so with lots of Mbytes > who needs it, especially if the program is written well. > > - multi-user machines are too complicated to be both fast and simple. > > - shared-memory is not necessary; it's a software issue that shouldn't > be solved in hardware. > > For example, 10 MIPS of computation with 4 MB of ECC RAM can be placed on >a single 4" x 6" Eurocard. Add multi-user support, virtual memory or multiple >CPUs and the board looks like a football pitch in comparison. Guess which is >cheaper as well! > I beg your pardon? There are VME boards available today that contain a NS32532 (a 10 MIP processor) with 64KB of cache and 4MB of memory, support Unix* System V Release 3 (which is a multi-user system, of course), virtual memory, and can be combined into a multiprocessor configuration. I do believe that a double-height VME board is slightly smaller than a "football pitch" (the actual dimensions are 6" x 9"). Furthermore, put the board into a VME chassis with a SCSI controller, a 5-1/4" hard disk, a cartridge tape and an Ethernet board and you have one hell of a system in a box that's about the size of the proverbial breadbox for less than $20,000. Sure, you can argue that supporting multi-user environments and virtual memory costs you something, but there are very, very few real-world situations in which you have no need to interact with other systems and people. You simply can't do that unless you have a system which both allows you to have that interaction and protects you from the (un)intentional dangers of the outside world. Not to mention the other benefits you get from a multi-user, multi-tasking operating system such as Unix. In summary, unless your system is used only as a dedicated processor that does no interaction with human beings, the advantages of a multi-user virtual memory (or at least memory-protected) environment significantly make up for any increase in cost or board space. -- Glenn Weinberg Email: glennw@nsc.nsc.com National Semiconductor Corporation Phone: (408) 721-8102
lgy@pupthy2.PRINCETON.EDU (Larry Yaffe) (05/11/88)
In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: [[ much stuff about "simple" machines deleted" ]] > - virtual memory is useful only when an application won't fit in > physical memory. But memory is cheap, so with lots of Mbytes > who needs it, especially if the program is written well. I find this claim completely bogus. Especially when discussing future architectures for high performance machines (a major topic of this newsgroup). Real, worthwhile, uses of more memory than you will ever be able to afford exist in many, many areas. My view is that "memory is ALWAYS expensive". The price in $/Mb is completely irrelevant, since cheaper memory simply increases the range of interesting problems which become practical to persue. I would include this statement as one of the "laws" of computer science. Certainly, when designing new machines/software/languages, I would argue that the goal should always be to accomodate applications larger than than are practical today. (For this reason, I find "dataflow" languages hopeless - they waste too much memory.) > Hubert Matthews (software junkie, surprisingly enough) ------------------------------------------------------------------------ Laurence G. Yaffe lgy@pupthy.princeton.edu Department of Physics lgy@pucc.bitnet Princeton University ...!princeton!pupthy!lgy PO Box 708, Princeton NJ 08544 609-452-4371 or -4400
bertil@carola.uucp (Bertil Reinhammar) (05/11/88)
In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: > > - the cost of a computing system is primarily a function of size, > weight and the number of chips or pins; > From the hardware engineers point of view, yes, but not when considering a complete system including S/W. > > - to go really fast and to be efficient, the hardware should be simple; > As a matter of fact, pipelines can be faster and more efficient with added delay units which don't really simplify matters... > > So what am I trying to point out? Merely that a large amount of hardware >in present-day machines is there because of difficulties in software. ... Hmmm. >... Inside >these beasts is a *lot* of hardware to keep one user away from his fellow >hackers. An equally large amount of hardware is provided for the demand-paged >virtual memory system. Add to that a healthy(?) helping of cache chippery... > You imply that memory management hardware can securely be replaced by a good piece of S/W had we the appropriate tools ? The same comment on VM ! And do you really mean that software may provide the efficiency gained from a cache !? Either I'm pretty stupid or You must restate Your points more clearly. I don't get ANY point... > > - virtual memory is useful only when an application won't fit in > physical memory. But memory is cheap, so with lots of Mbytes > who needs it, especially if the program is written well. > And what if I have a lot of 'concurrent' processes ? I DON'T like swap time delays but disk is by far cheaper than RAM ( don't you know ? ) Also, I like to provide the entire address space to each of the running processes. This requires VM regardless of the quality of the program(mer)s. > > - multi-user machines are too complicated to be both fast and simple. > You hit right on the usual tradeoff stuff. > > - shared-memory is not necessary; it's a software issue that shouldn't > be solved in hardware. > !!!! My opinion: - A computer program is really a virtual machine. The real machine ( H/W ) actually kind of interprets your object code. OK, well known stuff. So how do you expect it to be more efficient to execute a number of instructions to manage memory/protection/speed/etc problems with all semaphores and such, when a piece of hardware can fix it in a few cycles ? - In general: The basic reason ( as I see it ) to have software at all is FLEXIBILITY. Special purpose hardware is ALWAYS faster than general d:o in solving the intended problem. Software is not cheap to produce ( just calculate on your own salary :-) So the real trick is TRADEOFF. We have a price/performance ratio to take care of. Just having good software tools and languages will not solve that part. -- Dept. of Electrical Engineering ...!uunet!mcvax!enea!rainier!bertil University of Linkoping, Sweden bertil@rainier.se, bertil@rainier.UUCP
hankd@pur-ee.UUCP (Hank Dietz) (05/12/88)
In article <674@cernvax.UUCP>, hjm@cernvax.UUCP (hjm) writes: > - shared-memory is not necessary; it's a software issue that shouldn't > be solved in hardware. Shared memory's full name is "shared memory address space" -- it means simply that some portion of the memory is addressible by more than one processor. In other words, it says that although memory may be physically distributed, and may have access times which depend on the physical structure as well as on bus/network traffic conditions, the WAY in which it is referenced appears as a conventional load/store on an address. The alternative is to create a MESSAGE which REQUESTS THAT SOMETHING ELSE REFERENCE the desired memory location. How do you create a message? Well, maybe a GET/PUT instruction or somesuch, but the key idea is that you're sending the message TO SOME ACTIVE ELEMENT, not to a memory address. As for which is better, because most message-passing systems only use messages to access non-local memory, one must distinguish between local and non-local references at compile-time to generate efficient code -- unfortunately, this is not always possible, and since the shared-memory model doesn't require this distinction be made at compile time, it is in some sense more powerful. The implementation difficulty usually depends on how you connect to memory: for word transfers, shared-memory is easier; for longer block transfers, messages are easier... for the obvious reasons. Shared memory DOES NOT MEAN CONSTANT ACCESS TIME independent of memory cell addressed -- if that's your "software" definition of shared-memory, forget it, because no highly-parallel machine will *ever* support that. Ok, maybe your software would still run if you wrote it with that assumption, but it ain't gonna run fast, and that is what parallel processing is all about. -hankd
jim@belltec.UUCP (Mr. Jim's Own Logon) (05/13/88)
In article <674@cernvax.UUCP>, hjm@cernvax.UUCP (hjm) writes: > > Dear All, > > I see the thorny subjects of RISC v. CISC and scalar v. vector have reared > > - the cost of a computing system is primarily a function of size, > weight and the number of chips or pins; > > - to go really fast and to be efficient, the hardware should be simple; > > This is quite incorrect. The cost breakdown of any standard computer system is power supply, hard disk, enclosure, memory (if you have a lot), burdened assembly cost, processor, other. The actual cost difference between a Z80 and a 386 is minimal. They cost so much more because the entire system is upscaled: bigger power supply, large hard disk, etc. A 386 PC which costs $1000 to build, only has $100 or so in logic (discounting the CPU and memory). None of the major cost has anything to do with multiprocessing. (I won't even waste time arguing that you can do multiprocessing on a single chip micro. It's the software that is complex for multiprocessing, not the hardware.) -Jim Wall Bell Technologies Inc. ...{ames,pyramid}!pacbell!belltec!jim
jkrueger@daitc.ARPA (Jonathan Krueger) (05/13/88)
>In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: >> - virtual memory is useful only when an application won't fit in >> physical memory. But memory is cheap, so with lots of Mbytes >> who needs it, especially if the program is written well. In article <9558@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: >Here are some counter-examples. Others can provide more. Two more: avoiding memory fragmentation - virtual memory management provides a way for multiple processes to share the physical store, cleanly and without performance bottlenecks. New processes start and grow all the time, multiple requirements for space vary dynamically, each is satisfied efficiently to the limits of available physical memory. Even when physical memory is cheap, processor time to manage it is not. preventing unnecessary i/o - virtual memory systems need not load in an entire image, thus performing fewer disk-to-memory reads per execution, an advantage in a development cycle, among other places. Even when physical memory is cheap, i/o bandwidth to fill it with stuff copied from disk is not. -- Jon
terry@wsccs.UUCP (Every system needs one) (05/17/88)
In article <674@cernvax.UUCP>, hjm@cernvax.UUCP (Hubert Matthews) writes: > Dear All, > > I see the thorny subjects of RISC v. CISC and scalar v. vector have > reared their ugly heads again, but in a different guise - multiprocessing! > > Allow to point out some of the ENGINEERING issues involved: > > - the cost of a computing system is primarily a function of size, > weight and the number of chips or pins; > > - to go really fast and to be efficient, the hardware should be simple; > > So what am I trying to point out? Merely that a large amount of hardware > in present-day machines is there because of difficulties in software. Let me point out that the entire purpose of hardware is to run the software. Further, difficulties in software can generally be divided into 2 parts: 1) Difficulties based in the inherent complexity of a software structure. (we will ignore this, as some structures can not be easily reduced. 2) Difficulties caused by inadequate/bad hardware design... such as multiple instructions to cause a commonly desired result rather than a single instruction, badly implemented flags/overflow/branching/testing, etc. > For example, take the common-place example of your local UNIX or VMS box. > Inside these beasts is a *lot* of hardware to keep one user away from his > fellow hackers. Let me point out that this is a largely policy and/or design issue, NOT one of software. 1) The concept of "keeping a user away" from data is one more of philosophy, rather than necessity. Those people who refer to themselves as hackers, and those that refer to them as hackers using the word correctly, generally are not the security problem on systems, unless sensitive data needs to be protected from outside eyes as well as damage. People who damage/destroy/alter data are not known as hackers; they are know as assholes. There are many good reasons why data should be kept away from all but a select group of people, national security, to name one. 2) The requirement of "a *lot* of hardware" is a silly one which has been imposed by hardware designers refusing to attend to security issues which are better relegated to hardware. Instead, security is generally implemented as additional hardware not because it is necessary to do so, but because hardware designers have yet to see the merits of VLSI when applied to anything other than a CPU or it's support chips; most hardware security measures could easily be implemented in a single chip or in software. > An equally large amount of hardware is provided for the demand-paged > virtual memory system. I think you are mixing your models here. A virtual memory system need not support demand paging, and a demand paging system need not imply virtual memory. If hardware designers understood what the machine was supposed to do when it was put together more clearly than is made apparent by this statement, perhaps there would not be this problem. In addition, most hardware designers use MMU chips to alleviate this problem entirely. > Add to that a healthy(?) helping of cache chippery I believe the memory cache was invented by a HARDWARE company so that their HARDWARE would appear faster than their competitors. Caching is, at times, helpful; however, it can also be a great inconvenience, especially when one is trying to design a mutliprocessor system or implement memory-mapped I/O. When you are using dual-ported RAM to communicate with other hardware because the designer was unable to cause the hardware to run at a reasonable rate if the communication took place via interrupts, it is horribly inconvenient to have your I/O cached, and perhaps lost. [I realize I'm going to get a lot of flack here from people who love DMA and hate interrupts, such as the designers of message-passing operating systems, but consider this: a well designed computer system with an interrupt-based architecture can not lose data as long as you stay within system performance limits. Synchronization of data flow in software is always subject to a number of failure modes, not the least of which is directly related to prioritization of tasks.] > and what do you get - yes, a machine built upon boards the size of a small > squash court! You work for IBM, right? ;-) > None of this hardware is simple, and applies to both the uniprocessor > and the multiprocessor case. I agree, but I have to modify this with the statement that this is only true in the case of badly designed hardware. > Now, add in the magic multiprocessor devices and all hell breaks loose on > the hardware front (not to mention the software - groan). Exactly... "groan". Software is always more complicated than hardware - that's why software takes longer. Add into this the apparent inability of hardware designers to comprehend what is necessary for software AND hardware to be smaller/faster/sooner, and you have machines which are so radically different in operational concept from what needs to be done that programmers who have to deal with the hardware are more often than not prone to mistakes. And built upon this are the users concepts of what "needs to be there"... the actual bottom line. To get from hardware to bottom line can often take 7 or more layers of software fixing or bypassing bad (or worse, ill-informed) design decisions made at the hardware level. This is the sort of atrocity that made it impossible to write decent terminal emulators on CP/M systems: the UART hardware was often capable of getting characters in excess of 19200 baud; the screen was often capable of displaying characters at rates in excess of 38400 baud. The bottoleneck was that when the screen scrolled, the hardware locked out serial interrupts, thus causing lost data from the serial channel. There were exceptions, but not many. > Everyones favourite trick seems to be finding evermore complicated ways > of getting large numbers of CPUs to talk to the memory all at once. Just > imagine an ever increasing number of waiters trying to get in and out of > the same kitchen all at once through one door, and you can see the mess. > OK, let's increase the number of doors ... in hardware terms this means > separating the memory into several pages which can be accessed simultaneously, > thereby increasing the effective bandwidth of the memory. Not everyones. This is only true in cases where bad implementations have occurred, in both hardware and/or software. Extreme paralellism is only useful for things which lend themselves to paralellism, and even then the only truly useful emergence from the whole paralell mess has been data- flow architectures, such as GoodYear's, for use in finite element modelling or fluid-dynamics, and things such as the Sequent/NCR/Sperry/Multiflow systems when applied to large numbers of users or online transaction processing. In these instances, it is, with very little intercommunication, like having a number of computers with shared resources, in the same box... not a bad idea, if you want to save money. I have yet to sit down at one of these machines and type "make" and have a seperate processor allocated to each compile. > Is this really admitting that shared memory is not necessary? Shared memory is *not* something to just throw at a problem. As you admitted, increased bandwidth improves performance. > Surely the highest bandwidth is achieved when each processor has its own > memory which it shares with noone else? It also makes the hardware a lot > smaller. Absolutely not. Think of an infinite memory plane with processors "crawling" over it, doing what needs to be done... somewhat like many small spiders cooperating to perfect a web which none could complete by themselves. This, I think, is a representation of the ideal dataflow machine. It might even be useful, if we could figure out how to talk to it. > - virtual memory is useful only when an application won't fit in > physical memory. But memory is cheap, so with lots of Mbytes > who needs it, especially if the program is written well. Memory is no longer cheap. > - multi-user machines are too complicated to be both fast and simple. Due to inappropriate hardware implementation, yes. > - shared-memory is not necessary; it's a software issue that shouldn't > be solved in hardware. How do you propose it be resolved in software, given hardware "protection"? This is an inane idea, and perpetuates the major problem with breaking software down into a true field of engineering. Given the complexity of software as compared to hardware, it may be impossible to derive a formula method of producing software; perhaps it will always be an art. But there are things which could be done in hardware to make it easier and less arcane than it currently is. The current problem is one of hardware not resembling the soloution which is the goal. | Terry Lambert UUCP: ...{ decvax, ihnp4 } ...utah-cs!century!terry | | @ Century Software OR: ...utah-cs!uplherc!sp7040!obie!wsccs!terry | | SLC, Utah | | These opinions are not my companies, but if you find them | | useful, send a $20.00 donation to Brisbane Australia... | | 'Admit it! You're just harrasing me because of the quote in my signature!' |
jack@cwi.nl (Jack Jansen) (05/19/88)
In article <53@daitc.ARPA> jkrueger@daitc.UUCP (Jonathan Krueger) writes: [Giving reasons for virtual memory] >avoiding memory fragmentation - virtual memory management provides a >way for multiple processes to share the physical store, cleanly and >without performance bottlenecks. Not providing virtual memory doesn't mean that segments in memory needs to be contiguous. There might be a point for copy-on-write or zero-fill-on-demand, but I'm not sure that only these are worth the trouble. > >preventing unnecessary i/o - virtual memory systems need not load in >an entire image, thus performing fewer disk-to-memory reads per >execution, an advantage in a development cycle, among other places. Again, given heaps of memory and fast communication this doesn't matter anymore. If my file server can keep a copy of emacs in it's cache and download it with 700Kb/s I'm up and running in a second. Quite acceptable, I think. Of course, there will always be applications that *do* benefit from virtual memory, but I guess within reasonable time VM will fall in the same class as vector processors or other esoteric features that can be found on specialized machines, not on everyday workstations. -- Jack Jansen, jack@cwi.nl (or jack@mcvax.uucp) The shell is my oyster.