mcdonald@Daisy.EE.UND.AC.ZA (Bruce J McDonald) (05/15/91)
Variant Instruction Set Computers - VISC ---------------------------------------- A way to speed up future Motorola CISC MPUs could be: Widen data bus to 64 bits and make the internal data paths and ALU width all 64 bits. Introduce an exclusive mode switch instruction which would switch in an enhanced, RISC-like micro-engine, with totally new instruction set geared for 64bit operations. Superset the existing 16 x 32bit register file up to a n x 64bit register file so that the new 64bit mode could access the old style register file as part of the new, large register file. Access to FPU, cache and MMU ( and any other funct- ional units ) would be maintained transparently as well as employing the same pipeline stages ( this would be harder to do .. ) as the old CISC core. Notice that the RISC-like enhancements to the CISC core should be dropped and the CISC core kept for downward compatibility only - all speedy execution should be handled by the RISC core. This opens up the interesting option of, say implementing a SPARC RISC core, or a HP-PA core, which would mean that an existing 680x0 product would run HP-PA code on executing the mode switch instr. This would mean that new compilers would have to be written which would be able to switch the MPU into the new mode for enhanced performance. I would think that this mean a addition CCR bit but since there are slots available, it should be no problem. What I do not like about this scheme is that it is resorting to kludging in the same fashion that intel used to upgrade their 8080 to 80486 by adding bits and pieces which destroyed the orthogonality of the original design ( except that the 8080 wasn't a great design ). The mode switch should be possible without having to reset or destroy data in the CPU as opposed to the real <-> protected mode switch horrors of the 80x86's. Comments please ... ( flames to /dev/null ) BJ McDonald, University of Natal, Durban, King George V Ave, South Africa.
sef@kithrup.COM (Sean Eric Fagan) (05/16/91)
In article <1991May15.110000.25800@Daisy.EE.UND.AC.ZA> mcdonald@Daisy.EE.UND.AC.ZA (Bruce J McDonald) writes: >Variant Instruction Set Computers - VISC > >ALU width all 64 bits. Introduce an exclusive mode switch instruction >which would switch in an enhanced, RISC-like micro-engine, with totally >new instruction set geared for 64bit operations. Uhm, it would probably be better to devote all that chip space to the "RISC" processor and ship a software emulator. If one puts in enough cache on the chip, one might even be able to make the entire emulator fit in cache (see Henry Spencer). -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
martin@adpplz.UUCP (Martin Golding) (05/17/91)
In <1991May15.110000.25800@Daisy.EE.UND.AC.ZA> mcdonald@Daisy.EE.UND.AC.ZA (Bruce J McDonald) writes: >Variant Instruction Set Computers - VISC >---------------------------------------- [description of hypothetical future machine with 64 bit risc and 68xxx modes] Mode switching is Vax. Extending the instruction set is Eagle. Plus ca change, plus ce la meme chose (My French is as good as my c). For the MOST exciting variable instruction set, consider the Burroughs 1700, with a byte width of 1, a variable word width up to 24, and microcode swapping to adapt the instruction set to the program that was running. (PROOF that Cobol isn't a real language: on the B1700, Cobol and RPG ran on the _same virtual machine_. Peugh.) And while we're busy making current computers into oldfashioned computers: Don't forget the Cyber trick of running multiple tasks on multiple memory buses with a single CPU (adapt cheap memory to fast RISC chips) or the IBM fancy that decoded 360 instructions for a dataflow processor (For some instruction sequences, dataflow beats scoreboard). Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp
plinio@turing.seas.ucla.edu (Plinio Barbeito) (05/17/91)
In article <1991May15.110000.25800@Daisy.EE.UND.AC.ZA> mcdonald@Daisy.EE.UND.AC.ZA (Bruce J McDonald) writes: >A way to speed up future Motorola CISC MPUs could be: > ... >core. Notice that the RISC-like enhancements to the CISC core should >be dropped and the CISC core kept for downward compatibility only - all >speedy execution should be handled by the RISC core. Does having a RISC core in itself guarantee fast execution? I thought the reason RISC was fast was the great amount of space it freed up on the chip that could be used to speed up basic operations. I think it would be more in line with RISC philosophy to rip out as much of the CISC core as possible, leaving close to the bare minimum of what is needed to emulate via software traps those addressing modes that would be deleted (most) and those instructions that would be deleted (anything that compilers are staying away from, up to the neck of the curve). This is how many FPU ops in the 68040 are implemented, and the strategy seems to have been successful, if SPEC numbers are worth their salt. Keeping the old CISC core would hog chip real-estate that could be better applied speeding up other things, like the ubiquitous 'move's, IMHO. As to whether they should go load/store, I think it would be less of a compatibility-kludge-nightmare to keep the move's but enlarge the cache to help outweigh benefits this approach might have had. Besides, I've always savored the fact that 68k programs have been consistently and significantly smaller than many equivalent RISC binaries. This helps load-time performance if processes are I/O bound, or use slow I/O devices. It also saves space on mass-storage devices, but I haven't seen these issues dealt with extensively in this group. Apparently, everyone buys enough RAM so that they never page fault :-) Regarding 64-bits, in my opinion, it depends. In the short term it looks like it would disproportionately raise costs and complicate compatibility issues. However, if memory speeds continue to lag behind CPU speeds, it may eventually become the only feasible solution. Comments? Is everyone jumping on the 64-bit bandwagon? >This would mean that new compilers would have to be written which would >be able to switch the MPU into the new mode for enhanced performance. I >would think that this mean a addition CCR bit but since there are slots >available, it should be no problem. The advantage of doing it the other way is that you don't have to rewrite any new software except maybe a new optimizer for your compilers. Interesting side-note: Since it would be cheap to add new instructions via traps, how about putting in an opcode to speed up string comparisons to deny intel the ability to claim higher performance in any benchmark category? (Flamesuit on) plin -- To mak wridin mo eficiend, i sujes de folouin janjs: drop deleder 'c', as 'k' uil do jus fin. gt rid of endn 'e', sins ids nevr pronncd aniuai. als, 't' is nevr nedd; us 'd'. repeddv knsnnds shd b nls bpp ngbbl rr...01011101
preston@ariel.rice.edu (Preston Briggs) (05/18/91)
plinio@turing.seas.ucla.edu (Plinio Barbeito) writes: >Does having a RISC core in itself guarantee fast execution? I thought >the reason RISC was fast was the great amount of space it freed up on >the chip that could be used to speed up basic operations. > >I think it would be more in line with RISC philosophy to rip out as >much of the CISC core as possible, leaving close to the bare minimum of >what is needed to emulate via software traps those addressing modes that >would be deleted (most) and those instructions that would be deleted >(anything that compilers are staying away from, up to the neck of the >curve). I think an important part of the "risc philosophy" is to expose low level operation to the compiler. If you bundle them up into cisc-like globs, the optimizer loses many opportunities. Emulation should probably be restricted to maintaining object compatibility, with the understanding the recompilation is always preferable, in terms of performance. Preston Briggs
torek@elf.ee.lbl.gov (Chris Torek) (05/22/91)
[Context: someone suggested adding/deleting/changing 680x0 instructions for newer 680x0s, with a compatibility mode in the status register or some such.] In article <1991May15.183328.22820@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >Uhm, it would probably be better to devote all that chip space to the "RISC" >processor and ship a software emulator. ... Why not just build a multiprocessor system with completely different processors? I.e., ship a system that contains, say, one 68040 and one or more 88x00s. There is no particular reason that the O/S cannot run the proper binary on the proper CPU automatically. Of course, this takes more board space unless the 68040 and 88100 are in the same package (and if that is the case you might have pin problems). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
plinio@turing.seas.ucla.edu (Plinio Barbeito) (05/23/91)
In article <1991May23.210000.8152@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >In article <13445@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >>Why not just build a multiprocessor system with completely different >>processors? I.e., ship a system that contains, say, one 68040 and one >>or more 88x00s. There is no particular reason that the O/S cannot run >>the proper binary on the proper CPU automatically. Interesting... this could lead to a more intelligent distributed process server. One that not only takes which CPU the least loaded into consideration for exec'ing new processes, but also how fast a given code will run on that system or CPU. For example, in such a system, the O/S could use the 040 to move blocks of memory around and floating point, the 486 to do the strcmp's, the RIOS to do the published Dhrystone benchmark, :-) and the i860 to provide an excuse for the frequent down-time and delivery delays :-) :-) (I went a bit overboard with that one...) Maybe not just binaries, but different library functions could be assigned to different processors. I can't help but think that there would be *something* a CISC processor would be consistently better at over its RISC contemporary. Based on previous experience in this group, Somebody Has Already Thought Of This. If it's true that SHATOT, then please share it with us. >Because then you wouldn't get any speedup on your old programs. I guess. >Historically, such ventures have not done too well. (Anyone remember the >machine, many years ago, that had a 68k, a 6502, a Z80, and possibly one or >two other processors? Dimension, mayhap? Anyway, it failed.) Yes, but how many Apple II's (ages hence) didn't have a Z80 card so people could run CP/M? Also, Intel's 'vision of the future', as they have explained it, is that the i860 family is not intended to compete with or replace the 80x86 line for desktop systems, but that it will accompany (or so they would hope) most of them as a card to "speed up operations". Then again, maybe they don't really think this would fly and are just using it as a ploy to calm investors paranoid of self-competition hurting immediate maximal profits. Possibly, the strategy of combining different processors would meet more consistent success if each processor was *needed* there to run a given operating system (or other suitable, presently existing investment in a body of software), and if the parts were cheap enough, and the signals compatible enough (the latter of which is why I think Chris must have mentioned the 040 together with the 88k's). A nice example of this might be a mac server, with the 040 running Mac-os and the other processors running an unhindered version of Unix, serving files out, etc. plin -- ----- ---- --- -- ------ ---- --- -- - - - plinio@seas.ucla.edu Para-noia will destroy-yaaaaa...
gdtltr@brahms.udel.edu (gdtltr@limbo.org (The Befuddled One)) (05/23/91)
In article <13445@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: =>Why not just build a multiprocessor system with completely different =>processors? I.e., ship a system that contains, say, one 68040 and one =>or more 88x00s. There is no particular reason that the O/S cannot run =>the proper binary on the proper CPU automatically. There was a paper on something like this in Operating Systems Review about a year ago. The system was called AAMP (I think) and was written by someone from Sequent. The system arranged resource management in a heirarchical structure, with the root actually managing memory and I/O. Client operating systems perform basic I/O functions by passing messages to its immediate server, and on up the tree. A server has access to the memory spaces of its clients, which facilitates message passing and allows for easy debugging of client operating systems. The example system was a Sequent Symmetry running several copies of a modified Dynix but the paper made its point that a heterogeneous, multi-OS multiprocessor is possible. Gary Duzan Time Lord Third Regeneration -- gdtltr@brahms.udel.edu _o_ ---------------------- _o_ [|o o|] To be is to be networked. [|o o|] |_o_| Disclaimer: I have no idea what I am talking about. |_o_|
peter@ficc.ferranti.com (peter da silva) (05/23/91)
In article <1991May23.210000.8152@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <13445@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: > >There is no particular reason that the O/S [on a heterogenous MP] cannot run > >the proper binary on the proper CPU automatically. > Because then you wouldn't get any speedup on your old programs. I guess. If your older programs are *the* critical thing to speed up, then that's a problem. But in that case you're unlikely to abandon the 68000, 8086, VAX, or whatever family anyway. For a concrete example... if I could get an 88000 in my Amiga, and I just had to recompile one of the PD raytracers to get it to use it, it might well be worthwhile. Especially with the fine-grained multitasking the Amiga supports. On another point, heterogenous networks that operate this way have existed for some time. -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
sef@kithrup.COM (Sean Eric Fagan) (05/24/91)
In article <13445@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >Why not just build a multiprocessor system with completely different >processors? I.e., ship a system that contains, say, one 68040 and one >or more 88x00s. There is no particular reason that the O/S cannot run >the proper binary on the proper CPU automatically. Because then you wouldn't get any speedup on your old programs. I guess. Historically, such ventures have not done too well. (Anyone remember the machine, many years ago, that had a 68k, a 6502, a Z80, and possibly one or two other processors? Dimension, mayhap? Anyway, it failed.) -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
aduane@urbana.mcd.mot.com (Andrew Duane) (05/29/91)
In article <21621@brahms.udel.edu> gdtltr@brahms.udel.edu (gdtltr@limbo.org (The Befuddled One)) writes: >In article <13445@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >=>Why not just build a multiprocessor system with completely different >=>processors? I.e., ship a system that contains, say, one 68040 and one >=>or more 88x00s. There is no particular reason that the O/S cannot run >=>the proper binary on the proper CPU automatically. > > There was a paper on something like this in Operating Systems Review >about a year ago. The system was called AAMP (I think) and was written >by someone from Sequent. Perhaps this is the XA/MP architecture from Intel? I worked on some whiteboard-type research to make a project proposal based on this architecture last year. It was a combination of the 80x86 (where 'x' probably == 4), and the i860. Our instance of this would have run Mach or somehting like it. We looked at several problems with a heterogeneous architecture, and (as long as byte order was the same between CPUs), the actual selection of a processor to run a thread on was pretty simple. We even figured out how to induce the compiler to emit both flavors of object code, and let the exec facility select the right one. Andrew L. Duane (JOT-7) w:(408)366-4935 Motorola Microcomputer Design Center decvax!cg-atla!samsung!duane 10700 N. De Anza Boulevard uunet/ Cupertino, CA 95014 duane@samsung.com Only my cat shares my opinions, and she's too heavy to care.
mac@gold.kpc.com (Mike McNamara) (05/29/91)
I've worked on two different commercially unsucessful heterogeneous processor machines. I don't think that I am the common thread of failure ;-) The first was the Cydrome Cydra 5, which had a 50/25 MFLOP ECL super computer wedded to a symetric 6 68020 cpu system running our MP'ized System V R3.2. This machine was introduced in 1987. The second was the Stardent Stiletto, which had two MIPS R3000, and four intel 860 processors. Each R3000 had a tightly coupled i860 which was connected as a vector processor, via our implementation of the canonical Mips write buffer chip. This is the same model we used to build the Ardent Titan super graphics workstations, altough we implemented the vector unit with gatearrays and floating point cores. The other two i860s were on a separate graphics board, with associated pixel processing power. One common problem with both machines is that the benchmark results did not justify the cost of the machine. This isn't a problem with the design of the machine, per se: Instead, it is just hard to write a benchmark that could effectively use all the power avialable on a heterogeneous machine: SPEC ran on Stiletto just as well as it would run on any machine with two 33MHz R3000/R3010 cpus. SPEC didn't "notice" that 3D graphics could go on concurrently with no performance loss. [Mashey: add a gSPEC to the mix: you must rotate a teapot while gcc'ing ;-] Another common problem with both machines is twice the technology risk exposure. At Cydrome, the ECL super computer was ready, and demonstrated at a trade show, one year before the general purposed 68020 system was ready. At Ardent/Stardent, the dual R3000 system was ready 10 months before all the bugs were ironed out of the i860 chips and our software. So, yes, Virginia, SHATOT (twice), and did not succeed. > Based on previous experience in this group, Somebody Has Already Thought > Of This. If it's true that SHATOT, then please share it with us. Of course that by no means indicates that it is not a viable proposition. However, there are bones littered on the side of the trail... -mac \___/^^\___/ ---|oo|--- || \/ -- +-----------+-----------------------------------------------------------------+ |mac@kpc.com| Increasing Software complexity lets us sell Mainframes as | | | personal computers. Carry on, X windows/Postscript/emacs/CASE!! | +-----------+-----------------------------------------------------------------+
martin@adpplz.UUCP (Martin Golding) (05/31/91)
In <MAC.91May29095345@gold.kpc.com> mac@gold.kpc.com (Mike McNamara) writes: > I've worked on two different commercially unsucessful >heterogeneous processor machines. I don't think that I am the common >thread of failure ;-) [description of two machines canceled due to lack of interest] > One common problem with both machines is that the benchmark >results did not justify the cost of the machine. AHA I just realised that I have useful information to contribute here. Here at ADP we built a _successful_ dual processor system; it ran a (disk-based) RT11 like operating system and (virtual memory based) Reality, simultaneously. For about 4 years they were our most popular single model. Our machine sold because we had just oodles of software for _each_ processor, and the cost and risk of rewriting was worse than the engineering for the computer. (Two systems types and a hardware engineer for the computer, 200 programmers working 5 years for the software). If the stuff had _all_ been in _one_ (preferably popular) language, there wouldn't have been any point. Moral: It's only worth building the dual processor machine if you _already_ have software that _can't otherwise_ be ported. See the interesting dos and cp/m add-ins for all kinds of interesting computers. Note also that binary converters and interpreters are gaining strength. Drawback: interesting software is currently produced for DOS and unix. Dos machines are dos machines, eh? And unix computers sell based on how many interesting packages get ported. So the demand for a RISC computer with 68xxx binary coprocessor is probably not worth the engineering. Besides, we get better stuff for our 88k's than for our 68k's; I think that the software porting types have a personal fondness for the RISC systems that biases the results. All of this has strayed very wide from the RISC vs CISC long term performance, which is heavily language dependent; unless someone wants to make vectorising compilers that extract string functions from c. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp