smryan@garth.UUCP (Steven Ryan) (06/29/88)
In article <7570@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >In article <12171@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: > > It came up with a four-68000-instruction `signum' function, > > and its versions of min and max (and minmax together) are also quite > > surprising. None of them have any branches. >(What is signum?) >Not so surprising for min and max (and minmax). However, they require a >sign extending shift, so that implementation is not possible on all >machines. Also, if a branch takes only one cycle (with delay slot), >you do not gain anything (in general). The article was interesting, but the technique does not appear ready for compilers. In the article, the optimiser was used as standalone program that was fed small pieces of code. Branches tend to be deadly to fast machines. Delay slots or no, it can still give the cache/instruction stack indigestion. By the way, as we continue forward into the past, the Cyber 170 macros are filled with handcoded sequences to avoid branches.
chuck@amdahl.uts.amdahl.com (Charles Simmons) (06/30/88)
In article <834@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: >The article was interesting, but the technique does not appear ready for >compilers. In the article, the optimiser was used as standalone program that >was fed small pieces of code. yeah... The authors basically suggested using the results of the standalone program to do things similar to peephole optimization, or writing code sequences for the GNU backend, or providing help in figuring out cute instruction sequences to use in your innermost handcoded loop. >Branches tend to be deadly to fast machines. Delay slots or no, it can still >give the cache/instruction stack indigestion. Gosh! You guys aren't thinking big enough. How about multiple parallel pipelines to compute all the various instruction threads in parallel and just keep the results of the one that is actually taken? Takes Big Bucks? Sure. But when you're using gallium arsenide, who cares?
aglew@urbsdc.UUCP (07/01/88)
> >Branches tend to be deadly to fast machines. Delay slots or no, it can still > >give the cache/instruction stack indigestion. > >Gosh! You guys aren't thinking big enough. How about multiple >parallel pipelines to compute all the various instruction threads >in parallel and just keep the results of the one that is actually >taken? Takes Big Bucks? Sure. But when you're using gallium >arsenide, who cares? The IBM 360 model 97 (I always get the model number wrong; anyway, one of the early 360s) did this.
smryan@garth.UUCP (07/01/88)
> >Branches tend to be deadly to fast machines. Delay slots or no, it can still > >give the cache/instruction stack indigestion. > >Gosh! You guys aren't thinking big enough. How about multiple >parallel pipelines to compute all the various instruction threads >in parallel and just keep the results of the one that is actually >taken? Actually, it is a 170 (aka 6600) and 205 coding technique (read: trick). Given a conditional expression like if p then a else b if a and b are side-effect and fault-free, define a mask mask(p) = all ones if p true all zeros if p false then the above conditional is encoded as mask(p) and a or not mask(p) and b For example, the following 170 sequence folds Ascii 8/12 control characters into blanks: MX0 0 +0. SX7 40B 8/12 space character code. IX6 X5-X7 X5=original character. IX6 X6+X0 get rid of -0. If control, X5<X7, X5-X7<0, so sign set. AX6 59 all ones if X5 is a control character, otherwise zeros. BX5 -X6*X5 zero if control character, otherwise original character. BX6 X6*X7 space character if control, otherwise zero. BX5 X5+X6 X5 is a control folded to space or a printing character. If I hadn't run out of registers (this is a macro), I could have interleaved the next sequence which folds lower to upper case, so that it would have four parallel streams for two conditionals. We guys do think big. Cheaply.
mash@mips.COM (John Mashey) (07/02/88)
In article <28200172@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: > >> >Branches tend to be deadly to fast machines. Delay slots or no, it can still >> >give the cache/instruction stack indigestion. >>Gosh! You guys aren't thinking big enough. How about multiple >>parallel pipelines to compute all the various instruction threads >>in parallel and just keep the results of the one that is actually >>taken? Takes Big Bucks? Sure. But when you're using gallium >>arsenide, who cares? >The IBM 360 model 97 (I always get the model number wrong; >anyway, one of the early 360s) did this. 360/91, of which maybe (on the order of) 20 were built; there was also a 360/95 that was a little faster. It is instructive to examine this, compared with, for example, the 360/85, which was built at aboutthe same time (late 60s). The complexity of the 360/91 occurred because the CPU was too much faster than the memories. The 360/85 used a cache instead, was more cost-effective. Certainly the later machines derive more from its ideas than from the 91. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
baum@Apple.COM (Allen J. Baum) (07/02/88)
[] >In article <28200172@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: > >>>Branches tend to be deadly to fast machines. Delay slots or no, it can still >>>give the cache/instruction stack indigestion. >> >>Gosh! You guys aren't thinking big enough. How about multiple >>parallel pipelines to compute all the various instruction threads >>in parallel and just keep the results of the one that is actually >>taken? Takes Big Bucks? Sure. But when you're using gallium >>arsenide, who cares? > >The IBM 360 model 97 (I always get the model number wrong; >anyway, one of the early 360s) did this. I don't believe that any model of IBM 360 or 370 (including /91, /92, or /195) ever executed multiple threads and threw away the unused one. The did have a high degree of overlap, and out-of-order execution, but that is all (and was enough). The only rumors I've heard about multiple threading machines from IBM was a machine that was never fully built, killed by the promise of the 360/91, which (I believe) John Cocke worked on, called the ACS. I've heard rumors of all kinds of features that this machine was supposed to have, some of which were pretty hairy. If only it had ben 360 compatible, it may have seen the light of day. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
mat@amdahl.uts.amdahl.com (Mike Taylor) (07/02/88)
> > I don't believe that any model of IBM 360 or 370 (including /91, /92, or /195) > ever executed multiple threads and threw away the unused one. As I recall, the 3033 did this ( and I think maybe the 168 too, but I'm not sure about that), in a limited sense. The 3033 was the "tiger team" response to Amdahl's 470 V/6. It had a three(two?)-pipe IPPF (Instruction Pre-Processing Facility) that got three(two?) instruction streams ready for processing - i.e. operands checked and ready to go, etc. However, I don't think any machine has had multiple E-units, as you point out. -- Mike Taylor ...!{hplabs,amdcad,sun}!amdahl!mat [ This may not reflect my opinion, let alone anyone else's. ]
pl@tut.fi (Pertti Lehtinen) (07/04/88)
From article <834@garth.UUCP>, by smryan@garth.UUCP (Steven Ryan): > > Branches tend to be deadly to fast machines. Delay slots or no, it can still > give the cache/instruction stack indigestion. > > By the way, as we continue forward into the past, the Cyber 170 macros are > filled with handcoded sequences to avoid branches. > When avoiding branches, interesting technique is adopted in ARM-processor (Acorn Risc Machine). All instructions are conditional, which makes it possible to avoid short branches very easily. -- pl@tut.fi ! All opinions expressed above Pertti Lehtinen ! are preliminary and in subject N 61 26' E 23 50' ! to change without any further notice.
urjlew@ecsvax.uncecs.edu (Rostyk Lewyckyj) (07/05/88)
In article <13157@apple.Apple.COM>, baum@Apple.COM (Allen J. Baum) writes: > >>Gosh! You guys aren't thinking big enough. How about multiple > >>parallel pipelines to compute all the various instruction threads > >>in parallel and just keep the results of the one that is actually > >>taken? > > > >The IBM 360 model 97 (I always get the model number wrong; > >anyway, one of the early 360s) did this. > > I don't believe that any model of IBM 360 or 370 (including /91, /92, or /195) > ever executed multiple threads and threw away the unused one. The did have If I remember correctly, the machine would decode the addresses of the instructions on both sides of the branch, and then decode the addresses of their operands, and begin fetching these. As soon as the decision on which side of the branch would be taken was decided the machine would abort, throw away, the fetching for that thread. No actual instruction executions were done in parallel. i.e. if one side of the branch was a multiply and the other an add, neither the multiplication nor the addition would be started, just the operands would be sent for if possible. ----------------------------------------------- Reply-To: Rostyslaw Jarema Lewyckyj urjlew@ecsvax.UUCP , urjlew@tucc.bitnet or urjlew@tucc.tucc.edu (ARPA,SURA,NSF etc. internet) tel. (919)-962-9107
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/05/88)
In article <28200172@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: | The IBM 360 model 97 (I always get the model number wrong; | anyway, one of the early 360s) did this. Was that the "stretch" model? I can't remember the model number, but very few were ever made. It supposedly was a reasearch item which went into production but was never marketed, so you could buy one if you could find out it existed (or something bizarre like that). I find it hard to believe that IBM used to have a machine which they built for sale but didn't market. Now, of course they have come 180 degrees from that position. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
rw@beatnix.UUCP (Russell Williams) (07/05/88)
In article <2530@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes: >360/91, of which maybe (on the order of) 20 were built; >there was also a 360/95 that was a little faster. >It is instructive to examine this, compared with, for example, the >360/85, which was built at aboutthe same time (late 60s). >The complexity of the 360/91 occurred because the CPU was too much faster >than the memories. >The 360/85 used a cache instead, was more cost-effective. Certainly >the later machines derive more from its ideas than from the 91. The 195 was basically a 91 with a cache. The 91 spent its time waiting on memory unless you did mostly FP. SLAC does, & a friend of mine there said that for her work, the 91 was much faster than the 3033. Russell Williams ..uunet!elxsi!rw ..ucbvax!sun!elxsi!rw
hankd@pur-ee.UUCP (Hank Dietz) (07/06/88)
In article <4014@korppi.tut.fi>, pl@tut.fi (Pertti Lehtinen) writes: > When avoiding branches, interesting technique is adopted in > ARM-processor (Acorn Risc Machine). All instructions are > conditional, which makes it possible to avoid short branches > very easily. A similar technique is used in the Cydra machine... as I understand it, this machine performs VLIW-like scheduling of blocks of code (relatively conventional control-flow stuff) and simply enables/disables output arcs (i.e., stores) based on condition inputs for each block. In other words, you can execute both sides of a branch intermingled, then decide to ignore the results of either. I hav e only a single literature sheet on this architecture, however. Anyone have more details? -hankd
root@cca.ucsf.edu (Computer Center) (07/06/88)
In article <11458@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: | In article <28200172@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: | | | The IBM 360 model 97 (I always get the model number wrong; | | anyway, one of the early 360s) did this. | | Was that the "stretch" model? I can't remember the model number, but | very few were ever made. It supposedly was a reasearch item which went The Stretch was the 7030; it established technology which later appeared in the 7090 and 7094. This was several years before the 360 series and "Solid Logic Technology" (SLT). The 7000 series machines were discrete transistor based. Much of the technology was downgraded going from the 7030 to the 7090; e.g. the Stretch disk ran the disk access comb in parallel while the 7090 etc. version (1301) did a head select and read from one track at a time. I don't believe there was a Model 97; the high speed scientific machine was the Model 91; the fast general purpose machine was the Model 85. There was also a Model 195 about which I have no details. IBM allegedly lost money on each 91 and built them only to keep CDC out of prestige accounts. Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca (The I.G.) (...ucbvax!ucsfcgl!cca.ucsf!thos) OS|2 -- an Operating System for puppets. #include <disclaimer.std>
eugene@pioneer.arc.nasa.gov.arpa (Eugene N. Miya) (07/07/88)
In article <1300@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes: >In article <11458@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: >IBM allegedly lost money on each 91 and built them only to keep CDC >out of prestige accounts. Do you think it is any different today? Do you think IBM reads the comp.arch? [I don't, well maybe a few at ibmpa.. ;-)]. Only today it's not just CDC, it's all those little micro companies. Mainframe people seem really depressed of late. --enm
paul@unisoft.UUCP (n) (07/07/88)
In article <1300@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes: > >The Stretch was the 7030; it established technology which later appeared >in the 7090 and 7094. This was several years before the 360 series and >"Solid Logic Technology" (SLT). The 7000 series machines were discrete >transistor based. > One of the best stories about the Stretch (I think I read about it in CACM maybe 5 years ago) was about an early problem in the core memory: I order to keep the temperature down the core stacks were embedded in moving oil for cooling, an engineer was trying to find an intermittent failure, this lasted for several weeks untill finally someone noticed that the failure tended to move through memory addresses sequentially, finally the problem was diagnosed - a piece of solder in the oil. This was the first (and probably last) computer to be repaired by giving it an oil change ..... Paul -- Paul Campbell, UniSoft Corp. 6121 Hollis, Emeryville, Ca E-mail: ..!{ucbvax,hoptoad}!unisoft!paul Nothing here represents the opinions of UniSoft or its employees (except me) "Nuclear war doesn't prove who's Right, just who's Left" (ABC news 10/13/87)
sxr@purdue.UUCP (07/13/88)
In article <11458@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >In article <28200172@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: > >| The IBM 360 model 97 (I always get the model number wrong; >| anyway, one of the early 360s) did this. > > Was that the "stretch" model? I can't remember the model number, but >very few were ever made. It supposedly was a reasearch item which went >into production but was never marketed, so you could buy one if you >could find out it existed (or something bizarre like that). > > I find it hard to believe that IBM used to have a machine which they >built for sale but didn't market. Now, of course they have come 180 >degrees from that position. >-- > bill davidsen (wedu@ge-crd.arpa) The Stretch belonged to an earlier generation than the 360. Stretch was the IBM 7030 which IBM withdrew from the market in 1961. Its performance was spectacular for its time, but not as good as had been promised. IBM agreed to deliver eight systems at a reduced price at which it would lose money on every system delivered. See "IBM's Early Computers" by Bashe, Johnson, Palmer, and Pugh. MIT Press 1986. There was no 360 model 97. IBM did introduce a model 90 series of very high performance 360 systems, at least in part to help it compete with Control Data. This was the basis of an anti-trust suit by CDC against IBM. IBM built and delivered a few model 91 systems, and at least one model 95. They offered and sold a few model 91 systems to University Computing centers at clearance prices, and withdrew them from the market when the model 85 was introduced, and then the top of the line 195.