cs4g6ad@maccs.dcss.mcmaster.ca (Custeau RD) (09/27/89)
I am looking for references or information on VLIW architectures for a fourth-year architecture seminar. Any help would be greatly appreciated. Please reply by e-mail, as I don't often read the news. Thanks in advance!
cutler@mfci.UUCP (Ben Cutler) (09/29/89)
In article <251FCB3F.12366@maccs.dcss.mcmaster.ca> cs4g6ad@maccs.dcss.mcmaster.ca (Custeau RD) writes: > > I am looking for references or information on VLIW architectures for >a fourth-year architecture seminar. Any help would be greatly appreciated. > > Please reply by e-mail, as I don't often read the news. > > Thanks in advance! A good ``introductory'' text is ``Bulldog: A Compiler for VLIW Architectures'', by John Ellis, which won the ACM Doctoral Dissertation Award in 1985. It's available from MIT Press. For information on a commercial VLIW implementation, contact Multiflow Computer at (203) 488-6090.
slackey@bbn.com (Stan Lackey) (09/29/89)
In article <1050@m3.mfci.UUCP> cutler@mfci.UUCP (Ben Cutler) writes: >In article <251FCB3F.12366@maccs.dcss.mcmaster.ca> cs4g6ad@maccs.dcss.mcmaster.ca (Custeau RD) writes: >> I am looking for references or information on VLIW architectures for >>a fourth-year architecture seminar. Any help would be greatly appreciated. >A good ``introductory'' text is ``Bulldog: A Compiler for VLIW Architectures'', It's a great work. Nearly all of the attention is on the compiler, and doesn't get into hardware or OS issues. A main difference between the book and what Multiflow ended up shipping is in the memory interface. The book expected a separate path from each main data path in the CPU to its corresponding bank of memory, with relatively slow transfer paths in between. Thus, for peak performance, the compiler would have to be able to predict which memory bank a datum is in, and give the instruction to the corresponding CPU. In many cases, this is not a problem, as scalar variables (in FORTRAN anyway) have their address known at compiler time, and of course the base of all arrays are known. But apparently calculated indices into arrays are common enough to be a problem. There was much attention paid to this "memory bank disambiguation" in the book, and even included a syntax for allowing the user to give hints to the compiler to aid in disambiguation. From what I could tell from looking at Multiflow's released documentation, Multiflow eliminated this problem with a kind of a "distributed crossbar" (as I would call it) between the CPU's and the memory banks. This allows the disambiguation to be resolved at runtime, and answered my main criticism of the technology. Another difference is that the prototype compiler for the thesis was written in Lisp. I think Multiflow ended up using C. -Stan Not affiliated with Multiflow except as an interested observer. Not necessarily the views of BBN either.
cik@l.cc.purdue.edu (Herman Rubin) (10/01/89)
In article <1050@m3.mfci.UUCP>, cutler@mfci.UUCP (Ben Cutler) writes: > In article <251FCB3F.12366@maccs.dcss.mcmaster.ca> cs4g6ad@maccs.dcss.mcmaster.ca (Custeau RD) writes: | > | > I am looking for references or information on VLIW architectures for | >a fourth-year architecture seminar. Any help would be greatly appreciated. > A good ``introductory'' text is ``Bulldog: A Compiler for VLIW Architectures'', > by John Ellis, which won the ACM Doctoral Dissertation Award in 1985. It's > available from MIT Press. For information on a commercial VLIW > implementation, contact Multiflow Computer at (203) 488-6090. As one who always finds ways to use the architecture that the compiler writers did not think about, I maintain that this book helps little. Why is it the case that people in the computing field think that someone can understand a computer in ignorance of its instruction set and the temporal operation of those instructions? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
rodman@mfci.UUCP (Paul Rodman) (10/02/89)
In article <1626@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >> A good ``introductory'' text is ``Bulldog: A Compiler for VLIW Architectures'', >> by John Ellis, which won the ACM Doctoral Dissertation Award in 1985. It's >> available from MIT Press. For information on a commercial VLIW >> implementation, contact Multiflow Computer at (203) 488-6090. > >As one who always finds ways to use the architecture that the compiler >writers did not think about, I maintain that this book helps little. Helps little for what? Compilers ALWAYs involve tradeoffs, as does hardware, as does any engineering endeavor. I would be suprised if you _couldn't_ find ways to use computers that the compiler writers didn't think about..(or were to stupid to realize, or decided to blow off...) IMHO, this book is _extremely_ well written, and filled with good ideas. When I first read it there were some things I disagreed with, but I was very impressed with it overall. > >Why is it the case that people in the computing field think that someone >can understand a computer in ignorance of its instruction set and the >temporal operation of those instructions? What do you mean by "understand"? The Bulldog effort broke important ground, period. Throwing stones at it because it didn't cover everything is silly. It was a phd thesis, not a product! Belive me, in order to generate code for a VLIW you _do_ need to understand the instruction set and the temporal operation of those instructions! But the kind of stuff Ellis talks about is at a higher level and has to come first. Obviously the best product will try to integrate things from the lowest gate level back to the phase-one front end of the compiler, but it is human nature to try to comparmentalize things. Too _many_ folks have designed computers and focused on the low-level instructions set/ pipelining issues without having a clue as to how the compiler will actually find a way use them. When it comes time to emit a .o file, hand waving won't work and bad algorithms will fall apart on somebody's code. Written any compilers lately? >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Paul K. Rodman / KA1ZA / rodman@multiflow.com Multiflow Computer, Inc. Tel. 203 488 6090 x 236 Branford, Ct. 06405
hankd@pur-ee.UUCP (Hank Dietz) (10/03/89)
In article <1626@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <1050@m3.mfci.UUCP>, cutler@mfci.UUCP (Ben Cutler) writes: >> In article <251FCB3F.12366@maccs.dcss.mcmaster.ca> cs4g6ad@maccs.dcss.mcmaster.ca (Custeau RD) writes: >| > >| > I am looking for references or information on VLIW architectures for >| >a fourth-year architecture seminar. Any help would be greatly appreciated. > >> A good ``introductory'' text is ``Bulldog: A Compiler for VLIW Architectures'', >> by John Ellis, which won the ACM Doctoral Dissertation Award in 1985. It's >> available from MIT Press. For information on a commercial VLIW >> implementation, contact Multiflow Computer at (203) 488-6090. > >As one who always finds ways to use the architecture that the compiler >writers did not think about, I maintain that this book helps little. VLIW machine compilers essentially search for optimal schedules of instructions; I don't see how a full-width search could be ommitting things that Dr. Rubin would want to do. ;-) >Why is it the case that people in the computing field think that someone >can understand a computer in ignorance of its instruction set and the >temporal operation of those instructions? The primary reason humans have done so well is not that we are so much better than other life forms, but rather that we build much better tools. To accomplish any large task, a human must be able to be ignorant of at least some details -- VLIW compiler technology is a prime example of a mechanism keeping track of, and optimizing use of, a structure too complex for humans to manage directly. I suppose you'd rather not use any tools... but then why do you want to use the tools called "computers?" -hankd@ecn.purdue.edu >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 >Phone: (317)494-6054 >hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP) PS: Dr. Rubin's views do not represent those of Purdue University. ;-)
cik@l.cc.purdue.edu (Herman Rubin) (10/03/89)
In article <13038@pur-ee.UUCP>, hankd@pur-ee.UUCP (Hank Dietz) writes: > In article <1626@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >As one who always finds ways to use the architecture that the compiler > >writers did not think about, I maintain that this book helps little. > > VLIW machine compilers essentially search for optimal schedules of > instructions; I don't see how a full-width search could be ommitting things > that Dr. Rubin would want to do. ;-) > > >Why is it the case that people in the computing field think that someone > >can understand a computer in ignorance of its instruction set and the > >temporal operation of those instructions? What kind of integer arithmetic is there in hardware? Examples of the problem are what kinds of integer products can be formed in one operation. Some machines can form long x long -> double long in one operation, and some cannot. Some allow this to be done unsigned and some do not. Some machines allow double/single getting quotient and remainder in one operation and some do not. Some allow this to be unsigned and some do not. Some machines have hardware square root and some do not. A few allow fixed-point arithmetic, but most do not. Different machines have different addressing modes; vector operations should be coded differently depending on that. I have profitably used bit operations on floating-point numbers; this means that I must know the exact representation. How do you code the following: look at one bit and transfer if it is a 1, setting pointers so that that bit will not be looked at again? In the cases where I envision using this, it is a tool and will be used frequently. Another such is the spacing between 1's in a bit stream, and in this case there are alternatives. What does your HLL have to say about these? > The primary reason humans have done so well is not that we are so much > better than other life forms, but rather that we build much better tools. > To accomplish any large task, a human must be able to be ignorant of at least > some details -- VLIW compiler technology is a prime example of a mechanism > keeping track of, and optimizing use of, a structure too complex for humans > to manage directly. The human knows when not to use the tools provided. Are you so dead sure that I cannot manage such structures better than the compiler? The most complicated instruction set I have seen is MUCH simpler than HLLs. The compiler maps a route on Interstates; I find a short cut on good roads. Will an autodriver let you back the car out of the garage so you can clean the garage? > I suppose you'd rather not use any tools... but then why do you want to use > the tools called "computers?" Use the best tool for the user and the job. The crude mechanic must put the car on the analyzer and use what it tells him. There are a few good mechanics who can do better listening to the engine. > -hankd@ecn.purdue.edu > > PS: Dr. Rubin's views do not represent those of Purdue University. ;-) PPS: Neither do those of Dr. Dietz. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
hankd@pur-ee.UUCP (Hank Dietz) (10/04/89)
In article <1629@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: ...[numerous "why don't HLLs let me say..." flames omitted]... >What does your HLL have to say about these? What HLL? What does this have to do with VLIW techniques? Dr. Rubin is complaining about languages -- but we are talking about VLIW compiler analysis and program transformation technology, not language design. I hope that this confusion is not common. Perhaps a lot of compilers do blindly spit-out the "obvious" code for badly-designed language constructs, but that certainly isn't the state of the art. I would think that a person who has spent some time counting T-states would really appreciate the VLIW work that Ellis presents in his award-winning PhD thesis... I know I do. >... Are you so dead sure that >I cannot manage such structures better than the compiler? The most complicated >instruction set I have seen is MUCH simpler than HLLs. ... VLIW technology is complex because of its use of parallelism -- it has very little to do with instruction set complexity issues. Generating good code for a VLIW is most like microcode scheduling/compaction for a huge, asymmetric, microcoded machine. You really don't even want to try it by hand... well, I know I don't. And why bother? VLIW compiler techniques come very close to optimal schedules every time. I can't match it, let alone beat it. -hankd@ecn.purdue.edu "A good workman is known by his tools." Intro to Chapter 12, F. Brooks, "The Mythical Man-Month," p. 127.
cik@l.cc.purdue.edu (Herman Rubin) (10/04/89)
In article <13050@pur-ee.UUCP>, hankd@pur-ee.UUCP (Hank Dietz) writes: > In article <1629@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > ...[numerous "why don't HLLs let me say..." flames omitted]... > >What does your HLL have to say about these? > > What HLL? What does this have to do with VLIW techniques? Dr. Rubin is > complaining about languages -- but we are talking about VLIW compiler > analysis and program transformation technology, not language design. If the language the compiler compiles does not have the operations to be used in my program, I cannot use them. I have seen NO language designed for the efficient use of hardware operations. The choice of the algorithm depends on the hardware, and there can be lots of different algorithms. An example: most of the efficient ways of generating non-uniform random variables are acceptance-rejection. This is only partly vectorizable on CRAYs, and is especially difficult on the CRAY 1. This is due to the non-existence of some hardware instructions. It would be necessary to try out a number of computationally less efficient methods to find the ones which are best on the machine. I would not consider any of them on the CYBER 205, as they are obviously worse. > I hope that this confusion is not common. Perhaps a lot of compilers do > blindly spit-out the "obvious" code for badly-designed language constructs, > but that certainly isn't the state of the art. I would think that a person > who has spent some time counting T-states would really appreciate the VLIW > work that Ellis presents in his award-winning PhD thesis... I know I do. > > >... Are you so dead sure that > >I cannot manage such structures better than the compiler? The most complicated > >instruction set I have seen is MUCH simpler than HLLs. ... > > VLIW technology is complex because of its use of parallelism -- it has very > little to do with instruction set complexity issues. Generating good code > for a VLIW is most like microcode scheduling/compaction for a huge, > asymmetric, microcoded machine. You really don't even want to try it by > hand... well, I know I don't. And why bother? VLIW compiler techniques > come very close to optimal schedules every time. I can't match it, let > alone beat it. What you are saying is that an instruction scheduler can do better thqn a human. This is mostly true. It is necessary to allow human override, as the scheduler only takes into account what its designers thought was of value. But this does not answer my complaint. I need to know these instructions to make use of them. I certainly can use information about their timing and overlap to select those algorithms which will run faster. I do not insist that an assembler carry out the instructions in the order I present them, as long as the program logic is maintained. This is what an optimizer does, and I do not oppose this. I do not even object to the optimizer-compiler-scheduler even suggesting alternates which are more efficient, but which it cannot be sure will work. One of our programmers asked me whether some transfers could be deleted from code. It was possible by essentially duplicating some code, and making inessential changes in the program. I doubt a compiler could figure this out. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
preston@titan.rice.edu (Preston Briggs) (10/04/89)
In article <1630@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >One of our programmers asked me whether some transfers could be deleted >from code. It was possible by essentially duplicating some code, and >making inessential changes in the program. I doubt a compiler could >figure this out. Steve Carr showed me this transformation, which can be done automatically (and *is* done, in a local experimental compiler). This is slightly contrived, for expository simplicity. A simple DO-loop DO i = 1, 2*n x(i) = f(y(i)) z(i) = g(x(i-1)) ENDDO where x, y, and z are arrays and f and g are arbitrary functions. Note the reuse of each x(i) value one iteration after it has been set. We can take advantage of this reuse x1 = x(0) DO i = 1, 2*n x0 = f(y(i)) x(i) = x0 z(i) = g(x1) x1 = x0 ENDDO This is nice because we save a memory access per iteration. (Assuming a smart backend (normal optimizing compiler) that will allocate x0 and x1 to registers.) The problem is that we'll end up with an extra reg-reg copy (x1=x0) at the end of the loop. This can be avoided by unrolling slightly to subsume the copy. x1 = x(0) DO i = 1, n x0 = f(y(i)) x(i) = x0 z(i) = g(x1) x1 = f(y(i+1)) x(i+1) = x1 z(i) = g(x0) ENDDO The first transformation, dependence-based register allocation, is due to Randy Allan and Ken Kennedy. The 2nd transform, unrolling to subsume copies, is in Kennedy's thesis (1971). Both have been implemented at Rice by Steve Carr. Of course, this is a simple example; but these transformations (with others of a similar nature) can give tremendous improvements in real code. What's the point? Compilers are getting better. Humans had to come up with the original ideas (certainly compiler writers learn from human coders), but a compiler can apply the ideas repeatedly, without human effort, to amazingly nastly code. Regards, Preston Briggs
preston@titan.rice.edu (Preston Briggs) (10/05/89)
In article <1914@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: > DO i = 1, 2*n > x(i) = f(y(i)) > z(i) = g(x(i-1)) > ENDDO becomes > x1 = x(0) > DO i = 1, 2*n > x0 = f(y(i)) > x(i) = x0 > z(i) = g(x1) > x1 = x0 > ENDDO becomes > x1 = x(0) > DO i = 1, n > x0 = f(y(i)) > x(i) = x0 > z(i) = g(x1) > x1 = f(y(i+1)) > x(i+1) = x1 > z(i) = g(x0) > ENDDO but this last is wrong. It should be x1 = x(0) DO i = 1, 2*n, 2 x0 = f(y(i)) x(i) = x0 z(i) = g(x1) x1 = f(y(i+1)) x(i+1) = x1 z(i+1) = g(x0) ENDDO I'm sorry for any confusion. Think of this as an example of why I'd prefer an optimizer to transform my program. Preston
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/05/89)
In article <1630@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >I have seen NO language designed for the efficient use of hardware operations. Gee, I have, albeit for specific computers, e.g.: Bliss-10 PDP-10 Fortran IBM 709 Neither was very good in the portability department, at least not in its initial form, as befits a hardware specific language. To return to the original argument, John Ellis' book doesn't concern itself with specific instruction sets because at the time he wrote it, VLIWs existed only on paper. The concrete design of a VLIW happened only after most of the rest of the Yale VLIW project left to build real hardware at Multiflow. I was at Yale at the time and the longest instruction word we had was 36 bits in our DEC-20. Or maybe 48 bits in a PDP-11. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu Massachusetts has 64 licensed drivers who are over 100 years old. -The Globe
ok@cs.mu.oz.au (Richard O'Keefe) (10/05/89)
In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > If the language the compiler compiles does not have the operations to be used > in my program, I cannot use them. I have seen NO language designed for > the efficient use of hardware operations. The choice of the algorithm > depends on the hardware, and there can be lots of different algorithms. How about PL-360 and Bliss-10, both of which let you generate any instruction you like? Come to that, how about the C compiler that comes with UNIX in System V.3 for the 386, which lets you write assembly-code routines that are expanded in-line? Or how about the SunOS C/Fortran/Pascal/Modula2 compilers, which have a pass that replaces apparent procedure calls with arbitrary assembly code of your choice (specified in .il files) before final code optimisation? Herman Rubin clearly enunciates a policy which I believe is responsible for much of the unreliability of present-day software: the machine is what it is, the programmer's job is to exploit the machine, the language's job is to expose the machine to the programmer. The view that I take is the programmer's job is to express his intentions clearly, the machine's job is to obey the programmer, the language's job is to provide a *simple* conceptual model. Rubin is willing to be the servant of his machines. I'm not. I've exchanged E-mail with a floating-point expert who had some illuminating things to say about the work being done for Ada; it turns out that they can do amazing things without needing to know the details of the hardware instructions, but what does hurt them is the extent to which Ada already lets hardware vagaries show through.
cik@l.cc.purdue.edu (Herman Rubin) (10/05/89)
In article <2307@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes: > In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > > If the language the compiler compiles does not have the operations to be used > > in my program, I cannot use them. I have seen NO language designed for > > the efficient use of hardware operations. The choice of the algorithm > > depends on the hardware, and there can be lots of different algorithms. > > How about PL-360 and Bliss-10, both of which let you generate any instruction > you like? Come to that, how about the C compiler that comes with UNIX in > System V.3 for the 386, which lets you write assembly-code routines that > are expanded in-line? Or how about the SunOS C/Fortran/Pascal/Modula2 > compilers, which have a pass that replaces apparent procedure calls with > arbitrary assembly code of your choice (specified in .il files) before > final code optimisation? I do not believe that the .il approach is quite adequate. As to the others you mention, I am unfamiliar with them. And even that is inadequate; I want to be able even to introduce overloaded infix operators in addition to the usual ones. Another feature missing from most languages since day 1 is to allow the return of a list (NOT a struct). > Herman Rubin clearly enunciates a policy which I believe is responsible > for much of the unreliability of present-day software: > the machine is what it is, > the programmer's job is to exploit the machine, > the language's job is to expose the machine to the programmer. > The view that I take is > the programmer's job is to express his intentions clearly, > the machine's job is to obey the programmer, > the language's job is to provide a *simple* conceptual model. > Rubin is willing to be the servant of his machines. I'm not. I would like to be able to travel to any place in the world at low cost by stepping into a booth and pressing a button. I would like a computer which has a rather large number of machine instructions now lacking with a gigabyte memory, a reasonable screen, good color graphics, a WYSIWYG editor capable of handling mathematical expressions well, at a cost of $1000. These machines do not exist. Does that mean that I should not use the ones that are available? I should not travel by airplane because there are no teleportation booths? That I should do no computing because the machines I would like do not exist? That I should write no papers because the processors I want are not there? What would O'Keefe do in these situations? My travel intentions depend on what is available and how long it will take. I would like to go to more meetings than I do. So I state that my intention is to go to such-and-such meetings for X dollars and return home to sleep each night. Okay, machines; obey me! I have clearly stated that the performance of an algorithm depends on the hardware, so much so that a single hardware instruction can be crucial. The current languages do not provide a conceptual model for the operations which I have thought of. I ask O'Keefe to tell me how to find his Utopian machine and language situation for my problems. > I've exchanged E-mail with a floating-point expert who had some > illuminating things to say about the work being done for Ada; it turns > out that they can do amazing things without needing to know the details > of the hardware instructions, but what does hurt them is the extent to > which Ada already lets hardware vagaries show through. How can it be otherwise? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
ok@cs.mu.oz.au (Richard O'Keefe) (10/06/89)
In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) complained that he had "seen no language designed for the efficient use of hardware operations". In article <2307@munnari.oz.au>, ok@cs.mu.oz.au (I) listed several languages (PL-360, Bliss-10, SunOS C) which provide full access to all hardware instructions. In article <1632@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) replied > I do not believe that the .il approach is quite adequate. What, precisely, can it not do? Admittedly, this approach relies on the back-end optimiser being able to handle any assembly code you throw at it, but then, so must any scheme giving you access to all the operations. > As to the others > you mention, I am unfamiliar with them. And even that is inadequate; If you don't know them, how can you know they are inadequate? > I want > to be able even to introduce overloaded infix operators in addition to the > usual ones. Another feature missing from most languages since day 1 is to > allow the return of a list (NOT a struct). Having used Fortran, Algol, SNOBOL, Lisp, Prolog, APL, COBOL, PL/I, and lots of other languages, I gave up thinking that the syntax of programming languages had any great importance a long time ago. I should have mentioned Ada as one of the languages which provides full machine access; it has both assembly code inserts and overloaded operators. I don't know precisely what Rubin means by returning a list; Algol 68 permits the return of anything (except pointers into the local frame). The reason that most languages don't return lists is precisely that they are designed to let the hardware show through, and you can't fit much in a register. I wrote > Herman Rubin clearly enunciates a policy which I believe is responsible > for much of the unreliability of present-day software: > the machine is what it is, > the programmer's job is to exploit the machine, > the language's job is to expose the machine to the programmer. > The view that I take is > the programmer's job is to express his intentions clearly, > the machine's job is to obey the programmer, > the language's job is to provide a *simple* conceptual model. > Rubin is willing to be the servant of his machines. I'm not. Rubin replied that he would like transmatters and the ultimate computer. > These machines do not exist. Does that mean that I should not use the ones > that are available? I should not travel by airplane because there are no > teleportation booths? That I should do no computing because the machines > I would like do not exist? That I should write no papers because the > processors I want are not there? What would O'Keefe do in these situations? Why, I would do the same as I do now. Nowhere in the text cited above do I say that machines should be ideal! The language's job is to conceal the deficiencies of the computer. The computer can only do 32-bit arithmetic? Then it is the language's job to implement arbitrary-precision integers on top of that and not bother me with stupid little details. For a more common example, I really do not want to be bothered with the MC68010 limitation that a shift-by-literal can shift by at most 8. (I have used a language where this limit showed through, and I did _not_ like it.) A C compiler papers over this limit, and so it should. I normally use a language which runs 2 to 4 times slower than C; I do this gladly because the reduction in _language_ complexity means that I can manage more complex _algorithms_, and win back O(n) speed that way. (Yes, such algorithms can then in principle be recoded in C. But it is much harder to _develop_ complex algorithms in complex languages.) I wrote > I've exchanged E-mail with a floating-point expert who had some > illuminating things to say about the work being done for Ada; it turns > out that they can do amazing things without needing to know the details > of the hardware instructions, but what does hurt them is the extent to > which Ada already lets hardware vagaries show through. Rubin replied: > How can it be otherwise? But that's _my_ point; letting the hardware show through is precisely what _he_ is asking for, and _that_ is the problem! This is not unlike the RISC/CISC debate. I want simple regular languages (C is a monster of complexity) which I can master completely. I don't care whether it's a ROMP or a WM or a /360 underneath. I particularly don't want to _have_ to know.
lewitt@Alliant.COM (Martin Lewitt) (10/06/89)
In article <1989Oct5.025841.2046@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes: *** much deleted *** >To return to the original argument, John Ellis' book doesn't concern itself >with specific instruction sets because at the time he wrote it, VLIWs >existed only on paper. The concrete design of a VLIW happened only after >most of the rest of the Yale VLIW project left to build real hardware at >Multiflow. I was at Yale at the time and the longest instruction word we >had was 36 bits in our DEC-20. Or maybe 48 bits in a PDP-11. Ouch!! Why don't the Floating Point System's AP120B and FPS-164 processors qualify as VLIW? They were commercial products introduced in 1975 and 1981 respectively: 1) Both have 64 bit instructions, with fields controlling 10 operations in parallel, i.e., micro-coded. 2) Both have FORTRAN compilers, the 164 from its introduction in 1981 and the 120B in 1985 (though one may have been available from a third party earlier) 3) By 1983, the FORTRAN compiler for the 164 was "software pipelining", i.e., executing operations from different iterations of a loop in parallel in an instruction. 4) Both were installed on the Yale campus by 1985 (correct me on this one). 8-) I've tried to think of possible objections to their classification as VLIW: 1) The 64 bit instruction word isn't long enough. This quantitative argument falls to the qualitative, 10 operations controlled by a horizontal micro-instruction. 2) They were only "array processors" or "subroutine boxes". This is true, especially for the AP120B, but the 164 was able to run complete FORTRAN jobs from 1982 on. It is more of a back end or attached processor, like the CRAY. 3) No virtual memory or multi-tasking. What does this have to do with VLIW? 4) They were too easy to micro-program, humans could do it. True! And they were fun to program as well. The software pipelining compiler was faster though, and produced fine code. There were only a few hand coding tricks it couldn't handle, e.g., moving the loop control calculations to the floating point adder when the integer/address unit was too busy. The Multiflow machine was no mystery to FPS sales analysts. It was the FPS dream machine: UNIX, virtual memory, better compilers, etc. We had asked FPS to give us these features right from the beginning of the 164 back in 1981. By the time, Multi-flow delivered, it was too, late, Convex, Alliant and FPS's own ECL machine, 264 were already on the scene, and the high end workstations arrived in short order. It will be interesting to see if FPS gets the credit it deserves when the history is written, I doubt it. Will the historians do any better than the contemporaries? In 1985, an article about micro-programming appeared in Scientific American, written by a Stanford professor. He anticipated the day when there would be commercial machines compiling directly to micro-code. FPS had sold nearly 150 machines already. Let's get the history right. 1) The first commercially available VLIW machine was the FPS-164, not the Multiflow (by 6 years). 2) The "first affordable supercomputer" was the FPS-164, not the Convex (by 4 years). FPS was using the "affordable supercomputer" and "departmental supercomputing" phrases long before the Convex advertisements and literature took them up. 3) The first commercially available machine to compile complete HLL applications to micro-code was once again, the FPS-164 (well, actually the VAX since it was cross-compilation). 4) The first commercially available machine to successfully exploit parallel processors automatically using "dusty deck", serial FORTRAN, the Alliant FX8, (by 4 years and counting). 5) The first commercially available RISC machine was the FPS-164. (I'd love to see this one discussed, are VLIWs RISCy?) 8-) I don't know if I've got the history right, I've only been in the industry since '81, so feel free to propose "minor" adjustments. I'll gracelessly hide out when the heavyweights start swinging. > -- >John R. Levine, Segue Software, POB 349, Cambridge MA 02238,
lewitt@Alliant.COM (Martin Lewitt) (10/06/89)
Apologies to John Levine and the net if the authorship of my last posting on this subject was unclear. I'm posting without a home directory for the first time so my signature wasn't automatically appended. It is manually appended here. Please double check before flaming John, you possibly really intend to flame me. --- Phone: (206) 931-8364 Martin E. Lewitt My opinions are Domain: lewitt@alliant.COM 2945 Scenic Dr. SE my own, not my UUCP: {linus|mit-eddie}!alliant!lewitt Auburn, WA 98002 employer's.
dricejb@drilex.UUCP (Craig Jackson drilex1) (10/06/89)
In article <2307@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: >> If the language the compiler compiles does not have the operations to be used >> in my program, I cannot use them. I have seen NO language designed for >> the efficient use of hardware operations. The choice of the algorithm >> depends on the hardware, and there can be lots of different algorithms. > >Herman Rubin clearly enunciates a policy which I believe is responsible >for much of the unreliability of present-day software: > the machine is what it is, > the programmer's job is to exploit the machine, > the language's job is to expose the machine to the programmer. >The view that I take is > the programmer's job is to express his intentions clearly, > the machine's job is to obey the programmer, > the language's job is to provide a *simple* conceptual model. >Rubin is willing to be the servant of his machines. I'm not. >I've exchanged E-mail with a floating-point expert who had some >illuminating things to say about the work being done for Ada; it turns >out that they can do amazing things without needing to know the details >of the hardware instructions, but what does hurt them is the extent to >which Ada already lets hardware vagaries show through. I think you have probably guess Herman Rubin's view pretty well (I don't know him). It is a view which was very common up until around 1980, when computers finally began to be fast enough for many uses. (Before then, there were few users who really thought the computer that they worked on was fast enough.) I choose 1980 because that is when the VAX 11/780 and the 4341 became common; before those machines, few organizations had more than one 32-bit machine with more than one MIPS or so of power. Your view is conventional wisdom in computer science today, partially driven by the common surfeit of computational power available today. When there is enough computing power so that few tasks run into compute-hours, and compilers are good enough to translate rather abstract programming languages to good-enough machine code, this is a reasonable idea. I think that the difference is that Mr. Rubin still doesn't see himself as having a surfeit of computing power available to him. I am not familiar with his area of work, but I know that there remain many problems which still cannot be solved in reasonable time by today's fastest computers. When you are at the bleeding edge of technology, you *do* want to know about the relative speeds of various instructions, and other capabilities of the processor. It might make a difference if multiply were 3 or 10 times faster than divide--a differences of minutes, if not hours. If your compute times were on that order, you would care about how you used the underlying hardware. We should be thankful for the people at the bleeding edge of computability --it is they who keep the MIPS (VUPS, whatever) increasing every year... -- Craig Jackson dricejb@drilex.dri.mgh.com {bbn,ll-xn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}
cik@l.cc.purdue.edu (Herman Rubin) (10/06/89)
In article <2311@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes: > In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) > complained that he had "seen no language designed for the efficient use of > hardware operations". > In article <2307@munnari.oz.au>, ok@cs.mu.oz.au (I) listed several > languages (PL-360, Bliss-10, SunOS C) which provide full access to > all hardware instructions. > In article <1632@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) > replied < > I do not believe that the .il approach is quite adequate. > What, precisely, can it not do? Admittedly, this approach relies on > the back-end optimiser being able to handle any assembly code you throw > at it, but then, so must any scheme giving you access to all the operations. I admit I am not familiar with .il. How would you do the following: fabs(x) implemented as x &~ CLEARSIGN; where this clears the sign bit, and may depend on the precision of x. No moving of x to specific registers, etc., should be used. < > As to the others < > you mention, I am unfamiliar with them. And even that is inadequate; > If you don't know them, how can you know they are inadequate? < > I want < > to be able even to introduce overloaded infix operators in addition to the < > usual ones. Another feature missing from most languages since day 1 is to < > allow the return of a list (NOT a struct). And suppose I want to use << and >> on double longs? How do I get this in without using any unnecessary memory references, and keeping things which are already in registers there? I want to be able to define a heavily overloaded power operator with a reasonable notation such as a!b or something similar. I see no more reason why I should have to use prenex form for this than I should have to write sum(a,b) instead of a+b. To paraphrase, why should I be a slave to the compiler's notation rather than using familiar mathematical notation? > Having used Fortran, Algol, SNOBOL, Lisp, Prolog, APL, COBOL, PL/I, > and lots of other languages, I gave up thinking that the syntax of > programming languages had any great importance a long time ago. > I should have mentioned Ada as one of the languages which provides > full machine access; it has both assembly code inserts and overloaded > operators. I don't know precisely what Rubin means by returning a list; > Algol 68 permits the return of anything (except pointers into the local > frame). The reason that most languages don't return lists is precisely > that they are designed to let the hardware show through, and you can't > fit much in a register. When computers first came out, most of them would multiply an integer by an integer, giving a most and least significant part. Most of them would divide a double length integer by an integer, giving a quotient and a remainder. > I wrote < > Herman Rubin clearly enunciates a policy which I believe is responsible < > for much of the unreliability of present-day software: < > the machine is what it is, < > the programmer's job is to exploit the machine, < > the language's job is to expose the machine to the programmer. < > The view that I take is < > the programmer's job is to express his intentions clearly, < > the machine's job is to obey the programmer, < > the language's job is to provide a *simple* conceptual model. < > Rubin is willing to be the servant of his machines. I'm not. > > Rubin replied that he would like transmatters and the ultimate computer. < > These machines do not exist. Does that mean that I should not use the ones < > that are available? I should not travel by airplane because there are no < > teleportation booths? That I should do no computing because the machines < > I would like do not exist? That I should write no papers because the < > processors I want are not there? What would O'Keefe do in these situations? > > Why, I would do the same as I do now. Nowhere in the text cited above > do I say that machines should be ideal! The language's job is to > conceal the deficiencies of the computer. The computer can only do > 32-bit arithmetic? Then it is the language's job to implement > arbitrary-precision integers on top of that and not bother me with > stupid little details. For a more common example, I really do not want > to be bothered with the MC68010 limitation that a shift-by-literal can > shift by at most 8. (I have used a language where this limit showed > through, and I did _not_ like it.) A C compiler papers over this limit, > and so it should. I normally use a language which runs 2 to 4 times > slower than C; I do this gladly because the reduction in _language_ > complexity means that I can manage more complex _algorithms_, and win > back O(n) speed that way. (Yes, such algorithms can then in principle > be recoded in C. But it is much harder to _develop_ complex algorithms > in complex languages.) Getting O(n) may or may not be appropriate. Algorithms for generating non-uniform random variables efficiently tend to be short, but with a fairly intricate structure. They will be used millions of times. The most efficient, from the standpoint of computational complexity, are difficult even knowing the hardware, and may be not even worth programming. Two of my operations are transferring on a bit and "removing" the bit whether or not the transfer occurs, and finding the distance to the next one in a bit stream and "removing" all bits scanned. These are tools, as basic as addition. As for not worrying about those stupid little details about multiple precision arithmetic, it is the difference between factoring a large number in a month and several years. I can give algorithms which anyone who understands even what is taught in computing will understand, but which the HLLs cannot express. In a university, it is almost impossible to get programming assistance in mathematics and statistics. Also, these supposedly highly optimizing compilers do not seem to be available. I doubt that I would have any more difficulty in writing a decent machine language than any HLL, because a decent machine language would be weakly typed and overloaded operator. > I wrote < > I've exchanged E-mail with a floating-point expert who had some < > illuminating things to say about the work being done for Ada; it turns < > out that they can do amazing things without needing to know the details < > of the hardware instructions, but what does hurt them is the extent to < > which Ada already lets hardware vagaries show through. > > Rubin replied: < > How can it be otherwise? > > But that's _my_ point; letting the hardware show through is precisely > what _he_ is asking for, and _that_ is the problem! > > This is not unlike the RISC/CISC debate. I want simple regular > languages (C is a monster of complexity) which I can master completely. > I don't care whether it's a ROMP or a WM or a /360 underneath. I > particularly don't want to _have_ to know. When I design an algorithm, I _have_ to know. O'Keefe wants to master his language completely, but he is a slave to the language, whether or not he knows it. I will use different algorithms on different computers, and have no problem keeping them straight. I find it no problem whatever in seeing how the hardware interacts with the algorithm and exploiting it. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
ECULHAM@vm.ucs.UAlberta.CA (Earl Culham) (10/06/89)
In article <1630@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:
.... all sort of things about languages accessing the hardware
through computer languages
This is an open letter to Herman Rubin.
========================================================================
Rubin, I realise that what you are trying to say in your postings is
"Don't throw away the power of the computer by limiting yourself to
high level languages."
But, you are going overboard when you say that NO language is designed
for the effecient use of hardware operations. I can name two right off,
Assembler, and PL360.
We have a dichotomy between wanting a program to run as fast as possible
on our current machine, and wanting it to keep running as we migrate
to new machines.
Often, your comments are along the line
"We need access to the nitty-gritty in order make this algorithm fly."
Nice ideas, but they don't belong in comp.arch. Please take them to
a language discussion.
On the other hand, comments along the lines of
"Having feature 'x' in hardware would sure make algorithm 'y' run fast"
are welcome grist.
schwartz@psuvax1.cs.psu.edu (Scott Schwartz) (10/06/89)
In article <2311@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: Herman Rubin writes: > I want > to be able even to introduce overloaded infix operators in addition to the > usual ones. Another feature missing from most languages since day 1 is to > allow the return of a list (NOT a struct). The reason that most languages don't return lists is precisely that they are designed to let the hardware show through, and you can't fit much in a register. Isn't the idea to return out-parameters on the left side of an assignment? I.e. (out1, out2, out3) := foo (in1, in2, in3); corresponds to procedure foo (in in1, in2, in3; out out1, out2, out3) ... foo (in1, in2, in3, out1, out2, out3); This doesn't seem that bad to me. A byte full of syntactic sugar helps the eyestrain go down. :-) -- Scott Schwartz <schwartz@shire.cs.psu.edu> Now back to our regularly scheduled programming....
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/07/89)
In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: >Ouch!! Why don't the Floating Point System's AP120B and FPS-164 processors >qualify as VLIW? ... >4) Both were installed on the Yale campus by 1985 ... They were indeed there, excuse me, my mind curdled. (The fact that I never used either of them while I was there may have something to do with it.) Ellis discusses the FPS-164 in his book. He calls it an LIW, kind of a predecessor to a VLIW. It's a horizontally microcoded machine rather than a RISC with a lot of parallel functional units. The trace scheduling compiler used for VLIWs is quite different from the one used for the FPS. Someone did try trace scheduling for the FPS, but gave up because the assymetry of the architecture made it very hard. There are several pages in Ellis' book that talk about this. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu Massachusetts has 64 licensed drivers who are over 100 years old. -The Globe
cutler@mfci.UUCP (Ben Cutler) (10/08/89)
In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: >Ouch!! Why don't the Floating Point System's AP120B and FPS-164 processors >qualify as VLIW? They were commercial products introduced in 1975 and 1981 >respectively: > > [ lots of deleted text ] > >Let's get the history right. > 1) The first commercially available VLIW machine was the FPS-164, > not the Multiflow (by 6 years). VLIWs and Trace Scheduling (TM) Compiler technology go hand-in-hand. Ellis describes why the FPS-164 doesn't qualify as a good compilation target in his thesis: ``Many micro-programmed architectures have so-called `hot spots', latches, that will hold a value for only one cycle or until another value displaces it. For example, the outputs of the multiplier and adder pipelines in the FPS-164 [ APAL64 Programmer's Guide, FPS 1982 ] are latched and can be fed back to the register banks or directly to the inputs of the pipelines. There isn't enough bandwidth into the register banks, so to keep the pipelines full the programmer is often forced to use the latched pipeline outputs directly as pipeline inputs. But the programmer must be very careful to synchronize the pipelines' inputs and outputs, since the next value coming out of the pipeline will overwrite the previous latched value.'' Ellis goes on to describe how hot spots are unpleasant artifacts for a compiler to deal with, how it isn't always possible to move a value in a latch out of the way and into a register bank, or if you can do so by making room in the register bank, that in turn might require displacing a value in another latch, and so on. If you want to do a good job on complex loops, a backtracking compiler (blech) may be in order. According to Ellis, ''Every value producing operation and every data transfer reads its operands from a register bank and delivers its result to a register bank.'' Ellis notes that, ``Architectures with hot spots are easy to build, but building compilers for them is hard. (It took years to build the FPS-164 compiler.) Better to build the compiler and hardware in parallel, avoiding hardware features that can't be used easily by the compiler.'' If you want to know more, then read the thesis, which is extremely well-written and informative. ( My apologies for any typos in the above quotations. )
cutler@mfci.UUCP (Ben Cutler) (10/08/89)
Oops, in case it wasn't clear, the sentence reproduced below describes a fundamental characteristic of VLIWs (and NOT FPS machines), that operations don't leave values sitting around in hot spots waiting to be trashed or pushed along someplace else: According to Ellis, ''Every value producing operation and every data transfer reads its operands from a register bank and delivers its result to a register bank.''
khb%chiba@Sun.COM (Keith Bierman - SPD Advanced Languages) (10/09/89)
In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes:
... stuff about FPS being first VLIW...
It is my understanding that FPS purchased the basic design from Glen
Culler. Culler continued his design activities and spun up a company
for rev 7 of the basic design (The infamous Culler-7) down in Santa
Barbara.
Glen is still at it, and perhaps the 8th, 9th or 10th edition will
gain a popular following.
Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault | MTS --Only my work belongs to Sun*
I Voted for Bill & | Advanced Languages/Floating Point Group
Opus | "When the going gets Weird .. the Weird turn PRO"
lewitt@Alliant.COM (Martin E. Lewitt) (10/09/89)
In article <1068@m3.mfci.UUCP> cutler@mfci.UUCP (Ben Cutler) writes: *** some deleted *** >VLIWs and Trace Scheduling (TM) Compiler technology go hand-in-hand. Ellis >describes why the FPS-164 doesn't qualify as a good compilation target in >his thesis: I guess what I find strange is that the definition of a type of computers and a (trademarked) compiler technology are so closely tied. The definition doesn't seem as general as the RISC and CISC definitions (which I won't attempt to give here). It is strange that the definition doesn't encompass the similarities with the FPS architecture. Ex-FPS persons I know, immediately saw the similarities with the Multiflow and correctly anticipated (in a general way) how it would perform. We would have welcomed the improvements in register unit bandwidth that Ellis requires, though we would not have given up the ability to latch the result of one pipeline directly to the input of another, if there were a performance penalty. A sophisticated compiler technology should be able to handle this (but why bother if you don't have to?). From a marketing viewpoint the virtual memory and multiuser OS were more important features. The Multiflow architecture just seemed a natural evolution of the FPS architecture. From your posting, it appears Ellis correctly understood the limitations of the FPS architecture and his analysis of it seems to acknowledge some related place for it in history. *** much deleted *** >If you want to know more, then read the thesis, which is extremely >well-written and informative. Yes, I hope to have the opportunity to read this. -- Phone: (206) 931-8364 Martin E. Lewitt My opinions are Domain: lewitt@alliant.COM 2945 Scenic Dr. SE my own, not my UUCP: {linus|mit-eddie}!alliant!lewitt Auburn, WA 98002 employer's.
colwell@mfci.UUCP (Robert Colwell) (10/09/89)
In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: >In article <1989Oct5.025841.2046@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes: >*** much deleted *** >>To return to the original argument, John Ellis' book doesn't concern itself >>with specific instruction sets because at the time he wrote it, VLIWs >>existed only on paper. The concrete design of a VLIW happened only after >>most of the rest of the Yale VLIW project left to build real hardware at >>Multiflow. I was at Yale at the time and the longest instruction word we >>had was 36 bits in our DEC-20. Or maybe 48 bits in a PDP-11. > >Ouch!! Why don't the Floating Point System's AP120B and FPS-164 processors >qualify as VLIW? They were commercial products introduced in 1975 and 1981 >respectively: FPS failed to realize a couple of key concepts, and this is why they don't get to retrofit the "VLIW" label to their boxes (after all, Josh Fisher invented this technology and came up with that label, so he gets to provide the definition for VLIW). The first concept is that you can't just invent whatever architecture and instruction set comes to mind, or is easiest to build, or is cheapest. You must start with what the compiler wants to see as a target. A trace-scheduling compiler is hard enough to do that you certainly do not want to add to its burdens without a compelling reason. If you have only a wide-word (and I won't quibble about the actual width) machine with parallel functional units then you're not even halfway there. >The Multiflow machine was no mystery to FPS sales analysts. It was the >FPS dream machine: UNIX, virtual memory, better compilers, etc. We had >asked FPS to give us these features right from the beginning of the 164 >back in 1981. By the time, Multi-flow delivered, it was too, late, >Convex, Alliant and FPS's own ECL machine, 264 were already on the scene, >and the high end workstations arrived in short order. Too late for whom or for what? What a weird paragraph. >It will be interesting to see if FPS gets the credit it deserves when the >history is written, I doubt it. Will the historians do any better than >the contemporaries? In 1985, an article about micro-programming appeared >in Scientific American, written by a Stanford professor. He anticipated >the day when there would be commercial machines compiling directly to >micro-code. FPS had sold nearly 150 machines already. There *is* a relationship between the FPS boxes and the VLIW work done at Multiflow and Yale. As Ellis reports, the FPS boxes showed what would be wrong with a machine that was not designed at the outset to be a good target for compiled code. There are hot spot registers, the functional units are not decoupled at the instruction set level, the datapaths are not regular, and there aren't enough registers to support the flops. When you design a machine to be completely scheduled at compile-time, it shows in the design. Retrofitting a compiler to an existing architecture will never work. The influence that the FPS work had on the Multiflow machines was a negative one -- we knew that was one approach which would not get us what we needed. >Let's get the history right. > 1) The first commercially available VLIW machine was the FPS-164, > not the Multiflow (by 6 years). The FPS machines do indeed enjoy a unique place in the computer hall of fame, but the label under the exhibit won't say VLIW, nor should it. But the FPS machines were not statically scheduled and controlled by the compiler. They therefore don't qualify as VLIWs under Josh's definition of the term. Feel free to make up some other name if you want; "attached processor" has always seemed pretty appropriate to me. The FPS designs also try to compress the instruction stream in a sub-optimal way, pre-combining certain operations (your "parallel micro-operations") in much the same way (and for the same reasons) that CISCs did. A VLIW, on the other hand, completely decouples the control of all functional units, and avoids the baggage of carrying around the whole instruction word width during sequences of sequential code by using an encoding scheme described in our IEEE Transactions paper. This makes the compiler much happier. Ours smiles a lot. > 2) The "first affordable supercomputer" was the FPS-164, not the > Convex (by 4 years). FPS was using the "affordable > supercomputer" and "departmental supercomputing" phrases > long before the Convex advertisements and literature took > them up. Agreed. Although calling an attached processor a "computer" smacks of marketing hype, to me. And if your references to having a workable compiler were legit, then why did Convex's first machine beat the 164, which had more mflops under the hood? > 3) The first commercially available machine to compile complete > HLL applications to micro-code was once again, the FPS-164 > (well, actually the VAX since it was cross-compilation). Yes, although with the cart before the horse this effort was largely doomed from the start. > 4) The first commercially available machine to successfully exploit > parallel processors automatically using "dusty deck", serial > FORTRAN, the Alliant FX8, (by 4 years and counting). "And counting"? What does that mean? > 5) The first commercially available RISC machine was the FPS-164. > (I'd love to see this one discussed, are VLIWs RISCy?) 8-) What about the FPS-164 qualified it as a RISC? You must have some interesting definition of RISC. From Hwang and Briggs: "The AP-120B and the FPS-164 are back-end attached arithmetic processors specially designed to process large vectors or arrays (matrices) of data. Operationally, these processors must work with a host computer, which can be either a minicomputer (such as the VAX-11 series) or a mainframe computer (IBM 308X series." Whatever else the RISC acronym implies, the "C" stands for computer, and to me, attached processors aren't computers. I think discussing VLIWs as RISCs is interesting, partly because it so aptly illustrates why the early insistence on counting the number of instructions before declaring something a RISC was so misguided. The important thing is to be able to implement the most-used ops such that they're as fast as can be, which usually means they get hard-wired control. If you can manage to implement your design such that you get fast, hard-wired simple ops, but lots of them in parallel (controlled by the wide word) then I think you've got the essence of the RISC approach without ending up hamstrung by religious zealotry. >I don't know if I've got the history right, I've only been in the industry >since '81, so feel free to propose "minor" adjustments. I'll gracelessly hide >out when the heavyweights start swinging. Aw, you're just a baby. You should have stated this earlier and I'd have used smaller words. :-) Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 31 Business Park Dr. Branford, CT 06405 203-488-6090
hankd@pur-ee.UUCP (Hank Dietz) (10/10/89)
In article <1071@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: >>Let's get the history right. >> 1) The first commercially available VLIW machine was the FPS-164, >> not the Multiflow (by 6 years). > >The FPS machines do indeed enjoy a unique place in the computer hall of >fame, but the label under the exhibit won't say VLIW, nor should it. I quite agree that the FPS-164 isn't trace-scheduled etc. as a VLIW, but I'm not sure that means the hardware isn't VLIW.... However, it isn't the first in either case. Way back, Burroughs had some nifty WCS stuff that probably qualifies as the first "VLIW hardware," but Multiflow's Trace clearly deserves the title "first VLIW computer system." >> 2) The "first affordable supercomputer" was the FPS-164, not the >> Convex (by 4 years). FPS was using the "affordable >> supercomputer" and "departmental supercomputing" phrases >> long before the Convex advertisements and literature took >> them up. > >Agreed. Although calling an attached processor a "computer" smacks of >marketing hype, to me. Define "supercomputer." Intel's IPSC tried to claim the first title, and there are plenty more if you define "affordable" the right way. ;-) >> 3) The first commercially available machine to compile complete >> HLL applications to micro-code was once again, the FPS-164 Were "complete HLL applications" really compiled to microcode? I thought they ran partly on the host machine? >> 4) The first commercially available machine to successfully exploit >> parallel processors automatically using "dusty deck", serial ^^^^^^^^ >> FORTRAN, the Alliant FX8, (by 4 years and counting). > >"And counting"? What does that mean? Are you defining "parallel" to be MIMD, but not SIMD/Vector? In either case, doesn't Cray count? I don't know what compilers have been marketed, but I know of more than a few "dusty deck" compilers for MIMD targets.... >> 5) The first commercially available RISC machine was the FPS-164. >> (I'd love to see this one discussed, are VLIWs RISCy?) 8-) >... The important thing is >to be able to implement the most-used ops such that they're as fast as can be, >which usually means they get hard-wired control. Most early machines looked pretty RISCy... in exactly the sense stated above. The main difference in recent RISC design is the use of compiled-code statistics rather than applications algorithms in determining which ops are frequently used. I see no reason to single-out the relatively-modern FPS-164; of course, just as the "first VLIW" title goes to whoever coined the name, "first RISC" really belongs to UCB. VLIW does tend to imply RISC... and RISC implies VLIW if you really think about it. After all, parallelism is the *OBVIOUS* way to speed-up things, and the compiler emphasis of RISC + parallelism = VLIW. Not that the full VLIW concept is trivial, rather that it is natural & elegant. Best of all, it even seems to work pretty well. ;-) -hankd@ecn.purdue.edu
lewitt@Alliant.COM (Martin E. Lewitt) (10/10/89)
In article <13109@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: = In article <1071@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: = >In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: = >> 3) The first commercially available machine to compile complete = >> HLL applications to micro-code was once again, the FPS-164 = = Were "complete HLL applications" really compiled to microcode? I thought = they ran partly on the host machine? The FPS-164 and its successors almost always ran complete applications, the "subroutine box" mode was not used by most customers. This was made possible by an advanced (for FPS) OS called the Single Job Executive (SJE). = >> 4) The first commercially available machine to successfully exploit = >> parallel processors automatically using "dusty deck", serial = ^^^^^^^^ = >> FORTRAN, the Alliant FX8, (by 4 years and counting). = > = >"And counting"? What does that mean? = = Are you defining "parallel" to be MIMD, but not SIMD/Vector? In either = case, doesn't Cray count? I don't know what compilers have been marketed, = but I know of more than a few "dusty deck" compilers for MIMD targets.... I was thinking MIMD. Until recently didn't Cray MIMD require micro-tasking directives inserted in the code on the based on user analysis of the code? = -hankd@ecn.purdue.edu -- Phone: (206) 931-8364 Martin E. Lewitt My opinions are Domain: lewitt@alliant.COM 2945 Scenic Dr. SE my own, not my UUCP: {linus|mit-eddie}!alliant!lewitt Auburn, WA 98002 employer's.
lewitt@Alliant.COM (Martin E. Lewitt) (10/10/89)
In article <1071@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >In article <3449@alliant.Alliant.COM> lewitt@Alliant.COM (Martin Lewitt) writes: >>The Multiflow machine was no mystery to FPS sales analysts. It was the >>FPS dream machine: UNIX, virtual memory, better compilers, etc. We had >>asked FPS to give us these features right from the beginning of the 164 >>back in 1981. By the time, Multi-flow delivered, it was too, late, >>Convex, Alliant and FPS's own ECL machine, 264 were already on the scene, >>and the high end workstations arrived in short order. > >Too late for whom or for what? What a weird paragraph. I agree, what a weird paragraph! By "too late", I meant that Multiflow was too late to exploit the wide open market opportunities that the FPS analysts saw when they were specifying the machine they wanted FPS to build, i.e., a wide instruction machine supporting virtual memory and UNIX. FPS's analysts saw these requirements of the market place in plenty of time to preempt the later arrivals of Convex and Alliant. I was with Alliant when the Multi-flow Trace 7 first started shipping. While we didn't fear the Trace 7 in most application areas, the Trace 7 did turn out to be a tough competitor in certain areas, e.g., the integer (non-floating point) ECAD codes. However, after some initial competitive success for the Trace 7, I noticed that Alliant was no longer losing this market to the Multi-flow, but to the SUN-4 Sparcstation instead. While this was bad news for Alliant, it had to be terrible news for Multi-flow. A key area of competitive strength against the product that Alliant was shipping at that time, was being undercut by cheap integer mips from SUN. So this is another sense in which I meant "too late". *** stuff deleted *** >marketing hype, to me. And if your references to having a workable compiler >were legit, then why did Convex's first machine beat the 164, which had more >mflops under the hood? Convex's first machine the C1-XL had more megaflops, 40 single and 20 double peak megaflops vs the FPS's 11 peak megaflops. Even so the Convex rarely beat the FPS-164 on benchmarks unless the codes were highly vectorizable. The XL had poor scalar performance and the FPS-164 often beat it by a factor of two. Good scalar performance is a strength of these wide instruction machines. The C1-XP had much improved scalar performance but still faced tough competition from the 164. Of course, performance is only one purchase criterion and FPS could lose sales because of the lack of virtual memory, lack of a multi-user operating system, reliability concerns, etc. >> 4) The first commercially available machine to successfully exploit >> parallel processors automatically using "dusty deck", serial >> FORTRAN, the Alliant FX8, (by 4 years and counting). > >"And counting"? What does that mean? No competitors seem to have replicated Alliant's success in automatically producing code for a MIMD machine. So Alliant's lead is 4 years and growing. By success, I mean that this is the standard way most customers use the machine. Most Cray jobs still only use one CPU. *** more stuff deleted *** >Bob Colwell ..!uunet!mfci!colwell -- Phone: (206) 931-8364 Martin E. Lewitt My opinions are Domain: lewitt@alliant.COM 2945 Scenic Dr. SE my own, not my UUCP: {linus|mit-eddie}!alliant!lewitt Auburn, WA 98002 employer's.
cutler@mfci.UUCP (Ben Cutler) (10/11/89)
In article <3456@alliant.Alliant.COM> lewitt@alliant.Alliant.COM (Martin E. Lewitt) writes: >In article <1068@m3.mfci.UUCP> cutler@mfci.UUCP (Ben Cutler) writes: >*** some deleted *** >>VLIWs and Trace Scheduling (TM) Compiler technology go hand-in-hand. Ellis >>describes why the FPS-164 doesn't qualify as a good compilation target in >>his thesis: > >I guess what I find strange is that the definition of a type of computers >and a (trademarked) compiler technology are so closely tied. That's precisely the point. If you want to design the best high level language possible, you don't have one team design the control structures and a second team design the data structures and hope the two fit together well! By analogy, if you want a high performance computer system, you want the fastest overall system, not a superfast cpu for which you cannot generate a compiler (and hence fast code). So you design all the pieces together, in this case VLIW hardware and Trace Scheduling compilation. And by extension, you want a real computer, not an awkward attached processor. FPS never saw the big picture. >From your posting, it appears Ellis correctly understood the limitations >of the FPS architecture and his analysis of it seems to acknowledge >some related place for it in history. Historical footnote: The FPS machine was an attached numerical processor. It bit the dust when general purpose computers with as much or more power and greater ease of use appeared on the scene for many fewer dollars.
brooks@vette.llnl.gov (Eugene Brooks) (10/12/89)
In article <3460@alliant.Alliant.COM> lewitt@alliant.Alliant.COM (Martin E. Lewitt) writes: >I was with Alliant when the Multi-flow Trace 7 first started shipping. >While we didn't fear the Trace 7 in most application areas, the Trace 7 >did turn out to be a tough competitor in certain areas, e.g., >the integer (non-floating point) ECAD codes. However, after some initial >competitive success for the Trace 7, I noticed that Alliant was no longer >losing this market to the Multi-flow, but to the SUN-4 Sparcstation instead. You vendors of custom processors has better take note of this statement. It is really amusing to see the Multi-flow and Alliant fellows bragging at how well they are doing when single chip microprocessors run circles around their entire multiprocessor complexes. The SUN-4 Sparcstation which did all this "damage" is an absolute DOG compared to the most recent Sparc, MIPS, or i860 chip sets. No doubt a real terror from Motorola is not far away. The lifetime of custom processors, even processors like the Cray line, are real short. Imagine the surprise of CRI executives when they realize that they are no longer losing market share to the Japanese super computer companies, and are now losing market share to microprocessors. We are watching this happen internally at LLNL and do not doubt that it is happening elsewhere. There is no defense against the ATTACK OF THE KILLER MICROS! brooks@maddog.llnl.gov, brooks@maddog.uucp
mash@mips.COM (John Mashey) (10/13/89)
In article <35596@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: .... >There is no defense against the ATTACK OF THE KILLER MICROS! Well, even if I agree with the fundamental point, this might be a teeny bit strong :-) 1) Supers and mini-supers and mainframes still do things that micros don't yet, like huge memories, and very fast I/O, or very high connectivity, or very high bandwidths in more places. Those things cost money. One can hope that having fast cheap micros will induce people to build more fast cheap peripheral chips to help some parts of the problems. Some things will always cost money, no matter what. 2) It's hard to believe that there will not always be a niche for: a) The fastest box, at almost any cost b) Very fast boxes with good cost/performance 3) The issue raised really is: How big are those niches? How many companies can thrive in them? 4) OPINION: 1-2 companies each in both a) and b). This is a high-stakes game, and the ante to play keeps rising. 5) OBSERVATION: RISC micros are munching away at the lower levels of the minisuper business; going up (i.e., CONVEX) was a good strategy. Trying to fight it out with them where they can do the job, is like the fight that huge horde of 1970s mini- makers had with CISC micros: once the latter got to be reasonably competitive on performance, the former just got run over, except for the very biggest players. 6) An interesting paper is: "SUPERCOMPUTING, Myth & Reality", by George J. Luste, of the Physics Department at U. Toronto. (I think this was in Supercomputing Symposium '89). This was a nice intro to some of the issues of vector code versus scalar code, and cost/performance issues of supercomputers versus RISC micros. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086