hrubin@pop.stat.purdue.edu (Herman Rubin) (04/08/91)
In article <1991Apr5.172533.6717@agate.berkeley.edu>, doug@eris.berkeley.edu (Doug Merritt) writes: > In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes: > > He observes that "Few compilers use any of the nifty instructions that > > a CISC has." I believe I read recently that RISC was based on the > > observation that, in fact, only about 30% of the instructions in CISC > > computers were used by compilers. The rest of the instructions, for > > all practical purposes, were just excess baggage. ......................... > Anyway, the point is that even if compilers *do* use every possible feature > of a CISC, it still usually won't make the CISC s/w competitive with RISC > s/w. CISC cpus almost never put sufficient h/w optimization into the support > of the fancy instructions. (There are exceptions to this, of course, and I > haven't been watching the 030 and 040 to see how they do in this regard. > CISC may yet rise again, but not until after superscalar RISC has been > completely exploited.) If the languages do not allow the user to communicate the use of those features of the CISC architecture which can make the object code considerably faster, the compiler cannot take advantage of them. This IS the situation. For example, suppose there was an instruction, which could be effectively used, which would divide a float by a float, obtaining an integer quotient and a float remainder. Now just how is the compiler to recognize that this is in fact wanted? There is, at present, no language designed to take into account the collection of "natural" useful hardware which the present users can find uses for, and which are far more expensive in software than in hardware. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
ram+@cs.cmu.edu (Rob MacLachlan) (04/09/91)
>From: hrubin@pop.stat.purdue.edu (Herman Rubin) >Subject: Compilers and efficiency >Date: 7 Apr 91 19:04:12 GMT > >For example, suppose there was an instruction, which could be effectively >used, which would divide a float by a float, obtaining an integer quotient >and a float remainder. Now just how is the compiler to recognize that this >is in fact wanted? > >There is, at present, no language designed to take into account the collection >of "natural" useful hardware which the present users can find uses for, and >which are far more expensive in software than in hardware. Well, in Common Lisp, (truncate 3.5 .5) returns 3 (an integer) and .5 (a float). Rob
ram+@cs.cmu.edu (Rob MacLachlan) (04/09/91)
>From: hrubin@pop.stat.purdue.edu (Herman Rubin) >Subject: Compilers and efficiency >Date: 7 Apr 91 19:04:12 GMT > >For example, suppose there was an instruction, which could be effectively >used, which would divide a float by a float, obtaining an integer quotient >and a float remainder. Now just how is the compiler to recognize that this >is in fact wanted? > >There is, at present, no language designed to take into account the collection >of "natural" useful hardware which the present users can find uses for, and >which are far more expensive in software than in hardware. Well, in Common Lisp, (truncate 4.5 2.0) returns 2 (an integer) and .5 (a float). And you can use CEILING, FLOOR or ROUND to get rounding to +inf, -inf or nearest. The Common Lisp numeric support was designed by someone with a pretty good understanding of numerical analysis, and of the IEEE float standard. [Guy Steele, I believe...] Unfortunately, most Common Lisp implementations don't implement numeric stuff well enough to be worth using. In my work on CMU Common Lisp, I have made some major improvements in basic efficiency, but nobody has yet realized the full potential of numbers in Common Lisp. Rob
guy@auspex.auspex.com (Guy Harris) (04/11/91)
>If the languages do not allow the user to communicate the use of those >features of the CISC architecture which can make the object code considerably >faster, the compiler cannot take advantage of them. What about those features of the CISC architecture that don't make the object code *any* faster? Methinks that's what a lot of people are complaining about here....
chip@tct.com (Chip Salzenberg) (04/11/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >There is, at present, no language designed to take into account the collection >of "natural" useful hardware which the present users can find uses for, and >which are far more expensive in software than in hardware. If Herman would write be so kind as to write a spec for such a language, we could try to implement it. We're waiting, Herman... -- Brand X Industries Custodial, Refurbishing and Containment Service: When You Never, Ever Want To See It Again [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/11/91)
In article <7117@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) writes: > >If the languages do not allow the user to communicate the use of those > >features of the CISC architecture which can make the object code considerably > >faster, the compiler cannot take advantage of them. > > What about those features of the CISC architecture that don't make the > object code *any* faster? Methinks that's what a lot of people are > complaining about here.... In some cases this is true. A badly implemented procedure is bad. If a hardware polynomial evaluation takes longer than an explicit loop, it is not the fault of the instruction, but of the implementation. Also, it is important not to compare the object codes produced by compilers, but by intelligent human beings, who can reason out how to use the features not supported by the languages. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/11/91)
In article <2803849F.483A@tct.com>, chip@tct.com (Chip Salzenberg) writes: > According to hrubin@pop.stat.purdue.edu (Herman Rubin): > >There is, at present, no language designed to take into account the collection > >of "natural" useful hardware which the present users can find uses for, and > >which are far more expensive in software than in hardware. > > If Herman would write be so kind as to write a spec for such a > language, we could try to implement it. > > We're waiting, Herman... 1. I am not quite sure what you mean by a spec. If it is what I think it is, it is far too early to do this. 2. I am willing to provide a list of such instructions and their descriptions, by no means intended to be complete, written in such a way that a reasonable undergraduate (or even a high school student who can understand the idea of mathematical symbolism and operations) with an understanding of binary representation can understand. Many of these have been posted to this group by others as well as myself. Such a list will include only those operations of which I am acquainted, and others will come up with additional suggestions. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
turner@lance.tis.llnl.gov (Michael Turner) (04/12/91)
In article <7117@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>If the languages do not allow the user to communicate the use of those >>features of the CISC architecture which can make the object code considerably >>faster, the compiler cannot take advantage of them. > >What about those features of the CISC architecture that don't make the >object code *any* faster? Methinks that's what a lot of people are >complaining about here.... Guy Harris's example (if I remember correctly) was that no language he knew of had semantics for retrieving both the integer quotient and the remainder from a floating divide, the point being that both values are usually available from any implementation of floating point divide, but that the language stands in the way of getting them at the same time. I think a fairly trivial ad hoc common-subexpression analysis could handle this particular case. I also don't see how this is a CISC/RISC issue-- if anything, one instruction that yielded both quotient and remainder is more "RISC", than having two that did div and rem separately. As for CISC-architecture features that don't appear to be faster than using combinations of other native-code features, this gets slippery. Yes, those added 68020 addressing modes don't make it faster, but they do make the corresponding code denser, which might reduce I-cache misses. If implementing them used up space on the chip that would have been wasted otherwise, then there has been some good done, and little harm but for a possible slight decrease in yield. I don't know that that's the case with the 68020. From what I'm hearing of the 68040, it even seems doubtful. But that shouldn't get in the way of my point: that when you're designing an architecture for single-chip implementation, at some point you've got a little left-over real estate to play around with. Maybe you can even (*gasp*) add some instructions. If this be treason, make the most of it, I say. Just my arrogant, ignorant opinion, as usual. --- Michael Turner turner@tis.llnl.gov
cs450a03@uc780.umd.edu (04/12/91)
Michael Turner writes: >Guy Harris's example (if I remember correctly) was that no language he >knew of had semantics for retrieving both the integer quotient and the >remainder from a floating divide ... You mean like 0 3.141592654 #: 20 6 1.150444076 ??? That's been around for ages. It's just in a language that not many people write compilers for (APL). Raul Rockwell
msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (04/12/91)
In <1406@ncis.tis.llnl.gov> turner@lance.tis.llnl.gov (Michael Turner) writes: >If implementing them used up space on the chip that would have been >wasted otherwise, then there has been some good done, and little harm >but for a possible slight decrease in yield. I don't know that that's >the case with the 68020. From what I'm hearing of the 68040, it even >seems doubtful. But that shouldn't get in the way of my point: that >when you're designing an architecture for single-chip implementation, >at some point you've got a little left-over real estate to play around >with. Maybe you can even (*gasp*) add some instructions. If this be >treason, make the most of it, I say. The problem is that you will have to include those instructions in the next version of your processor in order to remain binary compatible, and if things work out differently, as they probably will, they may be hard to implement. That happens in microcoded machines: we have some extra microcode store space, lets add some instructions. Next version, you have to work like hell to cram all those instructions in because things worked out differently, and probably almost no one uses those instructions anyway. -- Michael Pereckas * InterNet: m-pereckas@uiuc.edu * just another student... (CI$: 72311,3246) Jargon Dept.: Decoupled Architecture---sounds like the aftermath of a tornado
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/12/91)
In article <11APR91.21114644@uc780.umd.edu>, cs450a03@uc780.umd.edu writes: > Michael Turner writes: > > >Guy Harris's example (if I remember correctly) was that no language he > >knew of had semantics for retrieving both the integer quotient and the > >remainder from a floating divide ... > You mean like > 0 3.141592654 #: 20 > 6 1.150444076 Translate, please. Seeing the results, I still do not have any idea what the instruction does or how to put the results in a useful place, probably registers. The problem with APL and other such wierdos (like 99+% of the assembler languages) is their highly artificial syntax. I can, even at my age, learn the machine instructions quickly, but the stupid "mnemonics" of assembler language are another matter. This is what computer language people do not realize; they seem to think that memorizing even highly obscure notation is easy, and understanding slightly unusual operations is difficult. APL does have one feature lacking in at least most other languages; it attempts to have a reasonably compact notation. But there is no adequate macro processor available. An adequate macro processor would include the capability of easily translating APL into other languages, for example, and even having the translation take into account types. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/12/91)
In article <10095@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: | If a hardware polynomial evaluation takes longer than an explicit loop, | it is not the fault of the instruction, but of the implementation. Also, | it is important not to compare the object codes produced by compilers, but | by intelligent human beings, who can reason out how to use the features not | supported by the languages. Obviously a bad algorithm is slow, however you implement it. A good implementation can be faster than the best code, due to overlap of instructions. We had someone do an FFT instruction for VAX loadable control store (master's thesis) and he got about 15-20% over the hand coded assembler. You can get somewhat the same effect on a RISC machine if you feed it good enough code and it has register scoreboarding or other techniques which allow overlap. If I wanted maximum speed for some operation I would still hardcode an instruction, but you need a certain dollar volume to justify building a special instruction into a CPU instead of using the real estate for something else. This is why we have coprocessors, to allow the user to buy the instructionss/he needs. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
chip@tct.com (Chip Salzenberg) (04/13/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >In article <2803849F.483A@tct.com>, chip@tct.com (Chip Salzenberg) writes: >> If Herman would write be so kind as to write a spec for such a >> language, we could try to implement it. > >1. I am not quite sure what you mean by a spec. Okay. Let's suppose that holy grail of programming languages, HROL (Herman Rubin Oriented Language), has been designed, implemented, and tested, and it's in a box on your desk. In the box is a tape, a programmer's reference manual, and an architecture-specific reference manual. Here's your job, Herman: Write that programmer's reference manual. If you feel that you can't write it because you're not sure what should be in it, then obviously you don't know what you want, and I for one would be happy if you would stop lamenting its absence. >2. I am willing to provide a list of such instructions ... Not an instruction reference, Herman, a *language* specification. The instruction set is way down the line from there. -- Brand X Industries Custodial, Refurbishing and Containment Service: When You Never, Ever Want To See It Again [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
cs450a03@uc780.umd.edu (04/13/91)
Herman Rubin > Me >| Michael Turner >** >** Guy Harris's example (if I remember correctly) was that no >** language he knew of had semantics for retrieving both the integer >** quotient and the remainder from a floating divide ... >| You mean like >| 0 3.141592654 #: 20 >| 6 1.150444076 >Translate, please. Seeing the results, I still do not have any idea what >the instruction does or how to put the results in a useful place, probably >registers. Well, (x,y) #: z where x, y, and z are all single numbers returns two results. The "number on the right" is the remainder from z divided by y. The "number on the left" is the quotient of that division modulo x. "N modulo 0" is defined to be N. Therefore, (0,y) #: z computes the dividend and remainder of y divided by z. Finally, note that for constants, the notation (x, y) may be abbreviated by just writing the two numbers separated by a space. >The problem with APL and other such wierdos (like 99+% of the >assembler languages) is their highly artificial syntax. Anything you understand is simple. Anything you do not understand is not simple. I suppose I should also point out that in infix notation a function takes arguments on both the right side and the left side. I don't know if it is relevant to describe the details of this mechanism. Also note that #: is a single symbol. >APL does have one feature lacking in at least most other languages; >it attempts to have a reasonably compact notation. But there is no >adequate macro processor available. An adequate macro processor >would include the capability of easily translating APL into other >languages, for example, and even having the translation take into >account types. He he... APL is just now getting around to using ASCII... give it a little time before it translates to every language under the sun. :-) Other than that, it looks like all you are asking for is constant folding. Raul
guido@cwi.nl (Guido van Rossum) (04/13/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >[...] An adequate macro processor would >include the capability of easily translating APL into other languages, >for example, and even having the translation take into account types. Can you give some more clues that this statement is true? Without any more context it sounds like what you call a macro processor is something completely different from what is usually called a macro processor; for instance, normally macro processors have no notion whatsoever of the types they are working on.
herrickd@iccgcc.decnet.ab.com (04/16/91)
In article <2803849F.483A@tct.com>, chip@tct.com (Chip Salzenberg) writes: > According to hrubin@pop.stat.purdue.edu (Herman Rubin): >>There is, at present, no language designed to take into account the collection >>of "natural" useful hardware which the present users can find uses for, and >>which are far more expensive in software than in hardware. > > If Herman would write be so kind as to write a spec for such a > language, we could try to implement it. > > We're waiting, Herman... Herman, and some others, have already published a requirement spec for the language, mostly in the "bizarre instructions" thread here quite recently. Of course, it would help if someone would gather it together in a form that could be reviewed and resolve some of the contradictions. dan herrick herrickd@iccgcc.decnet.ab.com
guy@auspex.auspex.com (Guy Harris) (04/16/91)
>Guy Harris's example (if I remember correctly) was that no language he >knew of had semantics for retrieving both the integer quotient and the >remainder from a floating divide, the point being that both values are >usually available from any implementation of floating point divide, but >that the language stands in the way of getting them at the same time. I don't think you remember correctly; if *I* remember correctly, I didn't give any particular example, and I certainly didn't give *that* example. *Herman Rubin* made the claim that no language he knew of had those semantics, and at least two other people jumped in to claim that Common LISP *did* have those semantics. I was mainly thinking of, indeed, such things as the added addressing modes; they may increase code density, but do they complicate instruction decoding and slow the machine down there? And what about some of the more elaborate procedure-calling, procedure-entry, or procedure-exit instructions? Admittedly, to some extent, the problems with the more elaborate features may come either from 1) sloppy implementations of them and 2) using the elaborate features even when inappropriate, e.g. not making use of simplifications that can be done at code-generation time, such as better management of registers. Both seem to come, at least in part, from the notion that "well, it's *one instruction*, that means it *has* to be fast!" - i.e., since it's a single instruction, they didn't worry about making it fast, or making the common case fast, and/or assumed it was *always* the right thing to do to use that instruction. As an example of 1), the CCI Power 6/32, as I remember, did a lot better job at implementing a VAX-like CALLS instruction than did the VAX-11/780; one trick I remember them doing was to generate the fields of the stack frame in order, so that the stores that built the stack frame would work well with interleaved memory. They also stored *decoded* instructions, rather than code bytes, in the instruction cache, as a way of avoiding the overhead of decoding VAX-style instructions. As an example of 2), consider, say, treating leaf procedures differently - can you get away with doing less than a full procedure entry?
guy@auspex.auspex.com (Guy Harris) (04/17/91)
>If a hardware polynomial evaluation takes longer than an explicit loop, >it is not the fault of the instruction, but of the implementation. What if a hardware polynomial evaluation takes longer than an *unrolled* loop, with the zero-coefficient terms taken out? Or do you have an implementation in mind that takes zero cycles to skip the terms with zero coefficients?
miklg@sono.uucp (Michael Goldman ) (04/17/91)
msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) writes: >In <1406@ncis.tis.llnl.gov> turner@lance.tis.llnl.gov (Michael Turner) writes: >>If implementing them used up space on the chip that would have been >>wasted otherwise, then there has been some good done, and little harm >>but for a possible slight decrease in yield. I don't know that that's >>the case with the 68020. From what I'm hearing of the 68040, it even >>seems doubtful. But that shouldn't get in the way of my point: that >>when you're designing an architecture for single-chip implementation, >>at some point you've got a little left-over real estate to play around >>with. Maybe you can even (*gasp*) add some instructions. If this be >>treason, make the most of it, I say. >The problem is that you will have to include those instructions in the >next version of your processor in order to remain binary compatible, >and if things work out differently, as they probably will, they may be >hard to implement. That happens in microcoded machines: we have some >extra microcode store space, lets add some instructions. Next >version, you have to work like hell to cram all those instructions in >because things worked out differently, and probably almost no one uses >those instructions anyway. Actually, as I follow the 68020 to the 68030 to the 68040 I find a number of deleted instructions as well as added insrtuctions at each move. The added instructions are either essential (MMU support) or not (BKPT). The deleted instructions are of the same nature. That this requires rewriting code is just one of those things you're supposed to put up with to maintain application compatibility. The 680x0 doesn't have a lot of really complex instructions like the 80x86 family. The string handling instructions of the 80x86 come to mind as something best implemented as library functions. The context switching instructions of the 80286 are sufficiently involved as to be part of a large function. I think one of the problems with the incredibly complex instructions for OS stuff in the 80286 and beyond, is that you may not WANT to implement your OS the way they designed the instruction. Another problem is that it takes learning about the instruction set that much harder and longer. This results in delays in implementation and bugs. Then, also, as pointed out above, you've got compatibility issues if you actually wrote your OS around their specific hardware concept of how context switching, should be done.
john@iastate.edu (Hascall John Paul) (04/17/91)
In article <7207@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: }>If a hardware polynomial evaluation takes longer than an explicit loop, }>it is not the fault of the instruction, but of the implementation. }What if a hardware polynomial evaluation takes longer than an *unrolled* }loop, with the zero-coefficient terms taken out? Or do you have an }implementation in mind that takes zero cycles to skip the terms with }zero coefficients? Doesn't seem so impossible: POLY CMASK,CTABLE,X,RESULT where the CMASK operand is a bitmask which indicates which elements of the CTABLE array are non-zero. [probably this instruction would still be a waste of real estate for most] John -- John Hascall An ill-chosen word is the fool's messenger. Project Vincent Iowa State University Computation Center john@iastate.edu Ames, IA 50011 (515) 294-9551
mjs@hpfcso.FC.HP.COM (Marc Sabatella) (04/17/91)
>no language he [ Herman Rubin, actually ] >knew of had semantics for retrieving both the integer quotient and the >remainder from a floating divide, the point being that both values are >usually available from any implementation of floating point divide, but >that the language stands in the way of getting them at the same time. In addition to Common LISP, Forth also has this ability. In fact, it comes about as close to Herman's idealized "user definable syntax macro processor" as any language I can think of.
robertsw@gtephx.UUCP (Wild Rider) (04/18/91)
In article <1991Apr12.001324.5@ux1.cso.uiuc.edu> msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) writes: >The problem is that you will have to include those instructions in the >next version of your processor in order to remain binary compatible, >and if things work out differently, as they probably will, they may be >hard to implement. That happens in microcoded machines: we have some >extra microcode store space, lets add some instructions. Next >version, you have to work like hell to cram all those instructions in >because things worked out differently, and probably almost no one uses >those instructions anyway. or, you could do like motorola did when they designed the 030: dump a couple of unused 020 instructions, i.e., the "callm" ("call module") and the corresponding "rtm" ("return from module") instructions. yes, that's correct, you can have an executable which runs _only_ on the 020, if it uses those instructions. i guess motorola called up all their major 020 customers and asked them: "do you use the callm/rtm instructions?" apparently, the answer was "no" from every customer so motorola dumped the 2 unused instructions. i just dug out my "mc68020 user's manual" and looked up the "callm" and "rtm" instructions... yuk. no wonder they dumped them. talk about "complex instructions" ... anyway, i'll bet moto's designers freed up quite a bit of microcode real estate when they released those 2 losers. -- Wallace Roberts, AG (formerly GTE) Communication Systems, Phoenix, AZ UUCP: ...!{ncar!noao!asuvax | uunet!zardoz!hrc | att}!gtephx!robertsw Internet: gtephx!robertsw@asuvax.eas.asu.edu Bike: '82 GS1100L Suz voice: (602)581-4555 fax: (602)582-7624 Cage: '89 Mustang GT
shankar@hpcupt3.cup.hp.com (Shankar Unni) (04/20/91)
In comp.arch, robertsw@gtephx.UUCP (Wild Rider) writes: > or, you could do like motorola did when they designed the 030: > dump a couple of unused 020 instructions, i.e., the "callm" > ("call module") and the corresponding "rtm" ("return from > module") instructions. yes, that's correct, you can have an That's surely an oversimplification. What usually happens is that those instructions will raise some sort of "unimplemented instruction" (or so-called "assist") exception, which is different from the run-of-the-mill "illegal instruction", moving the responsibility for simulating those instructions to the kernel (or the user program, if you are running stand-alone embedded programs). This sort of thing is commonly done with certain types of floating-point instructions. ----- Shankar Unni E-Mail: HP India Software Operation, Bangalore Internet: shankar@cup.hp.com Phone : +91-812-261254 x417 UUCP: ...!hplabs!hpda!shankar
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (04/22/91)
In article <7184@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >And what about some of the >more elaborate procedure-calling, procedure-entry, or procedure-exit >instructions? > >Admittedly, to some extent, the problems with the more elaborate >features may come either from 1) sloppy implementations of them and >2) using the elaborate features even when inappropriate The problem with an elaborate procedure calling convention is that you *have* to do it that way - by emulation if not by use of the fancy instructions. To do otherwise is to confuse debuggers, exception/longjmp packages, and smart linkers, not to mention breaking separate compilation. Actually, the worst calling convention I ever dealt with had none of those problems. IBM 360 PL/I-F didn't *have* a debugger, smart linkers hadn't been invented, and the exception package was easily tacked on. The reason it was easy was because the machine code for a procedure prologue, contained (get this) a *call* to the runtime package... -- Don D.C.Lindsay Carnegie Mellon Robotics Institute
meissner@osf.org (Michael Meissner) (04/22/91)
In article <12740@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: | Actually, the worst calling convention I ever dealt with had none of | those problems. IBM 360 PL/I-F didn't *have* a debugger, smart | linkers hadn't been invented, and the exception package was easily | tacked on. The reason it was easy was because the machine code for a | procedure prologue, contained (get this) a *call* to the runtime | package... If I remember correctly, the Unix V6 (Ritchie) compiler did this too.... -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
peter@ficc.ferranti.com (peter da silva) (04/22/91)
In article <12740@pt.cs.cmu.edu>, lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: > The reason it was easy was because the machine code for a > procedure prologue, contained (get this) a *call* to the runtime > package... The K&R C compiler for the PDP-11 did this, too. And it did returns with a "jmp cret". You want a weird calling convention, try the 1802. A subroutine call is made by changing which register is acting as the PC: all subroutines were coroutines. To get stack calling you set up a pair of call-return subroutines, burned a couple registers as their start address, and called them with SEP 4 and SEP 5. The only machine I know for which Forth (particularly token indirect threaded code) was faster than regular subroutine calls using the Standard Call/Return technique. (I know, you're all tired of me flaming about the 1802. But it was a really CUTE micro. Sort of like a boutique car.) -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
henry@zoo.toronto.edu (Henry Spencer) (04/23/91)
In article <MEISSNER.91Apr21153714@curley.osf.org> meissner@osf.org (Michael Meissner) writes: >| ... The reason it was easy was because the machine code for a >| procedure prologue, contained (get this) a *call* to the runtime >| package... > >If I remember correctly, the Unix V6 (Ritchie) compiler did this too... All instantiations of the Ritchie compiler did it, in fact. On the whole, it made sense: it saved code space, which was usually more of an issue on the 11 than execution time. And it wasn't actually that much slower than inlining the relevant code, given that we're talking about machines that had little or no pipelining. -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
halkoD@batman.moravian.EDU (David Halko) (04/23/91)
In article <9782@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > In article <1991Apr5.172533.6717@agate.berkeley.edu>, doug@eris.berkeley.edu (Doug Merritt) writes: > > In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes: > > > He observes that "Few compilers use any of the nifty instructions that > > > a CISC has." I believe I read recently that RISC was based on the > > > observation that, in fact, only about 30% of the instructions in CISC > > > computers were used by compilers. The rest of the instructions, for > > > all practical purposes, were just excess baggage. > ......................... > > Anyway, the point is that even if compilers *do* use every possible feature > > of a CISC, it still usually won't make the CISC s/w competitive with RISC > > s/w. CISC cpus almost never put sufficient h/w optimization into the support > > of the fancy instructions. (There are exceptions to this, of course, and I > > haven't been watching the 030 and 040 to see how they do in this regard. > > CISC may yet rise again, but not until after superscalar RISC has been > > completely exploited.) > ......................... > If the languages do not allow the user to communicate the use of those > features of the CISC architecture which can make the object code considerably > faster, the compiler cannot take advantage of them. This IS the situation. > For example, suppose there was an instruction, which could be effectively > used, which would divide a float by a float, obtaining an integer quotient > and a float remainder. Now just how is the compiler to recognize that this > is in fact wanted? > ......................... What do you mean how is a compiler suppoosed to recognize that this is wanted? AT&T came out with a DSP board which utilized neato CPU features in its high level languages... since floating point multiply was so much faster than integer multiply, it would take integers, convert them to float, do the floating point multiplication, and convert back to integer again. Man, talk about fast and nifty. This was all handled internally, inside the C Compiler. It seems quite useful for integer division to do the same, here. There were, however, some minor flaws in the Compiler which took away from some of it's efficiency. I used to use global search and replace with specific patterns to optimize the code I produced... > There is, at present, no language designed to take into account the collection > of "natural" useful hardware which the present users can find uses for, and > which are far more expensive in software than in hardware. I think it may be more of a question of "Are CISC features being fully realized by compiler designers?" I wonder if RISC architecture is all what it is cracked up to be. I work on a SUN SS1, and it is not all that much faster than a 3/60. Quite often, I use the 3/60's we have instead. -- _______________________________________________________________________ / \ / "The use of COBOL cripples the mind; its teaching should, therefore, be \ / regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. \ +-----------------------------------------------------------------------------+ \ "Have you purchased a multi- halkoD@moravian.edu Have you booted your / \ media machine from IMS yet?" David J. Halko OS-9 computer today?/ \_______________________________________________________________________/
barmar@think.com (Barry Margolin) (04/23/91)
In article <1991Apr23.151932.2125@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <MEISSNER.91Apr21153714@curley.osf.org> meissner@osf.org (Michael Meissner) writes: >>| ... The reason it was easy was because the machine code for a >>| procedure prologue, contained (get this) a *call* to the runtime >>| package... >>If I remember correctly, the Unix V6 (Ritchie) compiler did this too... >All instantiations of the Ritchie compiler did it, in fact. On the whole, >it made sense: it saved code space, which was usually more of an issue on >the 11 than execution time. And it wasn't actually that much slower than >inlining the relevant code, given that we're talking about machines that >had little or no pipelining. On Multics, external procedures also make a call to the runtime system (called "operators", for some reason -- I guess to distinguish the library routines that the compiler "knows" about from the ones that it doesn't) during the standard entry sequence. Besides reducing code space, it is a very useful feature of the software development environment. The operator procedures are addressed through a transfer vector. Thus, alternate versions of the operators could be dynamically linked into the process (similar to the way personal computer customizations patch trap vectors). The most notable uses of this are the procedure call tracing mechanism and the metering facility. On most systems, one must compile a program with a special option in order to make it maintain metering statistics; on Multics, one just issues a command that switches operators, runs the program, and then switch back to the regular operators (there's a program that combines all this). For call tracing, one switches to versions of the entry and exit operators that display a message. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (04/25/91)
In article <1991Apr23.162814.20093@Think.COM> barmar@think.com (Barry Margolin) writes: | ... The reason it was easy was because the machine code for a | procedure prologue, contained (get this) a *call* to the runtime | package... >...alternate versions of the operators could be >dynamically linked into the process (similar to the way personal computer >customizations patch trap vectors). The most notable uses of this are the >procedure call tracing mechanism and the metering facility. Another debugging feature available via this trick is break-when- expression-true. The expression is of course only evaluated on every procedure entry, not on every cycle. However, my personal experience (on a DEC 20) was very postive. I was usually trying to find out which routine was using a wild pointer, so per-call was the perfect granularity. Of course, there are other ways to achieve the same effect on vanilla hardware. Aside from page protection cleverness, one can run a snoop process on one of the other CPUs. (Multiprocessors *are* vanilla, right?) -- Don D.C.Lindsay Carnegie Mellon Robotics Institute
meissner@osf.org (Michael Meissner) (04/26/91)
In article <4082@batman.moravian.EDU> halkoD@batman.moravian.EDU (David Halko) writes: | I wonder if RISC architecture is all what it is cracked up to be. I work on | a SUN SS1, and it is not all that much faster than a 3/60. Quite often, I | use the 3/60's we have instead. From your posting, I would guess that you use a lot of integer multiplies (non-constant in particular). The initial Sparc hardware does not have an integer multiply or divide instruction. It has a multiply step instruction (not sure about divide) to give a partial answer. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/27/91)
In article <4082@batman.moravian.EDU>, halkoD@batman.moravian.EDU (David Halko) writes: > In article <9782@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > In article <1991Apr5.172533.6717@agate.berkeley.edu>, doug@eris.berkeley.edu (Doug Merritt) writes: > > > In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes: > > > > He observes that "Few compilers use any of the nifty instructions that > > > > a CISC has." I believe I read recently that RISC was based on the > > > > observation that, in fact, only about 30% of the instructions in CISC > > > > computers were used by compilers. The rest of the instructions, for > > > > all practical purposes, were just excess baggage. > > ......................... > > > Anyway, the point is that even if compilers *do* use every possible feature > > > of a CISC, it still usually won't make the CISC s/w competitive with RISC > > > s/w. CISC cpus almost never put sufficient h/w optimization into the support > > > of the fancy instructions. (There are exceptions to this, of course, and I > > > haven't been watching the 030 and 040 to see how they do in this regard. > > > CISC may yet rise again, but not until after superscalar RISC has been > > > completely exploited.) > > ......................... > > If the languages do not allow the user to communicate the use of those > > features of the CISC architecture which can make the object code considerably > > faster, the compiler cannot take advantage of them. This IS the situation. > > For example, suppose there was an instruction, which could be effectively > > used, which would divide a float by a float, obtaining an integer quotient > > and a float remainder. Now just how is the compiler to recognize that this > > is in fact wanted? > > ......................... > What do you mean how is a compiler suppoosed to recognize that this is wanted? I mean exactly that. I am assuming that the operations are done by means of a sequence of instructions, and not a call to some (possibly inlined) function. I know of NO hardware on which I would consider calls to be reasonable where short sequences of code will do the job. It is not the case that the algorithm to be used is the same on different machines. Hardware differences can cause major problems, as was seen when cache/paging first came in; an algorithm for matrix multiplication which would seem to be better from the timing standpoint (same operations, fewer loads, far fewer stores) caused page faults on the great bulk of its loads, and thus became non-competitive for large matrices. Consider the following two operations, which comprise more than 25% of the operations in some highly efficient, from the "picocode" standpoint, methods for generating non-uniform random numbers. 1. Examine a bit, and take some action depending on it. Also, the next time this is to be done, this bit effectively no longer exists. It is necessary to make provision at least somewhere for replenishing the supply of bits. 2. Find the distance to the next one in a bit stream. The same considerations as above apply. Try coding this in any present language, including assembler, in such a way that a compiler can be expected to recognize one of these without being explicitly instructed. I am not saying that this should not be the case. But it is the user, not the language designer, who has to do this. Whatever language, present or future, is used, human beings are going to come up with extremely simple operations which the language designers have not thought of, and for which doing those operations using the language primitives will be clumsy. The user must be able to instruct the compiler about these operations in a manner similar to the way it now looks at addition and multiplication. If there are competing ways to do something, the compiler must know the available variety. This is even the case for adding vectors in C. > AT&T came out with a DSP board which utilized neato CPU features in its > high level languages... since floating point multiply was so much faster > than integer multiply, it would take integers, convert them to float, do > the floating point multiplication, and convert back to integer again. Man, > talk about fast and nifty. This was all handled internally, inside the C > Compiler. It seems quite useful for integer division to do the same, here. > There were, however, some minor flaws in the Compiler which took away from > some of it's efficiency. I used to use global search and replace with specific > patterns to optimize the code I produced... If this was the case, the hardware was deficient. To a mathematician, floating point arithmetic is a complicated version of fixed point arithmetic. This is a point which the RISC designers do not see; the chip real estate for doing good integer arithmetic is already there in the floating point stuff, and the design should be such as to use it. Floating point arithmetic uses fixed point internally, so this would not be a problem AT THE TIME OF THE DESIGN OF THE ARCHITECTURE. It is a problem afterward. > > There is, at present, no language designed to take into account the collection > > of "natural" useful hardware which the present users can find uses for, and > > which are far more expensive in software than in hardware. > > I think it may be more of a question of "Are CISC features being fully realized by > compiler designers?" Have any language designers or hardware designers ever asked for input on the variety of operations which are readily understood by people like Silverman, Lehmer, etc.? Have language designers considered that vastly different algorithms should be used with different architectures? Most optimization procedures which I have seen assume not. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
bengtl@maths.lth.se (Bengt Larsson) (04/28/91)
In article <11411@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >In article <4082@batman.moravian.EDU>, halkoD@batman.moravian.EDU (David Halko) writes: >> What do you mean how is a compiler suppoosed to recognize that this is wanted? > >I mean exactly that. I am assuming that the operations are done by means of >a sequence of instructions, and not a call to some (possibly inlined) function. >I know of NO hardware on which I would consider calls to be reasonable where >short sequences of code will do the job. What is the difference between an _inlined_ function and a "short sequence of code"? Explanation, please. >It is not the case that the algorithm to be used is the same on different >machines. Hardware differences can cause major problems, as was seen when >cache/paging first came in; ... This no news. >1. Examine a bit, and take some action depending on it. Also, the next time >this is to be done, this bit effectively no longer exists. It is necessary >to make provision at least somewhere for replenishing the supply of bits. You are being obscure. What is this if not a shift instruction? >2. Find the distance to the next one in a bit stream. The same considerations >as above apply. True. And some hardware has instructions for "find first set bit". The Vax has, for example. That may be a useful addition, though it's somewhat specialized. Useful for cryptography, I hear. >Try coding this in any present language, including assembler, in such a way >that a compiler can be expected to recognize one of these without being >explicitly instructed. It isn't doable to make a compiler which can recognize _anything_ without being instructed. If you want an extensible language, you'll have to show how it can be made extensible. I can assure you that it isn't quite as easy as it looks. >I am not saying that this should not be the case. But it is the user, not >the language designer, who has to do this. Whatever language, present or >future, is used, human beings are going to come up with extremely simple >operations which the language designers have not thought of, and for which >doing those operations using the language primitives will be clumsy. This is utterly obvious. >The user must be able to instruct the compiler about these operations >in a manner similar to the way it now looks at addition and multiplication. OK, you like infix notation. >If there are competing ways to do something, the compiler must know the >available variety. This is even the case for adding vectors in C. You don't seem to know anything about the problems in what you are talking about. >If this was the case, the hardware was deficient. To a mathematician, floating >point arithmetic is a complicated version of fixed point arithmetic. Hardware integer multiply should probably be provided for, yes. >Have any language designers or hardware designers ever asked for input on the >variety of operations which are readily understood by people like Silverman, >Lehmer, etc.? Why do you assume that people should come and ask you (a matematician) about computer language development? What makes you an expert in the field? Bengt L. -- Bengt Larsson - Dep. of Math. Statistics, Lund University, Sweden Internet: bengtl@maths.lth.se SUNET: TYCHE::BENGT_L
preston@ariel.rice.edu (Preston Briggs) (04/28/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > If the languages do not allow the user to communicate the use of those > features of the CISC architecture which can make the object code considerably > faster, the compiler cannot take advantage of them. This IS the situation. > For example, suppose there was an instruction, which could be effectively > used, which would divide a float by a float, obtaining an integer quotient > and a float remainder. Now just how is the compiler to recognize that this > is in fact wanted? In Fortran, you might write iq = f1 / f2 and a while later (during which f1 and f2 are not changed) fr = amod(f1, f2) The compiler could recognize, during value numbering, that an quotient and remainder of the same operands was being computed. During instruction selection (or perhaps peephole optimization), it could recognize that it could avoid the real2int conversion of the quotent by using the "hrubin" instruction. Not everyone does these things the way I've described them, but many compilers can achieve the same effect, one way or another. Preston Briggs
preston@ariel.rice.edu (Preston Briggs) (04/28/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >Consider the following two operations, which comprise more than 25% of the >operations in some highly efficient, from the "picocode" standpoint, methods >for generating non-uniform random numbers. > >1. Examine a bit, and take some action depending on it. > Also, the next time this is to be done, this bit effectively > no longer exists. It is necessary to make provision at least > somewhere for replenishing the supply of bits. > >2. Find the distance to the next one in a bit stream. > The same considerations as above apply. These sort of things are certainly CISCish and would be difficult to recognize in a compiler. However, I would argue that they are possibly misrepresented. How about a linked-list (oh no!) of integers. An element in the list represents a "1" bit. The integer is number of "zero" bits preceding the 1. The stream starting with 001010000001... would look like (2, 1, 6, ...) So, a single memory access per 1 bit. The "distance to next 1 bit" questions can be answered immediately. The "examine a bit and take an action" can be done with some of those fancy "decrement, test, and branch" instructions people use for closing loops. Alternatively, ... It seems like there's something producing the bits and something consuming them and making decisions based on them. Why not have the producer simply call the alternate choices in the consumer directly? Generally though, my critisism is that you're thinking very low-level. You decided how you want these operations represented and you seem unwilling to talk about them at a level above "bit stream". If you must think this way, I expect you're going to have to code in the same fashion; that is, in assembly. Preston Briggs
hrubin@pop.stat.purdue.edu (Herman Rubin) (04/29/91)
In article <1991Apr28.154603.8003@rice.edu>, preston@ariel.rice.edu (Preston Briggs) writes: > hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > > If the languages do not allow the user to communicate the use of those > > features of the CISC architecture which can make the object code considerably > > faster, the compiler cannot take advantage of them. This IS the situation. > > For example, suppose there was an instruction, which could be effectively > > used, which would divide a float by a float, obtaining an integer quotient > > and a float remainder. Now just how is the compiler to recognize that this > > is in fact wanted? > > In Fortran, you might write > > iq = f1 / f2 > > and a while later (during which f1 and f2 are not changed) > > fr = amod(f1, f2) > > The compiler could recognize, during value numbering, that an > quotient and remainder of the same operands was being computed. > During instruction selection (or perhaps peephole optimization), > it could recognize that it could avoid the real2int conversion > of the quotent by using the "hrubin" instruction. > > Not everyone does these things the way I've described them, but many > compilers can achieve the same effect, one way or another. Any specific instance of this can be added to the compiler's ken, but until it does, the compiler cannot take advantage of it. The point is that the compiler must be instructed to look for them. As to what it can do with such instructions, this depends very strongly on what the hardware provides. The real2int conversion still has to be done on most machines, and also an integerize of real. But suppose that the machine has integer hardware comparable to floating hardware, and can also handle unnormalized floats. Then one could put a shifted significand of f1 in a double register, divide by the significand of f2, setting iq to be the quotient, and the remainder with the exponent of f2, and then normalized, as fr. Now I wonder how many of the readers would think of doing it this way? And how would the hardware designers recognize that putting the operation in hardware might be a good idea? The individual compiler might or might not see it. The community definitely would not. Now, how would you write the code for doing a conditional transfer efficiently on the next bit of a bit stream, which can be extended (usually by a subroutine call) when necessary? It is unlikely that one will know, without querying, just where one is in the stream. If you wrote the code, would someone else so recognize it? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jesup@cbmvax.commodore.com (Randell Jesup) (04/29/91)
In article <11411@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >If this was the case, the hardware was deficient. To a mathematician, floating >point arithmetic is a complicated version of fixed point arithmetic. This is a >point which the RISC designers do not see; the chip real estate for doing good >integer arithmetic is already there in the floating point stuff, and the design >should be such as to use it. Floating point arithmetic uses fixed point >internally, so this would not be a problem AT THE TIME OF THE DESIGN OF THE >ARCHITECTURE. It is a problem afterward. The RPM-40 used the floating point HW for integer multiple and divide, and I'm certain at least a few others have also. However, don't do this if you have only a couple FP registers (a problem the rpm40 had). -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup Disclaimer: Nothing I say is anything other than my personal opinion. Thus spake the Master Ninjei: "To program a million-line operating system is easy, to change a man's temperament is more difficult." (From "The Zen of Programming") ;-)
preston@ariel.rice.edu (Preston Briggs) (04/29/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) wrote > If the languages do not allow the user to communicate the use of those > features of the CISC architecture which can make the object code considerably > faster, the compiler cannot take advantage of them. This IS the situation. > For example, suppose there was an instruction, which could be effectively > used, which would divide a float by a float, obtaining an integer quotient > and a float remainder. Now just how is the compiler to recognize that this > is in fact wanted? And I described how it could be done, using Fortran 77 and pretty standard optimization (value numbering and peephole optimization). hrubin@pop.stat.purdue.edu (Herman Rubin) replies: >Any specific instance of this can be added to the compiler's ken, but >until it does, the compiler cannot take advantage of it. The point is >that the compiler must be instructed to look for them. Yes indeed. That's the job of the compiler writer. If he does a bad job in using the instruction set of the machine, you should complain to him. >The real2int conversion still has to be done on most machines, and also >an integerize of real. But suppose that the machine has integer hardware >comparable to floating hardware, and can also handle unnormalized floats. Well, we can suppose all sorts of things. Nevertheless, my solution handles your first set of suppositions. If you change the machine, you need to change compilers. I thought your original question was posed in an attempt to show that many languages don't have the facilities you consider necessary to control the machine. I just tried to show that the effect you wanted could be achieved. Preston Briggs
bob@tera.com (Bob Alverson) (04/29/91)
In article <11489@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >[stuff deleted] >The real2int conversion still has to be done on most machines, and also >an integerize of real. But suppose that the machine has integer hardware >comparable to floating hardware, and can also handle unnormalized floats. >Then one could put a shifted significand of f1 in a double register, >divide by the significand of f2, setting iq to be the quotient, and >the remainder with the exponent of f2, and then normalized, as fr. >Now I wonder how many of the readers would think of doing it this way? >And how would the hardware designers recognize that putting the operation >in hardware might be a good idea? > Gee, and what does the hardware do if the division is 1e300 / 1e-300? It's got to extract about 2000 bits of quotient before it gets all the integer bits. I take it you really want iq = (f1 / f2) % some-power-of-two Since continuable instructions don't mesh well with pipelined systems, most hardware would only calculate 53 or maybe 64 quotient bits at a time. For example, look at the remainder *sequence* used on the 80x87. Nobody wants to wait for the full calculation to finish before taking an interrupt or context switching. Strange as it may seem, the dividers in today's computers can compute the correct quotient without having the correct remainder around to return to the user. Fortunately, some can compute the correct remainder in one operation with a fused multiply add. >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 >Phone: (317)494-6054 >hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP) Bob Alverson, Tera Computer Resident arithmetic lunatic
glew@pdx007.intel.com (Andy Glew) (05/01/91)
Just thought that I would juxtapose two posts to this newsgroup: Strange as it may seem, the dividers in today's computers can compute the correct quotient without having the correct remainder around to return to the user. Fortunately, some can compute the correct remainder in one operation with a fused multiply add. Bob Alverson, Tera Computer Resident arithmetic lunatic And: From the Program for the IEEE Symposium on Computer Arithmetic: 8.1 ``Integer Division Using Reciprocals'' Robert Alverson, Tera Computer Company Nuff said? (NB. I'm on Bob Alverson's side. This sort of thing is good.) -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
herrickd@iccgcc.decnet.ab.com (05/04/91)
In article <1991Apr28.151831.2768@lth.se>, bengtl@maths.lth.se (Bengt Larsson) writes: > In article <11411@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu > (Herman Rubin) writes: >>Have any language designers or hardware designers ever asked for input on the >>variety of operations which are readily understood by people like Silverman, >>Lehmer, etc.? > > Why do you assume that people should come and ask you (a matematician) > about computer language development? What makes you an expert in the > field? > He is a customer. Unfortunately, he does not represent a large enough subset of their customers. Unfortunately because he is better informed than many other customers and all the customer base might be better served if Henry's ideas could penetrate the language design community. Not talking to the mathematicians let the FORTRAN design team at IBM put the label REAL on a finite subset of the rationals. We'll never displace floating point numbers now, but what might we have if the game theorist had spent more time talking with number theorists back around 1940-1945? How many engineering juniors believe that DO I = 1, 50000 SUM = SUM + DATA (I) END DO computes a good approximation of the sum of 50000 numbers they measured somehow? Could we have realized in hardware a number system that does give us a commutative addition? I think there is good reason for the hardware and software architects to listen to the number theorists. dan herrick herrickd@iccgcc.decnet.ab.com
herrickd@iccgcc.decnet.ab.com (05/04/91)
In article <4474.28219270@iccgcc.decnet.ab.com>, I said: > > Could we have realized in hardware a number system that does give us > a commutative addition? > after giving the example that shows floating point arithmetic is not associative. But the fix for the algorithm is to sort the input data - doesn't that suggest the word "commutative" to you, too? I'm still > dan herrick > herrickd@iccgcc.decnet.ab.com
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (05/08/91)
In article <11411@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > In article <4082@batman.moravian.EDU>, halkoD@batman.moravian.EDU (David Halko) writes: > > In article <9782@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > > In article <1991Apr5.172533.6717@agate.berkeley.edu>, doug@eris.berkeley.edu (Doug Merritt) writes: > > > > In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes: I had to leave the above 4 lines in, just cause it looks so awesome to see a discussion going so many levels deep :^) ... > Consider the following two operations, which comprise more than 25% of the > operations in some highly efficient, from the "picocode" standpoint, methods > for generating non-uniform random numbers. > > 1. Examine a bit, and take some action depending on it. Also, the next time > this is to be done, this bit effectively no longer exists. It is necessary > to make provision at least somewhere for replenishing the supply of bits. > > 2. Find the distance to the next one in a bit stream. The same considerations > as above apply. I have got to ask. Why is it so generally important that the distance between bits can be determined efficiently. Note that I want to know, `why is it important to ME, and to the general computing base?'. This is the question that hardware designers and compiler writers ask themselves. For the hardware/software designer to incorporate the features you want, it is not enough to know that they are important to YOU. From a business case, the only reason that they care if you are happy is if you give them money. If the desired features are generally useful, then the probability is high that inclusion of such features will generate lots of money. It seems to me that the only way your crusade for bit distance will ever be successful is to show the general utility of such functions. Otherwise, no matter how interesting (to you), you will not get these features. Anyway, please tell us. What is so interesting/useful about non-uniform random numbers? -- Terry Ingoldsby ingoldsb%ctycal@cpsc.ucalgary.ca Land Information Services or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
stephen@pesto.uchicago.edu (Stephen P Spackman) (05/08/91)
In article <653@ctycal.UUCP> ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
It seems to me that the only way your crusade for bit distance will ever be
successful is to show the general utility of such functions. Otherwise, no
matter how interesting (to you), you will not get these features.
Are we talking about "find first one" here?
Find first one was a pretty useful instruction in our lisp compiler.
In fact, we figured we could speed the thing up a LOT by moving to the
Amiga and using the blitter to do bitmask stuff (which we wanted (a)
for closure algorithms and (b) for resource allocation).
A lot of CPU seems to get expended on variable length data encoding
around here. I haven't examined the code myself but it sounds like it
would come up there.
It's a notoriously useful thing for resource allocation in operating
systems, too.
I've surely wanted this more often than, say, floating point addition.
Of course, there are always other data representations, and so long as
the hardware people take the attitude that computers are for fluid
dynamics and to hell with compilers, operating systems and databases
(who uses computers for THAT stuff anyway? :-)....
Lotsa smilies.
----------------------------------------------------------------------
stephen p spackman Center for Information and Language Studies
systems analyst University of Chicago
----------------------------------------------------------------------
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/08/91)
In article <653@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: > In article <11411@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > In article <4082@batman.moravian.EDU>, halkoD@batman.moravian.EDU (David Halko) writes: > > > In article <9782@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > > > In article <1991Apr5.172533.6717@agate.berkeley.edu>, doug@eris.berkeley.edu (Doug Merritt) writes: > > > > > In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes: > I had to leave the above 4 lines in, just cause it looks so awesome to see > a discussion going so many levels deep :^) ... > > Consider the following two operations, which comprise more than 25% of the > > operations in some highly efficient, from the "picocode" standpoint, methods > > for generating non-uniform random numbers. > > 1. Examine a bit, and take some action depending on it. Also, the next time > > this is to be done, this bit effectively no longer exists. It is necessary > > to make provision at least somewhere for replenishing the supply of bits. > > 2. Find the distance to the next one in a bit stream. The same considerations > > as above apply. > I have got to ask. Why is it so generally important that the distance between > bits can be determined efficiently. Note that I want to know, `why is it > important to ME, and to the general computing base?'. This is the question > that hardware designers and compiler writers ask themselves. I believe most people are aware of the existence of simulation, including Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise intractable problems. In these, it is almost always the case that considerable use must be made of non-uniform random variables; the quantities of interest are usually exponential, normal, Cauchy, binomial, Poisson, etc., and not uniform. In any particular problem, it is quite likely that thousands or millions or even billions of these random numbers will be used. If this is not a situation which calls for efficiency, it will do until a real one comes along. :-) -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
wright@Stardent.COM (David Wright) (05/09/91)
In article <653@ctycal.UUCP> ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: >In article <11411@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >> In article <4082@batman.moravian.EDU>, halkoD@batman.moravian.EDU (David Halko) writes: >> > In article <9782@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >> Consider the following two operations, which comprise more than 25% of the >> operations in some highly efficient, from the "picocode" standpoint, methods >> for generating non-uniform random numbers. >> >> 1. Examine a bit, and take some action depending on it. Also, the next time >> this is to be done, this bit effectively no longer exists. It is necessary >> to make provision at least somewhere for replenishing the supply of bits. >> >> 2. Find the distance to the next one in a bit stream. The same >> considerations as above apply. > >I have got to ask. Why is it so generally important that the distance between >bits can be determined efficiently. Note that I want to know, `why is it >important to ME, and to the general computing base?'. This is the question >that hardware designers and compiler writers ask themselves. Hear hear. And this leads to a second question of HR, which has been posed once before but not answered (or I missed it): Why does your application have to work on a bit stream? What is wrong with instead computing the data about the bit stream, and passing the results along as, say, pairs (a,b), where a is the number of consecutive 0's and b the number of consecutive 1's in the current part of the stream? >Anyway, please tell us. What is so interesting/useful about non-uniform >random numbers? Well, if he's generating numbers that conform to some other distribution, the results certainly wouldn't be uniform random numbers (though I think this is usually done by *starting* with a uniform distribution and then manipulating that into, say, a Gaussian). -- David Wright, not officially representing Stardent Computer Inc wright@stardent.com or uunet!stardent!wright Groucho: America is waiting to hear him sing! Chico: Well, he can sing pretty loud, but not that loud. -- A Night at the Opera
chip@tct.com (Chip Salzenberg) (05/09/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >In article <653@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: >> I have got to ask. Why is it so generally important that the distance >> between bits can be determined efficiently. Note that I want to know, >> `why is it important to ME, and to the general computing base?'. > >I believe most people are aware of the existence of simulation, including >Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise >intractable problems. That kind of problem is not "generally important" in that it does not come up in the "general computing base". Care to try again? -- Brand X Industries Sentient and Semi-Sentient Being Resources Department: Because Sometimes, "Human" Just Isn't Good Enough [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
gideony@microsoft.UUCP (Gideon YUVAL) (05/10/91)
In article <STEPHEN.91May8001742@pesto.uchicago.edu> stephen@pesto.uchicago.edu (Stephen P Spackman) writes: >Are we talking about "find first one" here? > >Find first one was a pretty useful instruction in our lisp compiler. If you start from the low-order bit, you can find the first "one" bit by: table[ (x XOR (x-1)) MOD 37] On a 386, this is 4X as fast (worst-case) as the "hardware" BSF. -- Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/10/91)
In article <28297C23.6984@tct.com>, chip@tct.com (Chip Salzenberg) writes: > According to hrubin@pop.stat.purdue.edu (Herman Rubin): > >In article <653@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: > >> I have got to ask. Why is it so generally important that the distance > >> between bits can be determined efficiently. Note that I want to know, > >> `why is it important to ME, and to the general computing base?'. > >I believe most people are aware of the existence of simulation, including > >Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise > >intractable problems. > That kind of problem is not "generally important" in that it does not > come up in the "general computing base". That something is not important to everyone is no reason no to make it available to those who can see a use for it. How much of the driving public uses the low gears of automatic transmissions? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (05/10/91)
Chip Salzenberg writes: #> That kind of problem is not "generally important" in that it does not #> come up in the "general computing base". Herman Rubin writes: #That something is not important to everyone is no reason no to make it #available to those who can see a use for it. Chip didn't say something had to be important for everyone before making it available, Chip said it had to be important to the general base. Those aren't the same thing. Herman, you seem to have quite specialized computer needs, and more knowledge about computers than the average person, but the types of things you do are not useful to J. Random User, so people who write compilers for general computer users don't find it cost effective to add your optimizations into their code. As a radio-astronomer, I have specialized needs in computing also. I don't complain that there is no commodity language to reduce my data, instead I (and other radio-astronomers) write all our code ourselves. Our market is not large enough to expect off the shelf solutions to our problems. Why don't you apply for a grant, and hire someone to write a language for you? -- -Greg Hennessy, University of Virginia USPS Mail: Astronomy Department, Charlottesville, VA 22903-2475 USA Internet: gsh7w@virginia.edu UUCP: ...!uunet!virginia!gsh7w
rwa@cs.athabascau.ca (Ross Alexander) (05/11/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: [much chat about whether Herman's apps are typical elided] >How much of the driving public uses the low gears of automatic transmissions? All of them, Herman. Whenever they accelerate from a stop. That's what an automatic transmission is for. -- Ross Alexander rwa@cs.athabascau.ca (403) 675 6311 ve6pdq `You were s'posed to laugh!' -- Zippy
guy@auspex.auspex.com (Guy Harris) (05/11/91)
>> I have got to ask. Why is it so generally important that the distance between >> bits can be determined efficiently. Note that I want to know, `why is it >> important to ME, and to the general computing base?'. This is the question >> that hardware designers and compiler writers ask themselves. > >I believe most people are aware of the existence of simulation, including >Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise >intractable problems. Yes, but unless the person who posed the original question does those kinds of simulations, it's not important to him, and unless a given member of the general computing base does them, it's not important to them, either. I.e., you answered an interesting question, namely "tell me a use for finding the distance to the next 1 in a bit stream", but you didn't answer the question that was asked. Now, there are other places where finding the distance to the next 1 in a bit stream is useful. It may well be that it's worth providing some kind of hardware assist for it; I'm curious what forms of "hardware assist" of that sort exist (other than doing it using the obvious simple-minded loop, but in microcode). Some questions then are "what other useful hardware would you have to sacrifice, in some given implementation, to add the hardware assist for 'find next 1'?" and "will we, at some point, not have to sacrifice anything really useful to get it?" so that you might want to put such an operation into the instruction set anyway, and have a non-assisted implementation early on. (Is 'find next 1' similar enough to floating-point normalization that the same hardware assistance can be used for both?" BTW, I assume that by "most people" in "I believe most people are aware of the existence of simulation..." you mean "most readers of this group", which may well be true. I don't know how "find first 1" comes in handy for those simulations, though.
jpk@ingres.com (Jon Krueger) (05/11/91)
In article <12054@mentor.cc.purdue.edu>, Herman Rubin does not address the question: >> Why is it so generally important that the distance between >> bits can be determined efficiently ... why is it >> important to ME, and to the general computing base? Instead he addresses the question "why is it so specifically ipmortant to Herman Rubin": > I believe most people are aware of the existence of simulation, including > Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise > intractable problems. No one asked that question. We believe it's important to you, Herman. Can you answer the question that was asked? Why is it important to the general computing base that the distance between bits can be determined efficiently? -- Jon -- Jon Krueger, jpk@ingres.com
rockwell@socrates.umd.edu (Raul Rockwell) (05/11/91)
Herman Rubin: >> Consider the following two operations, ... >> 1. Examine a bit, ... >> 2. Find the distance to the next one in a bit stream. Terry Ingoldsby: >I have got to ask. Why is it so generally important that the >distance between bits can be determined efficiently. David Wright: Hear hear. And this leads to a second question of HR, which has been posed once before but not answered (or I missed it): Why does your application have to work on a bit stream? Chip Salzenberg: That kind of problem is not "generally important" in that it does not come up in the "general computing base". harumph. At work we have a hand-optimized assembly routine to convert from a bit array to a list of indices. It gets used a lot. Please note that a bit list is very handy for dealing with selection problems. They are very compact, and have very predictable size requirements. Selection problems include highlighting text for emphasis, intermediate results during a database query, as well as Herman Rubin's probabalistic models. They're also handy in some domains for converting from some arbitrary list of integers to a unique, sorted list (but otherwise the same integers), in O(n). We use "bit lists as integers" for other things too (once you have a generic feature you tend to use it whenever it's handy -- particularly if it's fast). As for the problem of "distance between bits" vs "absolute position of bits" note that if you have a _lot_ of bits you start talking about very long lists of integers (and possibly even exceeding maxint, or whatever for the representation you're using). This is particularly painful if your bitlist is dense. Both have their uses (as does a list ordered pairs of the form (position, offset)), and we happen to use them a lot. Raul Rockwell
zalman@mips.com (Zalman Stern) (05/11/91)
In article <7738@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: [...] >Now, there are other places where finding the distance to the next 1 in >a bit stream is useful. It may well be that it's worth providing some >kind of hardware assist for it; I'm curious what forms of "hardware >assist" of that sort exist (other than doing it using the obvious >simple-minded loop, but in microcode). The IBM RS/6000 has a single cycle count leading zeros instruction. The hardware is also used by the multiplier to short circuit multiplies by smaller constants. The operation takes from 3 to 5 cycles depending on the size of the multiplier. The clz hardware is used to decide how much work it has to do. > >Some questions then are "what other useful hardware would you have to >sacrifice, in some given implementation, to add the hardware assist for >'find next 1'?" and "will we, at some point, not have to sacrifice >anything really useful to get it?" so that you might want to put such an >operation into the instruction set anyway, and have a non-assisted >implementation early on. A Count leading zeros (or ffs if you prefer) instruction is not a big deal other than that you have to design the hardware to do it. (I.e. it doesn't place unreasoanble demands on the register file, it doesn't eat lots of opcode space.) However its not clear that this single instruction does the job. You can either look for a 1 or a 0 bit and you can look from most significant to least significant or vice-versa. Herman probably wants all of these options... > >(Is 'find next 1' similar enough to floating-point normalization that >the same hardware assistance can be used for both?" Putting it in the FP unit makes it harder to uses the result for shifts and such on machines with seperate FP and integer units. (Hooking the integer register file up to the FP normalization hardware is going to be very painful.) My guess is that special purpose hardware would be useful for this sort of operation. DMA it in at bus speed and write the values into scratch RAM. Interrupt the processor when the scratch RAM gets full or the buffer gets empty. (Or DMA the scratch RAM back out too...) Go ahead and toss a Xlinx or something into the hardware to make it programable. -- Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088 zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395 "Never rub another man's rhubarb" -- the Joker via Pop Will Eat Itself
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/11/91)
In article <rwa.673895513@aupair.cs.athabascau.ca>, rwa@cs.athabascau.ca (Ross Alexander) writes: > hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > [much chat about whether Herman's apps are typical elided] > >How much of the driving public uses the low gears of automatic transmissions? > > All of them, Herman. Whenever they accelerate from a stop. That's > what an automatic transmission is for. To clarify, I meant the specific ability to place the transmission in low gears. Current computer hardware can do anything,assuming sufficient external storage, but not efficiently. Similarly with software. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (05/12/91)
Herman Rubin writes:
#To clarify, I meant the specific ability to place the transmission in low gears.
I do this every time it snows. I would have thought that Lafayette had
its fare share of snow.
--
-Greg Hennessy, University of Virginia
USPS Mail: Astronomy Department, Charlottesville, VA 22903-2475 USA
Internet: gsh7w@virginia.edu
UUCP: ...!uunet!virginia!gsh7w
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/12/91)
In article <1991May10.204426.24828@ingres.Ingres.COM>, jpk@ingres.com (Jon Krueger) writes: > In article <12054@mentor.cc.purdue.edu>, Herman Rubin does not address > the question: > >> Why is it so generally important that the distance between > >> bits can be determined efficiently ... why is it > >> important to ME, and to the general computing base? > > Instead he addresses the question "why is it so specifically > ipmortant to Herman Rubin": > > > I believe most people are aware of the existence of simulation, including > > Monte Carlo, or Las Vegas, methods for obtaining answers to otherwise > > intractable problems. > > No one asked that question. We believe it's important to you, Herman. > Can you answer the question that was asked? > > Why is it important to the general computing base that the distance > between bits can be determined efficiently? The summary points out the problems. Does one have to go to great lengths to get a car which has features not in the "general driving base"? Do film companies only make films for the "general public"? Should one even consider that universities only teach courses for the "general student"? In addition, there is much use of instructions for systems programs and libraries which the individual programmer will not use. It is unlikely that the person using the subroutines which would make use of efficient obtaining of the distance between 1's in a bit stream would be aware of this, any more than the person who computes a trigonometric or exponential function is aware that that algorithm has an integer quotient and floating remainder in it. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
dmocsny@minerva.che.uc.edu (Daniel Mocsny) (05/12/91)
In article <12235@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >In article <1991May10.204426.24828@ingres.Ingres.COM>, jpk@ingres.com (Jon Krueger) writes: >> Why is it important to the general computing base that the distance >> between bits can be determined efficiently? > >The summary points out the problems. Does one have to go to great lengths >to get a car which has features not in the "general driving base"? I believe that Herman Rubin could solve his problem with these steps: 1. Develop a computer code which solves a reasonable set of problems, and which relies extensively on a computer's ability to determine the distance between bits. 2. Submit the code to SPEC and get it included in the standard benchmark suite. 3. Watch panic-stricken marketroids scurry to protect their SPEC ratings by further partitioning SPEC into an additional category: SPECrubins. -- Dan Mocsny Internet: dmocsny@minerva.che.uc.edu
jpk@ingres.com (Jon Krueger) (05/13/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) whines: > That something is not important to everyone is no reason no to make it > available to those who can see a use for it. But of course it is available. What exactly is preventing you from using whatever feature you like? -- Jon -- Jon Krueger, jpk@ingres.com
peter@ficc.ferranti.com (peter da silva) (05/13/91)
In article <12235@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > The summary points out the problems. Does one have to go to great lengths > to get a car which has features not in the "general driving base"? Yes. For example, it is getting harder and harder to find a car with competantly designed setabelts instead of rube-goldberg "passive restraint" systems (whether or not accompanied by an airbag). That's because the general public is lazy, so that's where all the money is. > Do film companies only make films for the "general public"? In general, yes. At least that's where all the money is. > Should one even consider that universities only teach courses for the > "general student"? That's where all the money is. > In addition, there is much use of instructions for systems programs and > libraries which the individual programmer will not use. So how about some *examples*? The "general computer base" is where the money is, and building a chip is not cheap. Nor is the return on investment that high. -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
rwa@cs.athabascau.ca (Ross Alexander) (05/13/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >In article <rwa.673895513@aupair.cs.athabascau.ca>, rwa@cs.athabascau.ca (Ross Alexander) writes: >> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >> [much chat about whether Herman's apps are typical elided] >> >How much of the driving public uses the low gears of automatic transmissions? >> >> All of them, Herman. Whenever they accelerate from a stop. That's >> what an automatic transmission is for. >To clarify, I meant the specific ability to place the transmission in low gears. But the auto transmission knows when to do that for you. Some poor underpaid mechanical engineer spent years designing an analogue (fluid) computer that monitors engine speed, accelerator position, drive wheel speed, and torques and then selects a gear ratio with a view to optomizing fuel economy, engine wear, and ghods know what else. They are remarkably robust (mine currently has 275,000 km without any failures) and maintenance-free. Using one allows the operator to concentrate on an appropriate problem domain - keeping the vehicle out of the ditch. Why are you so d*mned eager to second guess this mechanism, and divert your attention from the thing that, compared to the machinery, you do well (guidance)? BTW, I don't accept the `snow hypothesis'; with proper tires, second gear starts to prevent slippage are redundant (and besides, they're hard on the engine - low speed/high torque (lugging) strains the crankshaft and its bearings). All these arguements can be moved back into the computer architecture discussion with remarkably little rephrasing, I might add. >Current computer hardware can do anything,assuming sufficient external storage, >but not efficiently. Similarly with software. I can only wonder that, if they're inefficient, why are they so much faster and cheaper that they used to be? What the heck do you mean by `efficient'? Obviously some new use of the word I was previously unaquainted with (apologies to Arthur Dent). -- Ross Alexander rwa@cs.athabascau.ca (403) 675 6311 ve6pdq `You were s'posed to laugh!' -- Zippy
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/14/91)
In article <rwa.674151323@aupair.cs.athabascau.ca>, rwa@cs.athabascau.ca (Ross Alexander) writes: > hrubin@pop.stat.purdue.edu (Herman Rubin) writes: ................. > >To clarify, I meant the specific ability to place the transmission in low gears. > > But the auto transmission knows when to do that for you. Some poor > underpaid mechanical engineer spent years designing an analogue > (fluid) computer that monitors engine speed, accelerator position, > drive wheel speed, and torques and then selects a gear ratio with a > view to optomizing fuel economy, engine wear, and ghods know what > else. According to those who know, they do not do a very good job. And there are situations in which the overrides are needed; why do you think that the manufacturers still keep them? There are places I would not drive without using these overrides. ................... > >Current computer hardware can do anything,assuming sufficient external storage, > >but not efficiently. Similarly with software. > > I can only wonder that, if they're inefficient, why are they so much > faster and cheaper that they used to be? What the heck do you mean by > `efficient'? Obviously some new use of the word I was previously > unaquainted with (apologies to Arthur Dent). If you make something 10^6 times as fast, and with 10^5 times the capacity, and get 10^4 times as much done, it is not as efficient. Now these are not precise figures. But the CRAYs, for example, take about 20 instructions to perform a double precision multiplication of two single precision numbers, while the CYBER 205 takes exactly 2. The earliest computers multiplied two integers of more than 32 bits and obtained a double length result in one instruction. I suggest you program this on your box, and see how many instructions it takes, if the instruction is not there. If some hardware does not have communication between the integer and floating registers, conversion is a real horror. I do not think that any kludge is likely to be as bad as the complicated code needed, using memory, no less. Now the early computers were fast on memory and transfer, but the newer ones are very definitely not so, unless trickery is used. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
preston@ariel.rice.edu (Preston Briggs) (05/14/91)
hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >But the CRAYs, for example, take about 20 instructions >to perform a double precision multiplication of two single precision >numbers, while the CYBER 205 takes exactly 2. The earliest computers >multiplied two integers of more than 32 bits and obtained a double length >result in one instruction. Sure, but each instruction was very slow, in modern terms. >Now the early computers were fast on memory and transfer, but the newer >ones are very definitely not so, unless trickery is used. New computers have much faster memory access. They also have much, much faster FP and integer instructions. So the balance has changed. Everyone knows this. The idea is that the overall throughput goes up. Certainly my programs run faster. I expect yours do too. If you want to take advantage of new architectures and implementations, your probably going to have to rethink some of your old assumptions. For example, in the old days, FP was much more expensive than integer arithmetic. Now they are about the same. In the old days, FP was more expensive than memory accesses. Now FP is generally cheaper. It's easy to get old instructions (say int x int -> long int) in 1 cycle, just slow the cycle time down to old rates.
jpk@ingres.com (Jon Krueger) (05/14/91)
hrubin@pop.stat.purdue.edu (Herman Rubin): > Does one have to go to great lengths > to get a car which has features not in the "general driving base"? Yes. "Great lengths" usually means cost. TVRs are available for those who are willing to pay. > Do film companies only make films for the "general public"? Yes. Custom emulsions are available for those who are willing to pay. The rest of us benefit from commodity hardware. > Should one even consider that universities only teach courses for the > "general student"? Of course. Universities regularly cancel courses if too few people sign up. Courses are not seminars. Seminars are not tutoring. > [a] person using the subroutines which would make use of efficient > obtaining of the distance between 1's in a bit stream [might be > unaware of it] Where's your DATA? Regardless of whether he's aware of it, just how many microseconds has Joe User saved this year because of efficient bit distance routines working on his behalf? Where's your DATA? Skip the analogies, they're not compelling and even if they were they're without substance. Where's your DATA? -- Jon -- Jon Krueger, jpk@ingres.com
kers@hplb.hpl.hp.com (Chris Dollin) (05/14/91)
Ross Alexander says:
[quoting Herman Rubin]
>To clarify, I meant the specific ability to place the transmission in low gears.
But the auto transmission knows when to do that for you.
My experience with manual gears [in Britain] vs automatic gears [America, much
less experience] is ``knows when to do that for you'' is an overstatement.
Using one allows the operator to concentrate on an appropriate problem
domain - keeping the vehicle out of the ditch.
Is this a matter of fact, or of plausible opinion? I'm less aware of changing
gear [in my car my car] than I am of an automatic mishandling gear changes
- that is to say, it [the former] happens, but not very often.
Why are you so d*mned eager to second guess
this mechanism, and divert your attention from the thing that,
compared to the machinery, you do well (guidance)?
He then adds
All these arguements can be moved back into the computer architecture
discussion with remarkably little rephrasing, I might add.
Well, I'd agree, but perhaps with the opposite sign! I usually disagree with
Herman on architecture; but it seems to me that the car geraing analogy is all
in his favour. Perhaps, if we wish to use this analogy, we should find a part
of the car more suitaed to our purposes ...
One of the Rules for System Design [the authors name I recall with enough
uncertainty to be to embarrassed to attempt to write here] was ``don't hide
power''. Isn't this applicable here? [The Rule doesn't say ``go out of your way
to provide power in the form the use would most like'', of course.]
--
Regards, Kers. | "You're better off not dreaming of the things to come;
Caravan: | Dreams are always ending far too soon."
hinton@gca.UUCP (Edward Hinton) (05/14/91)
In article <rwa.674151323@aupair.cs.athabascau.ca> rwa@cs.athabascau.ca (Ross Alexander) writes: >hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > >>To clarify, I meant the specific ability to place the transmission in low gears. > >Why are you so d*mned eager to second guess >this mechanism, and divert your attention from the thing that, >compared to the machinery, you do well (guidance)? BTW, I don't >accept the `snow hypothesis'; with proper tires, second gear starts to >prevent slippage are redundant (and besides, they're hard on the >engine - low speed/high torque (lugging) strains the crankshaft and >its bearings). > I'm new to this thread, but I have a bone to pick with this reasoning, as well as whether Ross has ever lived through a winter driving an automatic transmission car in New England or a similar snow area. Snow here is WET. It slides. It becomes something akin to sawdust and motor oil mixed together during the early portions of a storm, which happens to be when MOST accidents due to skidding occur up here. I much prefer the control of a manual (translate: RISC) transmission, although when driving an automatic (translate: CISC), I have many times found that lower gears are the ONLY way to go up hill unless you care to invest in chains which contribute to road damage unless the road is covered with ice. Low gears are also necessary when you have certain engine problems miles from the nearest service station. CISC might be claimed to do things better under the conditions designed for, but I recently had a problem which required me to use the two low gears to get home in order to prevent stalling every few feet. In computer architecture terms, the principles are the same. CISC can do wonderful things that the designers thought of. But those are usually the easy cases to predict. 'When things go wrong' and 'under adverse conditions' are the ones where no amount of hardware forsight will suffice, and exceptions are where software folks really make their money. RISC makes the hard cases easier. Of course, one could claim that just as automatic transmissions also allow manual use of lower gears, CISC machines don't need to forbid simple instructions also. But we're dealing with extra baggage then, and just as the lower gears in my truck don't work nearly as well as my standard transmission car, so too, extra baggage in a CISC is unlikely to be as efficient as the regular RISC instructions. Please don't simply flame my next remark, but my philosophy is simple: "Hardware designers shouldn't keep trying to guess what unexpected ways I will want to use their hardware." I am just as creative as a hardware guy, just in a different line of work. I want fast, simple, general tools that let me be creative. RISC gives me that.
abaum (Allen Baum) (05/14/91)
In article <1991May13.211555.28824@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: >It's easy to get old instructions (say int x int -> long int) in 1 cycle, >just slow the cycle time down to old rates. It's easy to do now, with the same cycle times. Cycle times are not the issue in this case. The issue is one of returning two results, and what to do with them. The are various solutions, almost all are disliked by either compiler writers, architects, or chip designers. These solutions include specifying two destination registers in the instruction, overwriting one or both of the source registers, writing back to dest reg/dest reg+1 pair, keeping the upper half in a special register..... you can probably think of more, and come up with the reasons that each of the groups above dislike it. The reasons for disliking them aren't terribly frivolous either- they just outweigh the advantages of returning both (in the eyes of those particular groups, AT the particular time - times change!)3
chip@tct.com (Chip Salzenberg) (05/15/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >That something is not important to everyone is no reason no to make it >available to those who can see a use for it. I agree... unless the inclusion of a feature slows down other, more widely used features. The addition of an instruction can make a CPU slower, harder to build and harder to test. "Neat" instructions are not free. -- Brand X Industries Custodial, Refurbishing and Containment Service: When You Never, Ever Want To See It Again [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
paulh@cimage.com (Paul Haas) (05/16/91)
In article <ROCKWELL.91May10233855@socrates.umd.edu> rockwell@socrates.umd.edu (Raul Rockwell) writes: > >At work we have a hand-optimized assembly routine to convert from a >bit array to a list of indices. It gets used a lot. > >Please note that a bit list is very handy for dealing with selection >problems. They are very compact, and have very predictable size >requirements. > ... >Raul Rockwell Our product also uses bit lists to store selection lists. According to gprof my rather naive find_next_bit() function (coded in C) took an insignificant amount of run time. In our application we have to render each selected entity, and that takes far longer than the bit twiddling. To see if a new instruction would be usefull, it helps to have a some numbers to show how much it would speed up a real application. For our application we will probably get a measurable improvement (with a profiler you can measure lots of insignificant things :-)) when we can use 64 bit words. Sometimes the selection lists are rather sparse, so we spend some time scanning for the next non-zero word. 64 bit words will speed up lots of applications. -- Paul Haas paulh@cimage.com (313) 761-7478 I am not now, nor have I ever been an official spokesman for Cimage Corporation.
ward@vlsi.waterloo.edu (Paul Ward) (05/16/91)
In article <2831299D.53F7@tct.com> chip@tct.com (Chip Salzenberg) writes: >According to hrubin@pop.stat.purdue.edu (Herman Rubin): >>That something is not important to everyone is no reason no to make it >>available to those who can see a use for it. > >I agree... unless the inclusion of a feature slows down other, more >widely used features. > >The addition of an instruction can make a CPU slower, harder to build >and harder to test. "Neat" instructions are not free. Even if the "Neat" instruction is free, it should still not be added unless clearly warrented. What is free today has to be supported tomorrow. Paul Ward University of Waterloo -- They will say, "As surely as the LORD lives, who brought the Israelites up out of the land of the north and out of all the countries where he had banished them." For I will restore them to the land I gave to their forefathers. Jeremiah 16:15
mlord@bwdls58.bnr.ca (Mark Lord) (05/16/91)
<Ross Alexander says: < < [quoting Herman Rubin] < >To clarify, I meant the specific ability to place the transmission in low gears. < < But the auto transmission knows when to do that for you. This is probably a lousy way to try to shoot down this analogy. It is fairly common knowledge that manual transmissions get significantly better fuel economy than manual transmissions. Therefore, when fuel performance is important, manual control is also very important. :)
dik@cwi.nl (Dik T. Winter) (05/17/91)
In article <6812@bwdls58.bnr.ca> mlord@bwdls58.bnr.ca (Mark Lord) writes: > This is probably a lousy way to try to shoot down this analogy. Extremely lousy. > It is fairly common knowledge that manual transmissions get significantly > better fuel economy than manual transmissions. Right on the dot. It only depends on the driver. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
chip@tct.com (Chip Salzenberg) (05/17/91)
According to rockwell@socrates.umd.edu (Raul Rockwell): >Chip Salzenberg: > That kind of problem is not "generally important" in that it does not > come up in the "general computing base". > >harumph. > >At work we have a hand-optimized assembly routine to convert from a >bit array to a list of indices. It gets used a lot. Mr. Rockwell's point is presumably that it would be more widely used were it more widely implemented. I cannot help but agree. Unfortunately, it is impossible to do cost/benefit analyses on possible universes of computation in which given tools are more or less available or efficient. A pity. I've been taken to task by another reader for my use of the term "general computing base", since I can't define it. That point is also well taken. Things are not as simple as I had believed... -- Brand X Industries Custodial, Refurbishing and Containment Service: When You Never, Ever Want To See It Again [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>