radford@calgary.UUCP (Radford Neal) (02/21/88)
Several instruction sets, such as the 68020 and the VAX, have an "indexed" addressing mode, in which a register is shifted left by 0, 1, 2, or 3 bits before being added to a displacement, allowing easy access to arrays of ints, etc. I has occurred to me that this "auto-shifting" might be useful for ALL register references. Every reference to a register as a source operand would be accompanied by two bits giving a shift amount - equivalet to a multiplication by 1, 2, 4, or 8. This gets you indexed addressing for no extra work, given register-offset addressing. It also lets you do various other operations quickly. For example: add r1 = 2*r2+8*r2 Calculates 10*r2 in a single instruction add r1 = 2*r1+r2 Shifts a bit into the bottom r1 (I assume here an instruction set with three-address, register-to- register instructions, a la the CDC 6600.) Speaking from a position of marginal hardware literacy, it seems to me that the shifting might be done in parallel with the register address decode. The data path out of the register bank would contain a one-bit shifter and a two-bit shifter (essentially a very small barell-shifter). While the register address is being decoded, the decision as to whether the data is shifted in either of these two shifters is being made also. When the data comes out of the register, it propagates through the two shifters at close to the speed of light. Anyone want to comment on this idea? Radford Neal The University of Calgary
kers@otter.hple.hp.com (Christopher Dollin) (02/23/88)
"radford@calgary.UUCP (Radford Neal)" says |Several instruction sets, such as the 68020 and the VAX, have |an "indexed" addressing mode, in which a register is shifted |left by 0, 1, 2, or 3 bits before being added to a displacement, |allowing easy access to arrays of ints, etc. | |I has occurred to me that this "auto-shifting" might be useful |for ALL register references. Every reference to a register as |a source operand would be accompanied by two bits giving a |shift amount - equivalet to a multiplication by 1, 2, 4, or 8. The Acorn Risc Machine has some of this. The data manipulation instructions have the general form Target := Source1 Op Source2 where Target and Source1 are registers, and Source2 can be an immediate value (an 8-bit field on an even bit boundary) or a register shifted left, right (logical or arithmetic), or rotated right, either by a small constant (0..31) or by the content of another register (this adds an extra cycle). The simple load/store instructions compute the memory address as Base {+|-} {12Bit | Register_with_Shift} 12Bit is a 12bit byte offset, RegisterWithShift is a register possibly shifted (left/right (log/arith), rotated right) by a constant 0..31 (the register case was eliminated from the second-generation chip as no-one ever used it and they saved some chip area). Regards, Kers | "Why Lisp if you can talk Poperly?"
firth@sei.cmu.edu (Robert Firth) (02/23/88)
In article <1370@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >Several instruction sets, such as the 68020 and the VAX, have >an "indexed" addressing mode, in which a register is shifted >left by 0, 1, 2, or 3 bits before being added to a displacement, >allowing easy access to arrays of ints, etc. > >I has occurred to me that this "auto-shifting" might be useful >for ALL register references. Every reference to a register as >a source operand would be accompanied by two bits giving a >shift amount - equivalet to a multiplication by 1, 2, 4, or 8. The problem with your suggestion, and with scaled-index modes, is that they introduce yet another nonorthogonality into the architecture. On the 68020, for example, arrays with component size 1 are a special case (no scaling); arrays of component size 2, 4, 8 are another special case (can use index mode), and arrays of any other size are another special case (can't use index mode). That's one case too many. Moreover, if you are going to generate slick code for, say, arrays whose component size is 12, you have to do all that loop induction, strength reduction &c stuff. Surprise: once you've done that, you can use those algorithms for all arrays, junk the scaled-index mode, and get better code. My own preference is for as simple an architecture as possible (but no simpler, as Einstein remarked). However, if people want to build a scaled index mode, PLEASE let the damn thing scale for ANY array a modern HLL can generate. In particular for component sizes SMALLER than the addressable unit as well as larger. Nothing is more tiresome than a special feature that works some of the time. The Data General Nova did some of what you want, and the Acorn does more. Writing a codegenerator for either of them is a pig, because you have to handle an unconscionable number of special cases for seemingly simple things like multiply, add, subtract, &c.
jmd@granite.dec.com (John Danskin) (02/24/88)
In article <1370@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >Several instruction sets, such as the 68020 and the VAX, have >an "indexed" addressing mode, in which a register is shifted >left by 0, 1, 2, or 3 bits before being added to a displacement, >allowing easy access to arrays of ints, etc. > >I has occurred to me that this "auto-shifting" might be useful >for ALL register references. Every reference to a register as >a source operand would be accompanied by two bits giving a >shift amount - equivalet to a multiplication by 1, 2, 4, or 8. . .stuff deleted. . >Anyone want to comment on this idea? > > Radford Neal > The University of Calgary Add with shift instructions can be very nice. They speed address calculations, and can be used as part of a multiply by small constant. At least one RISC floating point oriented chip set incorporates such instructions. (Sorry, they don't like people to talk about their chipset so I can't say who they are.) However, based on my own coding practice, it isn't worth giving up two instruction bits for the capability. Instruction bits are incredibly useful things, and it is hard enough to stay at 32 bits without 'using just a couple more bits' for something that isn't going to be used even every tenth cycle. -- John Danskin | decwrl!jmd DEC Workstation Systems Engineering | (415)853-6724 100 Hamilton Avenue | My comments are my own. Palo Alto, CA 94306 | I do not speak for DEC.
viggy@hpcupt1.HP.COM (Viggy Mokkarala) (02/24/88)
radford@calgary.UUCP (Radford Neal) writes: >It has occurred to me that this "auto-shifting" might be useful >for ALL register references. Every reference to a register as >a source operand would be accompanied by two bits giving a >shift amount - equivalet to a multiplication by 1, 2, 4, or 8. ... ... > Radford Neal > The University of Calgary The HP Precision Architecture provides for these kinds of operations by its Shift and Add instructions. There is a pre-shifter before one of the inputs to the CPU. It allows for one of the operands to be pre-shifted by upto 3 bits before an addition happens. These Shift and Add instructions come in two flavors - one which traps on an overflow, and the other which doesn't. A general integer multiply routine takes, on the average, less than 20 instructions. The assumptions and details of this claim can be found in the paper titled "Integer Multiplication and Division on the HP Precision Architecture", Proceedings of the Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS II), pp 90 - 99. Viggy Mokkarala, Hewlett Packard Co., Cupertino, CA. (hpda!viggy)
jk3k+@andrew.cmu.edu (Joseph G. Keane) (02/24/88)
The problem i see with your suggestion is that it takes up much-needed instruction bits that could probably be better used some other way. --Joe
ok@quintus.UUCP (Richard A. O'Keefe) (02/24/88)
In article <4307@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: : In article <1370@vaxb.calgary.UUCP: radford@calgary.UUCP (Radford Neal) writes: : :Several instruction sets, such as the 68020 and the VAX, have : :an "indexed" addressing mode, in which a register is shifted : :left by 0, 1, 2, or 3 bits before being added to a displacement, : :allowing easy access to arrays of ints, etc. : Moreover, if you are going to generate slick code for, say, arrays whose : component size is 12, you have to do all that loop induction, strength : reduction &c stuff. Surprise: once you've done that, you can use those : algorithms for all arrays, junk the scaled-index mode, and get better : code. It's worth noting that doing the indexing in software is FASTER than using the scaled modes on the MC68020. The MC68020 has lots of fancy addressing modes that the MC68010 has, but (no doubt because they had to be shoehorned into an existing instruction set) it is usually faster to forget about them. (The full-size multiply and divide are quite another matter!) I wouldn't like to see auto-increment addressing junked, though.
andy@polya.STANFORD.EDU (Andy Freeman) (02/25/88)
In article <188@granite.dec.com> jmd@granite.UUCP (John Danskin) writes: >In article <1370@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >>[Radford Neal suggests that the ability to shift a register's value >> in any kind of register reference might be a good thing.] >Add with shift instructions can be very nice. They speed address >calculations, and can be used as part of a multiply by small constant. >.... However, based on my own coding practice, it isn't worth giving >up two instruction bits for the capability. Instruction bits are >incredibly useful things, and it is hard enough to stay at 32 bits >without 'using just a couple more bits' for something that isn't going >to be used even every tenth cycle. The MIPS-X compute instructions, which are all register to register, use 12 bits to determine the operation. There are 17 of them (79 if you count the shift instructions separately - MIPS-X uses 7 bits to encode the shift amount but more compact schemes exist). Now, it may be that a more compact encoding would slow the chip down more than ``shift any register value'' would help, but that's different than saying that 32 bits is a tight fit for register reference instructions in a load-store architecture. -andy
radford@calgary.UUCP (Radford Neal) (02/26/88)
In article <4307@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: > The problem with your suggestion, and with scaled-index modes, is that > they introduce yet another nonorthogonality into the architecture... > The Data General Nova did some of what you want, and the Acorn does more. > Writing a codegenerator for either of them is a pig, because you have to > handle an unconscionable number of special cases for seemingly simple > things like multiply, add, subtract, &c. This brings up an interesting point. What distinguishes an unnecessary complication in the compiler from an advantagous transfer of work from run time to compile time? To my mind, replacing a multiply by two by a shift is an entirely appropriate activity for a compiler. I don't really mind doing more of the same, if it gives better performance. The added complexity in the compiler is at the level of detail, not overall design. The shift is *always* better than the multiply, so there's no real difficulty. On the other hand, I hate two-address instructions, no matter how orthogonal, because they introduce more real options for the compiler to worry about. The locally optimal code destroys one operand. Which operand should you destroy? Should you insteadad save both operands at the cost of another move? Answering these questions is a lot more work than handling special cases for multiplication. As for the auto-shifted registers: my idea is that the shifting is done at no extra cost, in parellel with the register address decode. If this isn't true, I'd agree that it is a bad idea. If it is, handling it in the compiler seems like a small price for doing two operations in parellel. Radford Neal
radford@calgary.UUCP (Radford Neal) (02/26/88)
In article <6310005@hpcupt1.HP.COM>, viggy@hpcupt1.HP.COM (Viggy Mokkarala) writes: > The HP Precision Architecture provides for these kinds of operations by its > Shift and Add instructions. There is a pre-shifter before one of the inputs to > the CPU. It allows for one of the operands to be pre-shifted by upto 3 bits > before an addition happens. Does this cost you anything (other than chip area), or does the shift overlap another operation? If it does overlap, is there any reason not to allow it for all relevant instructions (e.g. and and or)? Is there any reason to keep the unshifted add (given a shift of zero is possible)? Radford Neal
viggy@hpcupt1.HP.COM (Viggy Mokkarala) (03/02/88)
/ hpcupt1:comp.arch / radford@calgary.UUCP (Radford Neal) / 3:35 pm Feb 25, 1988 / In article <6310005@hpcupt1.HP.COM>, viggy@hpcupt1.HP.COM (Viggy Mokkarala) writes: > The HP Precision Architecture provides for these kinds of operations by its > Shift and Add instructions. There is a pre-shifter before one of the inputs to > the CPU. It allows for one of the operands to be pre-shifted by upto 3 bits > before an addition happens. Does this cost you anything (other than chip area), or does the shift overlap another operation? If it does overlap, is there any reason not to allow it for all relevant instructions (e.g. and and or)? Is there any reason to keep the unshifted add (given a shift of zero is possible)? Radford Neal ----------
viggy@hpcupt1.HP.COM (Viggy Mokkarala) (03/02/88)
radford@calgary.UUCP (Radford Neal) writes: >In article <6310005@hpcupt1.HP.COM>, viggy@hpcupt1.HP.COM (Viggy Mokkarala) writes: >> The HP Precision Architecture provides for these kinds of operations by its >> Shift and Add instructions. There is a pre-shifter before one of the inputs to >> the CPU. It allows for one of the operands to be pre-shifted by upto 3 bits >> before an addition happens. >Does this cost you anything (other than chip area), or does the shift >overlap another operation? If it does overlap, is there any reason >not to allow it for all relevant instructions (e.g. and and or)? >Is there any reason to keep the unshifted add (given a shift of zero >is possible)? > Radford Neal Sorry about the previous failed posting. The pre-shifter before one of the CPU inputs is the same shifter that is used for the "indexed loads" of the HP Precision Arch. The index register (which can be any of the general registers), may be optionally shifted left by 1, 2, or 3 bits so that integer addressing to half words, words, or double words is possible. Therefore, it turns out that the "shift" operation in the shift and add instructions, which were included as primitives for integer multiplication, comes for free. Instructions such as Shift and And,or Shift and Or are not frequently used instructions, and weren't considered. There isn't a Shift Zero and Add instruction (it is simplly called the Add instruction, to keep the instruction names simple I guess :-) ). In the HP 9000/840 (the first HPPA product - off the shelf TTL parts), the pre-shift operation takes place in the beginning of the execute phase (the pipe is 3 stages deep, and the execute phase is the second one). The pre-shift operation takes 6-7 nsec and happens in parallel with the immediate generation. All register operands do pass through the pre-shifter and only some indexed loads, and the shift and add instructions get shifted by more than zero. In this implementation, it required about 8 or 9 packs extra to do this. Viggy Mokkarala, Hewlett Packard Co., Cupertino, CA. (hpda!viggy)