schow@bcarh185.bnr.ca (Stanley T.H. Chow) (02/03/90)
In article <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes: > >Microcoded unaligned data takes two cycles to load an unaligned datum. >(Assuming the unaligned datum overlaps two data bus widths.) MIPSco >style load-left and load-right take two cycles to load the same >unaligned datum. As you point out later, a lot depends on the actual alignment in relation to the bus. A lot also depends on the hardware available. It is not true that all microcode (or H/W) takes two cycles. It is true that all RISC ISA (announced todate) takes minimum of two instructions. Also, note that for microcode or H/W, the extra cycles (if any) may well be hidden in some pipeline stages. Whereas the RISC instructions must be issued one per clock. (Even for superscaler stuff, register scoreboarding probably forces one per clock, unless the compiler gets clever). > If the *possibly* unaligned datum is *actually* aligned, then a >microcoded unaligned operation _might_ require only one cycle -- but >the determination of alignment would probably be done so late in the >pipeline that it would probably be easier to just require two pipeline >slots for the unaligned load. As someone else posted, at least the IBM 3090 does this at no time penalty. There is also a rumor that the (new? unannounced?) Intel chips are zero penalty if the unknown-alignment datum is actually aligned, and the penalty for real misalignment is only one extra cycle. Anyone know better? > Such a model would only win if actually unaligned data occurred >infrequently enough that you would only allocate one cycle, and be >prepared to stall the pipeline (and insert another transfer) if the >datum were unaligned. How could this model lose? Can it *ever* do worse than the RISC must-align- everything model? >Handling the overlapping case, case (1), inherently requires two bus >transfers, and two bus transfers cost just about as much as two >instructions. This is not true. Two bus transfers to succesive words done at the same time can take advantage of burst, etc. The transfers also happen at typically the memory access pipeline. It should cost much less than two full instructions that take two slots everywhere. >Thing is, though, a processor with such a wide bus is probably so much >damned faster than any external I/O device you have (external >representation being the best justification for badly aligned data >formats) that you probably don't care if it didn't try to optimize >this case anyway. But there are lots of appllications that need to pack memory. I have seen some really time-critical code that (essentially) does only possibly misaligned data-accesses. It may be true that many, or even most, applications have the freedom and luxury of padding at will but a significant fraction wants performance with arbitary alignment. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185 Me? Represent other people? Don't make them laugh so hard.
kds@blabla.intel.com (Ken Shoemaker) (02/04/90)
The i486 handles misaligned transfers transparently from the programming standpoint. From a performance standpoint, there is no penalty for aligned transfers, for misaligned transfers a misaligned transfer adds two clocks to the two transfers that need to be performed to get the data. Thus, for all cache hits, an aligned load takes 1 clock while a misaligned load takes 4 clocks (2 + 1 for each transfer). The two clocks are a "false start clock" for the transfer that gets aborted (since split processing is required) and a "tickle" cycle that insures that before any data transfer is attempted that the whole object is addressable. Thus, no transfer for a memory object will be seen on the pins until the processor has determined that it can retrieve both parts. In addition, the i486 has the ability to force traps on all misaligned transfers. This lets people insure that the data structures for their programs are such that they are getting the highest performance from their machines and also insures that their data structures are portable with mathines that don't support misaligned transfers (assuming, of course, that the hardware supported representation of objects is the same in memory). This facility is provided through a bit in control register 0, which, by default, allows misaligned objects. ---------- Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California csnet/arpanet: kds@mipos2.intel.com uucp: ...{hplabs|decwrl|pur-ee|hacgate|oliveb}!intelca!mipos3!kds
mash@mips.COM (John Mashey) (02/05/90)
In article <1577@mipos3.intel.com> kds@blabla.UUCP (Ken Shoemaker) writes: >The i486 handles misaligned transfers transparently from the programming ... >In addition, the i486 has the ability to force traps on all misaligned >transfers. This lets people insure that the data structures for their 1) That's a very nice feature, like what we once asked IBM to add to S/370s. I hope it gets plenty of use. 2) Although coming from the other direction, what MIPS does is: a) If you say nothing, trap misaligned references (killing process). b) On request, trap misalignment, and either fix them up, or fix them up, and keep a record of where they occurred, to help the user figure out what's happening. c) If you recompile the program and request it, the unaligned operations get generated. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
tihor@acf4.NYU.EDU (Stephen Tihor) (02/07/90)
Since I am a little endian its the same anyway you look at it. SInce these are mostly non-portable it mostly matters how you stuff them on each machine. Only portabilit requirement I can see is that the field occupy the SIZE bits (or |RIGHT-LEFT|+1 bits)
tihor@acf4.NYU.EDU (Stephen Tihor) (02/14/90)
A barrel shifter does not make you un-RISC. "RISC"ishness is if you figure out how often you use it and how, and put it somewhere that gives maximum utility for the investment given the alternative investments. Perhaps you put it somewhere that real estate isn't so expensive such as off chip, nearer the I/O devices, unless there are other applications that justify it being near chip.