[comp.arch] Handling mis-alignment

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (02/03/90)

In article <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:
>
>Microcoded unaligned data takes two cycles to load an unaligned datum.
>(Assuming the unaligned datum overlaps two data bus widths.)  MIPSco
>style load-left and load-right take two cycles to load the same
>unaligned datum.

As you point out later, a lot depends on the actual alignment in relation
to the bus. A lot also depends on the hardware available. It is not true
that all microcode (or H/W) takes two cycles. It is true that all RISC
ISA (announced todate) takes minimum of two instructions.

Also, note that for microcode or H/W, the extra cycles (if any) may well
be hidden in some pipeline stages. Whereas the RISC instructions must be
issued one per clock. (Even for superscaler stuff, register scoreboarding
probably forces one per clock, unless the compiler gets clever).

>    If the *possibly* unaligned datum is *actually* aligned, then a
>microcoded unaligned operation _might_ require only one cycle -- but
>the determination of alignment would probably be done so late in the
>pipeline that it would probably be easier to just require two pipeline
>slots for the unaligned load.

As someone else posted, at least the IBM 3090 does this at no time penalty.
There is also a rumor that the (new? unannounced?) Intel chips are zero 
penalty if the unknown-alignment datum is actually aligned, and the penalty
for real misalignment is only one extra cycle. Anyone know better?

>    Such a model would only win if actually unaligned data occurred
>infrequently enough that you would only allocate one cycle, and be
>prepared to stall the pipeline (and insert another transfer) if the
>datum were unaligned.

How could this model lose? Can it *ever* do worse than the RISC must-align-
everything model?

>Handling the overlapping case, case (1), inherently requires two bus
>transfers, and two bus transfers cost just about as much as two
>instructions.

This is not true.

Two bus transfers to succesive words done at the same time can take
advantage of burst, etc. The transfers also happen at typically the
memory access pipeline. It should cost much less than two full
instructions that take two slots everywhere.

>Thing is, though, a processor with such a wide bus is probably so much
>damned faster than any external I/O device you have (external
>representation being the best justification for badly aligned data
>formats) that you probably don't care if it didn't try to optimize
>this case anyway.

But there are lots of appllications that need to pack memory. I have
seen some really time-critical code that (essentially) does only
possibly misaligned data-accesses. It may be true that many, or even
most, applications have the freedom and luxury of padding at will but 
a significant fraction wants performance with arbitary alignment.

Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185
Me? Represent other people? Don't make them laugh so hard.

kds@blabla.intel.com (Ken Shoemaker) (02/04/90)

The i486 handles misaligned transfers transparently from the programming 
standpoint.  From a performance standpoint, there is no penalty for aligned 
transfers, for misaligned transfers a misaligned transfer adds two
clocks to the two transfers that need to be performed to get the data.
Thus, for all cache hits, an aligned load takes 1 clock while a misaligned
load takes 4 clocks (2 + 1 for each transfer).  The two clocks are a "false
start clock" for the transfer that gets aborted (since split processing is
required) and a "tickle" cycle that insures that before any data transfer is
attempted that the whole object is addressable.  Thus, no transfer for a
memory object will be seen on the pins until the processor has determined
that it can retrieve both parts.

In addition, the i486 has the ability to force traps on all misaligned
transfers.  This lets people insure that the data structures for their
programs are such that they are getting the highest performance from their
machines and also insures that their data structures are portable with
mathines that don't support misaligned transfers (assuming, of course, that
the hardware supported representation of objects is the same in memory).
This facility is provided through a bit in control register 0, which, by
default, allows misaligned objects.
----------
Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
csnet/arpanet: kds@mipos2.intel.com
uucp: ...{hplabs|decwrl|pur-ee|hacgate|oliveb}!intelca!mipos3!kds

mash@mips.COM (John Mashey) (02/05/90)

In article <1577@mipos3.intel.com> kds@blabla.UUCP (Ken Shoemaker) writes:
>The i486 handles misaligned transfers transparently from the programming 
...
>In addition, the i486 has the ability to force traps on all misaligned
>transfers.  This lets people insure that the data structures for their

1) That's a very nice feature, like what we once asked IBM to add to S/370s.
I hope it gets plenty of use.
2) Although coming from the other direction, what MIPS does is:
	a) If you say nothing, trap misaligned references (killing process).
	b) On request, trap misalignment, and either fix them up, or
	fix them up, and keep a record of where they occurred, to help the
	user figure out what's happening.
	c) If you recompile the program and request it, the unaligned
	operations get generated.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

tihor@acf4.NYU.EDU (Stephen Tihor) (02/07/90)

Since I am a little endian its the same anyway you look at it.
SInce these are mostly non-portable it mostly matters how you
stuff them on each machine.  Only portabilit requirement I can see 
is that the field occupy the SIZE bits (or |RIGHT-LEFT|+1 bits)

tihor@acf4.NYU.EDU (Stephen Tihor) (02/14/90)

A barrel shifter does not make you un-RISC. "RISC"ishness is if you
figure out how often you use it and how, and put it somewhere that
gives maximum utility for the investment given the alternative
investments.  Perhaps you put it somewhere that real estate isn't so
expensive such as off chip, nearer the I/O devices, unless there are
other applications that justify it being near chip.