[net.arch] Arbitrary byte alignment

ksg@houxl.UUCP (K.GRANT) (10/03/84)

I'm interested in the byte alignment problem which exists in 32 bit
machines.  If a processor is given an address for a 32 bit operand (hereafter called a word) which is not aligned on a word boundary it can either:
	1) fault
	2) fetch the word in two accesses.

(Are there memories which support arbitrarily aligned accesses to bytes, two
byte quantities, and four byte quantities?  If not, why not?)

Do most processors support the first or second option?

If a processor chooses the second option, how does it handle memory faults
between accesses?  How does it support restartability of that instruction?
What other problems have I omitted?  Does the logic design become unbearable?


					Thanks,
					houxl!ksg

jackson@uiucdcsb.UUCP (10/05/84)

If I remember correctly, most IBM machines use the 1st option.

bcase@uiucdcs.UUCP (10/05/84)

The main reason most memories don't support all kinds of arbitrary
alignments is that the critical path to memory is lengthened.  Thus,
even memory references which do not require the alignment hardware
pay the price and are slowed down.  And sometimes the hardware itself
can be more expensive than the memory chips (in small memories).

    bcase

cem@intelca.UUCP (Chuck McManis) (10/06/84)

One of the nicer things the DEC-10s and 20s had was something called a 
byte pointer. In this context a byte was any arbitrary grouping of bits
up to 16 I believe, but it may have gone up to 36, anyway the a string
of "bytes" had to start on an 36 bit word boundary but you could have
very long strings (ie all of a text file could be considered a string
of 7 bit bytes) There were several commands for using bytes notably
LDB, DPB for Load from byte pointer and Deposit to byte pointer, both
of these opcodes took an accumulator and an effective address of the
byte pointer, and transferred the byte to/from the lower (rightmost)
bits into/out of memory. It also took care of odd bits, (ie 5 7bit ascii
bytes would fit into a 36 bit word with one bit (the lsb) left over)
There were also autoincrement and autodecrement modes (this is DEC 
right ? :-)) and were quite convenient(sp?) for manipulating things 
smaller than a word. Memory was always accessed as a 36 bit word
and the extraction was done in microcode I am pretty sure. I sure 
wish some of todays processors were so talented and didn't need
such archaic things such as byte, and word alignment with bytes
fixed at 8 bits.
--Chuck

-- 
-- Chuck                                    - - - D I S C L A I M E R - - - 
{ihnp4,fortune}!dual\                     All opinions expressed herein are my
        {proper,idi}-> !intelca!cem       own and not those of my employer, my
 {ucbvax,hao}!hplabs/                     friends, or my avocado plant. :-}
                             ARPAnet    : "hplabs!intelca!cem"@Berkeley

guy@rlgvax.UUCP (Guy Harris) (10/07/84)

> One of the nicer things the DEC-10s and 20s had was something called a 
> byte pointer.... Memory was always accessed as a 36 bit word
> and the extraction was done in microcode I am pretty sure. I sure 
> wish some of todays processors were so talented and didn't need
> such archaic things such as byte, and word alignment with bytes
> fixed at 8 bits.

On the original KA10, a quick look at the timing would indicate that
an LDB or STB instruction did its work via the old trick of "shift and
mask"; the timings were dependent on how many bits you had to shift the
word right or left.  (The KA10 wasn't microcoded, and I don't think the
KI10 was, either; I don't remember whether the KL10 timings implied it was
done by shifting or not.)  You need a barrel shifter or somesuch to make it
doable in a fixed number of cycles.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

johnl@godot.UUCP (10/09/84)

There seem to have been four stages of byte addressing philosophy.

1.  Prehistoric: Machines like the 1620 and Z80 which were 
addressed a digit at a time, and built that way.  No alignment 
constraints, since there were no performance implications 
thereof.  

2.  Early, such as IBM System 360 and the PDP-11:  Byte addressed but word-
implemented.  Objects must be aligned on "natural" boundaries, i.e. multiples
of their own size, and you get a program fault if they're not.  Sometimes
software caught the faults and made it appear that arbitrary alignment was
possible, although very slowly.

3.  Decadent, such as IBM 370 and Vax:  Assembler programmers complained
about having to align stuff, so the misalignment was handled in microcode.
There's still a penalty for misalignment, but it's not so bad.

4.  Post-modern, such as Pyramid 90X, Berkeley RISC, and Stanford MIPS:
Hardware and software designers start to talk to each other, and find that
a) teaching compilers to deal with alignment isn't that hard, and b) if you
do so, you buy back a lot of performance.

There have also been strange intermediate stages such as least one
post-modern machine that enforces alignment by ignoring the low-order bits
of the address.

I suppose it would be possible to have fiendishly clever memory designs
where adjacent words were always in different memory banks so you could
cycle both at the same time.  Sounds pretty awful, though, since you have
to determine for each memory reference how many memories to cycle and how
to splice the parts together.  As far as I can tell, it's never been
seriously proposed for implementation, except perhaps incidentally in very
large cached architectures such as the IBM 308X.

John Levine, ima!johnl

crandell@ut-sally.UUCP (Jim Crandell) (10/09/84)

There is in fact a rather obvious solution to the alignment/performance
issue (I cannot bring myself to call it a problem) which satisfies
everyone except the members of a group I shall refer to as X.
Assuming eight-bit bytes and 32-bit words, the method involves four
independent byte-wide* memories (two LSBs for bank select), a
+0:+1:+2:+3 circuit, and eight 4-by-4 crossbar switches.  The exact
details of implementation, the reasons for its unpopularity, and the
identity of X are left as rather trivial exercises for the reader.


* Mostek legal department please note hyphen and lack of capitals.
-- 

    Jim Crandell, C. S. Dept., The University of Texas at Austin
               {ihnp4,seismo,ctvax}!ut-sally!crandell

bob@anwar.UUCP (Bob Erickson) (10/10/84)

A note about the Pyramid 90x.  After further discussion between
software and hardware engineers, it was determined that it wouldn't be
very difficult or expensive (speedwise) to implement almost arbitrary
byte alignment (e.g. longwords accessed on any even address) in the
microcode.  Pyramid will be offering this microcode change to its'
customers real soon now.

I think this implies that arbitrary byte alignment does not necessarily
imply a performance penalty in the global throughput of a machine.

I know compilers can be taught to do alignment, but many programmers
using C's simple address arithmetic mechanisms, can't be. :-)

-- 


========================================================== Be

Company: 	HHB-Softron
		1000 Wyckoff Ave.
		Mahwah NJ 07430
		201-848-8000

UUCP address:	{ihnp4,decvax,allegra}!philabs!hhb!bob

bprice@bmcg.UUCP (10/10/84)

In article <ima.426> John Levine, ima!johnl writes:
>I suppose it would be possible to have fiendishly clever memory designs
>where adjacent words were always in different memory banks so you could
>cycle both at the same time.  Sounds pretty awful, though, since you have
>to determine for each memory reference how many memories to cycle and how
>to splice the parts together.  As far as I can tell, it's never been
>seriously proposed for implementation, except perhaps incidentally in very
>large cached architectures such as the IBM 308X.

Indeed, it has been done.  The Burroughs B1700-B1800-B1900 series had a memory
of two banks, interleaved by word.  If the word containing the first address
were in bank A, it was fetched.  At the same cycle, bank B was accessed with
address n or n+1, as appropriate.  As I recall, the words were 32 bits.  The
processor's data path was 24 bits, so many possibilities arose:  all bits
in the same word; all bits in two words, but in one processor word; the operand
in two memory words, and two processor words;...

There were several features of the B1700 architecture that were noteworthy--the
bytes that we all know and love, that we are discussing here--the B1700
addressed bits, and operand sizes were given in several levels:  address
granularity in the code, "byte" size for string operations, string length.  The
processor was microcoded, and several interpreters were provided: COBOL-RPG,
Fortran, and MCP (the operating system), were the most popular.  Interpreter
selection was dynamic--the operating system used a different one than the
programs did, quite often.  The microcode had a dedicated memory, and the size
of the microstore was the subject of a hardware option:  it ranged from zero to
nearly enough.  Overflow microcode resided in main memory.

Enough for now.  Maybe somebody who knows more about the B1700 could carry on.

--Bill Price
-- 
--Bill Price    uucp:   {decvax!ucbvax  philabs}!sdcsvax!bmcg!bprice
                arpa:?  sdcsvax!bmcg!bprice@nosc

tom@hcrvx1.UUCP (Tom Kelly) (10/11/84)

> I suppose it would be possible to have fiendishly clever memory designs
> where adjacent words were always in different memory banks so you could
> cycle both at the same time.  Sounds pretty awful, though, since you have
> to determine for each memory reference how many memories to cycle and how
> to splice the parts together.  As far as I can tell, it's never been
> seriously proposed for implementation, except perhaps incidentally in very
> large cached architectures such as the IBM 308X.

If my memory serves me correctly, the CDC 6600 series had something like
this.  It was called "Phased Memory" - the configuration we had was described
as "40 banks phased".  My understanding was that adjacent addresses were
in different memory banks, so that memory access could be overlapped.

Of course, the 6600 had no byte addressing.  If you wanted a byte, you
fetched a word and then used shifts and masks.

Tom Kelly  (416) 922-1937
{utzoo, ihnp4, decvax}!hcr!hcrvx1!tom

wayne@bambi.UUCP (Wayne Wilner) (10/13/84)

Thanks to Bill Price for explaining the B1700...B1900
bit-addressibility.  The one feature that seems to always
escape everyone's attention is that variable-length strings
were addressed by a triple:
    starting address
    length
    direction
not just the usual address-length pair.  By having the
"direction" parameter, one could fetch toward higher or
toward lower addresses from the starting address.  So add to
Bill's description that if bank A were given address N, then
bank B might be given N, N+1, or N-1.

--mhuxl!thumper!wayne Wilner
  Bell Communications Research

mjl@ritcv.UUCP (Mike Lutz) (10/14/84)

Ah, the B1700!  I've waxed loquacious on this fascinating machine
before, but what the hell! it's time for another lecture.

Bill Price discussed its primary distinctive feature already, namely
bit addressable memory.  Just to fill in a bit ;-):  a single micro
instruction could read from 1 to 24 bits beginning at any bit address
in main memory (and scanning in either direction).  The number of bits
could be a constant, or determined by the current ALU precision
(1-24).

Programmer controlled variable sized operands were supported by another
B1700 innovation: if the microinstruction register (MIR) was the
destination of a micro-instruction, then the value it was sent was
logically or'ed with the next micro-instruction fetched; the resulting
bit pattern was the actual microinstruction executed.  Thus if one had
a bit count in, say, register Y, and wanted to read that number of bits
from main memory into register X, the code looked like the following:

	MOVE Y TO MIR
	READ 0 BITS TO X

This technique was used throughout the micro architecture to support
jump tables, status bit testing, and variable length shifts & rotates.

The Burrough's designers put a lot of effort into creating a micro
architecture for which efficient emulators and interpreters could be
rapidly developed.  In my case, I was a graduate student on a project
investigating microprogramming and emulation, and the B1700 was a
godsend to us, as it made it relatively easy to develop and investigate
new architectural ideas.  It's hard to explain what made the micro
architecture so good, but my experience and that of me and my
colleagues was that the pieces fit together well.  There always seemed
to be a natural, even elegant, solution to microprogramming problems
which fit hand in glove with what the hardware provided.

On top of this base, Burrough's constructed emulators for specialized
FORTRAN, COBOL, and SDL (a systems programming language) machines.  SDL
was used for the operating system, compilers, and most of the
utilities.  As Bill Price mentioned, processes using different
architectures could be multiprogrammed, with the appropriate emulator
being invoked as part of a context switch.  The O.S. was ahead of for
its time, supporting multiprogramming, virtual memory, and multiple
emulators on systems with as little as 48Kbytes of memory and using
RK05-like discs.

Enough nostalgia, though I do believe any serious computer architect
could learn a lot from the B1700's design.

Mike Lutz

P.S.  Wayne Wilner, one of the B1700 designers, used to be on the net.
If he still is, maybe he could fill in more of the details that I
forgot.
-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
ARPA:		ritcv!mjl@Rochester.ARPA

guy@rlgvax.UUCP (Guy Harris) (10/15/84)

> Ah, the B1700!  I've waxed loquacious on this fascinating machine
> before, but what the hell! it's time for another lecture.
> 
> Bill Price discussed its primary distinctive feature already, namely
> bit addressable memory.  Just to fill in a bit ;-):  a single micro
> instruction could read from 1 to 24 bits beginning at any bit address
> in main memory (and scanning in either direction).  The number of bits
> could be a constant, or determined by the current ALU precision
> (1-24).

Well, on the IBM 7030 (a/k/a STRETCH), a single *macro* instruction (in the
sense of non-micro, not in the sense of a macro assembler) could read from
1 to 32 (or possibly even 64) bits beginning at any bit address in memory,
although it only scanned in the "forward" direction.  The field length was
part of the instruction.

> Programmer controlled variable sized operands were supported by another
> B1700 innovation: if the microinstruction register (MIR) was the
> destination of a micro-instruction, then the value it was sent was
> logically or'ed with the next micro-instruction fetched; the resulting
> bit pattern was the actual microinstruction executed.  Thus if one had
> a bit count in, say, register Y, and wanted to read that number of bits
> from main memory into register X, the code looked like the following:
> 
> 	MOVE Y TO MIR
> 	READ 0 BITS TO X

Sounds like the microinstruction equivalent of the IBM 360's EX (execute)
instruction, which uses it for much the same purpose.  The MVC (move character)
instruction has the character count in the instruction itself, so if you
have the number of characters to be moved in a register you just do an EX
and tell it to stuff the count into the instruction being executed.  (The
370 introduced a MVCL instruction (move characters long, or was it MVL for
"move long"?) which had the addresses and count in registers; it allowed you
to move more than 256 characters at a time, and was interruptible so you
couldn't lose interrupts by trying to move all of main memory around.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

ken@turtlevax.UUCP (Ken Turkowski) (10/15/84)

> There is in fact a rather obvious solution to the alignment/performance
> issue (I cannot bring myself to call it a problem) which satisfies
> everyone except the members of a group I shall refer to as X.
> Assuming eight-bit bytes and 32-bit words, the method involves four
> independent byte-wide* memories (two LSBs for bank select), a
> +0:+1:+2:+3 circuit, and eight 4-by-4 crossbar switches.  The exact
> details of implementation, the reasons for its unpopularity, and the
> identity of X are left as rather trivial exercises for the reader.

An extension of this technique is used in high-performance raster
graphics systems, which allows access to several horizontally-,
vertically-, or block-contiguous pixels.  It is called a tesselated
frame buffer, and is described in two recent papers, one about a year
ago in IEEE Computer Graphics and Applications, the other in this
years' SIGGRAPH tutorial on State-of-the-Art in Image Synthesis.  If
there is any interest, I can dig up the exact references and post them
to the net.
-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

arndt@ttds.UUCP (Arndt Jonasson) (10/15/84)

    >One of the nicer things the DEC-10s and 20s had was something called a 
    >byte pointer. In this context a byte was any arbitrary grouping of bits

Had? They are still very much alive, although DEC no longer
does any development on them (or so I have heard).

A PDP-10 bytepointer can access any contiguous string of bits
that lies within a 36-bit word. It doesn't have to start on a
word boundary. The usual application of byte pointers is
handling 7-bit bytes, i.e characters, but others are used as
well. In the source code for ITS TECO (the base system for the
*real* Emacs), I recall that 36-bit pointers are used for some 
purpose.

The autoincrementing versions of LDB and DPB (ILDB and IDPB)
increment the pointer before accessing the byte. There is also
an IBP (increment byte pointer). (Actually ILDB and IDPB are
just IBP followed by LDP and DPB). There are no autodecrementing
modes, but later versions of the PDP-10 have an ADJBP (adjust
byte pointer) which adjusts a byte pointer an arbitary amount
(forwards or backwards). ADJBP doesn't fit in very well with
the others, though. It seems like a piece of happy hacking in 
the microcode.

   {decvax,philabs}!mcvax!enea!ttds!arndt

ken@turtlevax.UUCP (Ken Turkowski) (10/15/84)

High performance raster graphics hardware uses frame buffer
tessellation to gain access to multiple pixels on arbitrary
two-dimensional boundaries.  These pixels may be horizontally-,
vertically-, or block-contiguous.  Some references are:

    Rodney Stock, "Graphics Animation Hardware", notes for the SIGGRAPH
    1983 State-of-the-Art in Image Synthesis tutorial.

    Thomas Porter & Rodney Stock, "Image Composition", notes for the
    SIGGRAPH 1984 State-of-the-Art in Image Synthesis tutorial.

    Mary Whitton, "Memory Design for Raster Graphics Displays", IEEE
    Computer Graphics and Applications, March 1984, vol. 4, no. 3, pp.
    48-64.

The R. Stock papers may be difficult to get ahold of, because they have
only been distributed as tutorial notes, although you may be able to
write Tom Porter for any papers he has on frame buffer tessellation at:

    Thomas Porter
    Lucasfilm, Ltd.
    Computer Graphics Division
    P.O. Box 2009
    San Raphael, CA 94912

The M. Whitton paper is more comprehensive, so much so that it gives
away the secrets of frame buffer design so that anyone can design a
good one.

-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

gnu@sun.uucp (John Gilmore) (10/16/84)

>                                                            I sure 
> wish some of todays processors were so talented and didn't need
> such archaic things such as byte, and word alignment with bytes
> fixed at 8 bits.
> --Chuck

I thought people (even at Intel :-) ) would know by now that the 68020
has even better bit-string-manipulation facilities than the DEC 10/20.
The 10's required the bit string to fit in a single memory word; the
68020 allows totally arbitrary alignment.  The 68020 also takes a byte
address as the base, and adds a 32-bit signed bit number; if you don't
need the byte addressas e.g. an array base, you can still address up to
2Gbits or 256Mbytes with simple sequential bit numbers (easy for
hardware since the word size is a power of 2).  There are 8
instructions that work with such bitfields: load (signed & unsigned),
store, set (to ones), clear, invert, test, and find-first-one-bit.

Note that such instructions have more overhead (both on the 10 and the
68020) than their simpler relations (eg load a whole aligned word).
There's still room for fixed-size bytes in an architecture -- the 432
proved that.

crandell@ut-sally.UUCP (Jim Crandell) (10/16/84)

> Actually, the 1620 addressed its digits in even/odd pairs and, although
> an address had no restriction to be even or odd, there was a performance
> gain by aligning on a pair boundary.  I guess that makes it a decadent
> machine (I know several people who would agree with that estimation).

Almost.  Instructions had to start at even addresses, and the performance
advantage (which was all of 10 musec -- 1 cycle -- on, for example,
TF and TR) applied only to the Model II.
-- 

    Jim Crandell, C. S. Dept., The University of Texas at Austin
               {ihnp4,seismo,ctvax}!ut-sally!crandell

phil@unisoft.UUCP (Phil Ronzone) (10/19/84)

Now that the 7030 has been mentioned .....

My second most favorite machine that-I've-never-programmed is the 1700.

My most favorite machine that-I've-never-programmed is the IBM 7030.

The 7030 realized as a single-chipper today would be most impressive.
I'm still trying to verify if (as IBM claims) that the 7030 first introduced
the word ``byte''. Any comments?

sysad@tikal.UUCP (sysad) (10/20/84)

>>A note about the Pyramid 90x.  After further discussion between
>>software and hardware engineers, it was determined that it wouldn't be
>>very difficult or expensive (speedwise) to implement almost arbitrary
>>byte alignment (e.g. longwords accessed on any even address) in the
>>microcode.  Pyramid will be offering this microcode change to its'
>>customers real soon now.
>>
>>I think this implies that arbitrary byte alignment does not necessarily
>>imply a performance penalty in the global throughput of a machine.
>>
>>I know compilers can be taught to do alignment, but many programmers
>>using C's simple address arithmetic mechanisms, can't be. :-)

I can tell you from recent bitter experience that Pyramid will NOT be
offering this microcode change to ANYONE, anytime.  It seems that when
HHB Softron decided to re-port its code, Pyramid decided byte-alignment
was too hard.  Apparently they also decided not to tell anyone.

	Pyramid "software and hardware engineers" also seem to believe
that compilers cannot easily be taught to do alignment.  If you are
expecting arbitrary byte alignment out of Pyramid, you'd better ask
them again.

Duane Hesser
...uw-beaver!tikal!sysad

wls@astrovax.UUCP (William L. Sebok) (10/20/84)

> >                                                            I sure 
> > wish some of todays processors were so talented and didn't need
> > such archaic things such as byte, and word alignment with bytes
> > fixed at 8 bits.
> > --Chuck
> 
> I thought people (even at Intel :-) ) would know by now that the 68020
> has even better bit-string-manipulation facilities than the DEC 10/20.
> The 10's required the bit string to fit in a single memory word; the
> 68020 allows totally arbitrary alignment.  The 68020 also takes a byte
> address as the base, and adds a 32-bit signed bit number; if you don't
> need the byte addressas e.g. an array base, you can still address up to
> 2Gbits or 256Mbytes with simple sequential bit numbers (easy for
> hardware since the word size is a power of 2).  There are 8
> instructions that work with such bitfields: load (signed & unsigned),
> store, set (to ones), clear, invert, test, and find-first-one-bit.

Since no one has mentioned it yet I think that I should say that the Vax
also has such bit string instructions that let one address a bit string of 1
to 32 bits with arbitrary alignment with respect to word boundaries.  A bit
address consists of a byte address base and a signed 32 bit offset from the
base.  Instructions provided are FFS (find first bit set), FFC (find first bit
clear, EXTV (extract bit field sign extended), EXTVZ (extract bit field zero
extended), CMPV (compare sign extended bit field to integer), CMPVZ (compare
zero extended bit field to integer), and INSV (move integer to bit field).
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls

greg@sdcsvax.UUCP (Greg Noel) (10/22/84)

In article <393@ism780.UUCP> darryl@ism780.UUCP writes:
>Actually, the 1620 addressed its digits in even/odd pairs and, although
>an address had no restriction to be even or odd, there was a performance
>gain by aligning on a pair boundary.  I guess that makes it a decadent
>machine (I know several people who would agree with that estimation).
>	    --Darryl Richman

But I'm one who would disagree......  It was an interesting machine, and
it became my first love.  (Now you know what's wrong with me!)

Actually, even though Darryl is correct that the 1620 fetched in even/odd
pairs, only instruction fetches were optimized to take advantage of this.
(Instructions were twelve digits long and had to be aligined on an even
address.)  Data fetches still fetched each digit individually.  Later in
its evolution, the 1620 Mod II was better at optimizing the references;
it essentially had a (four-digit?) cache, and it could get the second digit
from the same pair in about one-quarter the "normal" access time.
-- 
-- Greg Noel, NCR Torrey Pines       Greg@sdcsvax.UUCP or Greg@nosc.ARPA

jans@mako.UUCP (Jan Steinman) (10/29/84)

In article <astrovax.475> wls@astrovax.UUCP (William L. Sebok) quotes, writes:
>> ...68020 allows totally arbitrary alignment.  The 68020 also takes a byte
>> address as the base, and adds a 32-bit signed bit number; if you don't
>> need the byte addressas e.g. an array base, you can still address up to
>> 2Gbits or 256Mbytes with simple sequential bit numbers (easy for
>> hardware since the word size is a power of 2).
>
>Since no one has mentioned it yet I think that I should say that the Vax
>also has such bit string instructions that let one address a bit string of 1
>to 32 bits with arbitrary alignment with respect to word boundaries.  A bit
>address consists of a byte address base and a signed 32 bit offset from the
>base.

Please add to the list the NS32000 chips.  National >almost< did it right...
The general form is a base and a signed, 30 bit, bit offset from that base.
(Offsets also come in signed 14 and 7 bit lengths for memory conservation.)
A useful instruction (CVTP) generates the absolute address for such an item.
(Kinda like an LEA for bits.)

Why do I say "almost" did it right?  The instruction set is not quite
orthogonal when it comes to >bit fields<.  While still capable of arbitrary
alignment, the field may not span more than four bytes and may not have an
immediate operand for offset!  One use of bit fields is to break up imposed
data structures, such as hardware registers, or comm strings.  Such data
structures seldom have dynamic alignment and a static, immediate operand
offset to the beginning of the bit field would be useful.

FLAME PROOF YOUR ARTICLE TODAY!  Before you protest, note that the Extract
Field Short (EXTSi) instruction is limited to a 3 bit, bit offset, which means
you must have previously obtained the byte address of the target bit field.
Useable in a pinch, but it sure smells like rubber cement!  Incomplete
orthogonality: you must use one of two instructions depending on the size of
the target.

-- 
:::::: Jan Steinman		Box 1000, MS 61-161	(w)503/685-2843 ::::::
:::::: tektronix!tekecs!jans	Wilsonville, OR 97070	(h)503/657-7703 ::::::