[comp.arch] integer alignment problems on RT

johnny@edvvie.at (Johann Schweigl) (09/26/89)

Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler

After two night's hunting for a bug the big enlightment came over me; 
with it came the remembrance of the old law 'thou shalt write four byte
integers to word boundaries'. 

The story was as follows:
I'm producing an output stream, consisting of an int, containing the
length of the following string, and the string, this repeated for every string to be written. The string has arbitrary length, so the following int (4 bytes) 
can be at any adress, even or odd, word or not. 
Not paying attention to alignment rules, the tail of the string 
would be destroyed by the following int. 
That's it. The CPU writes every int to a word boundary <= the actual adress.
This is the assembly code:
	...

#       	*msgBufCurr.Integer = curColLen;
	l	4,8+L.1L(1)    	# load msgBufCurr.Integer into R4
	l	3,12+L.1L(1)    # load curColLen into R3
	st	3,0(4)    	# store R3 to *msgBufCurr.Integer
	...

Nothing to see from outside the CPU.
The thing that's very suspect to me is, that the CPU simply aligns the adress
internally and writes the int to the new, aligned adress.

I tried the same on my '386 AIX machine, and, whistle and bells, this one
does not take care of anything. If you write an 4 byte int to any address,
odd and wherever you want, the CPU does it.

This leads me to the final questions: 
- is it acceptable that the CPU changes the adress you delivered without any
  warning and does something you wouldn't expect
- how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
- would you prefer getting an 'alignment violation trap' or something like this
- does any CPU implement such a trap

Besides this discussion I would like to follow on the net (if there is any 
response) I include the C program source I used to proof my shame. If
you've got any of the above CPU's or another weirdo, and have a bit of time to
spend, please compile it, and email me the output of the program, your CPU type
and the assembler listing of the program. Just because I love to read assembler
listings of CPU's I don't know.

Thank you.
----- start of code ----------------------------------------------------------

#include <stdio.h>
#include <ctype.h>

void memHexDump();

union _ptr {
	int  	*Integer;
	char	*Character;
};

typedef union _ptr	ptr;

main()
{
	int 	iArr[4];
	ptr	foo;
	ptr	bar;

	iArr[0] = 0;
	iArr[1] = 1;
	iArr[2] = 2;
	iArr[3] = 3;

	foo.Integer = iArr;
	bar.Integer = iArr;
	memHexDump(foo.Character,16,"iArr[4] before hacking around");
	foo.Character += 5; /* Har har ack ack barf barf */
	*foo.Integer = -1;  /* 0xffff, a nice pattern    */ 
	memHexDump(bar.Character,16,"iArr[4] after hacking around");
}

void memHexDump(source,n,name)
char *source;
int n;
char *name;
{
	register int 	i;
	static char	hexChars[] = "0123456789abcdef";

	printf("memHexDump: %d bytes dump of %s\n",n,name);
	printf("memHexDump: starting at address %08x\n",source);
	for (i = 0; i < n; i++) {
		putchar(hexChars[i % 15]);
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(isprint(*(source + i)) ? *(source + i) : '.');
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(hexChars[(*(source + i) & (char)0xf0) >> 4]);
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(hexChars[*(source + i) & (char)0x0f]);
	}
	putchar('\n');
}
-- 
       ------------------------------------------------------------------
       EDV Ges.m.b.H Vienna              Johann Schweigl    
       Hofmuehlgasse 3 - 5               USENET: johnny@edvvie.at
       A-1060 Vienna, Austria      Tel: (0043) (222) 59907 257 (8-19 CET)

tim@cayman.amd.com (Tim Olson) (10/02/89)

In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes:
| This leads me to the final questions: 
| - is it acceptable that the CPU changes the adress you delivered without any
|   warning and does something you wouldn't expect

I don't think it is acceptable if there is no other option.  However,
this behaviour is potentially useful (the lower address bits may be
used as tags for dynamic data-typing systems).

| - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
| - would you prefer getting an 'alignment violation trap' or something like this
| - does any CPU implement such a trap

The Am29000 implements an Unaligned Access Trap enable bit (TU) in the
protected Current Processor Status Register which enables this trap on
a process-by-process basis.  If enabled, unaligned word and half-word
accesses cause an Unaligned Access trap, placing the offending
accesses' virtual address, data, and control information in special
registers for use in the trap handler.  The handler can be written to
either abort the process (SIGSEGV) or emulate the transfer and return.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

luner@werewolf.CS.WISC.EDU (David L. Luner) (10/02/89)

In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes:
>[... Integers must be word-aligned on an RT...]
>
>[ ... but not on a '386 ...]
>...
>This leads me to the final questions: 
>- is it acceptable that the CPU changes the adress you delivered without any
>  warning and does something you wouldn't expect
>- how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
>- would you prefer getting an 'alignment violation trap' or something like this
>- does any CPU implement such a trap
>

The full-word alignment restriction is due to the hardware design. The
last time I looked at this problem (someone's program was dying with
the usual "bus error, core dumped" message), I recall that AIX trapped
the error and produced the message (rather than altering the
destination address so things worked). It may be that under the current
release of AIX the kernel traps the error and patches things so they
work, albeit apparently incorrectly. If the is the case, you should
report the problem to IBM.

The restriction, I am told, is very common for RISC processors. To wit, I
believe that SUN SPARCstations have the same "problem".

	-- David

johnl@esegue.segue.boston.ma.us (John R. Levine) (10/03/89)

In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes:
>The thing that's very suspect to me is, that the CPU simply aligns the adress
>internally and writes the int to the new, aligned adress.

Yes, that is what the ROMP does.  I wrote the original ROMP AIX C compiler
and assembler.  There's no doubt that for debugging it would have been
somewhat easier if the processor faulted on a misaligned address rather than
just ignoring the low bits, but it wasn't all that tough.  I gather that the
ROMP's designers found that they could speed things up by leaving out the
alignment check.  Other than in code written by the "computer == Vax" crowd,
or perhaps the "computer == PC" crowd, misalignment is not a very big problem
in practice.

Most other RISC CPUs fault on misaligned addresses.  Most CISC CPUs accept
misaligned addresses at some loss in performance relative to aligned
addresses.  The ROMP's behavior is a little surprising, but not unreasonable.
As noted elsewhere, one could use the low two address bits as tags of some
sort, though I haven't seen a Lisp system that does so.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu
Massachusetts has 64 licensed drivers who are over 100 years old.  -The Globe

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/03/89)

In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes:

|  This leads me to the final questions: 
|  - is it acceptable that the CPU changes the adress you delivered without any
|    warning and does something you wouldn't expect

  That's up to you to decide. If I were writing portable code to do this
(and I have) I would use a simple output routine for machines which
force allignment.

|  - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)

  The GE600/6000 (now Honeywell DPS) series did this for double access.
The LSB was dropped in the address evaluation. What you are seeing is
the dropping of the two LSBs.

|  - would you prefer getting an 'alignment violation trap' or something 
|    like this

  It would prevent obscure programming errors. It would probably break a
lot of "working programs" if added as an FCO.

|  - does any CPU implement such a trap
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

mash@mips.COM (John Mashey) (10/04/89)

In article <754@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes:
>In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes:
>
>|  This leads me to the final questions: 
>|  - is it acceptable that the CPU changes the adress you delivered without any
>|    warning and does something you wouldn't expect
>  That's up to you to decide. If I were writing portable code to do this
>(and I have) I would use a simple output routine for machines which
>force allignment.
>
>|  - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)

Those RISCs all trap on unaligned accesses, as does HP PA.
MIPS has 4 instructions for doing unaligned 32-bit load/stores;
HP PA has an unaligned store bytes operation.

With relatively few exceptions, CPUs either:
	a) Trap on unaligned (most RISCs; S/360)
	OR
	b) Complete the access, crossing boundaries as needed.
	Many CISCs; S/370 & later.

Some combinations exist.  For example, 68000s would trap a 16-bit
(word) reference on an odd boundary, but not trap a 32-bit (longword)
reference on a (word, but not longword) boundary, which occasionally
caused performance pain for 68020s, which would allow accesses
to longwords on any boundary, but were of course slower when accessing
them unaligned.  Some 68K C compilers packed structures so that
longs often showed up on non-long boundaries.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

webb@bass.tcspa.ibm.com (Bill Webb) (10/04/89)

> 
> Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler
> 
> After two night's hunting for a bug the big enlightment came over me; 
> with it came the remembrance of the old law 'thou shalt write four byte
> integers to word boundaries'. 
>...

(I know you were using AIX, but I'm not enough of an AIX/RT user to know
if there is an equivalent document for AIX - one problem with a shelf full
of manuals is finding things!).

...

5. ALL MEMORY REFERENCES ARE ALIGNED
 Word and half-word data are stored most significant byte first and aligned on
 natrual boundaries. Off-boundary store references are not supported. The two
 low or one address bits are silently ignored, creating unexpected results.

 If lint(1) is run against such programs, it complains about a "possible
 alignment problem"

...

> - is it acceptable that the CPU changes the adress you delivered without any
>   warning and does something you wouldn't expect
> - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
> - would you prefer getting an 'alignment violation trap' or something
like this
> - does any CPU implement such a trap
>        ------------------------------------------------------------------
>        EDV Ges.m.b.H Vienna              Johann Schweigl    
>        Hofmuehlgasse 3 - 5               USENET: johnny@edvvie.at
>        A-1060 Vienna, Austria      Tel: (0043) (222) 59907 257 (8-19 CET)

Your final points get into the area of "what should happen with non-portable
code is used". Other similar cases are "what is the value of * (char *) 0?"
and ''what is the value of * (short *) "ab"?``. If one uses non-portable code,
then you are at the mercy of the hardware/software designers as to what you
get.

I won't argue with the assertion that it is usually desirable to get a trap
rather than silently ignoring the low-order bits. However, it is generally
the case that RISC processors put more demands on the programmer and compiler
since less features are implemented in silicon. Newer processors generally
either implement off-boundary fetches or provide the traps that you suggest,
but have more room on the chips in which to do so. If the RT was being 
designed today I'm sure that it would have implemented an Interrupt on 
Unaligned Access bit in the ICS register.

Bill Webb (IBM AWD Palo Alto, (415) 855-4457).  ...!uunet!ibmsupt!webb
All opinions expressed above are my own, and quite often not those of
my employer.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/04/89)

In article <28697@winchester.mips.COM>, mash@mips.COM (John Mashey) writes:

|  them unaligned.  Some 68K C compilers packed structures so that
|  longs often showed up on non-long boundaries.

  A good point! The Microsoft C compilers allow selection of packing on
1, 2, or 4 byte level, with the default being whatever is best for the
native hardware. Letting the CPU access packed structures is a lot
faster than unpacking by code, although I have to keep the code for
other machines.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/05/89)

In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes:
> Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler
...
> After two night's hunting for a bug the big enlightment came over me; 
> with it came the remembrance of the old law 'thou shalt write four byte
> integers to word boundaries'. 
...
> The thing that's very suspect to me is, that the CPU simply aligns the adress
> internally and writes the int to the new, aligned adress.
> 
> I tried the same on my '386 AIX machine, and, whistle and bells, this one
> does not take care of anything. If you write an 4 byte int to any address,
> odd and wherever you want, the CPU does it.
> 
> This leads me to the final questions: 
> - is it acceptable that the CPU changes the adress you delivered without any
>   warning and does something you wouldn't expect
> - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
> - would you prefer getting an 'alignment violation trap' or something like this
> - does any CPU implement such a trap

Aha!  You too have fallen pray to this nefarious feature!  One of my co-workers
and I spent *hours* looking for an obscure bug in some code that was running
on an Intergraph Clipper workstation.  That processor can only write double precision
values to 8 byte aligned words (did that make sense?).  ie. addresses ending in 0 or
8 hex.  If you try to write it elsewhere it thoughtfully strips the lower address
bits and stores it at the nearest lower oct byte aligned address.  This restriction
may be limited to values stored on the stack, I'm not sure.  In any case, the
CPU gives no error trap, and it is up to the programmer to figure it out.

While this feature is documented, it can be annoying if one is doing sorcery.
In most cases the compiler takes care of everything for you, but it can be
fooled.  Yes, I wish a trap was generated.

-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

johnny@edvvie.at (Johann Schweigl) (10/06/89)

From article <2396@ibmpa.UUCP>, by webb@bass.tcspa.ibm.com (Bill Webb):
> (I know you were using AIX, but I'm not enough of an AIX/RT user to know
> if there is an equivalent document for AIX - one problem with a shelf full
> of manuals is finding things!).

Your'e right. The Assembler Language Reference for st R1,D2(R2) says 
	"The effective address formed from D2 + 0/(R2) will have it's low-order
        two bits forced to zero."
> 
>  If lint(1) is run against such programs, it complains about a "possible
>  alignment problem"

   Maybe lint catches incosistent pointer usage, I'll try. In case of the
   example I posted to  the net it just says:

align.c
==============
(30)  warning: main() returns random value to invocation environment


==============
function argument ( number ) used inconsistently
    putchar( arg 1 )   	llib-lc(525) :: align.c(44)
    putchar( arg 1 )   	llib-lc(525) :: align.c(48)
    isprint( arg 1 )   	llib-lc(256) :: align.c(48)
    putchar( arg 1 )   	llib-lc(525) :: align.c(52)
    putchar( arg 1 )   	llib-lc(525) :: align.c(56)
function returns value which is always ignored
    printf	    putchar	

Remember that each member of the union _ptr is used according to the rules
for it's type.

> Your final points get into the area of "what should happen with non-portable
> code is used". Other similar cases are "what is the value of * (char *) 0?"
> and ''what is the value of * (short *) "ab"?``. If one uses non-portable code,
> then you are at the mercy of the hardware/software designers as to what you
> get.
> 
> I won't argue with the assertion that it is usually desirable to get a trap
> rather than silently ignoring the low-order bits. ...

Portability has more faces than are generally are talked about. 
One more for me, that I didn't think of earlyer.
It's ok that the CPU is designed for maximum performance, I just think that 
a trap  on illegal aligned accesses would preserve performance AND make porting
a bit easier.

Bye, johnny
-- 
This does not reflect the   | Johann  Schweigl | DOS machines? 
opinions of my employer.    | johnny@edvvie.at | I don't hate DOS machines. 
I am busy enough by talking |                  | I just feel better when I
about my own ...            |   EDVG  Vienna   | don't see one ...

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/12/89)

In article <2396@ibmpa.UUCP>, webb@bass.tcspa.ibm.com (Bill Webb) writes:
> Your final points get into the area of "what should happen with non-portable
> code is used". Other similar cases are "what is the value of * (char *) 0?"
> and ''what is the value of * (short *) "ab"?``. If one uses non-portable code,
> then you are at the mercy of the hardware/software designers as to what you
> get.

The point is arguable (why else would be discussing it :^), but I disagree.
This seems to me to be no less of an unusual event than a divide by zero
and no more code dependent.  ie. it is possible to generate faulty code
that at execution time (when it can't be detected by the compiler) causes
an arithmetic exception.  Similarly addresses can be calculated incorrectly.
Certainly perfect code would never need either kind of hardware support,
but in the real world . . .

In any case, it would seem to me that the number of extra gates to implement
this feature would be very small, even for a RISC chip.  There are things
that should be left out of a RISC chip; things that the compiler can do.
Features that should be included are things that the compiler has little
chance of doing.

-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb