johnny@edvvie.at (Johann Schweigl) (09/26/89)
Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler
After two night's hunting for a bug the big enlightment came over me; 
with it came the remembrance of the old law 'thou shalt write four byte
integers to word boundaries'. 
The story was as follows:
I'm producing an output stream, consisting of an int, containing the
length of the following string, and the string, this repeated for every string to be written. The string has arbitrary length, so the following int (4 bytes) 
can be at any adress, even or odd, word or not. 
Not paying attention to alignment rules, the tail of the string 
would be destroyed by the following int. 
That's it. The CPU writes every int to a word boundary <= the actual adress.
This is the assembly code:
	...
#       	*msgBufCurr.Integer = curColLen;
	l	4,8+L.1L(1)    	# load msgBufCurr.Integer into R4
	l	3,12+L.1L(1)    # load curColLen into R3
	st	3,0(4)    	# store R3 to *msgBufCurr.Integer
	...
Nothing to see from outside the CPU.
The thing that's very suspect to me is, that the CPU simply aligns the adress
internally and writes the int to the new, aligned adress.
I tried the same on my '386 AIX machine, and, whistle and bells, this one
does not take care of anything. If you write an 4 byte int to any address,
odd and wherever you want, the CPU does it.
This leads me to the final questions: 
- is it acceptable that the CPU changes the adress you delivered without any
  warning and does something you wouldn't expect
- how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS)
- would you prefer getting an 'alignment violation trap' or something like this
- does any CPU implement such a trap
Besides this discussion I would like to follow on the net (if there is any 
response) I include the C program source I used to proof my shame. If
you've got any of the above CPU's or another weirdo, and have a bit of time to
spend, please compile it, and email me the output of the program, your CPU type
and the assembler listing of the program. Just because I love to read assembler
listings of CPU's I don't know.
Thank you.
----- start of code ----------------------------------------------------------
#include <stdio.h>
#include <ctype.h>
void memHexDump();
union _ptr {
	int  	*Integer;
	char	*Character;
};
typedef union _ptr	ptr;
main()
{
	int 	iArr[4];
	ptr	foo;
	ptr	bar;
	iArr[0] = 0;
	iArr[1] = 1;
	iArr[2] = 2;
	iArr[3] = 3;
	foo.Integer = iArr;
	bar.Integer = iArr;
	memHexDump(foo.Character,16,"iArr[4] before hacking around");
	foo.Character += 5; /* Har har ack ack barf barf */
	*foo.Integer = -1;  /* 0xffff, a nice pattern    */ 
	memHexDump(bar.Character,16,"iArr[4] after hacking around");
}
void memHexDump(source,n,name)
char *source;
int n;
char *name;
{
	register int 	i;
	static char	hexChars[] = "0123456789abcdef";
	printf("memHexDump: %d bytes dump of %s\n",n,name);
	printf("memHexDump: starting at address %08x\n",source);
	for (i = 0; i < n; i++) {
		putchar(hexChars[i % 15]);
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(isprint(*(source + i)) ? *(source + i) : '.');
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(hexChars[(*(source + i) & (char)0xf0) >> 4]);
	}
	putchar('\n');
	for (i = 0; i < n; i++) {
		putchar(hexChars[*(source + i) & (char)0x0f]);
	}
	putchar('\n');
}
-- 
       ------------------------------------------------------------------
       EDV Ges.m.b.H Vienna              Johann Schweigl    
       Hofmuehlgasse 3 - 5               USENET: johnny@edvvie.at
       A-1060 Vienna, Austria      Tel: (0043) (222) 59907 257 (8-19 CET)tim@cayman.amd.com (Tim Olson) (10/02/89)
In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes: | This leads me to the final questions: | - is it acceptable that the CPU changes the adress you delivered without any | warning and does something you wouldn't expect I don't think it is acceptable if there is no other option. However, this behaviour is potentially useful (the lower address bits may be used as tags for dynamic data-typing systems). | - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) | - would you prefer getting an 'alignment violation trap' or something like this | - does any CPU implement such a trap The Am29000 implements an Unaligned Access Trap enable bit (TU) in the protected Current Processor Status Register which enables this trap on a process-by-process basis. If enabled, unaligned word and half-word accesses cause an Unaligned Access trap, placing the offending accesses' virtual address, data, and control information in special registers for use in the trap handler. The handler can be written to either abort the process (SIGSEGV) or emulate the transfer and return. -- Tim Olson Advanced Micro Devices (tim@amd.com)
luner@werewolf.CS.WISC.EDU (David L. Luner) (10/02/89)
In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes: >[... Integers must be word-aligned on an RT...] > >[ ... but not on a '386 ...] >... >This leads me to the final questions: >- is it acceptable that the CPU changes the adress you delivered without any > warning and does something you wouldn't expect >- how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) >- would you prefer getting an 'alignment violation trap' or something like this >- does any CPU implement such a trap > The full-word alignment restriction is due to the hardware design. The last time I looked at this problem (someone's program was dying with the usual "bus error, core dumped" message), I recall that AIX trapped the error and produced the message (rather than altering the destination address so things worked). It may be that under the current release of AIX the kernel traps the error and patches things so they work, albeit apparently incorrectly. If the is the case, you should report the problem to IBM. The restriction, I am told, is very common for RISC processors. To wit, I believe that SUN SPARCstations have the same "problem". -- David
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/03/89)
In article <162@eliza.edvvie.at> johnny@edvvie.at (Johann Schweigl) writes: >The thing that's very suspect to me is, that the CPU simply aligns the adress >internally and writes the int to the new, aligned adress. Yes, that is what the ROMP does. I wrote the original ROMP AIX C compiler and assembler. There's no doubt that for debugging it would have been somewhat easier if the processor faulted on a misaligned address rather than just ignoring the low bits, but it wasn't all that tough. I gather that the ROMP's designers found that they could speed things up by leaving out the alignment check. Other than in code written by the "computer == Vax" crowd, or perhaps the "computer == PC" crowd, misalignment is not a very big problem in practice. Most other RISC CPUs fault on misaligned addresses. Most CISC CPUs accept misaligned addresses at some loss in performance relative to aligned addresses. The ROMP's behavior is a little surprising, but not unreasonable. As noted elsewhere, one could use the low two address bits as tags of some sort, though I haven't seen a Lisp system that does so. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu Massachusetts has 64 licensed drivers who are over 100 years old. -The Globe
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/03/89)
In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes: | This leads me to the final questions: | - is it acceptable that the CPU changes the adress you delivered without any | warning and does something you wouldn't expect That's up to you to decide. If I were writing portable code to do this (and I have) I would use a simple output routine for machines which force allignment. | - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) The GE600/6000 (now Honeywell DPS) series did this for double access. The LSB was dropped in the address evaluation. What you are seeing is the dropping of the two LSBs. | - would you prefer getting an 'alignment violation trap' or something | like this It would prevent obscure programming errors. It would probably break a lot of "working programs" if added as an FCO. | - does any CPU implement such a trap -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
mash@mips.COM (John Mashey) (10/04/89)
In article <754@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: >In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes: > >| This leads me to the final questions: >| - is it acceptable that the CPU changes the adress you delivered without any >| warning and does something you wouldn't expect > That's up to you to decide. If I were writing portable code to do this >(and I have) I would use a simple output routine for machines which >force allignment. > >| - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) Those RISCs all trap on unaligned accesses, as does HP PA. MIPS has 4 instructions for doing unaligned 32-bit load/stores; HP PA has an unaligned store bytes operation. With relatively few exceptions, CPUs either: a) Trap on unaligned (most RISCs; S/360) OR b) Complete the access, crossing boundaries as needed. Many CISCs; S/370 & later. Some combinations exist. For example, 68000s would trap a 16-bit (word) reference on an odd boundary, but not trap a 32-bit (longword) reference on a (word, but not longword) boundary, which occasionally caused performance pain for 68020s, which would allow accesses to longwords on any boundary, but were of course slower when accessing them unaligned. Some 68K C compilers packed structures so that longs often showed up on non-long boundaries. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
webb@bass.tcspa.ibm.com (Bill Webb) (10/04/89)
> > Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler > > After two night's hunting for a bug the big enlightment came over me; > with it came the remembrance of the old law 'thou shalt write four byte > integers to word boundaries'. >... (I know you were using AIX, but I'm not enough of an AIX/RT user to know if there is an equivalent document for AIX - one problem with a shelf full of manuals is finding things!). ... 5. ALL MEMORY REFERENCES ARE ALIGNED Word and half-word data are stored most significant byte first and aligned on natrual boundaries. Off-boundary store references are not supported. The two low or one address bits are silently ignored, creating unexpected results. If lint(1) is run against such programs, it complains about a "possible alignment problem" ... > - is it acceptable that the CPU changes the adress you delivered without any > warning and does something you wouldn't expect > - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) > - would you prefer getting an 'alignment violation trap' or something like this > - does any CPU implement such a trap > ------------------------------------------------------------------ > EDV Ges.m.b.H Vienna Johann Schweigl > Hofmuehlgasse 3 - 5 USENET: johnny@edvvie.at > A-1060 Vienna, Austria Tel: (0043) (222) 59907 257 (8-19 CET) Your final points get into the area of "what should happen with non-portable code is used". Other similar cases are "what is the value of * (char *) 0?" and ''what is the value of * (short *) "ab"?``. If one uses non-portable code, then you are at the mercy of the hardware/software designers as to what you get. I won't argue with the assertion that it is usually desirable to get a trap rather than silently ignoring the low-order bits. However, it is generally the case that RISC processors put more demands on the programmer and compiler since less features are implemented in silicon. Newer processors generally either implement off-boundary fetches or provide the traps that you suggest, but have more room on the chips in which to do so. If the RT was being designed today I'm sure that it would have implemented an Interrupt on Unaligned Access bit in the ICS register. Bill Webb (IBM AWD Palo Alto, (415) 855-4457). ...!uunet!ibmsupt!webb All opinions expressed above are my own, and quite often not those of my employer.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/04/89)
In article <28697@winchester.mips.COM>, mash@mips.COM (John Mashey) writes: | them unaligned. Some 68K C compilers packed structures so that | longs often showed up on non-long boundaries. A good point! The Microsoft C compilers allow selection of packing on 1, 2, or 4 byte level, with the default being whatever is best for the native hardware. Letting the CPU access packed structures is a lot faster than unpacking by code, although I have to keep the code for other machines. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/05/89)
In article <162@eliza.edvvie.at>, johnny@edvvie.at (Johann Schweigl) writes: > Environment: IBM PC/RT, ROMP RISC CPU, AIX 2.2.1, standard AIX C compiler ... > After two night's hunting for a bug the big enlightment came over me; > with it came the remembrance of the old law 'thou shalt write four byte > integers to word boundaries'. ... > The thing that's very suspect to me is, that the CPU simply aligns the adress > internally and writes the int to the new, aligned adress. > > I tried the same on my '386 AIX machine, and, whistle and bells, this one > does not take care of anything. If you write an 4 byte int to any address, > odd and wherever you want, the CPU does it. > > This leads me to the final questions: > - is it acceptable that the CPU changes the adress you delivered without any > warning and does something you wouldn't expect > - how do other CPU's behave (eg. 88000, 68000, SPARC, MIPS) > - would you prefer getting an 'alignment violation trap' or something like this > - does any CPU implement such a trap Aha! You too have fallen pray to this nefarious feature! One of my co-workers and I spent *hours* looking for an obscure bug in some code that was running on an Intergraph Clipper workstation. That processor can only write double precision values to 8 byte aligned words (did that make sense?). ie. addresses ending in 0 or 8 hex. If you try to write it elsewhere it thoughtfully strips the lower address bits and stores it at the nearest lower oct byte aligned address. This restriction may be limited to values stored on the stack, I'm not sure. In any case, the CPU gives no error trap, and it is up to the programmer to figure it out. While this feature is documented, it can be annoying if one is doing sorcery. In most cases the compiler takes care of everything for you, but it can be fooled. Yes, I wish a trap was generated. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
johnny@edvvie.at (Johann Schweigl) (10/06/89)
From article <2396@ibmpa.UUCP>, by webb@bass.tcspa.ibm.com (Bill Webb): > (I know you were using AIX, but I'm not enough of an AIX/RT user to know > if there is an equivalent document for AIX - one problem with a shelf full > of manuals is finding things!). Your'e right. The Assembler Language Reference for st R1,D2(R2) says "The effective address formed from D2 + 0/(R2) will have it's low-order two bits forced to zero." > > If lint(1) is run against such programs, it complains about a "possible > alignment problem" Maybe lint catches incosistent pointer usage, I'll try. In case of the example I posted to the net it just says: align.c ============== (30) warning: main() returns random value to invocation environment ============== function argument ( number ) used inconsistently putchar( arg 1 ) llib-lc(525) :: align.c(44) putchar( arg 1 ) llib-lc(525) :: align.c(48) isprint( arg 1 ) llib-lc(256) :: align.c(48) putchar( arg 1 ) llib-lc(525) :: align.c(52) putchar( arg 1 ) llib-lc(525) :: align.c(56) function returns value which is always ignored printf putchar Remember that each member of the union _ptr is used according to the rules for it's type. > Your final points get into the area of "what should happen with non-portable > code is used". Other similar cases are "what is the value of * (char *) 0?" > and ''what is the value of * (short *) "ab"?``. If one uses non-portable code, > then you are at the mercy of the hardware/software designers as to what you > get. > > I won't argue with the assertion that it is usually desirable to get a trap > rather than silently ignoring the low-order bits. ... Portability has more faces than are generally are talked about. One more for me, that I didn't think of earlyer. It's ok that the CPU is designed for maximum performance, I just think that a trap on illegal aligned accesses would preserve performance AND make porting a bit easier. Bye, johnny -- This does not reflect the | Johann Schweigl | DOS machines? opinions of my employer. | johnny@edvvie.at | I don't hate DOS machines. I am busy enough by talking | | I just feel better when I about my own ... | EDVG Vienna | don't see one ...
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/12/89)
In article <2396@ibmpa.UUCP>, webb@bass.tcspa.ibm.com (Bill Webb) writes: > Your final points get into the area of "what should happen with non-portable > code is used". Other similar cases are "what is the value of * (char *) 0?" > and ''what is the value of * (short *) "ab"?``. If one uses non-portable code, > then you are at the mercy of the hardware/software designers as to what you > get. The point is arguable (why else would be discussing it :^), but I disagree. This seems to me to be no less of an unusual event than a divide by zero and no more code dependent. ie. it is possible to generate faulty code that at execution time (when it can't be detected by the compiler) causes an arithmetic exception. Similarly addresses can be calculated incorrectly. Certainly perfect code would never need either kind of hardware support, but in the real world . . . In any case, it would seem to me that the number of extra gates to implement this feature would be very small, even for a RISC chip. There are things that should be left out of a RISC chip; things that the compiler can do. Features that should be included are things that the compiler has little chance of doing. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb