[comp.arch] RISC Machine Data Structure Word Alignment Problems?

toppin@melpar.UUCP (Doug Toppin X2075) (01/20/90)

We are using the SUN 4/260 which is a RISC architecture machine.
We are having trouble with data alignment in our data structures.
We have to communicate with external devices that require data structures
such as the following:
	struct
	{
		long  a;
		short b;
		long  c;
	};
When we compile and link something referencing this structure the
data produced appears to have had each element word boundary aligned
so that what results appears to be as follows:
	struct
	{
		long  a;
		short b;
		short pad; <==== this was inserted by cc to align next thing
		long  c;
	};
This means that we lose the benefit of data abstraction and have
to create our own output without using structures.
We have not been able to find any Sun-4 cc option that eliminates
this problem. We cannot use the 'compile as Sun-3' option.
Please let us know if you know of a built-in way around this.
thanks
Doug Toppin
uunet!melpar!toppin

johnl@esegue.segue.boston.ma.us (John R. Levine) (01/22/90)

In article <111@melpar.UUCP> toppin@melpar.UUCP (Doug Toppin   X2075) writes:
>We are using the SUN 4/260 which is a RISC architecture machine.
>We are having trouble with data alignment in our data structures.
>We have to communicate with external devices that require data structures
>such as the following:
>	struct
>	{
>		long  a;
>		short b;
>		long  c;
>	};

I guess all the world's not a Vax any more, now it's a 68020.  It would be
more correct to say that your external device requires a four-byte integer, a
two-byte integer, and a four-byte integer, all sent highest byte first.  C
makes no promise that the layout of structures will be the same from machine
to machine.  For instance, if you ran this code on a 386, there doesn't need
to be any padding (though many compilers add it to make the code run faster)
but the words are all in the opposite byte order.

The SPARC and every other RISC chip requires that items be aligned on their
natural boundaries, because there is considerable performance to be gained by
doing so, and because it is not very hard to write programs that are totally
insensitive to padding and byte order.  Many people have observed this.  In
an article on the IBM 370 series in the CACM about 10 years ago one of the
370's architects noted that the 370 permits misaligned data while its
predecessor the 360 didn't, and it was a mistake to have done so because it's
rarely used and adds considerable complicated to every 370 machine.

In the particular case of the SPARC, there is a C compiler option (documented
in the FM) to allow misaligned data at the enormous cost of several
instructions and sometimes a subroutine call for every load and store.  I
presume you are passing byte streams back and forth to your device, a memory
mapped interface that requires misaligned operands is too awful to
contemplate.  You need to write something like this:

read_foo_structure(struct foo *p)
{
	p->a = read_long();
	p->b = read_short();
	p->c = read_long();
}

long read_long(void)
{
	long v;

	/* read in big endian order */
	v = getc(f) << 24;	/* should do some error checking */
	v |= getc(f) << 16;
	v |= getc(f) << 8;
	v |= getc(f);
	return v;
}

This may seem like more work, but in my experience you write a few of these
things and use them all over the place.  Then your code is really portable.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (01/22/90)

johnl@esegue.segue.boston.ma.us (John R. Levine) writes:

| long read_long(void)
| {
| 	long v;
| 
| 	/* read in big endian order */
| 	v = getc(f) << 24;	/* should do some error checking */
| 	v |= getc(f) << 16;
| 	v |= getc(f) << 8;
| 	v |= getc(f);
| 	return v;
| }
| 
| This may seem like more work, but in my experience you write a few of these
| things and use them all over the place.  Then your code is really portable.

  I agree with your thought, although for portable transfer I usually do
LSB first (not because of any preference) just for the loop. Since I
work with 36 and 64 bit machines, I always add a sign extend on the
read.

  At one time I was operating a PC (original IBM) with a unique
coprocessor Cray2 on an ethernet link. The C2 calculated data and passed
it in 32 bit RLE format to a BASIC program which used calls to write the
display. Amazing what you can do to get a demo up FAST.
-- 
	bill davidsen - sysop *IX BBS and Public Access UNIX
davidsen@sixhub.uucp		...!uunet!crdgw1!sixhub!davidsen

"Getting old is bad, but it beats the hell out of the alternative" -anon

peter@ficc.uu.net (Peter da Silva) (01/22/90)

> I guess all the world's not a Vax any more, now it's a 68020.

Worse, since non-word-aligned values do cost extra cycles to access, any
68020 C compiler that didn't pad that structure is broken. Some "features"
of CISC processors are just too expensive to use.
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

slackey@bbn.com (Stan Lackey) (01/23/90)

In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>> I guess all the world's not a Vax any more, now it's a 68020.
>Worse, since non-word-aligned values do cost extra cycles to access, any
>68020 C compiler that didn't pad that structure is broken. Some "features"
>of CISC processors are just too expensive to use.

Just a quick summary of the last time we went around on this issue:

There are a number of interesting applications that build many
instances of small data structures, each containing varied data types.
It was said that logic simulators do this.  In a machine that forces
you to always have data aligned, this can result in lots of wasted
memory.  Not because the programmer is stupid, but because of the
nature of the application.

Now, if I have a 4MB workstation, and alignment restrictions increases
the need from under 4MB to over 4MB, there will be significant paging.
I'd rather spend two cycles to access a word sometimes, than have to
page over the Etherent.  So would the people with whom I share the
network.

------

Also: the comments on the 360 (aligned) vs 370 (unaligned):

Boy did I hear a different story.  The version I heard was that the
370 supported unaligned data, because the experience with the 360
showed it was incredibly painful to be without it.  Remember in those
days memory was VERY expensive.

:-) Stan

cik@l.cc.purdue.edu (Herman Rubin) (01/23/90)

In article <LJ81OX3ggpc2@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> > I guess all the world's not a Vax any more, now it's a 68020.
> 
> Worse, since non-word-aligned values do cost extra cycles to access, any
> 68020 C compiler that didn't pad that structure is broken. Some "features"
> of CISC processors are just too expensive to use.

Having seen the statement about penalties for unaligned, I tried the following
code (hand coded in assembler to eliminate unnecessary overhead):

.....
while(k < end)*k++ = *i++ ^ *j++;

and the j pointer was deliberately unaligned.  Now this was on a VAX, and it is
possible that other machines may give different results, but the time penalty,
while there, was not excessive.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

weaver@weitek.WEITEK.COM (01/24/90)

In article <51245@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>Just a quick summary of the last time we went around on this issue:
>
>There are a number of interesting applications that build many
>instances of small data structures, each containing varied data types.
>It was said that logic simulators do this.  In a machine that forces
>you to always have data aligned, this can result in lots of wasted
>memory.  Not because the programmer is stupid, but because of the
>nature of the application.
>

I want to point out here that this data alignment problem can be 
mostly worked around for application programs. 

On a machine with "natural" alignment, a structure (record, common) 
made of primitive data items (integers, pointers, floats, etc.) 
needs no padding if the elements are ordered such that smaller items 
always follow larger items. The size ordering of primitive data
items is machine dependant, but similar from one machine to the next. 
If the entire record is not a multiple of the largest required alignment,
then some space may be lost between structures, or in nested 
structures. This cannot be handled so easily.

In summary, if you are writing an application from scratch, you
can minimize this effect in an almost (but not quite!) machine
independant way. So for new programs, I think natural alignment
is a good time/speed tradeoff. I also think that supporting 
unaligned data by both traps and special in-line code is a good
idea, since so many programs have long histories. 

Michael.

hascall@cs.iastate.edu (John Hascall) (01/24/90)

In article <21361> weaver@weitek.UUCP (Michael Gordon Weaver) writes:
}In article <51245> slackey@BBN.COM (Stan Lackey) writes:
}>There are a number of interesting applications that build many
}>instances of small data structures, each containing varied data types.
}>It was said that logic simulators do this.  In a machine that forces
}>you to always have data aligned, this can result in lots of wasted
}>memory.  Not because the programmer is stupid, but because of the
}>nature of the application.
 
}I want to point out here that this data alignment problem can be 
}mostly worked around for application programs. 
 
} [sort elements of structures by decreasing size...]


  It seems to me that now we have a conflict between "software engineering"
  and architecture.

  It surely seems to me that, from a programming point of view, you would
  want your structures in some meaningful order as an aid to program
  understanding.  Shouldn't elements that are used together, be located
  together?

  And doesn't everyone pretty much expect certain elements at the top
  of structures, for example:

    struct FOO {                   struct BAR {
        struct FOO  *next;             struct BAR  *left;
        struct FOO  *prev;             struct BAR  *right;
          :                              :
    };                             };

  And on machines with "displacement mode" addressing (i.e., 32(R4) addresses
  the element 32 bytes into the structure at the address in register four)
  there is often a bonus (e.g., speed or code size) for elements within some
  distance (i.e., 127 bytes) from the start of the structure.  So if you put
  the big elements first, you minimize the number of "close" elements.

John Hascall  /  ISU Comp Ctr

gary@dgcad.SV.DG.COM (Gary Bridgewater) (01/24/90)

In article <21361@weitek.WEITEK.COM> weaver@weitek.UUCP (Michael Gordon Weaver) writes:
>In article <51245@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>>Just a quick summary of the last time we went around on this issue:
>>
>>There are a number of interesting applications that build many
>>instances of small data structures, each containing varied data types.
>>It was said that logic simulators do this.  In a machine that forces
>>you to always have data aligned, this can result in lots of wasted
>>memory.  Not because the programmer is stupid, but because of the
>>nature of the application.
>>
>
>I want to point out here that this data alignment problem can be 
>mostly worked around for application programs. 

I think you missed the phrase "Not because the programmer is stupid..."

>On a machine with "natural" alignment, a structure (record, common) 
>made of primitive data items (integers, pointers, floats, etc.) 
>needs no padding if the elements are ordered such that smaller items 
>always follow larger items. The size ordering of primitive data
>items is machine dependant, but similar from one machine to the next. 
>If the entire record is not a multiple of the largest required alignment,
>then some space may be lost between structures, or in nested 
>structures. This cannot be handled so easily.

I need to allocate an array of 50,000,000 8 bit integers. How do I do this?
Which is more important 1) overall memory use, 2) misalignment penalty, or
code readability? 
Then I need to allocate 1,000,000 structs containing other structs written
by another programmer. What is the natural order of the data a priori on any
machine? How big is an addr_t on a 386? Sparc? Cray? Is it bigger than a
long float?
I plan to pass these structures from a Sun 4 to a Vax to a Cray via an
ethernet connection. Now what is the natural order?

>In summary, if you are writing an application from scratch, you
>can minimize this effect in an almost (but not quite!) machine
>independant way. So for new programs, I think natural alignment
>is a good time/speed tradeoff. I also think that supporting 
>unaligned data by both traps and special in-line code is a good
>idea, since so many programs have long histories. 

I suggest that when RE-writing a program from scratch you can mitigate this
effect if you have some idea where the code is going to run. This is of
little help to Simulator vendors who have to run across different architectures.
When you write a program you have no idea if it will be successful enough to be
bothered by data alignment inefficiencies. You are usually more worried about
getting it up quickly and in the same execution universe as the specs.

In general, you are stuck and at best will have to go back and micro-tune the
heck out of it on a case-by-case basis. In your spare time, study malloc
algorithms so you can figure out how to allocate bit structures for fun and
profit.

I agree that it is easier if the hardware lets you misalign but that thinking is
passe in the brave new world of RISC where using the computer is a compiler
problem.
-- 
Gary Bridgewater, Data General Corporation, Sunnyvale California
gary@proa.sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary
Networking is the worst form of data exchange except for all the others
(apologies to WC).

larus@primost.cs.wisc.edu (James Larus) (01/25/90)

In article <21361@weitek.WEITEK.COM>, weaver@weitek.WEITEK.COM writes:
> In summary, if you are writing an application from scratch, you
> can minimize this effect in an almost (but not quite!) machine
> independant way. So for new programs, I think natural alignment
> is a good time/speed tradeoff. I also think that supporting 
> unaligned data by both traps and special in-line code is a good
> idea, since so many programs have long histories. 

This statement may be true in general, but it is not always true.  For example,
I wrote a program tracing system that writes out a trace file consisting of a
mixture of bytes, halfwords, and full words.  It is crucial to this system
that the byte quantities only take up 8 bits (otherwise the size of the already
large files grow by a factor of 2 or more).  However, it means that I need
to do unaligned stores into the trace buffer.  And, since I trace programs in
real time, I need to do the stores fast.

The MIPS R2000 has a 2 instruction sequence that can store a half/fullword
quantity on any byte boundary.  On SPARC, it takes 7 instructions to store
fullwords byte-by-byte.  Comming from Berkeley, I hate to say it, but this
is another case in which MIPS has a much better designed machine than Sun (-:

/Jim

mph@lion.inmos.co.uk (Mike Harrison) (02/01/90)

In article <1810@sunquest.UUCP> terry@sunquest.UUCP (Terry Friedrichsen) writes:
>The abstruct/memstruct proposal for aligning/not aligning C structure
>members leads me to post what I thought was the obvious idea all
>along :-):
>
>Borrow Pascal's idea of "records" and "packed records".  So in C,

Or even borrow Ada's ideas, which separate the abstaction and the 
representation.

Given a sset of declarations such as:

    type INTEGER32 is range -2147483648 .. 2147483647;

    type INTEGER16 is range -32768 .. 32767;

    type STRUCT is 
      record
        F1 : INTEGER32;
        F2 : INTEGER16;
	F : INTEGER32;
      end record;

objects of type STRUCT will be mapped in any way that the compiler wishes,
with fields (potentially) re-ordered, padding added etc., for efficiency of
access.

If I need a specific mapping I can achieve it by providing a Representation
Clause, eg. for maximum packing:

    WORD : constant := 4; -- assumes storage unit is byte, 4 bytes per word.

    for INTEGER32'SIZE use 32;

    for INTEGER16'SIZE use 16;

    for STRUCT use
      record at mod 16;
        F1 at 0 * WORD range 0 .. 31;
        F2 at 1 * WORD range 0 .. 15;
        F3 at 1 * WORD range 16 .. 47;
      end record;

In this case objects of type STRUCT will be laid out exactly as shown and will
occupy exactly 80 bits, and will be aligned on a half-word boundary.
The compiler will generate appropriate code to access complete objects or 
idividual fields, for a RISC of the kind being discussed this will obviously
be (?much) less efficient, but if the application needs it who cares?

If I wish to declare an array of these objects, such as:

    type STRUCT_ARRAY is array (NATURAL range <>) of STRUCT;

I can inform the compile that I want maximum packing by writing :

    pragma PACK (STRUCT_ARRAY);

Mike,

Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

aglew@dwarfs.csg.uiuc.edu (Andy Glew) (02/02/90)

Terminology check:

I have been informed that DEC VAX designers call the overlap case,
the case where a data field overlaps two bus widths, non-aligned.
--
Andy Glew, aglew@uiuc.edu

peter@ficc.uu.net (Peter da Silva) (02/02/90)

In article <3428@odin.SGI.COM> pkr@maddog.sgi.com (Phil Ronzone) writes:
> No, I disagree. Most of the time the data (mis)alignments are from real world
> constraint. Compressed video data, even when capacious CD-ROMs are used, are
> full of adjacent 1, 2, 3, and 4 byte integer

And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit
addressible memory, but outside of microcontrollers I've never actually
seen it.

> And generally you want to
> read and display that data as fast as is possible. Having a microcoded
> unaligned data capability is faster than user-level instructions doing the
> same thing.

If your loop is in an instruction cache you'll probably find that your
actual memory accesses are coming as fast as the bus can fill them. Cutting
the size of your loop will just add wait states.
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

khb@chiba.kbierman@sun.com (Keith Bierman - SPD Advanced Languages) (02/02/90)

In article <1990Jan31.174500.10553@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

   >1) In the hardware, perhaps taking a performance hit on "misaligned" data.
   >2) In the compilers
   >3) Explicitly in the program.

   any of the above.  (Sun-3 C does #1, Mips C does #2 on request, SPARC C
   does #3.)  It does not *promise* anything but the lowest common denominator,
   however, to wit number 3.

Sun's compilers have a -misalign option.
--
Keith H. Bierman    |*My thoughts are my own. !! kbierman@Eng.Sun.COM
It's Not My Fault   | MTS --Only my work belongs to Sun* kbierman%eng@sun.com
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"

"There is NO defense against the attack of the KILLER MICROS!"
			Eugene Brooks

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/02/90)

In article <4YG1638xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

| And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit
| addressible memory, but outside of microcontrollers I've never actually
| seen it.

  I *think* the Intel 432 has bit addressibility. I don't have my manual
(yes I kept one) here, and I evaluated it about the time of first
engineering samples.

  It was ahead of its time.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/03/90)

In article <4YG1638xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit
>addressible memory, but outside of microcontrollers I've never actually
>seen it.

The CDC STAR and its relatives (Cyber 205, ETA 10) all have/had bit
addressable memory.  I think it is a good idea.

On the subject of this discussion, those machines *still* required alignment on
natural boundaries.  Bits on bits (easy :-) bytes on bytes, 32 bit on
32 bit, 64 bit on 64 bit, etc.  I note that the machine had 48 bit addresses,
breaking the 32 bit addressing boundary almost 20 years ago.  Of course, the
machine supported 64 bit registers (both 32 and 64 actually).

On the other subject of this discussion, the System Programming Language
for those machines had a special construct to be used when you needed
to create packed structures.  Other languages do too.  It is not correct to
assume that you can create an arbitrary structure and expect that the
compiler will map it in a certain way in memory.  You need a special construct
to do that.  Thank goodness (almost) nobody builds 36 bit, 48 bit, and 60 bit
machines anymore- you might even be able to do it in a portable way.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

carlw@mercury.sybase.com (carl weidling) (02/03/90)

	The question is whether or not C's requirement to build structures
with the components in the order in which they were declared is a mistake
or not.
In article <1990Jan29.173412.2859@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
	< stuff deleted >
>The basic problem here is that the compiler cannot read minds, and the
>language does not provide a way to tell the compiler which of two
>interpretations is wanted.  The two possibilities are "I want precise
>control of what goes into memory" and "I want these members but please
>pad as necessary to make accesses fast".  Unfortunately, you can't just
>say "well, if I want padding I'll put it in myself", because many people
>want to write portable programs, and the padding requirements are *very*
>machine-specific.  Precise control of memory layout is not necessary for
	< rest of article deleted>
	Reading this I got an idea which is a slight variation on the idea
of a pragma or directive in the language.
	Why not have a PRE-processor directive that will re-arrange the
fields in a structure to maximize efficiency one way or the other? The
C-language itself is untouched, the programmer can run the pre-processor
by itself on the code to see what was done.  Perhaps lint could be made
smart enough to tell if someone was playing too many games with one of
these re-arranged structures. Something like
struct { int alpha;
#ARRANGE_ANY_WAY_YOU_WANT /* maybe specify criteria? i.e. speed vs compact */
	 long beta;
	 char gamma[3];
#END_ARRANGE
	}
-Carl Weidling

pkr@maddog.sgi.com (Phil Ronzone) (02/03/90)

In article <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:
>>Having a microcoded unaligned data capability is faster than
>>user-level instructions doing the same thing.
>
>Why?
>
>Microcoded unaligned data takes two cycles to load an unaligned datum.
>(Assuming the unaligned datum overlaps two data bus widths.)  MIPSco
>style load-left and load-right take two cycles to load the same
>unaligned datum.


I was thinking of bus-wide words (i.e., typically 32-bits).
You have at least:
     BUS FETCH / SHIFT ALIGN / BUS FETCH / SHIFT ALIGN / OR / STORE
Implementing these at typical user level adds even more -- tests to
figure out how much to shift etc.


------Me and my dyslexic keyboard----------------------------------------------
Phil Ronzone   Manager Secure UNIX           pkr@sgi.COM   {decwrl,sun}!sgi!pkr
Silicon Graphics, Inc.               "I never vote, it only encourages 'em ..."
-----In honor of Minas, no spell checker was run on this posting---------------

rpeglar@csinc.UUCP (Rob Peglar) (02/03/90)

In article <4YG1638xds13@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> In article <3428@odin.SGI.COM> pkr@maddog.sgi.com (Phil Ronzone) writes:
> > No, I disagree. Most of the time the data (mis)alignments are from real world
> > constraint. Compressed video data, even when capacious CD-ROMs are used, are
> > full of adjacent 1, 2, 3, and 4 byte integer
> 
> And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit
> addressible memory, but outside of microcontrollers I've never actually
> seen it.
> 

Actually, since the days of the CDC Star-100, that particular line of
supercomputers (Star-Cy 203-Cy 205-ETA10) supported bit-addressable memory.
This was important for things like vector bit string operations on arbitrary
aligned operands.  Such things (bit vector C <- bit vector A && bit vector
B) were in microcode.

Just thought you'd like to know.

Rob
-- 
Rob Peglar	Control Systems, Inc.	2675 Patton Rd., St. Paul MN 55113
...uunet!csinc!rpeglar		612-631-7800

The posting above does not necessarily represent the policies of my employer.

tihor@acf4.NYU.EDU (Stephen Tihor) (02/03/90)

Now add repr clauses to the RECORD ala ada in a terse C syntax of course
to maintian consistancy and maxiomize errors:

	record
		long a :13,35;
		...

where a is a long integer stored in bits 13 thorugh 35 of the record structure.

shap@delrey.sgi.com (Jonathan Shapiro) (02/04/90)

In article <8314@sybase.sybase.com> carlw@mercury.UUCP (carl weidling) writes:
>	Why not have a PRE-processor directive that will re-arrange the
>fields in a structure to maximize efficiency one way or the other?

Yuck.  If this problem is worth solving, it is worth solving right.

Jon

aglew@dwarfs.csg.uiuc.edu (Andy Glew) (02/05/90)

>>>Having a microcoded unaligned data capability is faster than
>>>user-level instructions doing the same thing.
>>
>>Microcoded unaligned data takes two cycles to load an unaligned datum.
>>(Assuming the unaligned datum overlaps two data bus widths.)  MIPSco
>>style load-left and load-right take two cycles to load the same
>>unaligned datum.
>
>I was thinking of bus-wide words (i.e., typically 32-bits).
>You have at least:
>     BUS FETCH / SHIFT ALIGN / BUS FETCH / SHIFT ALIGN / OR / STORE
>Implementing these at typical user level adds even more -- tests to
>figure out how much to shift etc.

LWL/LWR seem to work this way (and the MIPSco folk will correct me, I'm sure):
    BUS FETCH / SHIFT ALIGN / STORE selected bytes - LWL
    BUS FETCH / SHIFT ALIGN / STORE selected bytes - LWR
Two instructions. Cycles depending on mermory.

--
Andy Glew, aglew@uiuc.edu

henry@utzoo.uucp (Henry Spencer) (02/06/90)

In article <12780024@acf4.NYU.EDU> tihor@acf4.NYU.EDU (Stephen Tihor) writes:
>		long a :13,35;
>		...
>where a is a long integer stored in bits 13 thorugh 35...

Bit 13, you say?  Is that the 13th (or 14th) bit from the left, or the
13th (or 14th) bit from the right?  And if it's from the right, how big
is the unit of bits, i.e. how far is bit 0 from the leftmost bit?
-- 
SVR4:  every feature you ever |     Henry Spencer at U of Toronto Zoology
wanted, and plenty you didn't.| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

carr@gandalf.UUCP (Dave Carr) (02/06/90)

In article <11666@thorin.cs.unc.edu>, tuck@jason.cs.unc.edu (Russ Tuck) writes:
> 
> If the compiler did what you suggest and did not align struct members,
> it would in most cases be impossible to access the data member "c" above 
> without causing the program to dump core.  This would not be a useful 
> compiler "feature" :-).  SPARC (and most other RISC archs) requires all 
> ordinary memory accesses to be aligned. 

That's *most* RISC architecture.  At least with the 80960 (I know, not a true
RISC), I have the freedom to access non word aligned data.  I would rather
have the choice than let the RISC architecture force me.

Data explosion on RISC computers is pretty bad.  We should have the choice 
between slowing the CPU down only for those accesses which are not word 
aligned.  We could pad the structures to speed it back up.
-- 
Dave Carr                |  carr@e.gandalf.ca   | If you don't know where  
Gandalf Data Limited	 |  TEL (613) 723-6500  | you are going, you will
Nepean, Ontario, Canada  |  FAX (613) 226-1717  | never get there.

ccc_ldo@waikato.ac.nz (02/07/90)

In <LJ81OX3ggpc2@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) deplores
the fact that VAX and 680n0 (n > 1) processors allow word and longword accesses
on arbitrary byte boundaries, saying that "since non-word-aligned values do cost
extra cycles to access, any 68020 C compiler that didn't pad that structure is
broken."

GAK! I can't believe I'm reading this!

He goes on to say: "Some 'features' of CISC processors are just too expensive
to use."

So what's the alternative? Have you tried counting how many extra cycles you
spend doing explicit byte-by-byte accesses?

The thing is, as a programmer, I want to be able to make the tradeoff (space
versus time) myself. Sure byte-packed structures will cost more memory cycles
to access, and bit-packed structures even more so. But there are times when
I'm willing to pay the cost--think of an array of a hundred thousand 3-byte
records, or a million boolean elements. I heartily *RESENT* CPU designers
who take it upon themselves to say "Nope--that alternative costs a few too
many memory cycles for us to be comfortable with, so we'll leave out the
hardware support for it, and force you to do it in software should you feel
the urge, which will make it even *MORE* expensive."

Even Motorola aren't completely free of this disease. Look at the 68881/68882
floating-point units, which put an extra 16 padding bits into every extended-
precision quantity, just so they take up an even number of longwords.

I don't like compilers which automatically insert padding bits and bytes
between elements of arrays and structures--which is to say, most of them. My
argument is based on a very simple principle: "correctness comes before
efficiency". Pad fields can cause all kinds of problems, not just when
different compilers follow different rules in inserting them: think of
what happens when you're trying to compare two objects which happen to
differ only in the random garbage in the pad fields.

I'd rather have a compiler which allowed me to specify explicit alignment
constraints on element types (with the default being byte alignment for
everything), and which reported errors with element offsets that didn't
match their alignment constraints--that is, I was forced to put in padding
fields myself (so that I knew where they were and how big they were), rather
than having the compiler do it for me.

In conclusion, I'll say it again. *I'M* the programmer, *I'M* the only one
who knows what the performance requirements of my program are (including its
memory usage), let *ME* make the tradeoff decisions.

Lawrence D'Oliveiro
Computer Services Dept, University of Waikato
Hamilton, New Zealand
ldo@waikato.ac.nz

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (02/08/90)

In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:
> In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
> >Worse, since non-word-aligned values do cost extra cycles to access, any
> >68020 C compiler that didn't pad that structure is broken. 
> 
> This is nonsense.  Which you want depends whether speed or size is more
> important.  A valid criticism would be that too many C compilers don't let
> you specify which kind of optimisation you want.
> 

This discussion, IMHO, is pointless.  The C compilers work just fine the way
are (or at least the ones I am familiar with).  I don't think some of the
people discussing this realize the implications of what they propose.

I work on an Intergraph Clipper based workstation.  Unless I am mistaken,
floating point values can only be aligned on 8 byte boundaries if the
processor is to be able to access them in a single instruction.  If you
try to access a floating point value that is not 8 byte aligned, it
actually grabs the value at the next lowest 8 byte boundary.  It doesn't
even give a bus error trap!  In theory, the compiler could place it on
arbitrary boundaries by generating a sequence of instructions that would
read adjacent values and AND and OR the values into memory.  It sounds to
me that we are talking about 4 or 5 instructions to do this, so your
access speed would be the pits!

The reason people seem to want to be able to store values at arbitrary
locations seems to have to do with the need to write out contiguous
regions of memory to a binary file.  They then complain that reading
that file back into the memory of another machine doesn't work.  No
one ever said it would.  If you want portable code, don't write it
that way.  It is almost always possible to sacrifice portability for
speed.

I don't know why this is so astonishing; you can't write out binary
values for integers between machines, what would lead anyone to believe
that structures should be any different.

C is a low level language.  If you want greater data abstraction, move
to a higher level language that guarantees that data will appear to
be in the same format across systems.  That guarantee is not in the
C definition; doing so would probably limit C's ability to blast bits.
The only format that C guarantees to understand is ascii represented
numeric values.

The only thread of this discussion that might relate to comp.arch is
why processors (such as Clipper) do not give a trap if you try to
access memory on illegal boundaries.  Surely that would not require
much silicon?
-- 
  Terry Ingoldsby                ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                 or
  The City of Calgary       ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

cik@l.cc.purdue.edu (Herman Rubin) (02/11/90)

In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
> In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:
> > In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

			......................

> I don't know why this is so astonishing; you can't write out binary
> values for integers between machines, what would lead anyone to believe
> that structures should be any different.

I can see no more reason why strings of ASCII characters should be
transferrable by hardware with little software intervention than binary
integers, other fixed place binary numbers, other types of numbers (not
strings of numerals), mathematical symbols beyond the usual ones, etc.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

woody@rpp386.cactus.org (Woodrow Baker) (02/11/90)

In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
> In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:

> This discussion, IMHO, is pointless.  The C compilers work just fine the way
> are (or at least the ones I am familiar with).  I don't think some of the
> people discussing this realize the implications of what they propose.

Wrong.  It depends on what you do.  I happen to do programming dealing
with industrial controllers.  Specificaly, I maintain a compiler, editor
downloader, and monitor package used to program Eagle Signal Controls
EPTAK series industrial controllers.  The code that I work on runs under
MS-DOS.  I have to do things like reach out over the network, and read
data structures out of the remote controllers.  These structures for the
most part, are a mix of byte and word fields.  I then have to parse through
them, and isolate the parts.  Structures are the obvious way to do this.
BUT, the @#$% compiler choses to pad byte or char values out to ints.
This, obviously screws up the data structure access to the retrieved
values.  I have wound up doing things that I am not proud of, like unions,
monkeying around with pointers to the structures such that they don't
point to where they should, but to some offset other than the first byte
of the structure etc.  Yes, I could chose to use an array, but it is clearer
to use standard field names, (at least standard for the EPTAK controlers)
to access these data fields.

Cheers
Woody

peter@ficc.uu.net (Peter da Silva) (02/11/90)

Use structs internally.

Provide functions to read and write each structure, that do the needed
conversions. Never touch the external format internally.

For example:

	Analog accumulator:

		| flags  | val.lo   val.hi |
		+--------+--------+--------+
		| BYTE 0 | BYTE 1 | BYTE 2 |

	struct accumulator {
		char flags;
		int value;
	};

	read_accumulator(addr, info)
	char *addr;
	struct accumulator *info;
	{
		info->flags = addr[0];
		info->value = addr[2];
		info->value = (info->value << 8) | addr[1];
	}

	write_accumulator(addr, info)
	char *addr;
	struct accumulator *info;
	{
		*addr++ = info->flags;
		*addr++ = info & 0xFF;
		*addr   = (inf >> 8) & 0xFF;
	}
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

ronald@robobar.co.uk (Ronald S H Khoo) (02/12/90)

In article <17906@rpp386.cactus.org> woody@rpp386.cactus.org (Woodrow Baker) writes:
> 
> MS-DOS.  I have to do things like reach out over the network, and read
> data structures out of the remote controllers.  These structures for the
> most part, are a mix of byte and word fields.  I then have to parse through
> them, and isolate the parts.  Structures are the obvious way to do this.
> BUT, the @#$% compiler choses to pad byte or char values out to ints.

#ifdef MEDIUM_MADRAS

You don't think this is a hint that it would have been *so* much easier
if everything spoke *text* instead.  Sure, there's the overhead of
binary->text->binary, but the advantages outweigh the cost, especially
if you ever have a mix of controllers with wildly differing internal
architectures.

Oh, you want to discourage that to lock your customers in? Excuse me.

#endif

-- 
Eunet: Ronald.Khoo@robobar.Co.Uk   Phone: +44 1 991 1142    Fax: +44 1 998 8343
Paper: Robobar Ltd. 22 Wadsworth Road, Perivale, Middx., UB6 7JD ENGLAND.
$Header: /usr/ronald/.signature,v 1.2 90/01/26 15:17:15 ronald Exp $ :-)

msb@sq.sq.com (Mark Brader) (02/13/90)

> I can see no more reason why strings of ASCII characters should be
> transferrable by hardware with little software intervention than binary
> integers, other fixed place binary numbers, other types of numbers ...etc.

Because ASCII is, after all, the American Standard Code for Information
Interchange, and those other things aren't.  See signature quote.


Followups to comp.arch.

-- 
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
	A standard is established on sure bases, not capriciously but with
	the surety of something intentional and of a logic controlled by
	analysis and experiment. ... A standard is necessary for order
	in human effort.				-- Le Corbusier

This article is in the public domain.

cik@l.cc.purdue.edu (Herman Rubin) (02/13/90)

In article <S_O1_F6xds13@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> Use structs internally.
> 
> Provide functions to read and write each structure, that do the needed
> conversions. Never touch the external format internally.

			[Example deleted.]

This is another situation where the procedure is extremely slow in software.
If the appropriate hardware were provided, this would not be a problem.  But
would the machine then be RISC?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

peter@ficc.uu.net (Peter da Silva) (02/14/90)

In article <1925@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> This is another situation where the procedure is extremely slow in software.
> If the appropriate hardware were provided, this would not be a problem.  But
> would the machine then be RISC?

Who cares if it's RISC, CISC, VLIW, or a bunch of elves with abaci? If it's
fast enough, fine. If it's not, unroll the loop to the LCD of the struct
size and the data size. If that doesn't do it, recode in assembler. Then get
a faster machine (where faster is defined in terms of the problem you have
to solve: if the problem involves moving weird numbers of bits around all the
byte ops in the world won't help you). Maybe a coprocessor would help (like
having a disk controller to convert NRZ into MFM instead of doing it yourself).

Most of the time this particular operation isn't a bottleneck, so who cares
how fast it is?
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

pasek@ncrcce.StPaul.NCR.COM (Michael A. Pasek) (02/14/90)

In <17906@rpp386.cactus.org> woody@rpp386.cactus.org (Woodrow Baker) writes:
>In <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
>> This discussion, IMHO, is pointless.  The C compilers work just fine the way
>> are (or at least the ones I am familiar with).  I don't think some of the
>> people discussing this realize the implications of what they propose.
>Wrong.  It depends on what you do.  [specifics deleted..]
>  I have to do things like reach out over the network, and read
>data structures out of the remote controllers.  These structures for the
>most part, are a mix of byte and word fields.  I then have to parse through
>them, and isolate the parts.  Structures are the obvious way to do this.
>BUT, the @#$% compiler choses to pad byte or char values out to ints.

I also have the same problem.  Having the compiler pad to the "native" data
size is OK if (and ONLY if) you have complete control over that data structure
and do not need to share it with other programs/systems.  However, in data
communications protocols (pick one), the programmer has NO control over the
data structure -- it is predefined, and doesn't come with that nice padding
that the compiler likes to put in.  Some recent RISC compilers (I'm looking
at the 29K) allow you to specify whether structures are "packed" or not, 
which I think is mandatory.  Unfortunately, in the case of the 29K compiler,
although it will "pack" structures, as far as I know it will NOT generate
the appropriate instructions to access those structures if the external
memory subsystem does NOT support non-aligned accesses. Oh, well....

M. A. Pasek               Software Development              NCR Comten, Inc.
(612) 638-7668              MNI Development               2700 N. Snelling Ave.
pasek@c10sd3.StPaul.NCR.COM                               Roseville, MN  55113

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (02/15/90)

In article <17906@rpp386.cactus.org>, woody@rpp386.cactus.org (Woodrow Baker) writes:
> In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
> > In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:
> > This discussion, IMHO, is pointless.  The C compilers work just fine the way
> > are (or at least the ones I am familiar with).  I don't think some of the
> > people discussing this realize the implications of what they propose.
> Wrong.  It depends on what you do.  I happen to do programming dealing
> with industrial controllers.  Specificaly, I maintain a compiler, editor
> downloader, and monitor package used to program Eagle Signal Controls
> EPTAK series industrial controllers.  The code that I work on runs under
> MS-DOS.  I have to do things like reach out over the network, and read
> data structures out of the remote controllers.  These structures for the
> most part, are a mix of byte and word fields.  I then have to parse through
> them, and isolate the parts.  Structures are the obvious way to do this.
> BUT, the @#$% compiler choses to pad byte or char values out to ints.
If you are passing values across a network to dissimilar machines, you
should be using something like the XDR (External Data Representation).  This
 makes for portable (although messy) code.  In your case, I would agree that
your compiler might reasonably be considered to be malfunctioning, since
the Intel processors can access arbitrarily aligned data.  The discussion
originally discussed RISC processors which can NOT access arbitrary
alignments for all data types.  In this case padding is necessary.  To
minimize the amount of padding, it is necessary to reorder the structure
elements.  This is in accordance with K&R which (as I recall) explicitly
states that the elements may be re-ordered.
 
I re-iterate my original claim; it is not the compilers that are causing
the problems (your case excepted).  Rather, it is the fact that different
processors have different access requirements for data types.  Even if you
wrote your programms in RISC assembler (a horrible thought) then you could
not align your variables arbitrarily.  You would be forced to make the
same decisions/tradeoffs that the compilers make.
-- 
  Terry Ingoldsby                ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                 or
  The City of Calgary       ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

martin@mwtech.UUCP (Martin Weitzel) (02/21/90)

There were some recent postings, that pointed out/complained about
'holes' in C-struct definitions. I hope it is to the benefit of
some readers, to explain an alternate point of view of C-struct-s
and give some advice how to access a certain byte-layout in memory
in a portable (nevertheless painless) way, which avoid struct-s
completly. Because the latter may be of more interest, I'll come
to it first.

Suppose, you have some library function 'getmsg' you supply with the
adresse of a buffer and when the function returns it has the buffer
filled with the following information:

	2 Byte Integer	- length of message
	1 Byte		- several flag bits
	1 Byte		- type of message
	4 Byte Integer	- checksum
	100 Byte	- arbitrary message

Many C-Programmers now think about defining the following

struct m {
	short m_length;
	unsigned char m_flags;
	char m_type;
	unsigned long m_checksum;
	char m_bytes[100];
} buffer;

so that after an 'getmsg(&buffer)' they can access the individual
parts 'by name', eg: buffer.m_length, buffer.m_flags, ....

... and as the previous posters pointed out, they eventually
get trapped by the 'holes' inserted into the struct by the
compiler for the sake of efficiency.

My advice in this situation is, to change this code as follows:

char buffer[
	  2 /* length of message */
	+ 1 /* several flag bits
	+ 1 /* type of message */
	+ 4 /* checksum */
	+ 100 /* arbitrary message */
];

#define m_length(b)	(*((short *)        (char *)(b) + 0))
#define m_flags(b)	(*((unsigned char *)(char *)(b) + 2))
#define m_type(b)	(*((char *)         (char *)(b) + 3))
#define m_checksum(b)	(*((unsigned long *)(char *)(b) + 4))
#define m_bytes(b)	(                   (char *)(b) + 8 )

(I inserted some white space for readability.)

The least you must know of your compiler in that case is that
a 'char' occupies exactly one byte in an 'array of char'. But
as before, you can access the individual parts 'by name' as
follows: m_length(buffer), m_flags(buffer), ....
If 'getmsg' is allways supplied to the same buffer, you could
make it even simpler by avoiding a parametrized macros and use

#define m_length (*(short *)buffer)
#define m_flags (*(unsigned char *)(buffer + 2))
......

Note that the above expressions are also 'lvalues' ie you
can use them on the left side of an assignment.

There remains only the minor problem, that 'buffer' must be
properly aligned. (Techniques for achieving this are shown
in K&R - you simply have to define buffer as a union with the
type of desired alignement. Alternatively you may allocate
the buffer with 'malloc'.)

If your concern is only 'reading' the elements out of the buffer,
you have the additional benefit that you can transparently compensate
for possible 'byte-order' problems. Suppose the message is produced
by some piece of hardware that assumes the LSB of a 16 Bit Integer
on the lower adress, and you want to move this hardware to a system,
where the CPU takes just the opposite view. All you have to change is:

#define m_length ((short)\
	((*(unsigned char *)(buffer+1))<<8)\
	|(*(unsigned char *)buffer))
.......

(Hope I missed no brackets ... :-)) 

Now back to an alternate view of the C-struct-s, hit 'n' if
you are no more interested.

IMHO many features of the C language can elegantly be explained in
an easy way, if you 'translate' the feature to the 'machine level'.
(Eg I explain much about pointers and arrays to my classes by
sketching pictures with the contents of the data segment.)

One thing to misunderstand here is, that such an explanation often
describes only *one* possible approach to implement the abstract
concept: Though it seems natural, to think about a C-struct as
beeing a collection of individual variables located at increasing
memory adresses in the order they are declared(%) as struct-components,
it often makes more sense, to see a C-struct only as a collection
of data-items, that are garanteed *not* to overlap(%%). Furthermore
the compiler asserts that access to a named struct-component will
allways refer to the same part of memory, even if only the struct-s
adress is the same (important when transfering struct-pointers as
function parameters).

The other guaranty, that the struct-components are located (more
or less) adjacent in memory is only of some 'practical' value,
especially if you have an 'array of struct'-s or write one struct
to a file (using write/fwrite together with sizeof), but has
nothing to do with the abstract concept of a C-struct. 

(%): Even the guarantee, that the struct elements are at ascending
adresses in the order they are declared, IMHO only was given
to avoid complex (and hard to understand) rules, when and when
not it would be allowed to rearrange the elements. Readers who
know other good reasons why this guarantee is given are welcome
to correct me (hello Chris :-)).

(%%): Note, that in the case of a C-union the garanty is *not*
that the elements overlap: They only *may* overlap (unless they
are of the same type or they are different C-structs but with
components of the same type at the beginning, which leads back
to the problem when and when not rearranging could have been
allowed ... again, correct me if I'm wrong).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

peter@ficc.uu.net (Peter da Silva) (02/23/90)

> (%): Even the guarantee, that the struct elements are at ascending
> adresses in the order they are declared, IMHO only was given
> to avoid complex (and hard to understand) rules, when and when
> not it would be allowed to rearrange the elements. Readers who
> know other good reasons why this guarantee is given are welcome
> to correct me (hello Chris :-)).

It makes the following two practices reasonably portable:

1:
	struct list_header {
		struct list_header next, prev;
	};

	struct object {
		struct list_header list;
		...
	};

	struct list_header *my_list == NULL;
	struct object my_object;
	extern add_list(struct list_header **list, struct list_header *elt);

	add_list(&my_list, &my_object);

2:
	struct buffer {
		int len;
		char *next;
		char data[1];
	};

	struct buffer *new_buffer(size)
	int size;
	{
		struct buffer *temp;
		
		temp = (struct buffer *) malloc(sizeof *temp + size);
		if(temp) {
			temp->len = size;
			temp->next = &temp->data[0];
		}
		return temp;
	}
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

djones@megatest.UUCP (Dave Jones) (02/23/90)

From article <645@mwtech.UUCP), by martin@mwtech.UUCP (Martin Weitzel):
) There were some recent postings, that pointed out/complained about
) 'holes' in C-struct definitions.
...
) 
) My advice in this situation is, to change this code as follows:
) 
) char buffer[
) 	  2 /* length of message */
) 	+ 1 /* several flag bits
) 	+ 1 /* type of message */
) 	+ 4 /* checksum */
) 	+ 100 /* arbitrary message */
) ];
) 
) #define m_length(b)	(*((short *)        (char *)(b) + 0))
) #define m_flags(b)	(*((unsigned char *)(char *)(b) + 2))
) #define m_type(b)	(*((char *)         (char *)(b) + 3))
) #define m_checksum(b)	(*((unsigned long *)(char *)(b) + 4))
) #define m_bytes(b)	(                   (char *)(b) + 8 )
) 

There's probably going to be a flurry of replies telling you why
this will not work in the general case.

These casts from char* to this-or-that* are not going to work
unless the data just happen to be properly aligned for whatever
processor you happen to be using.

martin@mwtech.UUCP (Martin Weitzel) (02/24/90)

In article <12118@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
}From article <645@mwtech.UUCP), by me (Martin Weitzel):
}) There were some recent postings, that pointed out/complained about
}) 'holes' in C-struct definitions.
}...
}) 
}) My advice in this situation is, to change this code as follows:
}) 
}) char buffer[
}) 	  2 /* length of message */
}) 	+ 1 /* several flag bits
}) 	+ 1 /* type of message */
}) 	+ 4 /* checksum */
}) 	+ 100 /* arbitrary message */
}) ];
}) 
}) #define m_length(b)	(*((short *)        (char *)(b) + 0))
}) #define m_flags(b)	(*((unsigned char *)(char *)(b) + 2))
}) #define m_type(b)	(*((char *)         (char *)(b) + 3))
}) #define m_checksum(b)	(*((unsigned long *)(char *)(b) + 4))
}) #define m_bytes(b)	(                   (char *)(b) + 8 )
}) 
}
}There's probably going to be a flurry of replies telling you why
}this will not work in the general case.
}
}These casts from char* to this-or-that* are not going to work
}unless the data just happen to be properly aligned for whatever
}processor you happen to be using.

I'm well aware that allignment restrictions may invalidate
certain casts from one pointer type to another, but you must
see my proposual in the context of the original questions:

The posters generally complained, that they were not able to
overlay certain byte patterns in memory, because the C-struct
they defined for that purpose contained holes (introduced by
the compiler). The question, by which hard- or software the byte
patterns were produced, was never mentioned in these postings,
but because the posters seemed to be sure, that (only) the
holes in the structures caused the problems, the parts must
have been allready properly aligned 

If the parts of the byte patterns were not properly
aligned, also struct-s *without* holes could not have been
used for this purpose(%). So my proposual is not worse than
a struct, but sometimes helps to get (better) control of which
memory locations are accessed, than struct-s can provide.
If it is only necessary to *read-access* the bytes in question,
the approach described later in my original posting for getting
'wrong' byte order 'right', may also be used in case
of not properly aligned short-s, int-s or long-s.

(%) If a compiler, which supports an option to pack structures,
does this *always tightly*, even on systems with specific alignment
requirements for short-s, int-s and long-s, it may emit code
to acces the LSB/MSB idividual and combine them in a register,
but this would be such an extreme performance penalty, that
I guess such compilers are rare.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83