[comp.arch] Sorting struct members for alignment

hankd@pur-ee.UUCP (Hank Dietz) (03/28/89)

In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes:
>> Granted, by careful (human) reordering of the fields within a complex
>> structure, one can almost always reduce wasted space to less than one
>> machine word/line per allocated item.  This is a good thing that we all do
>> when possible on new code, but it is much more difficult (and risky)
>> to go back and try to do this with code in a very large existing base
>> of (otherwise) good, working software.
>
>Actually, it can be done by machine.  A simple working algorithm is to sort
>the elements by size, starting with the largest.  And, as I said, the wasted
>space has an upper bound of (greatest alignment restriction)-(size of smallest
>data item), usually 4 and 1 bytes, respectively.  Sometimes 8 and 1.
>
>If the structures aren't dumped, raw, to a file, I would have no hesitation
>reordering structures in my code.

All true.  In the compilers courses I teach, I've been mentioning structure
member re-arrangement as an integral part of data structure allocation
within a compiler.  It works really well for languages like Pascal, however,
there is a big problem with this for C code.

In C, it has been a very common idiom to ASSUME that the address of the first
member of a struct IS the address of the struct.  For example:

struct s {
	char c;
	double d;
} t;

char *p = &t;

Now this might get you a warning message (types don't match), but the loose
usage is very common in real programs, and is often hidden in a function:

a = f(&t);

where f expects its argument to be a char *.  Using (nasty) global flow
analysis, this use can be detected and even mechanically corrected in
MOST cases...  but is it worth it?  Opinions?

						-hankd@ee.ecn.purdue.edu

bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) (03/28/89)

In article <11118@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
>In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes:
>>> Granted, by careful (human) reordering of the fields within a complex
>>> structure, one can almost always reduce wasted space to less than one
>>> machine word/line per allocated item. [...]
>>Actually, it can be done by machine. [...]
>
>All true.  In the compilers courses I teach, I've been mentioning structure
>member re-arrangement as an integral part of data structure allocation
>within a compiler.  It works really well for languages like Pascal, however,
>there is a big problem with this for C code.
>
>In C, it has been a very common idiom to ASSUME that the address of the first
>member of a struct IS the address of the struct.  For example: [elided]

I brought this up as an advanced discussion topic in 211 which I TA.

The big problem that you're referring to is probably the injunction against
reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213).
You'll find it in the ANSI std too.  So, the assumption is valid for all
architectures that do not require padding before the first element of the
struct.  Does anybody know of any?  I'd conjecture that no machine is that
silly.

There are many uses for not reordering.  The first is just plain old
simplicity -- the KISS rule.  If you're writing a device driver, you'd like
to have fine control over member placement.  If you're interfacing to
foreign language routines, you'd like to have fine control over member
placement (accessing common blocks f77 on Vaxen, for example, is done via
using an extern struct/union with the name of the struct/union being that of
the common block with an underscore postpended).  Even if all you want is to
make it simple to interface with assembler, then you'd still want no
reordering:  you'd want easily understood member placement rules so your
assembler writer will have a chance -- and the leave-it-alone rule is the
easiest.

C is a low/mixed level language.  I want C compilers to do exactly what I
told it to do (whatever that may be -- look to the language specs)!  Like
structure member ordering, I want C compilers to listen when I tell it to
put something in a register -- if I don't bother, then it can do all the
analysis it wants, but if I _do_....  There are too many `optimizing'
compilers that (1) can't quite optimize the code as much as I'd like and (2)
when given hand tuned C code that should translate nicely to the underlying
machine code, bungle things because it wasn't willing to listen.  Why hand
tune C and not assembler?  Portability and maintainability.  When I hand
tune a program for a particular machine, it's nice to be able to run it on
another without having to rewrite code.

Until there's a keyword to explicitly allow/disallow member reordering (ala
register), it is an implementation flaw to reorder structure members.
Storage compaction by aggregate reordering is fine in a higher level
language -- but if you want your language to be able to express
systems-level code, leave it out!  Just because something is possible
doesn't mean it should be done.

-bsy
-- 
Internet:  bsy@cs.cmu.edu		Bitnet:	 bsy%cs.cmu.edu%smtp@interbit
CSnet:     bsy%cs.cmu.edu@relay.cs.net	Uucp:    ...!seismo!cs.cmu.edu!bsy
USPS:      Bennet Yee, CS Dept, CMU, Pittsburgh, PA 15213-3890
Voice:     (412) 268-7571
-- 

hankd@pur-ee.UUCP (Hank Dietz) (03/28/89)

In article <4582@pt.cs.cmu.edu> bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) writes:
>In article <11118@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
>>In C, it has been a very common idiom to ASSUME that the address of the first
>>member of a struct IS the address of the struct.  For example: [elided]
>
...
>The big problem that you're referring to is probably the injunction against
>reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213).
>You'll find it in the ANSI std too.  So, the assumption is valid for all
>architectures that do not require padding before the first element of the
>struct.  Does anybody know of any?  I'd conjecture that no machine is that
>silly.
....

It isn't a machine issue... it is a compiler issue.  Suppose an int is 4
units and a char is 1 unit, with alignment constraints to match.  At the
risk of reopening the great endian wars, given a struct like:

struct s { char c; int i; } t;

a compiler might legally decide to make &t be 3 units BEFORE &(t.c);
effectively {pad3, char, int} versus {char, pad3, int}.  This does not
violate the above referenced ordering constraints, and might make sense in
that the machine might be able to extract the char value from a fetched word
(int) faster if the padding appears in front rather than behind.  Basically,
the difference would be extracting the char by a simple mask versus
extracting it using a multi-bit shift (and perhaps also a mask).

Why isn't this a big problem?  Easy.  The compiler doesn't know it will need
to pad struct s until it encounters the int member, and by that time most
compilers have generated the allocation offset for the char field.  In
short, it is easier for the compiler not to put padding in front.

						-hankd@ee.ecn.purdue.edu

guy@auspex.UUCP (Guy Harris) (03/29/89)

>In C, it has been a very common idiom to ASSUME that the address of the first
>member of a struct IS the address of the struct.

Well, according to the December 7, 1988 ANSI C draft, it's not an
assumption, it's a fact:

	3.5.2.1 Structure and union specifiers

	...A pointer to a structure object, suitably converted, points
	to its initial member (or if that member is a bit-field, then to
	the unit in which it resides), and vice versa. ...

peter@ficc.uu.net (Peter da Silva) (03/29/89)

In article <11118@pur-ee.UUCP>, hankd@pur-ee.UUCP (Hank Dietz) writes:
> In C, it has been a very common idiom to ASSUME that the address of the first
> member of a struct IS the address of the struct.  For example:

> struct s {
> 	char c;
> 	double d;
> } t;

A very good point. A more common case is something like this:

struct window_list {
	struct general_list wl_header;
	int wl_x0, wl_x1, wl_y0, wl_y1;
	struct damage_list wl_dlist;
	...
};
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

Devin_E_Ben-Hur@cup.portal.com (03/29/89)

[discussions of reordering C structures deleted...]

Sorry, folks, this is from K&R I, p196, "8.5 Structure and union declarations
... Within a structure, the objects declared have addresses which increase
as their declarations are read left-to-right."

There is similar language in the pANS and K&R II.  Reordering C structure
members has never been and is not now correct behavior for a C compiler.

Devin_Ben-Hur@Cup.Portal.Com
...ucbvax!sun!portal!cup.portal.com!devin_ben-hur

gd@geovision.uucp (Gord Deinstadt) (03/31/89)

In article <4582@pt.cs.cmu.edu> bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) writes:
>
>The big problem that you're referring to is probably the injunction against
>reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213).
>You'll find it in the ANSI std too.  So, the assumption is valid for all
>architectures that do not require padding before the first element of the
>struct.  Does anybody know of any?  I'd conjecture that no machine is that
>silly.

Even when (as in the original example) the first member is smaller than
a word?  I bet there are machines "that silly".  It's not a problem to
sensible code.

>[ Arguments why structure order is important in system programming. ]
>Until there's a keyword to explicitly allow/disallow member reordering (ala
>register), it is an implementation flaw to reorder structure members.

And it's about time there was such a keyword.  We've got a million plus
lines of C code here, and hardly a line of it needs to know what a structure
looks like internally.  For our purposes (as for most application programmers)
structure member reordering should be the default.

eat
this,
line
counter
-- 
Gord Deinstadt           gd@geovision.uucp

nobody@tekecs.GWD.TEK.COM (-for inetd server command) (04/05/89)

In article <16360@cup.portal.com> Devin_E_Ben-Hur@cup.portal.com writes:
>Sorry, folks, this is from K&R I, p196, "8.5 Structure and union declarations
>... Within a structure, the objects declared have addresses which increase
>as their declarations are read left-to-right."

>There is similar language in the pANS and K&R II.  Reordering C structure
>members has never been and is not now correct behavior for a C compiler.

Let's ignore sorting ALL structure elements for a moment.  Let's consider a
compiler which simply puts each new structure element at the first offset
where that new element fits.  Many times such a compiler may have to put the
new element at the end (as per K&R and pANS).  Sometimes the compiler could
"take back" padding, that the compiler previously assumed was needed
(expressly forbidden by K&R and pANS).

Let's assume the compiler treats "sub-structures" as opaque black boxes and
doesn't try to stuff anything inside a sub-structure.  This will still allow
the compiler to implement structure assignment (e.g. to the sub-structures) as
simple memory copies.

This way programs that require certain structure elements at particular
offsets on a particular machine, could specify structure elements (e.g. pad1
pad2 ...) where padding would otherwise take place.

Under such a scheme, what otherwise portable programs would quit working?

Obviously, expressions like:

    &ptr->fieldx - &ptr->fieldy

may be (unexpectedly) negative.  I'm not sure that even with the needed casts,
that this sort of thing is portable.

Paul Scherf, Tektronix, Box 1000, MS 61-028, Wilsonville, OR, USA
paulsc@orca.WV.Tek.com     503-685-2734     tektronix!orca!paulsc

jlg@lanl.gov (Jim Giles) (04/06/89)

From article <583@geovision.UUCP>, by gd@geovision.uucp (Gord Deinstadt):
> And it's about time there was such a keyword.  We've got a million plus
> lines of C code here, and hardly a line of it needs to know what a structure
> looks like internally.  For our purposes (as for most application programmers)
> structure member reordering should be the default.


I disagree.  One of the main reasons I use structures is to describe tables
that will be used to share data between machines and/or between languages.
For this reason, I want all the fields to be in the order I specify _AND_
I want them contiguous (_NO_ compiler generated padding).  Any automatically
generated padding or alignment may vary from compiler to compiler and almost
certainly _will_ vary from one machine to another.  If I design a table that
tends to span word boudaries with fields, or has fields that begin at clumsy
places - that's my fault and I'll take the performance penalty - at least
the data will be shared correctly.

cquenel@polyslo.CalPoly.EDU (44 more school days) (04/06/89)

From article <583@geovision.UUCP>, by gd@geovision.uucp (Gord Deinstadt):
> structure member reordering should be the default.

In 9296 jlg@lanl.gov (Jim Giles) sez:
>I disagree.  One of the main reasons I use structures is to describe tables
>that will be used to share data between machines and/or between languages.

	This doesn't seem to be a reason for not making 
	re-arrangement the default, just for allowing it be turned off.

	If I were writing a structure that NEEDED to be aligned
	exactly as I had specifed, I would fell MUCH safer if I
	could say "explicit" in front of it, and KNOW that either
	the compiler would code it as it stands, or give me
	a compile time error.

	This is how I'd like to see it.

	But, unfortunately, there was no precedent for this, and
	not enough people think we need it, so I'll have to wait.
	Or use a different language.

	Please note the followups have been directed to comp.lang.c

	--chris

Maybe Computer Science should be in the College of Theology.
-- 
@---@  ------------------------------------------------------------------  @---@
\. ./  | Chris Quenelle (The First Lab Rat) cquenel@polyslo.calpoly.edu |  \. ./
 \ /   |                + good; ++ good;     -- Annie Lennox            |   \ / 
==o==  ------------------------------------------------------------------  ==o==