hankd@pur-ee.UUCP (Hank Dietz) (03/28/89)
In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: >> Granted, by careful (human) reordering of the fields within a complex >> structure, one can almost always reduce wasted space to less than one >> machine word/line per allocated item. This is a good thing that we all do >> when possible on new code, but it is much more difficult (and risky) >> to go back and try to do this with code in a very large existing base >> of (otherwise) good, working software. > >Actually, it can be done by machine. A simple working algorithm is to sort >the elements by size, starting with the largest. And, as I said, the wasted >space has an upper bound of (greatest alignment restriction)-(size of smallest >data item), usually 4 and 1 bytes, respectively. Sometimes 8 and 1. > >If the structures aren't dumped, raw, to a file, I would have no hesitation >reordering structures in my code. All true. In the compilers courses I teach, I've been mentioning structure member re-arrangement as an integral part of data structure allocation within a compiler. It works really well for languages like Pascal, however, there is a big problem with this for C code. In C, it has been a very common idiom to ASSUME that the address of the first member of a struct IS the address of the struct. For example: struct s { char c; double d; } t; char *p = &t; Now this might get you a warning message (types don't match), but the loose usage is very common in real programs, and is often hidden in a function: a = f(&t); where f expects its argument to be a char *. Using (nasty) global flow analysis, this use can be detected and even mechanically corrected in MOST cases... but is it worth it? Opinions? -hankd@ee.ecn.purdue.edu
bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) (03/28/89)
In article <11118@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: >>> Granted, by careful (human) reordering of the fields within a complex >>> structure, one can almost always reduce wasted space to less than one >>> machine word/line per allocated item. [...] >>Actually, it can be done by machine. [...] > >All true. In the compilers courses I teach, I've been mentioning structure >member re-arrangement as an integral part of data structure allocation >within a compiler. It works really well for languages like Pascal, however, >there is a big problem with this for C code. > >In C, it has been a very common idiom to ASSUME that the address of the first >member of a struct IS the address of the struct. For example: [elided] I brought this up as an advanced discussion topic in 211 which I TA. The big problem that you're referring to is probably the injunction against reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213). You'll find it in the ANSI std too. So, the assumption is valid for all architectures that do not require padding before the first element of the struct. Does anybody know of any? I'd conjecture that no machine is that silly. There are many uses for not reordering. The first is just plain old simplicity -- the KISS rule. If you're writing a device driver, you'd like to have fine control over member placement. If you're interfacing to foreign language routines, you'd like to have fine control over member placement (accessing common blocks f77 on Vaxen, for example, is done via using an extern struct/union with the name of the struct/union being that of the common block with an underscore postpended). Even if all you want is to make it simple to interface with assembler, then you'd still want no reordering: you'd want easily understood member placement rules so your assembler writer will have a chance -- and the leave-it-alone rule is the easiest. C is a low/mixed level language. I want C compilers to do exactly what I told it to do (whatever that may be -- look to the language specs)! Like structure member ordering, I want C compilers to listen when I tell it to put something in a register -- if I don't bother, then it can do all the analysis it wants, but if I _do_.... There are too many `optimizing' compilers that (1) can't quite optimize the code as much as I'd like and (2) when given hand tuned C code that should translate nicely to the underlying machine code, bungle things because it wasn't willing to listen. Why hand tune C and not assembler? Portability and maintainability. When I hand tune a program for a particular machine, it's nice to be able to run it on another without having to rewrite code. Until there's a keyword to explicitly allow/disallow member reordering (ala register), it is an implementation flaw to reorder structure members. Storage compaction by aggregate reordering is fine in a higher level language -- but if you want your language to be able to express systems-level code, leave it out! Just because something is possible doesn't mean it should be done. -bsy -- Internet: bsy@cs.cmu.edu Bitnet: bsy%cs.cmu.edu%smtp@interbit CSnet: bsy%cs.cmu.edu@relay.cs.net Uucp: ...!seismo!cs.cmu.edu!bsy USPS: Bennet Yee, CS Dept, CMU, Pittsburgh, PA 15213-3890 Voice: (412) 268-7571 --
hankd@pur-ee.UUCP (Hank Dietz) (03/28/89)
In article <4582@pt.cs.cmu.edu> bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) writes: >In article <11118@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >>In C, it has been a very common idiom to ASSUME that the address of the first >>member of a struct IS the address of the struct. For example: [elided] > ... >The big problem that you're referring to is probably the injunction against >reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213). >You'll find it in the ANSI std too. So, the assumption is valid for all >architectures that do not require padding before the first element of the >struct. Does anybody know of any? I'd conjecture that no machine is that >silly. .... It isn't a machine issue... it is a compiler issue. Suppose an int is 4 units and a char is 1 unit, with alignment constraints to match. At the risk of reopening the great endian wars, given a struct like: struct s { char c; int i; } t; a compiler might legally decide to make &t be 3 units BEFORE &(t.c); effectively {pad3, char, int} versus {char, pad3, int}. This does not violate the above referenced ordering constraints, and might make sense in that the machine might be able to extract the char value from a fetched word (int) faster if the padding appears in front rather than behind. Basically, the difference would be extracting the char by a simple mask versus extracting it using a multi-bit shift (and perhaps also a mask). Why isn't this a big problem? Easy. The compiler doesn't know it will need to pad struct s until it encounters the int member, and by that time most compilers have generated the allocation offset for the char field. In short, it is easier for the compiler not to put padding in front. -hankd@ee.ecn.purdue.edu
guy@auspex.UUCP (Guy Harris) (03/29/89)
>In C, it has been a very common idiom to ASSUME that the address of the first >member of a struct IS the address of the struct. Well, according to the December 7, 1988 ANSI C draft, it's not an assumption, it's a fact: 3.5.2.1 Structure and union specifiers ...A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
peter@ficc.uu.net (Peter da Silva) (03/29/89)
In article <11118@pur-ee.UUCP>, hankd@pur-ee.UUCP (Hank Dietz) writes: > In C, it has been a very common idiom to ASSUME that the address of the first > member of a struct IS the address of the struct. For example: > struct s { > char c; > double d; > } t; A very good point. A more common case is something like this: struct window_list { struct general_list wl_header; int wl_x0, wl_x1, wl_y0, wl_y1; struct damage_list wl_dlist; ... }; -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
Devin_E_Ben-Hur@cup.portal.com (03/29/89)
[discussions of reordering C structures deleted...] Sorry, folks, this is from K&R I, p196, "8.5 Structure and union declarations ... Within a structure, the objects declared have addresses which increase as their declarations are read left-to-right." There is similar language in the pANS and K&R II. Reordering C structure members has never been and is not now correct behavior for a C compiler. Devin_Ben-Hur@Cup.Portal.Com ...ucbvax!sun!portal!cup.portal.com!devin_ben-hur
gd@geovision.uucp (Gord Deinstadt) (03/31/89)
In article <4582@pt.cs.cmu.edu> bsy@PLAY.MACH.CS.CMU.EDU (Flexi-thumbs) writes: > >The big problem that you're referring to is probably the injunction against >reordering structure members, viz., Section A8.3 of K&R 2nd Ed (pg 213). >You'll find it in the ANSI std too. So, the assumption is valid for all >architectures that do not require padding before the first element of the >struct. Does anybody know of any? I'd conjecture that no machine is that >silly. Even when (as in the original example) the first member is smaller than a word? I bet there are machines "that silly". It's not a problem to sensible code. >[ Arguments why structure order is important in system programming. ] >Until there's a keyword to explicitly allow/disallow member reordering (ala >register), it is an implementation flaw to reorder structure members. And it's about time there was such a keyword. We've got a million plus lines of C code here, and hardly a line of it needs to know what a structure looks like internally. For our purposes (as for most application programmers) structure member reordering should be the default. eat this, line counter -- Gord Deinstadt gd@geovision.uucp
nobody@tekecs.GWD.TEK.COM (-for inetd server command) (04/05/89)
In article <16360@cup.portal.com> Devin_E_Ben-Hur@cup.portal.com writes: >Sorry, folks, this is from K&R I, p196, "8.5 Structure and union declarations >... Within a structure, the objects declared have addresses which increase >as their declarations are read left-to-right." >There is similar language in the pANS and K&R II. Reordering C structure >members has never been and is not now correct behavior for a C compiler. Let's ignore sorting ALL structure elements for a moment. Let's consider a compiler which simply puts each new structure element at the first offset where that new element fits. Many times such a compiler may have to put the new element at the end (as per K&R and pANS). Sometimes the compiler could "take back" padding, that the compiler previously assumed was needed (expressly forbidden by K&R and pANS). Let's assume the compiler treats "sub-structures" as opaque black boxes and doesn't try to stuff anything inside a sub-structure. This will still allow the compiler to implement structure assignment (e.g. to the sub-structures) as simple memory copies. This way programs that require certain structure elements at particular offsets on a particular machine, could specify structure elements (e.g. pad1 pad2 ...) where padding would otherwise take place. Under such a scheme, what otherwise portable programs would quit working? Obviously, expressions like: &ptr->fieldx - &ptr->fieldy may be (unexpectedly) negative. I'm not sure that even with the needed casts, that this sort of thing is portable. Paul Scherf, Tektronix, Box 1000, MS 61-028, Wilsonville, OR, USA paulsc@orca.WV.Tek.com 503-685-2734 tektronix!orca!paulsc
jlg@lanl.gov (Jim Giles) (04/06/89)
From article <583@geovision.UUCP>, by gd@geovision.uucp (Gord Deinstadt): > And it's about time there was such a keyword. We've got a million plus > lines of C code here, and hardly a line of it needs to know what a structure > looks like internally. For our purposes (as for most application programmers) > structure member reordering should be the default. I disagree. One of the main reasons I use structures is to describe tables that will be used to share data between machines and/or between languages. For this reason, I want all the fields to be in the order I specify _AND_ I want them contiguous (_NO_ compiler generated padding). Any automatically generated padding or alignment may vary from compiler to compiler and almost certainly _will_ vary from one machine to another. If I design a table that tends to span word boudaries with fields, or has fields that begin at clumsy places - that's my fault and I'll take the performance penalty - at least the data will be shared correctly.
cquenel@polyslo.CalPoly.EDU (44 more school days) (04/06/89)
From article <583@geovision.UUCP>, by gd@geovision.uucp (Gord Deinstadt): > structure member reordering should be the default. In 9296 jlg@lanl.gov (Jim Giles) sez: >I disagree. One of the main reasons I use structures is to describe tables >that will be used to share data between machines and/or between languages. This doesn't seem to be a reason for not making re-arrangement the default, just for allowing it be turned off. If I were writing a structure that NEEDED to be aligned exactly as I had specifed, I would fell MUCH safer if I could say "explicit" in front of it, and KNOW that either the compiler would code it as it stands, or give me a compile time error. This is how I'd like to see it. But, unfortunately, there was no precedent for this, and not enough people think we need it, so I'll have to wait. Or use a different language. Please note the followups have been directed to comp.lang.c --chris Maybe Computer Science should be in the College of Theology. -- @---@ ------------------------------------------------------------------ @---@ \. ./ | Chris Quenelle (The First Lab Rat) cquenel@polyslo.calpoly.edu | \. ./ \ / | + good; ++ good; -- Annie Lennox | \ / ==o== ------------------------------------------------------------------ ==o==