toppin@melpar.UUCP (Doug Toppin X2075) (01/20/90)
We are using the SUN 4/260 which is a RISC architecture machine. We are having trouble with data alignment in our data structures. We have to communicate with external devices that require data structures such as the following: struct { long a; short b; long c; }; When we compile and link something referencing this structure the data produced appears to have had each element word boundary aligned so that what results appears to be as follows: struct { long a; short b; short pad; <==== this was inserted by cc to align next thing long c; }; This means that we lose the benefit of data abstraction and have to create our own output without using structures. We have not been able to find any Sun-4 cc option that eliminates this problem. We cannot use the 'compile as Sun-3' option. Please let us know if you know of a built-in way around this. thanks Doug Toppin uunet!melpar!toppin
johnl@esegue.segue.boston.ma.us (John R. Levine) (01/22/90)
In article <111@melpar.UUCP> toppin@melpar.UUCP (Doug Toppin X2075) writes: >We are using the SUN 4/260 which is a RISC architecture machine. >We are having trouble with data alignment in our data structures. >We have to communicate with external devices that require data structures >such as the following: > struct > { > long a; > short b; > long c; > }; I guess all the world's not a Vax any more, now it's a 68020. It would be more correct to say that your external device requires a four-byte integer, a two-byte integer, and a four-byte integer, all sent highest byte first. C makes no promise that the layout of structures will be the same from machine to machine. For instance, if you ran this code on a 386, there doesn't need to be any padding (though many compilers add it to make the code run faster) but the words are all in the opposite byte order. The SPARC and every other RISC chip requires that items be aligned on their natural boundaries, because there is considerable performance to be gained by doing so, and because it is not very hard to write programs that are totally insensitive to padding and byte order. Many people have observed this. In an article on the IBM 370 series in the CACM about 10 years ago one of the 370's architects noted that the 370 permits misaligned data while its predecessor the 360 didn't, and it was a mistake to have done so because it's rarely used and adds considerable complicated to every 370 machine. In the particular case of the SPARC, there is a C compiler option (documented in the FM) to allow misaligned data at the enormous cost of several instructions and sometimes a subroutine call for every load and store. I presume you are passing byte streams back and forth to your device, a memory mapped interface that requires misaligned operands is too awful to contemplate. You need to write something like this: read_foo_structure(struct foo *p) { p->a = read_long(); p->b = read_short(); p->c = read_long(); } long read_long(void) { long v; /* read in big endian order */ v = getc(f) << 24; /* should do some error checking */ v |= getc(f) << 16; v |= getc(f) << 8; v |= getc(f); return v; } This may seem like more work, but in my experience you write a few of these things and use them all over the place. Then your code is really portable. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl "Now, we are all jelly doughnuts."
davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (01/22/90)
johnl@esegue.segue.boston.ma.us (John R. Levine) writes: | long read_long(void) | { | long v; | | /* read in big endian order */ | v = getc(f) << 24; /* should do some error checking */ | v |= getc(f) << 16; | v |= getc(f) << 8; | v |= getc(f); | return v; | } | | This may seem like more work, but in my experience you write a few of these | things and use them all over the place. Then your code is really portable. I agree with your thought, although for portable transfer I usually do LSB first (not because of any preference) just for the loop. Since I work with 36 and 64 bit machines, I always add a sign extend on the read. At one time I was operating a PC (original IBM) with a unique coprocessor Cray2 on an ethernet link. The C2 calculated data and passed it in 32 bit RLE format to a BASIC program which used calls to write the display. Amazing what you can do to get a demo up FAST. -- bill davidsen - sysop *IX BBS and Public Access UNIX davidsen@sixhub.uucp ...!uunet!crdgw1!sixhub!davidsen "Getting old is bad, but it beats the hell out of the alternative" -anon
peter@ficc.uu.net (Peter da Silva) (01/22/90)
> I guess all the world's not a Vax any more, now it's a 68020.
Worse, since non-word-aligned values do cost extra cycles to access, any
68020 C compiler that didn't pad that structure is broken. Some "features"
of CISC processors are just too expensive to use.
--
_--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/ \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
v "Have you hugged your wolf today?" `-_-'
slackey@bbn.com (Stan Lackey) (01/23/90)
In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >> I guess all the world's not a Vax any more, now it's a 68020. >Worse, since non-word-aligned values do cost extra cycles to access, any >68020 C compiler that didn't pad that structure is broken. Some "features" >of CISC processors are just too expensive to use. Just a quick summary of the last time we went around on this issue: There are a number of interesting applications that build many instances of small data structures, each containing varied data types. It was said that logic simulators do this. In a machine that forces you to always have data aligned, this can result in lots of wasted memory. Not because the programmer is stupid, but because of the nature of the application. Now, if I have a 4MB workstation, and alignment restrictions increases the need from under 4MB to over 4MB, there will be significant paging. I'd rather spend two cycles to access a word sometimes, than have to page over the Etherent. So would the people with whom I share the network. ------ Also: the comments on the 360 (aligned) vs 370 (unaligned): Boy did I hear a different story. The version I heard was that the 370 supported unaligned data, because the experience with the 360 showed it was incredibly painful to be without it. Remember in those days memory was VERY expensive. :-) Stan
cik@l.cc.purdue.edu (Herman Rubin) (01/23/90)
In article <LJ81OX3ggpc2@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes: > > I guess all the world's not a Vax any more, now it's a 68020. > > Worse, since non-word-aligned values do cost extra cycles to access, any > 68020 C compiler that didn't pad that structure is broken. Some "features" > of CISC processors are just too expensive to use. Having seen the statement about penalties for unaligned, I tried the following code (hand coded in assembler to eliminate unnecessary overhead): ..... while(k < end)*k++ = *i++ ^ *j++; and the j pointer was deliberately unaligned. Now this was on a VAX, and it is possible that other machines may give different results, but the time penalty, while there, was not excessive. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
weaver@weitek.WEITEK.COM (01/24/90)
In article <51245@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >Just a quick summary of the last time we went around on this issue: > >There are a number of interesting applications that build many >instances of small data structures, each containing varied data types. >It was said that logic simulators do this. In a machine that forces >you to always have data aligned, this can result in lots of wasted >memory. Not because the programmer is stupid, but because of the >nature of the application. > I want to point out here that this data alignment problem can be mostly worked around for application programs. On a machine with "natural" alignment, a structure (record, common) made of primitive data items (integers, pointers, floats, etc.) needs no padding if the elements are ordered such that smaller items always follow larger items. The size ordering of primitive data items is machine dependant, but similar from one machine to the next. If the entire record is not a multiple of the largest required alignment, then some space may be lost between structures, or in nested structures. This cannot be handled so easily. In summary, if you are writing an application from scratch, you can minimize this effect in an almost (but not quite!) machine independant way. So for new programs, I think natural alignment is a good time/speed tradeoff. I also think that supporting unaligned data by both traps and special in-line code is a good idea, since so many programs have long histories. Michael.
hascall@cs.iastate.edu (John Hascall) (01/24/90)
In article <21361> weaver@weitek.UUCP (Michael Gordon Weaver) writes: }In article <51245> slackey@BBN.COM (Stan Lackey) writes: }>There are a number of interesting applications that build many }>instances of small data structures, each containing varied data types. }>It was said that logic simulators do this. In a machine that forces }>you to always have data aligned, this can result in lots of wasted }>memory. Not because the programmer is stupid, but because of the }>nature of the application. }I want to point out here that this data alignment problem can be }mostly worked around for application programs. } [sort elements of structures by decreasing size...] It seems to me that now we have a conflict between "software engineering" and architecture. It surely seems to me that, from a programming point of view, you would want your structures in some meaningful order as an aid to program understanding. Shouldn't elements that are used together, be located together? And doesn't everyone pretty much expect certain elements at the top of structures, for example: struct FOO { struct BAR { struct FOO *next; struct BAR *left; struct FOO *prev; struct BAR *right; : : }; }; And on machines with "displacement mode" addressing (i.e., 32(R4) addresses the element 32 bytes into the structure at the address in register four) there is often a bonus (e.g., speed or code size) for elements within some distance (i.e., 127 bytes) from the start of the structure. So if you put the big elements first, you minimize the number of "close" elements. John Hascall / ISU Comp Ctr
gary@dgcad.SV.DG.COM (Gary Bridgewater) (01/24/90)
In article <21361@weitek.WEITEK.COM> weaver@weitek.UUCP (Michael Gordon Weaver) writes: >In article <51245@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >>Just a quick summary of the last time we went around on this issue: >> >>There are a number of interesting applications that build many >>instances of small data structures, each containing varied data types. >>It was said that logic simulators do this. In a machine that forces >>you to always have data aligned, this can result in lots of wasted >>memory. Not because the programmer is stupid, but because of the >>nature of the application. >> > >I want to point out here that this data alignment problem can be >mostly worked around for application programs. I think you missed the phrase "Not because the programmer is stupid..." >On a machine with "natural" alignment, a structure (record, common) >made of primitive data items (integers, pointers, floats, etc.) >needs no padding if the elements are ordered such that smaller items >always follow larger items. The size ordering of primitive data >items is machine dependant, but similar from one machine to the next. >If the entire record is not a multiple of the largest required alignment, >then some space may be lost between structures, or in nested >structures. This cannot be handled so easily. I need to allocate an array of 50,000,000 8 bit integers. How do I do this? Which is more important 1) overall memory use, 2) misalignment penalty, or code readability? Then I need to allocate 1,000,000 structs containing other structs written by another programmer. What is the natural order of the data a priori on any machine? How big is an addr_t on a 386? Sparc? Cray? Is it bigger than a long float? I plan to pass these structures from a Sun 4 to a Vax to a Cray via an ethernet connection. Now what is the natural order? >In summary, if you are writing an application from scratch, you >can minimize this effect in an almost (but not quite!) machine >independant way. So for new programs, I think natural alignment >is a good time/speed tradeoff. I also think that supporting >unaligned data by both traps and special in-line code is a good >idea, since so many programs have long histories. I suggest that when RE-writing a program from scratch you can mitigate this effect if you have some idea where the code is going to run. This is of little help to Simulator vendors who have to run across different architectures. When you write a program you have no idea if it will be successful enough to be bothered by data alignment inefficiencies. You are usually more worried about getting it up quickly and in the same execution universe as the specs. In general, you are stuck and at best will have to go back and micro-tune the heck out of it on a case-by-case basis. In your spare time, study malloc algorithms so you can figure out how to allocate bit structures for fun and profit. I agree that it is easier if the hardware lets you misalign but that thinking is passe in the brave new world of RISC where using the computer is a compiler problem. -- Gary Bridgewater, Data General Corporation, Sunnyvale California gary@proa.sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary Networking is the worst form of data exchange except for all the others (apologies to WC).
larus@primost.cs.wisc.edu (James Larus) (01/25/90)
In article <21361@weitek.WEITEK.COM>, weaver@weitek.WEITEK.COM writes: > In summary, if you are writing an application from scratch, you > can minimize this effect in an almost (but not quite!) machine > independant way. So for new programs, I think natural alignment > is a good time/speed tradeoff. I also think that supporting > unaligned data by both traps and special in-line code is a good > idea, since so many programs have long histories. This statement may be true in general, but it is not always true. For example, I wrote a program tracing system that writes out a trace file consisting of a mixture of bytes, halfwords, and full words. It is crucial to this system that the byte quantities only take up 8 bits (otherwise the size of the already large files grow by a factor of 2 or more). However, it means that I need to do unaligned stores into the trace buffer. And, since I trace programs in real time, I need to do the stores fast. The MIPS R2000 has a 2 instruction sequence that can store a half/fullword quantity on any byte boundary. On SPARC, it takes 7 instructions to store fullwords byte-by-byte. Comming from Berkeley, I hate to say it, but this is another case in which MIPS has a much better designed machine than Sun (-: /Jim
mph@lion.inmos.co.uk (Mike Harrison) (02/01/90)
In article <1810@sunquest.UUCP> terry@sunquest.UUCP (Terry Friedrichsen) writes: >The abstruct/memstruct proposal for aligning/not aligning C structure >members leads me to post what I thought was the obvious idea all >along :-): > >Borrow Pascal's idea of "records" and "packed records". So in C, Or even borrow Ada's ideas, which separate the abstaction and the representation. Given a sset of declarations such as: type INTEGER32 is range -2147483648 .. 2147483647; type INTEGER16 is range -32768 .. 32767; type STRUCT is record F1 : INTEGER32; F2 : INTEGER16; F : INTEGER32; end record; objects of type STRUCT will be mapped in any way that the compiler wishes, with fields (potentially) re-ordered, padding added etc., for efficiency of access. If I need a specific mapping I can achieve it by providing a Representation Clause, eg. for maximum packing: WORD : constant := 4; -- assumes storage unit is byte, 4 bytes per word. for INTEGER32'SIZE use 32; for INTEGER16'SIZE use 16; for STRUCT use record at mod 16; F1 at 0 * WORD range 0 .. 31; F2 at 1 * WORD range 0 .. 15; F3 at 1 * WORD range 16 .. 47; end record; In this case objects of type STRUCT will be laid out exactly as shown and will occupy exactly 80 bits, and will be aligned on a half-word boundary. The compiler will generate appropriate code to access complete objects or idividual fields, for a RISC of the kind being discussed this will obviously be (?much) less efficient, but if the application needs it who cares? If I wish to declare an array of these objects, such as: type STRUCT_ARRAY is array (NATURAL range <>) of STRUCT; I can inform the compile that I want maximum packing by writing : pragma PACK (STRUCT_ARRAY); Mike, Michael P. Harrison - Software Group - Inmos Ltd. UK. ----------------------------------------------------------- UK : mph@inmos.co.uk with STANDARD_DISCLAIMERS; US : mph@inmos.com use STANDARD_DISCLAIMERS;
aglew@dwarfs.csg.uiuc.edu (Andy Glew) (02/02/90)
Terminology check: I have been informed that DEC VAX designers call the overlap case, the case where a data field overlaps two bus widths, non-aligned. -- Andy Glew, aglew@uiuc.edu
peter@ficc.uu.net (Peter da Silva) (02/02/90)
In article <3428@odin.SGI.COM> pkr@maddog.sgi.com (Phil Ronzone) writes: > No, I disagree. Most of the time the data (mis)alignments are from real world > constraint. Compressed video data, even when capacious CD-ROMs are used, are > full of adjacent 1, 2, 3, and 4 byte integer And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit addressible memory, but outside of microcontrollers I've never actually seen it. > And generally you want to > read and display that data as fast as is possible. Having a microcoded > unaligned data capability is faster than user-level instructions doing the > same thing. If your loop is in an instruction cache you'll probably find that your actual memory accesses are coming as fast as the bus can fill them. Cutting the size of your loop will just add wait states. -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ \_.--._/ Xenix Support -- it's not just a job, it's an adventure! v "Have you hugged your wolf today?" `-_-'
khb@chiba.kbierman@sun.com (Keith Bierman - SPD Advanced Languages) (02/02/90)
In article <1990Jan31.174500.10553@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >1) In the hardware, perhaps taking a performance hit on "misaligned" data. >2) In the compilers >3) Explicitly in the program. any of the above. (Sun-3 C does #1, Mips C does #2 on request, SPARC C does #3.) It does not *promise* anything but the lowest common denominator, however, to wit number 3. Sun's compilers have a -misalign option. -- Keith H. Bierman |*My thoughts are my own. !! kbierman@Eng.Sun.COM It's Not My Fault | MTS --Only my work belongs to Sun* kbierman%eng@sun.com I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO" "There is NO defense against the attack of the KILLER MICROS!" Eugene Brooks
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/02/90)
In article <4YG1638xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: | And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit | addressible memory, but outside of microcontrollers I've never actually | seen it. I *think* the Intel 432 has bit addressibility. I don't have my manual (yes I kept one) here, and I evaluated it about the time of first engineering samples. It was ahead of its time. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/03/90)
In article <4YG1638xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit >addressible memory, but outside of microcontrollers I've never actually >seen it. The CDC STAR and its relatives (Cyber 205, ETA 10) all have/had bit addressable memory. I think it is a good idea. On the subject of this discussion, those machines *still* required alignment on natural boundaries. Bits on bits (easy :-) bytes on bytes, 32 bit on 32 bit, 64 bit on 64 bit, etc. I note that the machine had 48 bit addresses, breaking the 32 bit addressing boundary almost 20 years ago. Of course, the machine supported 64 bit registers (both 32 and 64 actually). On the other subject of this discussion, the System Programming Language for those machines had a special construct to be used when you needed to create packed structures. Other languages do too. It is not correct to assume that you can create an arbitrary structure and expect that the compiler will map it in a certain way in memory. You need a special construct to do that. Thank goodness (almost) nobody builds 36 bit, 48 bit, and 60 bit machines anymore- you might even be able to do it in a portable way. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
carlw@mercury.sybase.com (carl weidling) (02/03/90)
The question is whether or not C's requirement to build structures with the components in the order in which they were declared is a mistake or not. In article <1990Jan29.173412.2859@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: < stuff deleted > >The basic problem here is that the compiler cannot read minds, and the >language does not provide a way to tell the compiler which of two >interpretations is wanted. The two possibilities are "I want precise >control of what goes into memory" and "I want these members but please >pad as necessary to make accesses fast". Unfortunately, you can't just >say "well, if I want padding I'll put it in myself", because many people >want to write portable programs, and the padding requirements are *very* >machine-specific. Precise control of memory layout is not necessary for < rest of article deleted> Reading this I got an idea which is a slight variation on the idea of a pragma or directive in the language. Why not have a PRE-processor directive that will re-arrange the fields in a structure to maximize efficiency one way or the other? The C-language itself is untouched, the programmer can run the pre-processor by itself on the code to see what was done. Perhaps lint could be made smart enough to tell if someone was playing too many games with one of these re-arranged structures. Something like struct { int alpha; #ARRANGE_ANY_WAY_YOU_WANT /* maybe specify criteria? i.e. speed vs compact */ long beta; char gamma[3]; #END_ARRANGE } -Carl Weidling
pkr@maddog.sgi.com (Phil Ronzone) (02/03/90)
In article <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes: >>Having a microcoded unaligned data capability is faster than >>user-level instructions doing the same thing. > >Why? > >Microcoded unaligned data takes two cycles to load an unaligned datum. >(Assuming the unaligned datum overlaps two data bus widths.) MIPSco >style load-left and load-right take two cycles to load the same >unaligned datum. I was thinking of bus-wide words (i.e., typically 32-bits). You have at least: BUS FETCH / SHIFT ALIGN / BUS FETCH / SHIFT ALIGN / OR / STORE Implementing these at typical user level adds even more -- tests to figure out how much to shift etc. ------Me and my dyslexic keyboard---------------------------------------------- Phil Ronzone Manager Secure UNIX pkr@sgi.COM {decwrl,sun}!sgi!pkr Silicon Graphics, Inc. "I never vote, it only encourages 'em ..." -----In honor of Minas, no spell checker was run on this posting---------------
rpeglar@csinc.UUCP (Rob Peglar) (02/03/90)
In article <4YG1638xds13@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes: > In article <3428@odin.SGI.COM> pkr@maddog.sgi.com (Phil Ronzone) writes: > > No, I disagree. Most of the time the data (mis)alignments are from real world > > constraint. Compressed video data, even when capacious CD-ROMs are used, are > > full of adjacent 1, 2, 3, and 4 byte integer > > And ajacent 4-, 5-, 6-, and 12- bit integers as well. I've heard of bit > addressible memory, but outside of microcontrollers I've never actually > seen it. > Actually, since the days of the CDC Star-100, that particular line of supercomputers (Star-Cy 203-Cy 205-ETA10) supported bit-addressable memory. This was important for things like vector bit string operations on arbitrary aligned operands. Such things (bit vector C <- bit vector A && bit vector B) were in microcode. Just thought you'd like to know. Rob -- Rob Peglar Control Systems, Inc. 2675 Patton Rd., St. Paul MN 55113 ...uunet!csinc!rpeglar 612-631-7800 The posting above does not necessarily represent the policies of my employer.
tihor@acf4.NYU.EDU (Stephen Tihor) (02/03/90)
Now add repr clauses to the RECORD ala ada in a terse C syntax of course to maintian consistancy and maxiomize errors: record long a :13,35; ... where a is a long integer stored in bits 13 thorugh 35 of the record structure.
shap@delrey.sgi.com (Jonathan Shapiro) (02/04/90)
In article <8314@sybase.sybase.com> carlw@mercury.UUCP (carl weidling) writes: > Why not have a PRE-processor directive that will re-arrange the >fields in a structure to maximize efficiency one way or the other? Yuck. If this problem is worth solving, it is worth solving right. Jon
aglew@dwarfs.csg.uiuc.edu (Andy Glew) (02/05/90)
>>>Having a microcoded unaligned data capability is faster than >>>user-level instructions doing the same thing. >> >>Microcoded unaligned data takes two cycles to load an unaligned datum. >>(Assuming the unaligned datum overlaps two data bus widths.) MIPSco >>style load-left and load-right take two cycles to load the same >>unaligned datum. > >I was thinking of bus-wide words (i.e., typically 32-bits). >You have at least: > BUS FETCH / SHIFT ALIGN / BUS FETCH / SHIFT ALIGN / OR / STORE >Implementing these at typical user level adds even more -- tests to >figure out how much to shift etc. LWL/LWR seem to work this way (and the MIPSco folk will correct me, I'm sure): BUS FETCH / SHIFT ALIGN / STORE selected bytes - LWL BUS FETCH / SHIFT ALIGN / STORE selected bytes - LWR Two instructions. Cycles depending on mermory. -- Andy Glew, aglew@uiuc.edu
henry@utzoo.uucp (Henry Spencer) (02/06/90)
In article <12780024@acf4.NYU.EDU> tihor@acf4.NYU.EDU (Stephen Tihor) writes: > long a :13,35; > ... >where a is a long integer stored in bits 13 thorugh 35... Bit 13, you say? Is that the 13th (or 14th) bit from the left, or the 13th (or 14th) bit from the right? And if it's from the right, how big is the unit of bits, i.e. how far is bit 0 from the leftmost bit? -- SVR4: every feature you ever | Henry Spencer at U of Toronto Zoology wanted, and plenty you didn't.| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
carr@gandalf.UUCP (Dave Carr) (02/06/90)
In article <11666@thorin.cs.unc.edu>, tuck@jason.cs.unc.edu (Russ Tuck) writes: > > If the compiler did what you suggest and did not align struct members, > it would in most cases be impossible to access the data member "c" above > without causing the program to dump core. This would not be a useful > compiler "feature" :-). SPARC (and most other RISC archs) requires all > ordinary memory accesses to be aligned. That's *most* RISC architecture. At least with the 80960 (I know, not a true RISC), I have the freedom to access non word aligned data. I would rather have the choice than let the RISC architecture force me. Data explosion on RISC computers is pretty bad. We should have the choice between slowing the CPU down only for those accesses which are not word aligned. We could pad the structures to speed it back up. -- Dave Carr | carr@e.gandalf.ca | If you don't know where Gandalf Data Limited | TEL (613) 723-6500 | you are going, you will Nepean, Ontario, Canada | FAX (613) 226-1717 | never get there.
ccc_ldo@waikato.ac.nz (02/07/90)
In <LJ81OX3ggpc2@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) deplores the fact that VAX and 680n0 (n > 1) processors allow word and longword accesses on arbitrary byte boundaries, saying that "since non-word-aligned values do cost extra cycles to access, any 68020 C compiler that didn't pad that structure is broken." GAK! I can't believe I'm reading this! He goes on to say: "Some 'features' of CISC processors are just too expensive to use." So what's the alternative? Have you tried counting how many extra cycles you spend doing explicit byte-by-byte accesses? The thing is, as a programmer, I want to be able to make the tradeoff (space versus time) myself. Sure byte-packed structures will cost more memory cycles to access, and bit-packed structures even more so. But there are times when I'm willing to pay the cost--think of an array of a hundred thousand 3-byte records, or a million boolean elements. I heartily *RESENT* CPU designers who take it upon themselves to say "Nope--that alternative costs a few too many memory cycles for us to be comfortable with, so we'll leave out the hardware support for it, and force you to do it in software should you feel the urge, which will make it even *MORE* expensive." Even Motorola aren't completely free of this disease. Look at the 68881/68882 floating-point units, which put an extra 16 padding bits into every extended- precision quantity, just so they take up an even number of longwords. I don't like compilers which automatically insert padding bits and bytes between elements of arrays and structures--which is to say, most of them. My argument is based on a very simple principle: "correctness comes before efficiency". Pad fields can cause all kinds of problems, not just when different compilers follow different rules in inserting them: think of what happens when you're trying to compare two objects which happen to differ only in the random garbage in the pad fields. I'd rather have a compiler which allowed me to specify explicit alignment constraints on element types (with the default being byte alignment for everything), and which reported errors with element offsets that didn't match their alignment constraints--that is, I was forced to put in padding fields myself (so that I knew where they were and how big they were), rather than having the compiler do it for me. In conclusion, I'll say it again. *I'M* the programmer, *I'M* the only one who knows what the performance requirements of my program are (including its memory usage), let *ME* make the tradeoff decisions. Lawrence D'Oliveiro Computer Services Dept, University of Waikato Hamilton, New Zealand ldo@waikato.ac.nz
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (02/08/90)
In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: > In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: > >Worse, since non-word-aligned values do cost extra cycles to access, any > >68020 C compiler that didn't pad that structure is broken. > > This is nonsense. Which you want depends whether speed or size is more > important. A valid criticism would be that too many C compilers don't let > you specify which kind of optimisation you want. > This discussion, IMHO, is pointless. The C compilers work just fine the way are (or at least the ones I am familiar with). I don't think some of the people discussing this realize the implications of what they propose. I work on an Intergraph Clipper based workstation. Unless I am mistaken, floating point values can only be aligned on 8 byte boundaries if the processor is to be able to access them in a single instruction. If you try to access a floating point value that is not 8 byte aligned, it actually grabs the value at the next lowest 8 byte boundary. It doesn't even give a bus error trap! In theory, the compiler could place it on arbitrary boundaries by generating a sequence of instructions that would read adjacent values and AND and OR the values into memory. It sounds to me that we are talking about 4 or 5 instructions to do this, so your access speed would be the pits! The reason people seem to want to be able to store values at arbitrary locations seems to have to do with the need to write out contiguous regions of memory to a binary file. They then complain that reading that file back into the memory of another machine doesn't work. No one ever said it would. If you want portable code, don't write it that way. It is almost always possible to sacrifice portability for speed. I don't know why this is so astonishing; you can't write out binary values for integers between machines, what would lead anyone to believe that structures should be any different. C is a low level language. If you want greater data abstraction, move to a higher level language that guarantees that data will appear to be in the same format across systems. That guarantee is not in the C definition; doing so would probably limit C's ability to blast bits. The only format that C guarantees to understand is ascii represented numeric values. The only thread of this discussion that might relate to comp.arch is why processors (such as Clipper) do not give a trap if you try to access memory on illegal boundaries. Surely that would not require much silicon? -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
cik@l.cc.purdue.edu (Herman Rubin) (02/11/90)
In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: > In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: > > In article <LJ81OX3ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: ...................... > I don't know why this is so astonishing; you can't write out binary > values for integers between machines, what would lead anyone to believe > that structures should be any different. I can see no more reason why strings of ASCII characters should be transferrable by hardware with little software intervention than binary integers, other fixed place binary numbers, other types of numbers (not strings of numerals), mathematical symbols beyond the usual ones, etc. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
woody@rpp386.cactus.org (Woodrow Baker) (02/11/90)
In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: > In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: > This discussion, IMHO, is pointless. The C compilers work just fine the way > are (or at least the ones I am familiar with). I don't think some of the > people discussing this realize the implications of what they propose. Wrong. It depends on what you do. I happen to do programming dealing with industrial controllers. Specificaly, I maintain a compiler, editor downloader, and monitor package used to program Eagle Signal Controls EPTAK series industrial controllers. The code that I work on runs under MS-DOS. I have to do things like reach out over the network, and read data structures out of the remote controllers. These structures for the most part, are a mix of byte and word fields. I then have to parse through them, and isolate the parts. Structures are the obvious way to do this. BUT, the @#$% compiler choses to pad byte or char values out to ints. This, obviously screws up the data structure access to the retrieved values. I have wound up doing things that I am not proud of, like unions, monkeying around with pointers to the structures such that they don't point to where they should, but to some offset other than the first byte of the structure etc. Yes, I could chose to use an array, but it is clearer to use standard field names, (at least standard for the EPTAK controlers) to access these data fields. Cheers Woody
peter@ficc.uu.net (Peter da Silva) (02/11/90)
Use structs internally. Provide functions to read and write each structure, that do the needed conversions. Never touch the external format internally. For example: Analog accumulator: | flags | val.lo val.hi | +--------+--------+--------+ | BYTE 0 | BYTE 1 | BYTE 2 | struct accumulator { char flags; int value; }; read_accumulator(addr, info) char *addr; struct accumulator *info; { info->flags = addr[0]; info->value = addr[2]; info->value = (info->value << 8) | addr[1]; } write_accumulator(addr, info) char *addr; struct accumulator *info; { *addr++ = info->flags; *addr++ = info & 0xFF; *addr = (inf >> 8) & 0xFF; } -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ \_.--._/ Xenix Support -- it's not just a job, it's an adventure! v "Have you hugged your wolf today?" `-_-'
ronald@robobar.co.uk (Ronald S H Khoo) (02/12/90)
In article <17906@rpp386.cactus.org> woody@rpp386.cactus.org (Woodrow Baker) writes: > > MS-DOS. I have to do things like reach out over the network, and read > data structures out of the remote controllers. These structures for the > most part, are a mix of byte and word fields. I then have to parse through > them, and isolate the parts. Structures are the obvious way to do this. > BUT, the @#$% compiler choses to pad byte or char values out to ints. #ifdef MEDIUM_MADRAS You don't think this is a hint that it would have been *so* much easier if everything spoke *text* instead. Sure, there's the overhead of binary->text->binary, but the advantages outweigh the cost, especially if you ever have a mix of controllers with wildly differing internal architectures. Oh, you want to discourage that to lock your customers in? Excuse me. #endif -- Eunet: Ronald.Khoo@robobar.Co.Uk Phone: +44 1 991 1142 Fax: +44 1 998 8343 Paper: Robobar Ltd. 22 Wadsworth Road, Perivale, Middx., UB6 7JD ENGLAND. $Header: /usr/ronald/.signature,v 1.2 90/01/26 15:17:15 ronald Exp $ :-)
msb@sq.sq.com (Mark Brader) (02/13/90)
> I can see no more reason why strings of ASCII characters should be > transferrable by hardware with little software intervention than binary > integers, other fixed place binary numbers, other types of numbers ...etc. Because ASCII is, after all, the American Standard Code for Information Interchange, and those other things aren't. See signature quote. Followups to comp.arch. -- Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com A standard is established on sure bases, not capriciously but with the surety of something intentional and of a logic controlled by analysis and experiment. ... A standard is necessary for order in human effort. -- Le Corbusier This article is in the public domain.
cik@l.cc.purdue.edu (Herman Rubin) (02/13/90)
In article <S_O1_F6xds13@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes: > Use structs internally. > > Provide functions to read and write each structure, that do the needed > conversions. Never touch the external format internally. [Example deleted.] This is another situation where the procedure is extremely slow in software. If the appropriate hardware were provided, this would not be a problem. But would the machine then be RISC? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
peter@ficc.uu.net (Peter da Silva) (02/14/90)
In article <1925@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > This is another situation where the procedure is extremely slow in software. > If the appropriate hardware were provided, this would not be a problem. But > would the machine then be RISC? Who cares if it's RISC, CISC, VLIW, or a bunch of elves with abaci? If it's fast enough, fine. If it's not, unroll the loop to the LCD of the struct size and the data size. If that doesn't do it, recode in assembler. Then get a faster machine (where faster is defined in terms of the problem you have to solve: if the problem involves moving weird numbers of bits around all the byte ops in the world won't help you). Maybe a coprocessor would help (like having a disk controller to convert NRZ into MFM instead of doing it yourself). Most of the time this particular operation isn't a bottleneck, so who cares how fast it is? -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ \_.--._/ Xenix Support -- it's not just a job, it's an adventure! v "Have you hugged your wolf today?" `-_-'
pasek@ncrcce.StPaul.NCR.COM (Michael A. Pasek) (02/14/90)
In <17906@rpp386.cactus.org> woody@rpp386.cactus.org (Woodrow Baker) writes: >In <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: >> This discussion, IMHO, is pointless. The C compilers work just fine the way >> are (or at least the ones I am familiar with). I don't think some of the >> people discussing this realize the implications of what they propose. >Wrong. It depends on what you do. [specifics deleted..] > I have to do things like reach out over the network, and read >data structures out of the remote controllers. These structures for the >most part, are a mix of byte and word fields. I then have to parse through >them, and isolate the parts. Structures are the obvious way to do this. >BUT, the @#$% compiler choses to pad byte or char values out to ints. I also have the same problem. Having the compiler pad to the "native" data size is OK if (and ONLY if) you have complete control over that data structure and do not need to share it with other programs/systems. However, in data communications protocols (pick one), the programmer has NO control over the data structure -- it is predefined, and doesn't come with that nice padding that the compiler likes to put in. Some recent RISC compilers (I'm looking at the 29K) allow you to specify whether structures are "packed" or not, which I think is mandatory. Unfortunately, in the case of the 29K compiler, although it will "pack" structures, as far as I know it will NOT generate the appropriate instructions to access those structures if the external memory subsystem does NOT support non-aligned accesses. Oh, well.... M. A. Pasek Software Development NCR Comten, Inc. (612) 638-7668 MNI Development 2700 N. Snelling Ave. pasek@c10sd3.StPaul.NCR.COM Roseville, MN 55113
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (02/15/90)
In article <17906@rpp386.cactus.org>, woody@rpp386.cactus.org (Woodrow Baker) writes: > In article <328@ctycal.UUCP>, ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: > > In article <1648@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: > > This discussion, IMHO, is pointless. The C compilers work just fine the way > > are (or at least the ones I am familiar with). I don't think some of the > > people discussing this realize the implications of what they propose. > Wrong. It depends on what you do. I happen to do programming dealing > with industrial controllers. Specificaly, I maintain a compiler, editor > downloader, and monitor package used to program Eagle Signal Controls > EPTAK series industrial controllers. The code that I work on runs under > MS-DOS. I have to do things like reach out over the network, and read > data structures out of the remote controllers. These structures for the > most part, are a mix of byte and word fields. I then have to parse through > them, and isolate the parts. Structures are the obvious way to do this. > BUT, the @#$% compiler choses to pad byte or char values out to ints. If you are passing values across a network to dissimilar machines, you should be using something like the XDR (External Data Representation). This makes for portable (although messy) code. In your case, I would agree that your compiler might reasonably be considered to be malfunctioning, since the Intel processors can access arbitrarily aligned data. The discussion originally discussed RISC processors which can NOT access arbitrary alignments for all data types. In this case padding is necessary. To minimize the amount of padding, it is necessary to reorder the structure elements. This is in accordance with K&R which (as I recall) explicitly states that the elements may be re-ordered. I re-iterate my original claim; it is not the compilers that are causing the problems (your case excepted). Rather, it is the fact that different processors have different access requirements for data types. Even if you wrote your programms in RISC assembler (a horrible thought) then you could not align your variables arbitrarily. You would be forced to make the same decisions/tradeoffs that the compilers make. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
martin@mwtech.UUCP (Martin Weitzel) (02/21/90)
There were some recent postings, that pointed out/complained about 'holes' in C-struct definitions. I hope it is to the benefit of some readers, to explain an alternate point of view of C-struct-s and give some advice how to access a certain byte-layout in memory in a portable (nevertheless painless) way, which avoid struct-s completly. Because the latter may be of more interest, I'll come to it first. Suppose, you have some library function 'getmsg' you supply with the adresse of a buffer and when the function returns it has the buffer filled with the following information: 2 Byte Integer - length of message 1 Byte - several flag bits 1 Byte - type of message 4 Byte Integer - checksum 100 Byte - arbitrary message Many C-Programmers now think about defining the following struct m { short m_length; unsigned char m_flags; char m_type; unsigned long m_checksum; char m_bytes[100]; } buffer; so that after an 'getmsg(&buffer)' they can access the individual parts 'by name', eg: buffer.m_length, buffer.m_flags, .... ... and as the previous posters pointed out, they eventually get trapped by the 'holes' inserted into the struct by the compiler for the sake of efficiency. My advice in this situation is, to change this code as follows: char buffer[ 2 /* length of message */ + 1 /* several flag bits + 1 /* type of message */ + 4 /* checksum */ + 100 /* arbitrary message */ ]; #define m_length(b) (*((short *) (char *)(b) + 0)) #define m_flags(b) (*((unsigned char *)(char *)(b) + 2)) #define m_type(b) (*((char *) (char *)(b) + 3)) #define m_checksum(b) (*((unsigned long *)(char *)(b) + 4)) #define m_bytes(b) ( (char *)(b) + 8 ) (I inserted some white space for readability.) The least you must know of your compiler in that case is that a 'char' occupies exactly one byte in an 'array of char'. But as before, you can access the individual parts 'by name' as follows: m_length(buffer), m_flags(buffer), .... If 'getmsg' is allways supplied to the same buffer, you could make it even simpler by avoiding a parametrized macros and use #define m_length (*(short *)buffer) #define m_flags (*(unsigned char *)(buffer + 2)) ...... Note that the above expressions are also 'lvalues' ie you can use them on the left side of an assignment. There remains only the minor problem, that 'buffer' must be properly aligned. (Techniques for achieving this are shown in K&R - you simply have to define buffer as a union with the type of desired alignement. Alternatively you may allocate the buffer with 'malloc'.) If your concern is only 'reading' the elements out of the buffer, you have the additional benefit that you can transparently compensate for possible 'byte-order' problems. Suppose the message is produced by some piece of hardware that assumes the LSB of a 16 Bit Integer on the lower adress, and you want to move this hardware to a system, where the CPU takes just the opposite view. All you have to change is: #define m_length ((short)\ ((*(unsigned char *)(buffer+1))<<8)\ |(*(unsigned char *)buffer)) ....... (Hope I missed no brackets ... :-)) Now back to an alternate view of the C-struct-s, hit 'n' if you are no more interested. IMHO many features of the C language can elegantly be explained in an easy way, if you 'translate' the feature to the 'machine level'. (Eg I explain much about pointers and arrays to my classes by sketching pictures with the contents of the data segment.) One thing to misunderstand here is, that such an explanation often describes only *one* possible approach to implement the abstract concept: Though it seems natural, to think about a C-struct as beeing a collection of individual variables located at increasing memory adresses in the order they are declared(%) as struct-components, it often makes more sense, to see a C-struct only as a collection of data-items, that are garanteed *not* to overlap(%%). Furthermore the compiler asserts that access to a named struct-component will allways refer to the same part of memory, even if only the struct-s adress is the same (important when transfering struct-pointers as function parameters). The other guaranty, that the struct-components are located (more or less) adjacent in memory is only of some 'practical' value, especially if you have an 'array of struct'-s or write one struct to a file (using write/fwrite together with sizeof), but has nothing to do with the abstract concept of a C-struct. (%): Even the guarantee, that the struct elements are at ascending adresses in the order they are declared, IMHO only was given to avoid complex (and hard to understand) rules, when and when not it would be allowed to rearrange the elements. Readers who know other good reasons why this guarantee is given are welcome to correct me (hello Chris :-)). (%%): Note, that in the case of a C-union the garanty is *not* that the elements overlap: They only *may* overlap (unless they are of the same type or they are different C-structs but with components of the same type at the beginning, which leads back to the problem when and when not rearranging could have been allowed ... again, correct me if I'm wrong). -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83 -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
peter@ficc.uu.net (Peter da Silva) (02/23/90)
> (%): Even the guarantee, that the struct elements are at ascending > adresses in the order they are declared, IMHO only was given > to avoid complex (and hard to understand) rules, when and when > not it would be allowed to rearrange the elements. Readers who > know other good reasons why this guarantee is given are welcome > to correct me (hello Chris :-)). It makes the following two practices reasonably portable: 1: struct list_header { struct list_header next, prev; }; struct object { struct list_header list; ... }; struct list_header *my_list == NULL; struct object my_object; extern add_list(struct list_header **list, struct list_header *elt); add_list(&my_list, &my_object); 2: struct buffer { int len; char *next; char data[1]; }; struct buffer *new_buffer(size) int size; { struct buffer *temp; temp = (struct buffer *) malloc(sizeof *temp + size); if(temp) { temp->len = size; temp->next = &temp->data[0]; } return temp; } -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ \_.--._/ Xenix Support -- it's not just a job, it's an adventure! v "Have you hugged your wolf today?" `-_-'
djones@megatest.UUCP (Dave Jones) (02/23/90)
From article <645@mwtech.UUCP), by martin@mwtech.UUCP (Martin Weitzel): ) There were some recent postings, that pointed out/complained about ) 'holes' in C-struct definitions. ... ) ) My advice in this situation is, to change this code as follows: ) ) char buffer[ ) 2 /* length of message */ ) + 1 /* several flag bits ) + 1 /* type of message */ ) + 4 /* checksum */ ) + 100 /* arbitrary message */ ) ]; ) ) #define m_length(b) (*((short *) (char *)(b) + 0)) ) #define m_flags(b) (*((unsigned char *)(char *)(b) + 2)) ) #define m_type(b) (*((char *) (char *)(b) + 3)) ) #define m_checksum(b) (*((unsigned long *)(char *)(b) + 4)) ) #define m_bytes(b) ( (char *)(b) + 8 ) ) There's probably going to be a flurry of replies telling you why this will not work in the general case. These casts from char* to this-or-that* are not going to work unless the data just happen to be properly aligned for whatever processor you happen to be using.
martin@mwtech.UUCP (Martin Weitzel) (02/24/90)
In article <12118@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: }From article <645@mwtech.UUCP), by me (Martin Weitzel): }) There were some recent postings, that pointed out/complained about }) 'holes' in C-struct definitions. }... }) }) My advice in this situation is, to change this code as follows: }) }) char buffer[ }) 2 /* length of message */ }) + 1 /* several flag bits }) + 1 /* type of message */ }) + 4 /* checksum */ }) + 100 /* arbitrary message */ }) ]; }) }) #define m_length(b) (*((short *) (char *)(b) + 0)) }) #define m_flags(b) (*((unsigned char *)(char *)(b) + 2)) }) #define m_type(b) (*((char *) (char *)(b) + 3)) }) #define m_checksum(b) (*((unsigned long *)(char *)(b) + 4)) }) #define m_bytes(b) ( (char *)(b) + 8 ) }) } }There's probably going to be a flurry of replies telling you why }this will not work in the general case. } }These casts from char* to this-or-that* are not going to work }unless the data just happen to be properly aligned for whatever }processor you happen to be using. I'm well aware that allignment restrictions may invalidate certain casts from one pointer type to another, but you must see my proposual in the context of the original questions: The posters generally complained, that they were not able to overlay certain byte patterns in memory, because the C-struct they defined for that purpose contained holes (introduced by the compiler). The question, by which hard- or software the byte patterns were produced, was never mentioned in these postings, but because the posters seemed to be sure, that (only) the holes in the structures caused the problems, the parts must have been allready properly aligned If the parts of the byte patterns were not properly aligned, also struct-s *without* holes could not have been used for this purpose(%). So my proposual is not worse than a struct, but sometimes helps to get (better) control of which memory locations are accessed, than struct-s can provide. If it is only necessary to *read-access* the bytes in question, the approach described later in my original posting for getting 'wrong' byte order 'right', may also be used in case of not properly aligned short-s, int-s or long-s. (%) If a compiler, which supports an option to pack structures, does this *always tightly*, even on systems with specific alignment requirements for short-s, int-s and long-s, it may emit code to acces the LSB/MSB idividual and combine them in a register, but this would be such an extreme performance penalty, that I guess such compilers are rare. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83