[net.lang.c] the one and only objection to C

bing@galbp.UUCP (Bing Bang) (12/21/84)

I love C. I think it's by far the best compiler based language that has ever
been produced.

I only have minor complaint:
why in the world does most C compilers insist on padding structures?

I am currently working on a network driver that needs to handle a data
packet that has a precise structure to it. It's easy to describe the
structure in C, but if the compiler puts in padding between fields, I can't
simply read in a packet on top of a structure. I must instead "jump" over
the padding bytes both going and comming.

"No, you stupid computer, do what I mean, not what I type!"
...akgua!galbp!bing

graham@orca.UUCP (Graham Bromley) (12/23/84)

> I love C. I think it's by far the best compiler based language 
> that has ever been produced.  I only have minor complaint:
> why in the world does most C compilers insist on padding structures?
> I am currently working on a network driver that needs to handle a data
> packet that has a precise structure to it. It's easy to describe the
> structure in C, but if the compiler puts in padding between fields, 
> I can't simply read in a packet on top of a structure. I must 
> instead "jump" over the padding bytes both going and comming.

I agree with your appreciation of C. No ka oi. About your structure
problem, there's a reason for it. For example, floats have to be
32 bit word aligned on a VAX for use by float instructions.
Also I think a PDP11 requires operands of a mov (as opposed to
movb) instruction to be 16 bit word aligned, i.e. an int would
have to be 16 bit word aligned. If the compiler didn't do this,
before you could use such a structure element you would have to:

1.   Copy it as a byte string (using casts) into a variable of the 
     correct type.
2.   Do the operation on that variable.
3.   Byte copy it back into the structure.

i.e. you could use the structure to store data but couldn't do
anything with it.  However you will have to do this if you really 
want to be sure you structure is 'packed'. It should be an easy matter
to load and unload your structure, by copying each element as a
byte string using sizeof to get the appropriate number of bytes.

     gbgb aka the longjmp artist

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/26/84)

> why in the world does most C compilers insist on padding structures?

On some architectures, there are alignment restrictions such that data
of a particular type must be stored at an address that is a multiple
of some number of bytes, in order to be directly accessible; e.g., on a
PDP-11 all non-char data must be stored at an even-byte address.  The C
run-time environment designer takes these and other things into account
when deciding how to align structure members.

On a fully byte-addressable machine, no struct padding should be used.

On the VAX, a silly decision was made.  In spite of the fact that there
are no alignment constraints imposed by the architecture, the VAX PCC
implementors nevertheless chose to align data in order to take into
account the way the 11/780 cache was designed.  The idea was to squeeze
a small speed improvement out of the VAX-11/780.  I wonder if the scheme
continued to accomplish its goal on other VAX models?

jc@mit-athena.ARPA (John Chambers) (12/26/84)

Yet another objection to alignment.  I have had the fun (:-) of
porting code that had to deal with "structured" files.  In particular,
I have brought up several C compilers as cross compilers.  Alignment
is the biggest single pain in doing this.

Suppose you're trying to read a file which starts with a 16-bit field,
followed by a 32-bit field, then another 16-bit field.  Or suppose
you're trying to handle an archive file, which has a 14-char array
followed by an int of some sort.  The turkey compiler on your 32-bit
machine insists on aligning things "properly" on 4-byte boundaries.
(This would be fine if it would also transform all the files involved,
but it isn't that smart!)

So what do you do?  You dig in to the code, changing the 32-bit fields
to pairs of 16-bit fields.  Then, wherever the fields were mentioned,
you modify the code to extract the two little fields and combine them.

This is a royal pain in the *ss!!!

The trouble is that if you are handed data that is misaligned, someone
has to do the dirty work of unpacking, shifting, masking, etc.  Why 
does it have to be me?  That's why compilers were invented.  I don't
particularly care if the poor little compiler has to do some extra work.

Arguing that the code is inefficient doesn't impress me.  The "efficient"
code doesn't work right.  It doesn't read the data that's there.  Granted,
the data should have been better formatted.  If I had written the programs
to create the data, everything would have been nicely aligned.  That's not 
much consolation when I'm stuck with writing the program to try to read 
someone else's funny file formats.

It would be much nicer if C compilers would accept the fact that some
fields are misaligned and generate the code to handle it.  This would
save me a lot of work, and isn't that why high-level languages exist?
Or am I being overly naive to think such thoughts?

Of course, it would be nice if I could get warned about such ineficiencies;
preferably by lint, though, and not by the compiler.

				John Chambers (mit-athena)

david@ukma.UUCP (David Herron, NPR Lover) (12/27/84)

> From: bing@galbp.UUCP (Bing Bang)
> Newsgroups: net.lang.c
> Subject: The one and only objection to C
> Message-ID: <69@galbp.UUCP>
> Date: Thu, 20-Dec-84 17:17:35 EST

> I love C. I think it's by far the best compiler based language that has ever
> been produced.
So do I.  C++ looks real neat though.

> I only have minor complaint:
> why in the world does most C compilers insist on padding structures?
>
> I am currently working on a network driver that needs to handle a data
> packet that has a precise structure to it. It's easy to describe the
> structure in C, but if the compiler puts in padding between fields, I can't
> simply read in a packet on top of a structure. I must instead "jump" over
> the padding bytes both going and comming.

Huh?  I just looked at the proposed standard.  It states that the
sizeof a structure includes anything needed for padding, whether
internal or external.  I don't see why any compiler would implement
this differently.  

Ok, just engaged mind before (during anyway) writing.  Ok.  They
do it because not all machines are byte addressed.  Or have restrictions
as to where ints are placed.  That answer your question?

> "No, you stupid computer, do what I mean, not what I type!"
> ...akgua!galbp!bing


--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:-
David Herron;  ARPA-> "ukma!david"@ANL-MCS
(Try the arpa address w/ and w/o the quotes, I have had much trouble with both.)

UUCP          -:--:--:--:--:--:--:--:--:-          (follow one of these routes)

{ucbvax,unmvax,boulder,research} ! {anlams,anl-mcs} -----\  vvvvvvvvvvv
							  >-!ukma!david
   {cbosgd!hasmed,mcvax!qtlon,vax135,mddc} ! qusavx -----/  ^^^^^^^^^^^

henry@utzoo.UUCP (Henry Spencer) (12/30/84)

> Suppose you're trying to read a file which starts with a 16-bit field,
> followed by a 32-bit field, then another 16-bit field.  Or suppose
> you're trying to handle an archive file, which has a 14-char array
> followed by an int of some sort.  The turkey compiler on your 32-bit
> machine insists on aligning things "properly" on 4-byte boundaries.
> (This would be fine if it would also transform all the files involved,
> but it isn't that smart!)
> 
> So what do you do?  You dig in to the code, changing the 32-bit fields
> to pairs of 16-bit fields.  Then, wherever the fields were mentioned,
> you modify the code to extract the two little fields and combine them.

Actually, the preferred approach is to simply read the fields one at a
time instead of being lazy and trying to read the whole structure in one
gulp.  This would seem much simpler.  It also will become necessary
the instant you hit byte-ordering or word-ordering problems.  I hope you
don't think the compiler should "solve" them, too.

> It would be much nicer if C compilers would accept the fact that some
> fields are misaligned and generate the code to handle it.  This would
> save me a lot of work, and isn't that why high-level languages exist?

You might (repeat, might) be able to accomplish this with bitfields.
If your compiler is feeling cooperative that day...  Of course the
bitfield declarations will be machine-dependent.  You can't win.
-- 
"Face Mecca and repeat three times:  'binary file formats are not portable'".

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bing@galbp.UUCP (Bing Bang) (12/31/84)

I guess what I would really like is for C to have a "packed" keyword that
can be used in structure definitions. And if this keyword is used, the
compiler should generate any extra code that maybe needed to handle the
packed structures. Is this in C++? If not, somebody should make a
suggestion.

                                         -bing


-- 
----------
"No, you stupid computer, do what I mean, not what I type!"
...akgua!galbp!bing

brooks@lll-crg.ARPA (Eugene D. Brooks III) (01/01/85)

> I love C. I think it's by far the best compiler based language that has ever
> been produced.
> 
> I only have minor complaint:
> why in the world does most C compilers insist on padding structures?
> 
> I am currently working on a network driver that needs to handle a data
> packet that has a precise structure to it. It's easy to describe the
> structure in C, but if the compiler puts in padding between fields, I can't
> simply read in a packet on top of a structure. I must instead "jump" over
> the padding bytes both going and comming.
> 
> "No, you stupid computer, do what I mean, not what I type!"
> ...akgua!galbp!bing

alignment, alignment, alignment

jack@vu44.UUCP (Jack Jansen) (01/01/85)

Re: Alignment problems, etc.
It seems that what is wanted here is something like the
pascal 'packed' attribute.
In a good pascal implementation (well, 'good' in my opinion,
the pascal standard doesn't say what 'packed' should mean
precisely) if you say 'packed' before a record, it stuffs
all the fields adjacent to each other, not caring for word-
boundaries, or 'intelligent alignment' so that the fields
are easily extractable.
This is the way 'packed' is handled in for instance the Sheffield
compiler for the Prime, and all the compilers I know for
the CDC Cyber machines.
The problem is that all the compilers I know for byte-oriented
machines like a PDP or a VAX do it the easy way, i.e. making
every field an integral number of bytes, and aligning the
data to the right in this field.
If we had a precisely defined 'packed' feature in C, or even
a formal definition of the alignment of bitfields 
this would be great.
And, in the last case, I think we should *not* be worried about
inefficiency for machines with bytes the wrong way around, etc.
Just make *one* standard. Since bitfields are, as far as I know,
always used for esoteric things like grabbing status bits from
peripherals, etc. the programmer can easily define his bitfields
so that they are reasonably efficient on *his* machine.
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack
If *this* is my opinion, I wasn't sober at the time.

ken@rochester.UUCP (Ken Yap) (01/02/85)

***** Asbestos suit on *****
There are two things which come to mind here:

(1) I have seen an implementation of Pascal which allowed the
programmer to specify the byte offset of each field. Thus the
programmer gets complete control over the packing in the
record/structure. (Even overlapping fields were possible. Ick!) The
"packed" pragma would be the special case where the offsets are
consecutive.

(2) In most programs, it is irrelevant how the fields are arranged in
the record and the programmer doesn't care. There could be another
pragma that told the compiler to arrange the fields to minimize record
size. This can be done by allocating storage to the largest fields
first, ending with byte-sized (or even bit-sized) objects. Naturally
this pragma should be turned off for compiling device drivers.

Further to point (2), it seems that people who write device drivers are
making assumptions about how their compiler aligns fields. Given that
the drivers are not portable anyway, this is not an issue.  There may
be a case for making (2) the default behavior and making programmers
use mechanism (1) for field alignment. A plus would be the extra
information conveyed to driver modifiers.

Ok, flame away.
-- 
	Ken Yap

UUCP: (..!{allegra, decvax, seismo}!rochester!ken) ARPA: ken@rochester.arpa
USnail:	Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!

anders@suadb.UUCP (Anders Bj|rnerstedt) (01/02/85)

This is what DBMSs are for.

Anders Bjornerstedt
SYSLAB
Dept. of Inf. Proc. & Comp. Sci.
University of Stockholm
S-106 91  Stockholm
Sweden
{philabs!decvax}!mcvax!enea!suadb!anders

jack@vu44.UUCP (Jack Jansen) (01/02/85)

In my article <545@vu44.UUCP> I used the pascal compilers for
the CDC cyber as an example how packing should be implemented.
Please ignore this, I oversimplified things a little (of course,
you could also say I was plain wrong :-).
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack
If *this* is my opinion, I wasn't sober at the time.

garys@bunker.UUCP (Gary M. Samuelson) (01/02/85)

In response to Ken Yap:

> ... it seems that people who write device drivers are
> making assumptions about how their compiler aligns fields. Given that
> the drivers are not portable anyway, this is not an issue.

If I cannot "make assumptions" about how the compiler aligns fields,
then I cannot use the compiler to write a device driver at all,
can I?

I suppose I could start by figuring out what assembly language code
I expect the compiler to generate, and then diddle with the source
until I got the result I wanted, but if I have to do that, I might
as well write in assembly in the first place.

Gary Samuelson
ittvax!bunker!garys

mauney@ncsu.UUCP (01/02/85)

>  (1) I have seen an implementation of Pascal which allowed the
>  programmer to specify the byte offset of each field. Thus the
>  programmer gets complete control over the packing in the
>  record/structure. (Even overlapping fields were possible. Ick!) The
>  "packed" pragma would be the special case where the offsets are
>  consecutive.

OK, so we want to add representation specifications and pragmas to C.
How about if we also add packages, generics, overloading, and tasking.
Then we'd really have something. :-)
-- 

Jon Mauney       mcnc!ncsu!mauney       C.S. Dept, N. C. State University

(Ada?  Isn't that a novel by Vladimir Nabokov?)

ken@rochester.UUCP (Ken Yap) (01/03/85)

It seems that my respondents picked on parts of my posting that they
did not agree with. I was trying to point out something more
interesting. Let me try again.

* As currently defined, or by the weight of existing programs, structs
have ordered fields but may have holes in them. This seems to me both
lax and restrictive. Lax because alignment is not specified and makes
device registers descriptions (for example) compiler-dependent.
Restrictive because a lot of programs do not care if the fields are not
allocated in the same order as they were declared.

* I do not seriously believe in putting an offset declaration feature
in C now. The syntax will probably be ugly and it is too radical a
change.

* The points I was trying to make are these:
(1) By removing the ordering requirement, the compiler has more room to
optimize. (Another optimization: compilers could put the more
frequently used fields nearer the front and use shorter offsets in
machine code.) Tell me if I am wrong, but Pascal can already do this
because the language does not guarantee field ordering in records.

(2) By giving the programmer a facility to control ordering/alignment
for those times when he cares, the code tells the compiler that
alignment must be preserved even at the expense of speed. The answer
for people who want to read binary files, write ANSI headers, describe
devices, do obscene things with data structures, et cetera.

* Surely this is the best of both worlds and worth considering in any
new language design? That is what I wanted to say.

I know this is a contentious newsgroup and I sometimes don't say what I
mean very well (I am tempted to rewrite the above).  How about some
comments if you think it worth your while commenting on, instead of
making net.lang.c a religious battlefield. I really should have added a
:-) after the "flame away" in my last posting.
-- 
	Ken Yap

UUCP: (..!{allegra, decvax, seismo}!rochester!ken) ARPA: ken@rochester.arpa
USnail:	Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!

tim@cmu-cs-k.ARPA (Tim Maroney) (01/05/85)

Look, if you absolutely have to match an externally-imposed structure
format, then write a conversion routine between the C structure and the
external structure (considering it as an array of chars).  In virtually
every such case, the C structure can be on the stack, that is, in an "auto"
variable, so you don't have dynamic storage management overhead.
-=-
Tim Maroney, Carnegie-Mellon University Computation Center
ARPA:	Tim.Maroney@CMU-CS-K	uucp:	seismo!cmu-cs-k!tim
CompuServe:	74176,1360	audio:	shout "Hey, Tim!"

"Remember all ye that existence is pure joy; that all the sorrows are
but as shadows; they pass & are done; but there is that which remains."
Liber AL, II:9.