[comp.lang.c] sizeof

levy@ttrdc.UUCP (Daniel R. Levy) (11/09/86)

In article <5310@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>Guy is still missing my point about bitmap display programming;
>I have NOT been arguing for a GUARANTEED PORTABLE way to handle
>individual bits, but rather for the ability to do so directly
>in real C on specific machines/implementations WITH THE FACILITY:
>...
>Direct use of Pixel pointers/arrays tremendously simplifies coding for
>such applications as "dmdp", where one has to pick up typically six
>bits at a time from a rectangle for each printer byte being assembled
>(sometimes none of the six bits are in the same "word", no matter how
>bits may have been clumped into words by the architect).
>
>From the resistance he's been putting up, I doubt that I will convert
>Guy to my point of view, and I'm fairly sure that many people who have
>already settled on some strategy to address the multi-byte character
>issue are not eager to back out the work they've already put into it.
>However, since I've shown that a clean conceptual model for such text
>IS workable, there's no excuse for continued claims that explicit
>byte-packing and unpacking is the only way to go.

At the risk of sounding ultra-stupid, I am having trouble trying to relate
the issue of multi-byte character handling to that of "addressing individual
bits."  They sound like separate, distinct issues to me.

However, presuming Gwyn's interest is in doing the latter, it seems to me
that a smart enough C compiler on a machine which can "address individual
bits" within a data entity (byte, word, etc.) could use the bit-addressing
instructions to advantage when presented with the code contained within a
series of the "set bit" and "clear bit" and "copy bit" (etc.) macros which Guy
is pushing for "portable" programs.  That is, the compiler would implement
the intent of the code in the macros directly through bit operations rather
than shifting, masking, OR-ing, AND-ing, etc.  This would allow things like
what I think Gwyn wants, like copying, setting or clearing one or several
bits in random places in RAM without having to read the data containing those
bits into the CPU first the way a conventional machine would.  If that were
so, then one could have the best of the portable and the bit-addressable
worlds.

This seems too simple.  So, what have I missed?
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
	   go for it!  			allegra,ulysses,vax135}!ttrdc!levy

guy@sun.uucp (Guy Harris) (11/10/86)

> Guy is still missing my point about bitmap display programming;
> I have NOT been arguing for a GUARANTEED PORTABLE way to handle
> individual bits, but rather for the ability to do so directly
> in real C on specific machines/implementations WITH THE FACILITY:

Why the hell does it matter whether "real C" is used or not?  You DON'T NEED
"standard" pointers in C just to use bit-addressing hardware on machines
that have it.  Since you state below that "portable graphics programming
SUCKS", the fact that vanilla ANSI C has no constructs to support bit
addressing is a non-issue.

> Now, MC68000 and WE32000 architectures do not support this (except for
> (short char)s that are multi-bit pixels).  But I definitely want the
> next generation of desktop processors to support bit addressing.

If you're going to convince Motorola, Intel, National Semiconductor, DEC,
MIPS, etc., etc.  to put bit-addressing into their next generation of chips,
you're going to have to give them a justification of why it makes life
better for graphic applications.  Some of them are building graphic chips
instead; why should they stuff that sort of thing into the CPU?

> I am fully aware that programming at this level of detail is non-portable,
> but portable graphics programming SUCKS, particularly at the interactive
> human interface level.  Programmers who try that are doing their users
> a disservice.

Would you please justify that statement?  The only way in which portability
affects the user interface is if it makes the application run more slowly.
There could very well be hardware where using your latest CPU chip's
bit-pointer instructions makes things run *more slowly* than using some
other piece of hardware in the system.  Designing the graphics code so that
most of it deals with the hardware portably at a high level, and shoving the
device dependencies down to a lower level that the bulk of the graphics code
uses, may make this run more efficiently.

> I say this from the perspective of one who is considered
> almost obsessively concerned with software portability and who has been
> the chief designer of spiffy commercial graphic systems (and who
> currently programs DMDs and REAL frame buffers, not Suns).

What is a "real frame buffer"?  Why is a DMD different from a Sun in this
respect?  Neither of their processors have bit addressing.

> 	ESSENTIAL:
> 		(1) New type: (short char), signedness as for (char).
> 		(2) sizeof(short char) == 1.
> 		(3) sizeof(char) >= sizeof(short char).
> 		(4) Clean up wording slightly to improve the
> 		    byte (storage cell) vs. character distinction.

None of these are "essential".  They may merely make certain kinds of
programming more convenient - maybe.  You still have provided no evidence
for your claim that bit addressing in the instruction set is some kind of
necessity; nor have you provided any justification whatosever for your claim
that you have to make "char *" be a bit pointer in order to make bit
pointers usable in a high-level language.  Since you're not writing portable
code, who *gives* a damn if the facility you're using is part of the
"standard" language or not?

> I've previously pointed out that this has very little impact on most
> existing code, although I do know of exceptions.  (Actually, until the
> code is ported to a sizeof(short char) != sizeof(char) environment,
> it wouldn't break in this regard.  That port is likely to be a painful
> one in any case, since it would probably be to a multi-byte character
> environment, and SOMEthing would have to be done anyway.  The changes
> necessary to accommodate this are generally fewer and simpler under my
> proposal than under a (long char)/lstrcpy() approach.)

One of those "changes" would be to track down all occurrences of "char" that
really mean "storage_unit" in *existing* code, and changing it.  The code
that uses "char" as "storage_unit" is code that does NOT, in general, know
about the text environment.  The code that would have to be changed for a
"long char" environment would be the code that *does* deal with text; since
this code has to change anyway, I see changing *this* code as simpler than
rewhacking all the code that's interested in storage units.

> I won't bother responding in detail on other points, such as use of
> reasonable default "DP shop" collating sequences analogous to ASCII
> without having to pack/unpack multi-byte strings.  (Yes, it's true
> that machine collating sequence isn't always appropriate -- but does
> that mean that one never encounters computer output that IS ordered by
> internal collating sequence?  Also note that strcoll() amounts to a
> declaration that there IS a natural multibyte collating sequence for
> any single environment.)

Bullshit.  The proposal *I* see in front of me, in the latest P1003 mailing,
says that "the appropriate ordering is determined by the program's current
locale"; "strcoll" does NOT always perform the same transformation - its
action is determined by what the last locale the process set the current
locale to was.  (Also note that "strcoll" does not have to map single
characters to single characters - in fact, it probably won't for many
languages.)

You also don't have to "pack/unpack multi-byte strings" a lot; you may have
to do that when reading from or writing to a file, but that's life.  If you
store 16-bit characters in files, even for plain ASCII text, you're going to
have to unpack ASCII files, at least, that come from other machines.

If something is ordered by internal collating sequence, there is no
guarantee that this ordering is necessarily meaningful to a human.  As such,
all the string comparison routine does is impose some total order on
character strings, so you can use any comparison routine you want, such as
"lstrcmp" or "strcmp".

> and discovered that much of it was due to the unquestioned assumption
> that "16-bit" text had to be considered as made of individual 8-bit
> (char)s.

I have yet to see this presented as an "unquestioned assumption".  The
impression *I* had of the AT&T proposal was that 16-bit text is considered
to be made of individual 16-bit "long char"s, and would be manipulated as
such.  You may have to pack and unpack strings made of 8-bit bytes when
reading from and writing to text files, but you don't have any choice unless
you want your whizzy 16-bit-"char" machine to be unable to export even plain
ASCII files to or import them from 8-bit-"char" machines.  You may make this
pack/unpack function a command, and require users to run this command on
files imported from other machines before they use those files; I don't
guarantee that requiring plain ASCII files to take 16 bits per characters on
these machines will be considered acceptable by the users of these machines.

> If one starts to write out a BNF grammar for what text IS, it becomes
> obvious very quickly that that is an unnatural constraint.

Since I have yet to see any less-formal notion of "what text is" that can be
used as a jumping-off point for such a formal definition, if one starts to
write out a BNF grammar for "what text is" one should be reminded that
they'd better know "what text is" before they start this project.  How does
one represent font or point size?  Does one consider text to be made up of
characters or glyphs?  If the latter, does one jump to 32-bit or 64-bit
characters?  (If this is done, a naive "strcmp" will consider the boldface
and italic forms of "foo" to be different, as well as a 10-point and a
12-point "foo", or a Times Roman and Helvetica "foo".)

> Before glibly dismissing this as not well thought out, give it a genuine
> try and see what it is like for actual programming; then try ANY
> alternative approach and see how IT works in practice.

Before glibly dismissing the effort involved in converting programs that use
"char" to represent the smallest unit of storage C lets you address (having
had no better choice) to use "storage_unit" or something like that, give it
a genuine try (this means porting UNIX to such an environment, as well as
several common commercially-available applications); then try writing code
for a 16-bit character environment with a "long char" C implementation.
Then see whether the effort involved in doing the former is greater than the
extra effort involved in using a "char"/"long char" environment rather than
a "storage_unit"/"char" environment.

> If you prefer, don't consider my proposal as a panacea for such issues,
> but rather as a simple extension

It's not a "simple extension", given the fact that "char" has historically
been used to represent the "storage_unit" data type.  If there had been a
standard "typedef" that everybody used for this, I might be more inclined to
consider allowing "char" to be larger than "storage_unit" to be simple.

> What I DON'T want to see is a klutzy solution FORCED on all implementers,
> which is what standardizing a bunch of simultaneous (long char) and (char)
> string routines (lstrcpy(), etc.) would amount to.  If vendors think it
> is necessary to take the (long char) approach, the door is still open
> for them to do so under my proposal (without X3J11's blessing), but
> vendors who really don't care about 16-bit chars (yes, there are vendors
> like that!) are not forced to provide that extra baggage in their
> libraries and documentation.

Why not make support for "long char" optional, then?  Vendors who don't care
about them need not provide them; *customers* who do can either pressure
this vendor to implement them or buy other machines.

> The fact that more future CPU architectures may support tiny data types
> directly in standard C than at present is an extra benefit from my
> approach to the "multi-byte character" problem; it wasn't my original
> motivation, but I'm happy that it turned out that way.  (You can bet
> that (short char) would be heavily used for Boolean arrays, for example,
> if my proposal makes it into the standard;

It would only be so used on C implementations that support bit addressing.
(Note that there is NOT necessarily a 100% correlation between support of
bit addressing in hardware and support of bit addressing in a C
implementation for that hardware.  Even if the notion of RISCs turns out not
to have been the right idea, one benefit it will leave behind is the end of
the naive notion that there *must* be a one-to-one correspondence between
hardware features and programming-language features.)

You can bet that "bit" (not "short char", PLEASE; as a name it says nothing
that conveys the notion of "smallest addressible unit") would NOT be used at
all for Boolean arrays on implementations that do not support it.  As such,
either every implementation would have to support it or it would not be used
in portable code.  If the latter is the case, then you might as well just
throw it in as a special extension; if lots of implementors want to add this
extension, they should come up with a common scheme, work with it, beat the
bugs out of it, and then see if they can get the notion of this as a
"standard extension" into X3J11.  If you can even implement this reasonably
well on machines without bit pointers (where "reasonably well" means "code
written naively for machines with bit pointers compiles into code about as
good as the code that fairly standard idioms for this sort of thing compile
into), it should perhaps become part of the core of C.

> However, since I've shown that a clean conceptual model for such text
> IS workable, there's no excuse for continued claims that explicit
> byte-packing and unpacking is the only way to go.

You *think* you've shown this.  I disagree, as do others whose judgement I
respect.  Furthermore, the only "explicit byte-packing and unpacking" that a
reasonable "long char" proposal would require would be for converting things
like ASCII text imported from other machines.

In some problem domains, formal elegance is one of the most important
criteria for a "good" solution.  Unfortunately, this problem is an
engineering problem; in this, as in other real-world problem domains, formal
elegance is nice but sometimes you just have to put other things first.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

jbs@mit-eddie.MIT.EDU (Jeff Siegal) (11/10/86)

In article <9081@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>>[...]  But I definitely want the
>> next generation of desktop processors to support bit addressing.
>
>If you're going to convince Motorola, Intel, National Semiconductor, DEC,
>MIPS, etc., etc.  to put bit-addressing into their next generation of chips,
>[...]

I realize this has nothing to do with the issue being discussed, but
let's keep the technical references accurate.  DEC's current
generation of processor chips (the 78032, aka MicroVAX-II) _does_
support bit addressing and manipulation.  It may be an interesting
question as to whether this is a good idea, so I've directed
discussion to comp.arch.

Jeff Siegal

guy@sun.uucp (Guy Harris) (11/10/86)

> I realize this has nothing to do with the issue being discussed, but
> let's keep the technical references accurate.  DEC's current
> generation of processor chips (the 78032, aka MicroVAX-II) _does_
> support bit addressing and manipulation.

The MicroVAX-II supports the *bit field instructions of the VAX*.  This is
*not* the same as supporting "bit addressing".  Does the MicroVAX-II have
the ability to do a "movl" to an arbitrary *bit* boundary in memory?  A
"movc5" to an arbitrary *bit* boundary in memory?  Unless it's got a whole
set of bit-boundary instructions (i.e., unless they've added a bit-boundary
version of most of the instructions it supports), it doesn't support "bit
addressing" in the general sense.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

rbutterworth@watmath.UUCP (Ray Butterworth) (11/10/86)

> In article <126@olamb.UUCP> kimcm@olamb.UUCP (Kim Chr. Madsen) writes:
> >Why not take the full step and let the datatype char be of variable size,
> >like int's and other types. Then invent the datatype ``byte'' which is exactly
> >8 bits long.

Who says bytes are 8 bits long?  They're 9 on the machine I use.
Why not make sizeof() return the number of bits?  I don't think
there is much argument about how big a bit is.  And the number of
bits is much more useful than the number of chars.  If you really
want the number of chars, use "sizeof(object)/sizeof(char)".

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/11/86)

In article <1305@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>This seems too simple.  So, what have I missed?

There are a couple of factors.  First, if I know that accessing a bit
(whether by macro or by language-supported data type) is going to actually
load up a whole word, perform separate masking operations, then store
the word back in memory, as opposed to a direct hardware access of the
bit, I am likely to design my algorithms quite differently and explicitly
handle words as well as bits in my bitmap code.

Second, in order to support straightforward programming techniques, such
as looping through arrays, incrementing pointers, etc., the data type has
to be officially blessed as a basic or derived type by the compiler.

I should perhaps remind everyone that I am discussing explicitly NON-
portable bitmap programming, since I have NOT proposed that ALL C
implementations directly support bit-sized data objects.  For PORTABLE
bitmap programming (assuming you are concerned about it), one would
indeed have to assume the worst and be prepared to handle word-masking.
In case people have forgotten, C is not only a language for portable
application programming, but it is also (even foremost) a system
implementation language.  Nitty-gritty system-level programming often
has to deal with specifics of the hardware architecture.  Software
portability is important (most of you should be aware by now that I have
strong feelings about that), but concern for it should not be allowed to
limit the options of people who have an actual requirement for using C
in intrinsically "dirty" ways.

----------

Allow me to repeat:  proposal X3J11/86-136 is actually intended to help
solve the MULTI-BYTE CHARACTER PROBLEM (which DOES exist).  Its
possible ramifications for system-specific bitmap programming are really
a side issue, although such considerations can help clarify exactly what
implementation possibilities are opened up by the formal proposal.

Note that I am careful to distinguish between a character, by which I
mean an individually manipulable unit that represents a natural piece of
text, and a (char), which is a basic C data object.  I refer to
individually addressable storage units as bytes, no matter how many bits
they consist of.  If you don't keep these distinctions in mind, you will
NOT understand my proposal or explanations!

If I were developing programs to run on an Imagen print station, I would
very much prefer my compiler to support "Galactic ASCII" (16-bit data)
as a basic data type, in fact as a (char).  If I am developing DMD code,
I very much prefer my compiler/hardware to directly support the individual
bit as a basic data type/byte, but I also need characters separately; in
fact GASCII would be ideal for the DMD.  If I were developing a generic
operating system for world-wide distribution, I would very much prefer my
compiler to support individual text elements (characters; note that
"letters" is too limited a term for this) as basic data types.  All that
my proposal does is to allow compiler implementers the FREEDOM to choose
these trade-offs appropriately for the intended major application; it
doesn't force any particular choice for character or byte basic data
object sizes.  (However, if one uses an inappropriate choice for the
application, or if one doesn't have control over the compiler that will be
used, then one HAS to resort to "lowest common denominator" assumptions in
one's coding; this is also the current state of affairs.  I really don't
think insisting that a (char) must necessarily be an 8-bit byte, which is
ALREADY FALSE for K&R and X3J11 C, will help this situation.)

If you're worried about the possible impact of the proposal on your own
code, perhaps I should reassure you:  Of the approximately HALF-MILLION
lines of C code that I maintain (mostly written by others, practically
none of whom worried about these matters), not a single line is affected
by my proposal so long as the compiler implementer continues to choose to
make (char) and (short char) have the same size.  If these data types
were to have different sizes, then a few things would indeed break, as
follows:
	use of sizeof"string_constant" instead of
	strlen("string_constant")+1 : occurs in about 10 places (I had to
	find all these once, since an older Gould compiler insisted that
	sizeof"string_constant"==sizeof(char *) .)

	coercing of other pointer types to (char *), doing address
	arithmetic, then coercing the pointer back: this is atrocious
	practice in the first place, and seldom occurs; I estimate at most
	20 to 50 places would need to be fixed, by using (short char *)
	instead of (char *) (or better, by redesign of the code).

	specifically byte I/O routines, such as are required to meet
	predefined or machine-independent protocols: these occur in
	nearly 100 places, and most of them are written so that they
	make rather severe assumptions about the run-time environment,
	usually that getc/putc necessarily input/output precisely 8 bits
	at a time.  It is simple to adapt these to the multi-byte (char)
	environment, such as by using getsc/putsc, but such pieces of
	code are necessarily implementation-dependent anyway and should
	always be checked when porting to significantly different
	environments.
Actually, most of this code was developed for a 7-or-8 bit character,
one character per (char), environment and POTENTIALLY needs a fair amount
of rework for a more general character environment no matter WHAT approach
is used.  With my proposal, VERY LITTLE need be changed in such code,
since text handling is already being done with the idea that (char)
represents a single character (see my NOTE above!); with (long char)
approaches, a SUBSTANTIAL amount of rework would be needed.  To be fair,
the amount of rework for (long char) can be reduced if one artificially
constrains (long char)s so that neither byte is allowed to be zero except
for the "null character" string terminator.  Such a constraint is not at
all necessary with my approach, for which a "null character" is precisely
one that has 0 numeric value (without worrying about subfields), as in
current K&R and X3J11 C.  Note also that an artificial constraint also is
known by the pejorative name of "kludge"; some of us have an aversion,
not necessarily irrational, to kludges.

I finally should remark that Guy Harris shows every sign of having made
his mind up on the issue in advance of knowing what was proposed.  The
fact that he labeled my comments about implications of the strcoll()
approach "bullshit" and proceeded to explain setlocale() to me indicate
that he isn't LISTENING to what I'm saying; after all, I'm one of the
people who decided how those facilities would be specified.  Who does he
think he is?  The implication is that I must terribly stupid since I
don't understand stuff I helped design.  If instead one were to assume
the more likely theory that I DO understand the significance of those
facilities, then it would appear that Guy doesn't appreciate the point I
was making.  My guess is that he is so accustomed to responding to
ignorant amateurs in this newsgroup that he automatically assumes when
he doesn't immediately agree with someone they too must be "morons" and
their remarks are consequently not worth the effort or courtesy of
understanding before responding.  Because I have taken a lot of trouble
in choosing my exact wording, I also resent very much his apparent
assumption that my words represent sloppy approximate concepts; just
because many people write like that is no reason to assume that I do!

Rather than be misled by other people's misconceptions, if you seriously
want to evaluate my proposed solution to the multi-byte character problem
and don't have access to X3J11/86-136, then refer the the latter part of
my article <5310@brl-smoke.ARPA> (pretty much skipping the discussion of
bitmap programming until after you understand the logical meaning of the
formal proposal), rather than relying on the hash made of the proposal in
some people's responses.  Try assuming that I have NOT made some trivial
blunder, then figure out what my point of view is that allows me to make
the claims that I have been making.  Once you understand precisely WHAT I
have in mind, only THEN go back and examine counter-responses.  (This is
the approach that you should be taking to intellectual issues anyway.)

I'm asking that you figure out this proposal from what I have presented,
rather than spending lots of net time arguing over misconceptions.  I'm
fully prepared to admit that there are pros and cons to any alternative
solution to the multi-byte character issue (or to bitmap programming
issues, if that's more your concern), and that one might rationally
disagree with my proposal because of different value weighting of the
trade-offs.  However, rational discussion first requires accurate
communication and understanding of the ideas in question.  I've done the
best I can to explain them; now it's your turn to do the best you can to
understand them.  Otherwise, let's end the discussion now.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/11/86)

In article <5355@brl-smoke.ARPA> I wrote:
>	use of sizeof"string_constant" instead of
>	strlen("string_constant")+1 : occurs in about 10 places (I had to
>	find all these once, since an older Gould compiler insisted that
>	sizeof"string_constant"==sizeof(char *) .)

It occurs to me that if I don't add the following, someone will
undoubtedly pick up on a point that would result in even more net traffic:

	I deliberately didn't show the simpler, more efficient fix of
	using sizeof"string_constant"/sizeof(char) because that wasn't
	helpful for the Gould compiler's problem.  However, if 	one's
	compiler can be counted on to work, this latter is a much better
	fix.  Both approaches are also correct for pre-X3J11 C compilers.
	The important thing to notice is that you can prepare your code
	NOW for any possible future transition to a compiler that doesn't
	assume sizeof(char)==1.  I highly recommend this whether or not
	my proposal is adopted, since it helps to maintain the CONCEPTUAL
	distinction between characters and memory storage unit cells,
	much as I recommend writing C code AS THOUGH the C language
	(like Algol) distinguished between Booleans and ints.  We
	shouldn't have to argue the merits of data abstraction here,
	since that is taught in basic computer science courses (according
	to what I hear -- there weren't any such courses when I first got
	into this business, since we were busy inventing the discipline).

pec@necntc.UUCP (Paul Cohen) (11/11/86)

On the subject of bit addressing, NEC's new V60 and V70 microprocessors
come very close to full bit addressing.  In particular, bit-string 
operations can move, search or perform primitive logic operations on 
strings of bits starting at any bit in the virtual address space and 
ending up to 4 giga-bits later.  Additional bit-field and bit operations 
are available, but for these instructions the bit address must be formed 
using a bit offset of no more than 32 from a designated byte.  The 
bitstring operations have no limitation on the size of the offset.

On a related subject, are there any strong feelings out there about
whether a high level language like C should support bit addressing?
For example, what about arrays of bits?  This would seem to be very
useful at least for graphics applications.

K & R, not to mention the proposed ANSI C standard have nothing to say
about the permitted size of bitfields.  Does anyone know of a C compiler
that supports bitfields in excess of 32 bits?  

jbs@mit-eddie.MIT.EDU (Jeff Siegal) (11/11/86)

In article <9116@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>[...]
>The MicroVAX-II supports the *bit field instructions of the VAX*.  This is
>*not* the same as supporting "bit addressing".  Does the MicroVAX-II have
>the ability to do a "movl" to an arbitrary *bit* boundary in memory?
>[...]it doesn't support "bit addressing" in the general sense.

In the general sense, no, but you can move any string of up to 32 bits
from any bit boundry in memory to any other.  I don't see the
difference between this and a movl to any bit location.  It is
certainly much different than having to read memory in byte (or word)
boundries and do shifts.

Jeff Siegal

guy@sun.uucp (Guy Harris) (11/12/86)

> If these data types were to have different sizes, then a few things
> would indeed break, as follows:
>	...

	declarations of pointers to fundamental storage units as
	"char *", "unsigned char *", etc. rather than as "storage_unit *".

Yes, they *can* be changed.  Programs that use "char" *can* also be changed
to use "long char".  The question is "which is more work"?  I am still not
convinced that changing those declarations is less work than changing code
that handles characters, especially since the latter code will have to be
changed anyway in many cases to make it work with non-ASCII character sets.

> With my proposal, VERY LITTLE need be changed in such code,
> since text handling is already being done with the idea that (char)
> represents a single character (see my NOTE above!);

I'm not talking about code that processes characters; I'm talking about code
that processes storage units.  Maybe I'm biased, since I've spent a fair bit
of time recently working with streams module code, where you do a *lot* of
stuffing of data structures into and extracting data structures from arrays
of storage units, but I'd rather not have to worry about that code, since it
is not the code I'd be changing to internationalize a system.

> with (long char) approaches, a SUBSTANTIAL amount of rework would be
> needed.  To be fair, the amount of rework for (long char) can be reduced
> if one artificially constrains (long char)s so that neither byte is
> allowed to be zero except for the "null character" string terminator.

How much rework is needed to change "strcpy" to "lstrcpy"?  Note that, with
proper ANSI C declarations in <string.h>, changing the string types from
types derived from "char" to types derived from "long char" will cause the
compiler to flag many of these anyway.

> I finally should remark that Guy Harris shows every sign of having made
> his mind up on the issue in advance of knowing what was proposed.

Oh, good grief.  The only thing I've "made up my mind on" is that the claim
that there isn't much work involved in making all C code work correctly if
"char" is not the fundamental unit of storage.

> The fact that he labeled my comments about implications of the strcoll()
> approach "bullshit" and proceeded to explain setlocale() to me indicate
> that he isn't LISTENING to what I'm saying; after all, I'm one of the
> people who decided how those facilities would be specified.

If it is indeed the case that there is more than one way of sorting text in,
say, Oriental languages, then either 1) "setlocale" is a poor name, because
it takes into account more than just the locale, or 2) it is a poor routine,
because it doesn't take into account more than just the locale.

I notice in my copy of "Inside Macintosh" that they *do* support more than
one collating sequence for their extended character set for the benefit of
German (the vowels equipped with diareses sort in the same place as the
unadorned vowels in the primary ordering sequence for non-German languages,
but sort in the same place as the ligature composed of that vowel and the
letter "e" in the primary ordering sequence for German).  Now I am not
willing to rule out the possibility that a site might want to have both
documents in French and in German.  As such, code that would sort lists of
names in these documents would have to set the "locale" based on an
indication of the language the document is in (not from, say, an environment
variable).

The claim you made was that "strcoll() amounts to a declaration that there
IS a natural multibyte collating sequence for any single environment" is a
little hard to parse.  I assume you mean that "by specifying that there
is such a routine, the proposers of strcoll() are declaring that there IS a
natural multibyte collating sequence for any single environment."  Given
that "setlocale" exists, I fail to see how it declares this, unless
"environment" is defined so that an environment always specifies a single
collating sequence.  In the latter case, the claim is true, but trivially so.
> I'm fully prepared to admit that there are pros and cons to any alternative
> solution to the multi-byte character issue (or to bitmap programming
> issues, if that's more your concern), and that one might rationally
> disagree with my proposal because of different value weighting of the
> trade-offs.

Fine.  Are you prepared to admit that there *is* a non-trivial trade-off
involved in the "short char" proposal (i.e., that it is not a given that
few, if any, lines of *existing* code need change so that it can work
equally well in an one-storage-unit "char" and a two-storage-unit "char"
environment), and that some people might rationally disagree with your value
weighting of the changes needed to existing code to make it work in a
two-storage-unit "char" environment and to make it work in a "long char"
environment?
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

hansen@mips.UUCP (Craig Hansen) (11/12/86)

> >>[...]  But I definitely want the
> >> next generation of desktop processors to support bit addressing.
> >
> >If you're going to convince Motorola, Intel, National Semiconductor, DEC,
> >MIPS, etc., etc.  to put bit-addressing into their next generation of chips,
> >[...]
> 

One reason to avoid bit-addressing is that it uses up three more bits of
addresses, pointers, offsets, etc.  Given a 32-bit word-size, which can be
reasonably expected to be the norm for some time, and that the IBM XA
conversion (as well as the 68010->68012/68030 conversion) indicates that
24-bit addressing isn't nearly enough, those three bits are remarkably
precious.

Of course, one can use word- or byte-address pointers with bit offsets,
but that isn't quite the same thing as a simple linear bit-address,
and is harder to manipulate.

-- 

Craig Hansen			|	 "Evahthun' tastes
MIPS Computer Systems		|	 bettah when it
...decwrl!mips!hansen		|	 sits on a RISC"

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/13/86)

In article <9181@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>If it is indeed the case that there is more than one way of sorting text in,
>say, Oriental languages, then either 1) "setlocale" is a poor name, because
>it takes into account more than just the locale, or 2) it is a poor routine,
>because it doesn't take into account more than just the locale.

The name is short for "set locale-specific information", which reflects the
main motivation for the function.  There were several suggestions for the
name, but we couldn't find one that we liked better, other than contractions
of "set environment", which had to be rejected for the obvious reason.
Actually, it WAS intended that setlocale() indeed mean "change or query the
program's entire LOCALE or portions thereof", where the term "locale" was
to be defined in section 1.5.  However, something appears to have gone awry
in the process of making this last-minute addition to the draft proposed
standard document, since there are two sentences in the description of
setlocale (section 4.4.1.1) that say almost the same thing using different
words, and section 1.5 defines "locale-specific behavior" but not "locale".
The general term "locale" is intended in the context of X3J11 to refer to
a complete, orthogonal set of selections of conventions for items that are
allowed to affect program operation based on nationality, culture, or
language.  Thus "locale" is not synonymous with "location".

By the way, one doesn't have to turn to oriental languages to find more
than one way of sorting text.  Even English has several different collating
sequences, depending on the specific application.

>The claim you made was that "strcoll() amounts to a declaration that there
>IS a natural multibyte collating sequence for any single environment" is a
>little hard to parse.  I assume you mean that "by specifying that there
>is such a routine, the proposers of strcoll() are declaring that there IS a
>natural multibyte collating sequence for any single environment."  Given
>that "setlocale" exists, I fail to see how it declares this, unless
>"environment" is defined so that an environment always specifies a single
>collating sequence.  In the latter case, the claim is true, but trivially so.

I used "environment" rather than "locale" since the technical X3J11 meaning
of the latter is not well known.  The existence of a natural collating
sequence for a locale is not at all obvious; one might question whether
it is really true for languages that use ideographs for their printed
representation, for example.

>Fine.  Are you prepared to admit that there *is* a non-trivial trade-off
>involved in the "short char" proposal (i.e., that it is not a given that
>few, if any, lines of *existing* code need change so that it can work
>equally well in an one-storage-unit "char" and a two-storage-unit "char"
>environment), and that some people might rationally disagree with your value
>weighting of the changes needed to existing code to make it work in a
>two-storage-unit "char" environment and to make it work in a "long char"
>environment?

I have been maintaining that very little existing code is affected:
NONE on implementations that decide to make sizeof(char)==1, and almost
none for the vast majority of applications code on implementations that
decide to support multi-byte (char)s.  I even gave examples of most
typical code dependence on sizeof(char)==1.  I can well believe that
AT&T's STREAMS code would be heavily dependent on the constraint (in
fact, I wonder whether it could even be made to work on a 20- or 36-bit
word architecture, if it depends so much on the size of a (char));
however, I don't mind nearly so much making more work for kernel workers,
network hackers, and other lower life forms as I do making more work for
application developers.  (As I said, different value weighting.)

rgenter@labs-b.bbn.com (Rick Genter) (11/14/86)

All of this talk about bits and bytes and chars and short chars and long chars
and ints and pointers and on and on have made me realize what all you people
are *really* fighting about.  C has blurred the distinction between the size
of data and the type of data, primarily through its use of the keywords int,
short, and char.  Note that "long" is OK: a "long float" (generally) gives you
a longer floating point datum than a "float".

What X3J11 really needs to do is specify a "fixed" type.  Then we could have
"fixed", "short fixed", "long fixed".  In addition, we should allow "short
float" on those architectures where you have three precisions of floating point
data (IBM 370s come to mind).

But what does this do for the poor bitmap programmer?  Well, we could add a
length specification, so that you say:

	fixed(1) bitmap[ 8 ][ 1024 ][ 1024 ];

for your bitmap image.  Of course, then we can add offsets, for those who want
real fixed point calculations:

	fixed(32,16) a_fixed_point_integer;

But what about those accouting types, who don't grok binary?  I know, let's
add a base specification!  We can limit ourselves to decimal for now:

	fixed decimal (10,2) profits;

Uh, oh.  I think I just invented PL/I.  Please, Mr. IBM, no, don't shoot!
Aauuggghhhh!!!!!!!

I love the "American" attitude presented in this group.  "If it caint be dun
in mah language, it aint wuth doin'!"  (or alternatively, "..., then we'll
change the language so it *cain* be dun!").  Let's face it, there is no such
thing as the perfect general purpose language (sorry PL/I and Ada fans), and
I think it is a mistake to try and create one, because there are conflicting
requirements across the set of all possible desired programs.   Can't we just
leave C alone?

By the way, has anyone else noticed that an increasing number of the articles
being posted contain LONG SEQUENCES of CAPITAL LETTERS, causing them to look
like something said by ZIPPY THE PINHEAD?  Rbj, have you been FOOLING with my
MAILBOX again?

					:-) Rick
--------
Rick Genter 				BBN Laboratories Inc.
(617) 497-3848				10 Moulton St.  6/512
rgenter@labs-b.bbn.COM  (Internet new)	Cambridge, MA   02238
rgenter@bbn-labs-b.ARPA (Internet old)	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/18/86)

In article <9053@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>Furthermore, I don't know how you sort words in Oriental languages, although
>I remember people saying there *is* no unique way of sorting them.

Perhaps not a unique way of sorting them.  (Then again, I can sort
English words in several ways ... look in the Reference section of
your local library ...)  But I remember going through some of my
Dad's dictionaries years ago.  They had pretty much the same sort
sequence: first by number of strokes in the character, then by one
or two other characteristics that I never fully mastered.  I still
don't speak Chinese ...

Another way of sorting them (not as usable to the average person)
would be to express the words in the phonetic alphabet, and then
sort them by those glyphs.  Although this is intuitive to us users
of the Roman (Greek, Cyrillic, ...) alphabet, this doesn't seem
to be as intuitive to users of a character alphabet.

Japanese and Chinese readers of this newsgroup may be able to
enlighten us further, and perhaps correct this one's tentative
attempts?
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

mouse@mcgill-vision.UUCP (11/23/86)

In article <3853@mit-eddie.MIT.EDU>, jbs@mit-eddie.MIT.EDU (Jeff Siegal) writes:
> In article <9116@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>> Does the MicroVAX-II have the ability to do a "movl" to an arbitrary
>> *bit* boundary in memory?

Yes, but it's called INSV.  Look it up.

> In the general sense, no, but you can move any string of up to 32
> bits from any bit boundry in memory to any other.

I can't see any way of doing this without using a longword-aligned
intermediate stopping place (such as a register).  How?

> It is certainly much different than having to read memory in byte (or
> word) boundries and do shifts.

It is.

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu

[USA NSA food: terrorist, cryptography, DES, drugs, CIA, secret, decode]

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/24/86)

In article <3853@mit-eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>In article <9116@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>>The MicroVAX-II supports the *bit field instructions of the VAX*.  This is
>>*not* the same as supporting "bit addressing".
>In the general sense, no, but you can move any string of up to 32 bits
>from any bit boundry in memory to any other.

I'm surprised Guy didn't jump on this (or did we just miss it?).
The Vax EXTV/EXTZV and INSV (and CMPV/CMPZV) instructions go
between an arbitrary string of 0-32 bits and a byte-aligned
longword of 4 8-bit bytes.  Not the same as any 0-32 bits to
any 0-32 bits.  If I want to move bits 4-19 of word 10 to bits
5-20 of word 14, I'd have to do an EXTZV/INSV pair, clobbering
a longword somewhere in memory (or a register) as a temporary
holding place.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

throopw@dg_rtp.UUCP (Wayne Throop) (12/08/86)

> henry@utzoo.UUCP (Henry Spencer)

>> Can I do something like this:
>>     char a[sizeof(struct name *)0->element];

> Don't think so.

Why not?  Note what H&S have to say about sizeof, on page 153

    Applying the sizeof operator to an expression yields the same result
    as if it had been applied to the name of the type of the expression.
    [...]
    When sizeof is applied to an expression, the expression is analyzed
    at compile time to determine its type, but the expression itself is
    not compiled into executable code.

And draft X3J11, in section 3.3.3.4:

    The size is determined from the type of the operand, which is not
    itself evaluated.

Note that the evaluation of an indirection of the null pointer is indeed
illegal.  But that isn't what is going on here.

So again... why is sizeof(((struct_type *)0)->member_name) illegal?  I'm
not saying it IS legal, mind you (I don't want to be in the unenviable
position of disagreeing with Henry on a matter of C semantics).  I just
don't see any reason why it isn't.

--
A LISP programmer knows the value of everything, but the cost of nothing.
                                --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

henry@utzoo.UUCP (Henry Spencer) (12/16/86)

> >> Can I do something like this:
> >>     char a[sizeof(struct name *)0->element];
> 
> > Don't think so.
> 
> Why not?  Note what H&S have to say about sizeof, on page 153...
> 
> Note that the evaluation of an indirection of the null pointer is indeed
> illegal.  But that isn't what is going on here...

Hmmm...  On thinking this over and consulting the Scriptures, I think
Wayne is right and the construct is legal.  It gives me the creeps, and
I'd never use it, but I think it's legal.  Recent X3J11 drafts are quite
careful to say that certain operators must not be *evaluated* in constant
expressions, rather than that they must not be present at all.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

jsdy@hadron.UUCP (Joseph S. D. Yao) (12/29/86)

In article <7418@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
	[no attribution]
>> >> Can I do something like this:
>> >>     char a[sizeof(struct name *)0->element];

I hope someone said this while our news was down, but ... you may
not use tha above construct.  For precedence (and then also for
readability), the inner expression should be as in the Subject:
line:
	sizeof(((struct_type *)0)->member_name)

Some C compilers also haven't allowed arbitrarily complex or
semi-self-referential array size specifiers.  (The latter is,
e.g.:
	struct { googol_t a; } b[BLKSIZ/sizeof(b[0])];
.)  This is less of a problem these days, but may still be a
bug in some compilers.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

tps@sdchem.UUCP (Tom Stockfisch) (12/30/86)

In article <312@hadron.UUCP> jsdy@hadron.UUCP (Joseph S. D. Yao) writes:
>Some C compilers also haven't allowed arbitrarily complex or
>semi-self-referential array size specifiers.  (The latter is,
>e.g.:
>	struct { googol_t a; } b[BLKSIZ/sizeof(b[0])];
>.)  This is less of a problem these days, but may still be a
>bug in some compilers.

If this is legal, then the 4.3BSD C compiler is broken.  The file
below was sent to it,

	# define BLKSIZ	100

	typedef int	googol_t;

	/*###5 [cc] redeclaration of b%%%*/
	/*###5 [cc] illegal indirection%%%*/
	/*###5 [cc] warning: undeclared initializer name b%%%*/
	struct { googol_t a; } b[BLKSIZ/sizeof(b[0])];

as well as the much simpler

	/*###1 [cc] redeclaration of a%%%*/
	/*###1 [cc] illegal indirection%%%*/
	/*###1 [cc] warning: undeclared initializer name a%%%*/
	int	a[ sizeof(a[0]) ];

Can the legality of this construct be deduced from K&R?
Or is it an (obviously non-universal) extension?

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.UCSD

jwf@munsell.UUCP (Jim Franklin) (08/17/87)

Is the size of a bit-field structure required to be at least sizeof(int)?
K&R is vague about this -- all I can find is

        "Field members are packed into machine integers; they do not
        straddle words." (p. 196)

It doesn't say anything about the size of the resulting structure.
As a specific example, consider

#include <sys/types.h>
union xtbl_entry {
	u_short         x_entry;
	struct {
		u_int   x_spare : 5;
		u_int   x_val : 11;
	} x_bits;
};
main ()
{
        printf("sizeof(union xtbl_entry) = %d\n", sizeof(union xtbl_entry));
}

This prints out 2 on my SUN-3 (Release 3.3).  In this case, this is exactly
what I wanted - the union of a short int and a bit-field structure.  But is
this portable?  I suspect not, since another compiler might allocate a
32-bit int for struct x_bits.  Does ANSI-C address this?
-----
{harvard!adelie,{decvax,allegra,talcott}!encore}!munsell!jwf

Jim Franklin, Eikonix Corp., 23 Crosby Drive, Bedford, MA 01730
Phone: (617) 663-2115 x415

LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU (02/14/88)

I need some help from the readers of comp.lang.c/info-c in preparation
for making a proposal to X3J11. The gist of my proposal is that the
sizeof operator, when applied to a function name, would return the
length of the function rather than the size of a pointer to a function.

While this is a change, it is also (probably) an upward compatible extension
that would provide a capability not currently available in C but in keeping
with the "spirit of C" (Let's hear an AMEN, brothers and sisters!) as
this information is generally available in the assembly code that C allows
us to forego.

What I request from the readers of this newsgroup/mailing-list is
feedback if you know of code that this would break. Please reply directly
to me and not to the whole group as I will summarize any interesting
correspondence.

        Thanks,
         David
--------------
David Linn
INET:   drl@vuse.vanderbilt.edu         UUCP:   ...!uunet!vuse!drl
CSNET:  drl@vanderbilt.csnet            BITNET: linndr@vuengvax
AT&T:   (615)322-7924

chris@trantor.umd.edu (Chris Torek) (02/14/88)

In article <11801@brl-adm.ARPA> LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU writes:
>... The gist

[someone spelled `gist' right in netnews! hurrah!]

>of my proposal is that the
>sizeof operator, when applied to a function name, would return the
>length of the function rather than the size of a pointer to a function.

It might be nice, but it is a much larger change than you think.
In particular, the compiler does not know the size of a function [*],
hence the result would not be a constant.  It is probably possible
to implement this under most systems, although many might require
work on the linker.

-----
[*] Many architectures have span-dependent instructions.  A good
compiler/linker system will use the smallest instructions that fit,
and this cannot be determined until link time.
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
(hiding out on trantor.umd.edu until mimsy is reassembled in its new home)
Domain: chris@mimsy.umd.edu		Path: not easily reachable

cjc@ulysses.homer.nj.att.com (Chris Calabrese[rs]) (02/16/88)

In article <11801@brl-adm.ARPA>, LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU writes:
> I need some help from the readers of comp.lang.c/info-c in preparation
> for making a proposal to X3J11. The gist of my proposal is that the
> sizeof operator, when applied to a function name, would return the
> length of the function rather than the size of a pointer to a function.

How is this possible?  how can the length of a function be determined at
compile time without having compile all the functions dependant upon
the function under consideration first?

What happens when they're in two different files and get compiled
seperately?  What happens if two functions are dependent upon
each other for size information?

Somebody tell me how this is possible?

Another, more interesting possibility is that sizeof return the number
of arguments which the function takes.

Of course, what do we do with the C programs which already use
the current meaning of the sizeof operator on functions?

	Chris Calabrese
	AT&T Bell Labs
	ulysses!cjc

nevin1@ihlpf.ATT.COM (00704a-Liber) (02/16/88)

In article <11801@brl-adm.ARPA> LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU writes:
>The gist of my proposal is that the
>sizeof operator, when applied to a function name, would return the
>length of the function rather than the size of a pointer to a function.
>
>While this is a change, it is also (probably) an upward compatible extension
>that would provide a capability not currently available in C but in keeping
>with the "spirit of C" (Let's hear an AMEN, brothers and sisters!) as
>this information is generally available in the assembly code that C allows
>us to forego.

Obviously, it is NOT upward compatible, although not many functions bother to
find the size of different types of pointers.  But what happens when you need
the size of a pointer to a function??   How do you find it??

Also, I do not think it can be implemented all that easily to remain compatible
with current versions of C libraries and object code (which currently do not
necessarily keep the 'length' of the *function* around).  Also, what exactly do
you mean by the 'length' of the function.  Is it the actual code segment size,
does it include automatic vars and static data, etc.??  Suppose it is just the
code segment size; then how do I get the sizeof the data segment??  And what
exactly do you do for functions that produce in-line assembly instead of actual
function calls?

This function would be incomplete, at best.  And for that reason alone I don't
think it is worth changing.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

jv@mhres.mh.nl (Johan Vromans) (02/17/88)

In article <11801@brl-adm.ARPA> LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU writes:
>I need some help from the readers of comp.lang.c/info-c in preparation
>for making a proposal to X3J11. The gist of my proposal is that the
>sizeof operator, when applied to a function name, would return the
>length of the function rather than the size of a pointer to a function.
 ^^^^^^^^^^^^^^^^^^^^^^

What is the length of a function? Its size in bytes? Source lines? Before
or after cpp? Object code? Instructions?

And what use of this length?

-- 
Johan Vromans                              | jv@mh.nl via European backbone
Multihouse N.V., Gouda, the Netherlands    | uucp: ..{uunet!}mcvax!mh.nl!jv
"It is better to light a candle than to curse the darkness"

pjb@dutesta.UUCP (P.J. Brand) (02/17/88)

From article <11801@brl-adm.ARPA>, by LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU:
> I need some help from the readers of comp.lang.c/info-c in preparation
> for making a proposal to X3J11. The gist of my proposal is that the
> sizeof operator, when applied to a function name, would return the
> length of the function rather than the size of a pointer to a function.
> 

Well I wouldn't know of any code that would get into trouble, but
what the ?!#??? should you want to do with the length of a function??



==============================================================================
Paul Brand
Delft University of Technology         INTERNET : pjb@dutesta.UUCP
Faculty of Electrical Engineering      UUCP     : ..!mcvax!dutrun!dutesta!pjb

jsb@actnyc.UUCP (The Invisible Man) (02/18/88)

In article <2296@umd5.umd.edu> chris@trantor.umd.edu (Chris Torek) writes:
]In article <11801@brl-adm.ARPA> LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU writes:
]>... The gist
]>of my proposal is that the
]>sizeof operator, when applied to a function name, would return the
]>length of the function rather than the size of a pointer to a function.
]
]It might be nice, but it is a much larger change than you think.
]In particular, the compiler does not know the size of a function [*],
]hence the result would not be a constant.  It is probably possible
]to implement this under most systems, although many might require
]work on the linker.
]

Also, it is not clear what this might mean in the presence of optimization.
After an optimization pass, must we go back and change the 'sizeof' values?
And what if a function is expanded in line? 

Then again, there are times when we want the size of a pointer to a function;
I assume you want to reserve the new meaning for use of the actual 'name'
of the function.

-- 
	
				jim [missing right bracket in expression

				(uunet!actnyc!jsb)

kyriazis@pawl11.pawl.rpi.edu (George Kyriazis) (02/20/88)

In article <1077@dutesta.UUCP> pjb@dutesta.UUCP (P.J. Brand) writes:
>From article <11801@brl-adm.ARPA>, by LINNDR%VUENGVAX.BITNET@CUNYVM.CUNY.EDU:
>> I need some help from the readers of comp.lang.c/info-c in preparation
>> for making a proposal to X3J11. The gist of my proposal is that the
>> sizeof operator, when applied to a function name, would return the
>> length of the function rather than the size of a pointer to a function.
>> 
>
>Well I wouldn't know of any code that would get into trouble, but
>what the ?!#??? should you want to do with the length of a function??
>
  It is *not* useless.  I wanted to get the sizeof() a function when I was
writing a program for the PC. Concider writing device drivers and keeping
them in a file.  The application program can then read the sizeof() the 
function and then read() the function.  On the developers point of view having
a sizeof(f()) option will make life a lot easier.  Write the function, debug
it, and then compile it together with a program that gets the function's size
and write()'s it to a file.  Any comments??


*******************************************************
*George C. Kyriazis                                   *    Gravity is a myth
*userfe0e@mts.rpi.edu or userfe0e@rpitsmts.bitnet     *        \    /
*Electrical and Computer Systems Engineering Dept.    *         \  /
*Rensselear Polytechnic Institute, Troy, NY 12180     *          ||
*******************************************************      Earth sucks.

aglew@ccvaxa.UUCP (02/22/88)

..> sizeof(function)

I can see the need for this.

I have long wanted (and have promised to come up with a portable standard
for, really, I have) the ability to manipulate code as data.

Eg.
	actsize = compile(string,databuffer,maxsize); 
		/* compile a C language string into databuffer */
	if( actsize <= sizeof(function) )
		codecpyinto(function,databuffer,actsize);
	

ok@quintus.UUCP (Richard A. O'Keefe) (02/22/88)

In article <395@imagine.PAWL.RPI.EDU>, kyriazis@pawl11.pawl.rpi.edu (George Kyriazis) writes:
>   It is *not* useless.  I wanted to get the sizeof() a function when I was
> writing a program for the PC. Concider writing device drivers and keeping
> them in a file.  The application program can then read the sizeof() the 
> function and then read() the function.  On the developers point of view having
> a sizeof(f()) option will make life a lot easier.  Write the function, debug
> it, and then compile it together with a program that gets the function's size
> and write()'s it to a file.  Any comments??
> 
Given that C is supposed to make sense on machines with separate
instruction and data space (such as some PDP-11s, the Ridge (I think),
the ICL version of the PERQ, and many many others), this doesn't seem
like something that belongs in standard C.

Having (sizeof fn) in C isn't going to help much when a function is in
several discontiguous pieces (some UNIX systems generate code in as many
as three separate chunks, which is then pasted together, but the code
for a single function may not be contiguous).  It isn't going to help
with the relocation of the function (I haven't heard the phrase
"position- independent code" in AGES, is it still fashionable?). Bearing
in mind that in some architectures, the size of an identical sequence of
symbolic instructions may depend on where they are loaded (anyone
remember page-relative addressing?)  even a contiguous function may not
HAVE a definite size.  It isn't going to help when the function requires
other functions.  There may be (I am thinking of a real case) a global
optimisation pass in the link-editor.  And so on, and so on.

We provide a dynamic loading facility in our product.  It runs under VMS
and both BSD and Missed'Em V UNIX (and other things too).  Not only did
we not need a (sizeof fn) in C to support this, we never do find out how
big a function is.  What you need is a loader which is willing to load a
file relative to an existing symbol table, and a fairly basic facility
for finding out how big the result is.  It's not the size of a function
that matters, but the size of a *file*.  

If you are using a system where this kind of operation makes sense,
you had better be able to read .obj files (or whatever they are called).
Ask the linker how big functions are, not the compiler.

pardo@june.cs.washington.edu (David Keppel) (02/23/88)

[ applyig "sizeof()" to functions ]

Andy Glew posted some stuff about this a while back.  His proposal was
slightly more general, namely how to migrate compiled-on-the-fly code from
d-space to i-space in a portable way.  If you want a copy of his posting,
I've got it and will send it to you.

    ;-D on  (There are many solutions for which solutions don't exist)  Pardo
    pardo@june.cs.washinton.edu		..!ucbvax!uw-beaver!uw-june!pardo

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/24/88)

In article <28700026@ccvaxa> aglew@ccvaxa.UUCP writes:
>I have long wanted (and have promised to come up with a portable standard
>for, really, I have) the ability to manipulate code as data.

Good luck -- on some (important) architectures, you have to get the
operating system to help with this, since the hardware distinguishes
between instruction and data address spaces.

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (03/10/88)

In article <2809@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
> >In article <1988Feb25.202237.8688@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
> >>Most machines cannot handle bits with anywhere near the efficiency
> >>with which they handle bytes; the appropriate base unit for efficient code
> >>*is* the byte.
> 
> "Word-addressible cannot handle bytes with anywhere near the efficiency
> with which they handle words; the appropriate base unit for efficient code
> is the word."  Right?
> 
> On a word-addressible machine, there is a perceived need to address objects
> smaller than one word; hence most C compilers give you access to individual
> bytes via "char".  (There may be some that define char as a machine word, but
> I'm not familiar with any.)  Similarly, on byte-addressible machines there is
> a (less) perceived need to address objects smaller than one byte.

The Honeywell Bull GCOS-8 C compiler packs chars into individual bytes
for arrays and structure, but individual external, static, or auto char
variables occupy a full word.  Accessing words (36 bits) on this machine
is easy.  Accesing individual bytes is not (typically a 3 word instruction
is needed to access individual bytes).

Certainly as far as this machine is concerned, there is no reason to
say that sizeof(char) should be 1, any more than it should be 9.

If C left the units of sizeof up to the implementor it would solve
a lot of problems (e.g. none of the multi-byte character mess that
is now in the proposed standard, and the problems of having to have
functions that know about the two different types of strings).

Other than allowing sloppily written code to continue to be written,
I can see no reason whatsoever for requiring that sizeof(char) be 1.
In Japan it could be 2, while on machines that do a lot of bit
manipulation it could be 16.  Both are appropriate for their needs.
Compiler writers that want to continue to support their old code
that assumed sizeof(char)==1 are free to write their compilers with
that same assumption.  But requiring sizeof(char)==1 on all compilers
imposes a needless limit on the language and forces implementors to
use various messy kluges and indefficiencies.

msb@sq.uucp (Mark Brader) (03/10/88)

Ray Butterworth (rbutterworth@watmath.waterloo.edu), with whom I usually
agree, writes:

> The Honeywell Bull GCOS-8 C compiler packs chars into individual [9-bit]
> bytes for arrays and structure, but individual external, static, or auto
> char variables occupy a full word. ... Certainly as far as this machine is
> concerned, there is no reason to say that sizeof(char) should be 1 ...

If chars in arrays and structures occupy 1 byte, sizeof(char) is 1.
That the machine chooses to align char scalars on word boundaries
is irrelevant.  Of course, if unsigned char scalars are then allowed to take
on values in excess of 512, the implementation is buggy.  [C defines overflow
semantics for unsigned types only.]

> If C left the units of sizeof up to the implementor it would solve
> a lot of problems ...
> Other than allowing sloppily written code to continue to be written,
> I can see no reason whatsoever for requiring that sizeof(char) be 1.

It is true that K&R appendix A noted that sizeof(char) is 1 only "in all
existing implementations" (page 188), while it is only in the main part
of the Book, which is not definitive, that this qualification is omitted
(page 126).  However, I think that the sizeof(char) assumption is *now*
so widespread that it must indeed be enshrined in the coming standard,
just like the description of the C library.

Mark Brader		"The default choice ... is in many ways the most
utzoo!sq!msb		 important thing.  ... People can get started without
msb@sq.com		 reading a big manual."		-- Brian W. Kernighan

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/11/88)

In article <17395@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>If C left the units of sizeof up to the implementor it would solve
>a lot of problems (e.g. none of the multi-byte character mess that
>is now in the proposed standard, and the problems of having to have
>functions that know about the two different types of strings).

>Other than allowing sloppily written code to continue to be written,
>I can see no reason whatsoever for requiring that sizeof(char) be 1.
>In Japan it could be 2, while on machines that do a lot of bit
>manipulation it could be 16.  Both are appropriate for their needs.

Yes!  I would appreciate it if you would send in a comment suggesting
that the approach in X3J11/86-205 (my sizeof(short char)==1 proposal)
should be considered in place of the proposed multi-byte character
support.  That is somewhat more ambitious than your sizeof(char)>=1
proposal, in that it provides an extra data type, so if that bothers
you you could suggest merely that the requirement sizeof(char)==1 be
dropped (which would leave the door open for later adoption of
something ling X3J11/86-205, but unfortunately would not eliminate
the proposed multi-byte character features).

dhesi@bsu-cs.UUCP (Rahul Dhesi) (03/11/88)

Ray Butterworth:
     Other than allowing sloppily written code to continue to be
     written, I can see no reason whatsoever for requiring that
     sizeof(char) be 1.

Doug Gwyn:
     Yes!  I would appreciate it if you would send in a comment...

Please don't do this.  Code that conforms to K&R should not be called
sloppy.  From page 126:

     The size is given in unspecified units called "bytes," which are
     the same size as a char.

Existing code will break if sizeof(char) != 1, and let's not blame the
programmer for simply taking K&R at face value.  And nowhere do I find
K&R saying that byte == char == 1 unit is simply a compiler peculiarity
and not a design assumption in the language.  I have always believed,
and I'm sure others share this reasonable belief, that the C
programming language is defined by all of K&R, and not just by the
brief semi-formal reference manual in the back of the book.

I think "long char" is a better bet than changing the meaning of
"char".  Simply making "char" into two bytes will not allow
new code, that needs long characters, to be usable with old compilers
anyway.  So there is no great need to continue to use the type "char"
for long characters (and it could be deceptive to do so).  With "long
char" one would simply need a new set of str functions such as
lstrcmp(), lstrcpy(), etc.  So long as we need long chars, and also
wish to permit short chars to be used for storage economy, we cannot
avoid having two sets of string functions anyway.

There is precedence for this sort of change in seek() going to lseek().
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

dhesi@bsu-cs.UUCP (Rahul Dhesi) (03/12/88)

In article <2328@bsu-cs.UUCP> I wrote:
>And nowhere do I find
>K&R saying that byte == char == 1 unit is simply a compiler peculiarity
>and not a design assumption in the language.

Looking again, more carefully this time, I find this on page 188:

     A byte is undefined by the language except in terms of the value
     of sizeof.

And yet, looking at the extensive discussion of a function alloc() to
allocate memory and a portable implementation of it, we find that char
and byte are used synonymously in the examples given.  See pages 96-98
and then 173-177.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/24/88)

In article <439@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes:
>I don't know how much existing code this would break (though I'd bet
>there would be quite a bit of it). It does mean that I, too, will be
>careful not to make that assumption...

Dennis certainly thought sizeof(char) SHOULD ==1 last time I asked
him about it, and there is indeed a large body of code that assumes
this.  As it now stands it is required for ANSI C.

gay%CLSEPF51.BITNET@CUNYVM.CUNY.EDU (David Gay) (04/16/88)

In article <259@sdrc.UUCP>, Larry Jones <scjones@sdrc.uucp> writes:
>In article <7684@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>> In article <8646@eleazar.Dartmouth.EDU> major@eleazar.Dartmouth.EDU (Lou
> Major) writes:
>> >char foo[]="This is a test.";
>> >sizeof (foo) == sizeof (char *)
>>
>> [...]
>>
>> (I don't think the array name is turned into a pointer just because it's
>> surrounded by parentheses.)
>
>If it ain't, the compiler's broke!  The sizeof operator can be applied to a
>parenthesized type name or to an expression.  Since "foo" isn't a type name,
>the operand of sizeof is an expression.  When an array name appears in an
>expression and it's not the operand of & or sizeof (whose operand is the
>parenthesized express, remember), it's converted into a pointer to the first
>element.

But that isn't what it says; in K&R (1st edition), under the description of
sizeof, you see:

   " [...] When applied to an array, the result is the total number of bytes
    in the array."
    [Appendix A, 7.2, p188]
I would think that even with the array name surrounded by brackets,  sizeof
is still being applied to an array. Also, precisely this syntax is used when
sizeof is introduced (p126), the statement is

#define NKEYS (sizeof(keytab) / sizeof(struct key))

with

struct key { ... } keytab[] =
{
...
};

>
>----
>Larry Jones                         UUCP: uunet!sdrc!scjones
>SDRC                                MAIL: 2000 Eastman Dr., Milford, OH  45150
>                                    AT&T: (513) 576-2070
>"When all else fails, read the directions."
Precisely.
>

David Gay                                   GAY@CLSEPF51.bitnet

No one bears a responsability for my opinions.

john@bc-cis.UUCP (John L. Wynstra) (07/20/88)

	I recently ran up against this one, and, would like to toss it out to
NetLand.  I had coded (on a 3b2 running System V.3) the following,

typedef struct {
	char	x[10];
	char	y;
	char	xx[10];
	char	yy;
} Stuff;

Stuff z;

Later on in the same code I had a reference to sizeof(z) expecting to get 22
(which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!

	"Aha!" said I, "either character data is being aligned to even byte
boundaries [which, with an octal dump, I later proved to myself it wasn't] or
I've discovered a compiler bug".  A colleague at work pointed out that, perhaps,
what I'm seeing is that storage is being allocated in units of 4 bytes, but
somehow that just doesn't seem right:  I should think that sizeof( _variable_ )
should be the length of the _variable_ not the length of the memory allocated
to it.  Ah, well...

	So, can anyone out there enlighten me as to what is going on?

--john

PS: As luck would have it I was posting while our news feeder was recieving,
and, try as I might, I have no way of knowing if I was quick enough in my
attempt to cancel the first posting.  Sorry if this got out twice.

kluft@hpcupt1.HP.COM (Ian Kluft) (07/23/88)

john@bc-cis.UUCP (John L. Wynstra) writes:
> 	I recently ran up against this one, and, would like to toss it out to
> NetLand.  I had coded (on a 3b2 running System V.3) the following,
> 
> typedef struct {
> 	char	x[10];
> 	char	y;
> 	char	xx[10];
> 	char	yy;
> } Stuff;
> 
> Stuff z;
> 
> Later on in the same code I had a reference to sizeof(z) expecting to get 22
> (which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!

The 3B2 runs AT&T's WE32000 series processors.  They're 32 bit machines and
align structures on 4-byte (1-machine word) boundaries.  That way, in an array
of these structures, items after the first are still properly aligned.

------------------------------------------------------------------
    Ian Kluft			RAS Lab
    UUCP: hplabs!hprasor!kluft	HP Systems Technology Division
    ARPA: kluft@hpda.hp.com	Cupertino, CA
------------------------------------------------------------------

bill@proxftl.UUCP (T. William Wells) (07/23/88)

In article <1264@bc-cis.UUCP> john@bc-cis.UUCP (John L. Wynstra) writes:
:
:       I recently ran up against this one, and, would like to toss it out to
: NetLand.  I had coded (on a 3b2 running System V.3) the following,
:
: typedef struct {
:       char    x[10];
:       char    y;
:       char    xx[10];
:       char    yy;
: } Stuff;
:
: Stuff z;
:
: Later on in the same code I had a reference to sizeof(z) expecting to get 22
: (which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!
:
:       "Aha!" said I, "either character data is being aligned to even byte
: boundaries [which, with an octal dump, I later proved to myself it wasn't] or
: I've discovered a compiler bug".  A colleague at work pointed out that, perhaps,
: what I'm seeing is that storage is being allocated in units of 4 bytes, but
: somehow that just doesn't seem right:  I should think that sizeof( _variable_ )
: should be the length of the _variable_ not the length of the memory allocated
: to it.  Ah, well...
:
:       So, can anyone out there enlighten me as to what is going on?

What is happening is this: C defines the size of a type as if it
were part of an array.  Thus the size for Stuff is the size of
whatever the size of an array element would be for an array of
Stuff.  Now, if it weren't for alignment, one could pack the
array elements as close as one would like; however, for an array
there may be padding to make the next element line up.

A lot of compiler writers get lazy at this point.  It seems that
most of them want to make structure pointers all exactly
equivalent.  This means giving them the most restrictive
alignment possible.  In your case, there is no good reason why
the structure must be aligned, so this is compiler-created
waste.

Note that this is not a compiler bug, any more than failing to
write a good optimizer in the compiler is a bug.  It is, however,
very irritating.

les@chinet.chi.il.us (Leslie Mikesell) (07/25/88)

In article <529@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>:.....  I should think that sizeof( _variable_ )
>: should be the length of the _variable_ not the length of the memory allocated
>: to it.

>What is happening is this: C defines the size of a type as if it
>were part of an array.

The point of this is that you can access corresponding elements of successive
structures in an array by adding sizeof(struct) to a pointer to the
previous element.  If sizeof(struct) did not include the padding there
would be no way to find the next corresponding element from the address
of the previous, and loops accessing one element of each struct in an
array would have to be written in a less efficient way (and you might have
to deal with pointers to structs and the necessity of casting them for
most operations instead of using pointers to the data type of the element
you are accessing).

  Les Mikesell

ok@quintus.uucp (Richard A. O'Keefe) (07/25/88)

In article <6089@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
>In article <529@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>>What is happening is this: C defines the size of a type as if it
>>were part of an array.
>
>The point of this is that you can access corresponding elements of successive
>structures in an array by adding sizeof(struct) to a pointer to the
>previous element.

There are two ways of using 'sizeof': 
	sizeof <variable>		-- not using official
and	sizeof (<type>)			-- nonterminals
The connection between sizeof (<type>) and arrays appears to me to mean
that sizeof (<type>) is redundant *IN SOME IMPLEMENTATIONS*:

#define	SIZEOF(Type) ( (char*)&1[(Type *)0] - (char*)&0[(Type *)0] )

This would have worked on the PDP-11, /370, VAX, 3B, &c, and works on Suns.
[The offsetof() macro works much the same way.]
Don't get me wrong: sizeof is a Good Thing because it is much clearer
and more portable than this sort of magic.  The question is whether
sizeof (<type>) was originally foresight or oversight?  Another one
is: in which implementations does SIZEOF not work?

guy@gorodish.Sun.COM (Guy Harris) (07/26/88)

> The 3B2 runs AT&T's WE32000 series processors.  They're 32 bit machines and
> align structures on 4-byte (1-machine word) boundaries.

Correct, but...

> That way, in an array of these structures, items after the first are still
> properly aligned.

this depends on your definition of "properly".  A structure consisting only of
"char"s and arrays of same need not be aligned on a 4-byte boundary, even on
machines that require 4-byte alignment for 4-byte quantities; the SPARC
processor requires 4-byte alignment for 4-byte quantities, but structures are
not automatically aligned on 4-byte boundaries by the SPARC C compiler.
("sizeof z" in the example given yields 22 on a SPARC-based system.)

PCC and its derivatives align structures to the maximum of:

	1) the "default structure alignment", which differs from machine to
	   machine;

	2) the maximum alignment requirement of all the members of the
	   structure.

Thus, a structure containing a "long" on a machine requiring "long"s to be
aligned on 4-byte boundaries will be aligned on at least a 4-byte boundary.

The only advantage I see to aligning all structures on some "natural" boundary
would be that you'd be more likely to be able to use "fast" memory copy code,
if that code required alignment on some "natural" boundary.  I don't know if
the WE32K compiler aligns structures on 4-byte boundaries for this reason, or
because the porter got confused and thought the "default structure alignment"
for a machine has to be the most restrictive alignment of all data types on
machines such as the WE32100 with restrictive alignment requirements (not
realizing that the compiler handles this for you, by 2) above).

henry@utzoo.uucp (Henry Spencer) (07/28/88)

In article <1264@bc-cis.UUCP> john@bc-cis.UUCP (John L. Wynstra) writes:
>Later on in the same code I had a reference to sizeof(z) expecting to get 22
>(which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!
>... I should think that sizeof( _variable_ )
>should be the length of the _variable_ not the length of the memory allocated
>to it...

Well, yes and no.  Sizeof has to include any necessary padding, so that
things like "foovector = (foo *)malloc(n * sizeof(foo))" work properly.
The key word is "necessary".  With only char members in the struct, on
most machines there should be no padding needed.  Evidently your compiler
is making some worst-case assumptions about structs and is not going to
the trouble of recognizing your struct as an unusually favorable case.
This is perhaps a bit sloppy but is not a violation of specs, since the
specs don't put any constraints on padding.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/29/88)

In article <1264@bc-cis.UUCP> john@bc-cis.UUCP (John L. Wynstra) writes:
>typedef struct {
>	char	x[10];
>	char	y;
>	char	xx[10];
>	char	yy;
>} Stuff;
>Stuff z;
>Later on in the same code I had a reference to sizeof(z) expecting to get 22
>(which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!

That's not a bug.  The sizeof an object must include any alignment padding.
Since you can make arrays of Stuff, the Stuff is padded out to a multiple
of 4 bytes (the machine word alignment requirement).  I happens that this
particular structure didn't really need the padding, but since all structs
are treated the same in this regard by the compiler, you got the padding
anyway.

You shouldn't make assumptions about the padding inside a struct.

aglew@urbsdc.Urbana.Gould.COM (07/30/88)

>In article <1264@bc-cis.UUCP> john@bc-cis.UUCP (John L. Wynstra) writes:
>typedef struct {
>	char	x[10];
>	char	y;
>	char	xx[10];
>	char	yy;
>} Stuff;
>Stuff z;
>Later on in the same code I had a reference to sizeof(z) expecting to get 22
>(which is btw what I just now got on the bsd 4.2 vax), but what I got was 24!

Oh, you're going to love me... I have proposed, and, in my spare time
university research (this is definitely not related to my employer!)
occasionally do some work on evaluating, a system where ALL data structures 
have to be aligned on a power of two boundary... no, that phrase isn't
quite correct. I call it "strict power of two alignment", and it means
that all objects are rounded up to a power of two in size, and have to be
aligned on a multiple of their rounded up size.

So, in the above example, x would be at location 0, y would be at location
16, xx would be at location 32, and yy would be at location 48. 
sizeof(Stuff) would be 64.

Why do this? To sell more memory chips... :-) Actually, to shave the proverbial
10% off address generation in a microprocessor (the actual savings varies,
and depends on what you measure - cycle time, or factor in increased miss
ratios due to memory expansion, etc.)

Note, of course, that rearranging some structs can produce savings (although
not in this case), but C guarantees ordering of struct fields, although
not contiguity.

I'm interested in any holes that people can blow in this memory alignment
scheme. Send mail, or post in comp.arch.


Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
    aglew@gould.com     	- preferred, if you have MX records
    aglew@xenurus.gould.com     - if you don't
    ...!ihnp4!uiucuxc!ccvaxa!aglew  - paths may still be the only way
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.

atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (08/13/88)

In article <1988Jul27.200546.21084@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>
>Well, yes and no.  Sizeof has to include any necessary padding, so that
>things like "foovector = (foo *)malloc(n * sizeof(foo))" work properly.
>The key word is "necessary".  With only char members in the struct, on
>most machines there should be no padding needed.  Evidently your compiler
>is making some worst-case assumptions about structs and is not going to
>the trouble of recognizing your struct as an unusually favorable case.
>This is perhaps a bit sloppy but is not a violation of specs, since the
>specs don't put any constraints on padding.

The compiler writer may have good reason for deciding to pad "all char"
structs to a word (or double word) boundary.  There are a number of
machines on which the block move instructions will work much faster
if the data is aligned to an appropriate boundary.   It may also be
that it was just a simplifying assumption that reduces the complexity
of the code generator.  Simplifying the compiler is no of itself a
good thing, but it tends to increase compiler speed, and reduce bugs.

schmidt@glacier.ics.uci.edu (Doug Schmidt) (04/14/89)

Please forgive me for asking an ``obvious'' question.  What exactly
does the ANSI-C draft say regarding the minimum guaranteed sizes of
short, int, and long integers in a conforming implementation of
standard C?

I realize the relation short <= int <= long holds, I'm just curious
whether there is any minimum that these basic types must meet (e.g.,
short >= 16 bits, etc.).

Thanks in advance.

Doug
--
On a clear day, under blue skies, there is no need to seek.
And asking about Buddha                +------------------------+
Is like proclaiming innocence,         | schmidt@ics.uci.edu    |
With loot in your pocket.              | office: (714) 856-4043 |

richard@pantor.UUCP (Richard Sargent) (04/17/89)

gwyn@smoke.BRL.MIL (Doug Gwyn) in Message-ID: <10044@smoke.BRL.MIL> writes:
> 
> In article <12005@paris.ics.uci.edu> Doug Schmidt <schmidt@glacier.ics.uci.edu> writes:
> >I realize the relation short <= int <= long holds, I'm just curious
> >whether there is any minimum that these basic types must meet (e.g.,
> >short >= 16 bits, etc.).
> 
> chars are at least 8 bits,
> shorts are at least 16 bits,
> longs are at least 32 bits.

I quote from the ANSI C DRAFT dated January 11, 1988, Section 3.1.2.5:

   An object declared as type  char  is large enough to store any member
   of the basic execution character set.  ...

   There are four _signed integer types_, designated as  signed char,
   short int,  int,  and  long int.  ...

   ...  A "plain"  int  object has the natural size SUGGESTED by the
   architecture of the execution environment ( ... in the header <limits.h>).
   In the list of signed integer types above, the range of values of each
   type is a subrange of the values of the next type in the list.

Please note that this definition explicitly avoids any claims about the sizes
of the types except for the "<=" business in the first message.  It most
definitely does NOT say anything about 8, 16, or 32 bits!  In fact, the
definition permits implementations of 8 bit longs and others of 32 bit chars!
(Of course, no one in their right mind would try to sell such a product,
but it is not forbidden by the language definition.)

Richard Sargent
Systems Analyst

austing@Apple.COM (Glenn L. Austin) (04/18/89)

In article <10044@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <12005@paris.ics.uci.edu> Doug Schmidt <schmidt@glacier.ics.uci.edu> writes:
>>I realize the relation short <= int <= long holds, I'm just curious
>>whether there is any minimum that these basic types must meet (e.g.,
>>short >= 16 bits, etc.).
>
>chars are at least 8 bits,
>shorts are at least 16 bits,
>longs are at least 32 bits.

longs are guaranteed to be at least 24 bits (according to C++, sec 2.3.1)


-----------------------------------------------------------------------------
| Glenn L. Austin             | The nice thing about standards is that      | 
| Apple Computer, Inc.        | there are so many of them to choose from.   | 
| Internet: austing@apple.com |       -Andrew S. Tanenbaum                  |
-----------------------------------------------------------------------------
| All opinions stated above are mine -- who else would want them?           |
-----------------------------------------------------------------------------

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/18/89)

In article <29127@apple.Apple.COM> austing@Apple.COM (Glenn L. Austin) writes:
-In article <10044@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
->longs are at least 32 bits.
-longs are guaranteed to be at least 24 bits (according to C++, sec 2.3.1)

That's nice, but Mr. Schmidt was inquiring about the C standard, not C++.

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/18/89)

In article <7.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes:
>gwyn@smoke.BRL.MIL (Doug Gwyn) in Message-ID: <10044@smoke.BRL.MIL> writes:
>> chars are at least 8 bits,
>> shorts are at least 16 bits,
>> longs are at least 32 bits.
>I quote from the ANSI C DRAFT dated January 11, 1988, Section 3.1.2.5:
[irrelevant stuff omitted]

Whoopee do.

Now try reading section 2.2.4.2.

	- D A Gwyn
	X3J11 response document editor

richard@pantor.UUCP (Richard Sargent) (04/18/89)

austing@Apple.COM (Glenn L. Austin) in > Message-ID: <29127@apple.Apple.COM>
writes:
> In article <10044@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
> >In article <12005@paris.ics.uci.edu> Doug Schmidt <schmidt@glacier.ics.uci.edu> writes:
> >>I realize the relation short <= int <= long holds, I'm just curious
> >>whether there is any minimum that these basic types must meet (e.g.,
> >>short >= 16 bits, etc.).
> >
> >chars are at least 8 bits,
> >shorts are at least 16 bits,
> >longs are at least 32 bits.
> 
> longs are guaranteed to be at least 24 bits (according to C++, sec 2.3.1)


procedure flame()
BEGIN
Please, if you are going to quote chapter and verse, do so in context.
Section 2.3.1 EXPLICITLY says "all that is guaranteed is [the usual <=
relationships]."  The sentence that Mr. Austin refers to begins "However,
it is usually reasonable to assume ... a long has at least 24 bits."

The immediate next sentence, in the same paragraph, says "Assuming more
is hazardous, and even this rule of thumb does not apply universally."

RTFM and quote it properly, too!

END;

Nothing in C or C++ is _guaranteed_ about sizes except
  1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

For all we may wish otherwise (at times), that's life, so let's just
live it.

Richard Sargent
Systems Analyst

austing@Apple.COM (Glenn L. Austin) (04/19/89)

In article <10064@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <29127@apple.Apple.COM> austing@Apple.COM (Glenn L. Austin) writes:
>-In article <10044@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>->longs are at least 32 bits.
>-longs are guaranteed to be at least 24 bits (according to C++, sec 2.3.1)
>
>That's nice, but Mr. Schmidt was inquiring about the C standard, not C++.

Considering that C++ is an *EXTENSION* of C (and was written with K&R as
active participants), the fact that C++ talks about longs as at least 24 bits
is true for C as well.


-----------------------------------------------------------------------------
| Glenn L. Austin             | The nice thing about standards is that      | 
| Apple Computer, Inc.        | there are so many of them to choose from.   | 
| Internet: austing@apple.com |       -Andrew S. Tanenbaum                  |
-----------------------------------------------------------------------------
| All opinions stated above are mine -- who else would want them?           |
-----------------------------------------------------------------------------

guy@auspex.auspex.com (Guy Harris) (04/19/89)

>Please note that this definition explicitly avoids any claims about the sizes
>of the types except for the "<=" business in the first message.  It most
>definitely does NOT say anything about 8, 16, or 32 bits!  In fact, the
>definition permits implementations of 8 bit longs and others of 32 bit chars!
>(Of course, no one in their right mind would try to sell such a product,
>but it is not forbidden by the language definition.)

Oh, yes, it is.  Try reading 2.2.4.2 "Numerical limits" as well, in the
December 7, 1988 draft (although it may well be in your older draft as
well):


	   A conforming implementation shall document all the limits
	specified in this section...

	Sizes of integral types <limits.h>

	   ...Their implementation-defined values shall be equal or
	greater in magnitude (absolute value) to those shown, with the
	same sign.

	   + minimum value for an object of type "signed char"

	     SCHAR_MIN				-127

	   + maximum value for an object of type "signed char"

	     SCHAR_MAX				+127

(8 bits per "char", minimum, on a binary machine - "-127", rather than
"-127", since the machine may be one's complement; for "unsigned char",
the "minimum maximum" value is 255, so again it's 8 bits, minimum, on
binary machines)

	   + minimum value for an object of type "short int"

	     SHRT_MIN				-32767

	   + maximum value for an object of type "short int"

	     SHRT_MAX				+32767

(16 bits per "short int", minimum, on a binary machine; USHRT_MAX is
65535, so it's again 16 bits on binary machines)

It says similar things for "int" (+/-32767 for "int", 65535 for
"unsigned int", so 16 bits, again, on binary machines) and "long int"
(+/-2147483647 for "long int", 4294967295 for "unsigned long int", so
it's 32 bits on binary machines).

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/19/89)

In article <29193@apple.Apple.COM> austing@Apple.COM (Glenn L. Austin) writes:
-In article <10064@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
->In article <29127@apple.Apple.COM> austing@Apple.COM (Glenn L. Austin) writes:
->-In article <10044@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
->->longs are at least 32 bits.
->-longs are guaranteed to be at least 24 bits (according to C++, sec 2.3.1)
->That's nice, but Mr. Schmidt was inquiring about the C standard, not C++.
-Considering that C++ is an *EXTENSION* of C (and was written with K&R as
-active participants), the fact that C++ talks about longs as at least 24 bits
-is true for C as well.

(1)  C++ is NOT an extension of Standard C.  They are fundamentally
     incompatible in a few ways and are likely to remain so.
(2)  C is in no way constrained by what Mr. Stroustrup may have to say
     about the rules for C++.
(3)  The forthcoming C standard required longs to be at least 32 bits.
(4)  What do Brian and Dennis have to do with this?

guy@auspex.auspex.com (Guy Harris) (04/19/89)

>Considering that C++ is an *EXTENSION* of C (and was written with K&R as
>active participants), the fact that C++ talks about longs as at least 24 bits
>is true for C as well.

Assuming that every statement made about C++ of that sort must *ipso
facto* apply to C, I guess that's true, since the pANS says that longs
must hold values in a range that, in binary, requires 32 bits, and 24 is
greater than 32....  However, K&R I makes no statement about "long"s,
and the pANS quite clearly says, for binary implementations, "32 bits",
not "24 bits".

However, the fact that C++ talks about the "class" keyword doesn't make
it a C keyword - since C++ is an extension of C, there are a lot of
things C++ talks about that *aren't* true of C.  As such, the assumption
in paragraph 1) isn't true, so C++ isn't all that relevant in this
particular case anyway.

On top of that, the C++ Reference Manual appendix of the first edition
of S (is there a second edition?) sayeth nothing about the minimum size
of a "long", and the section that *does* say somthing merely says that
"it is usually reasonable to assume" that a "long" has at least 24 bits,
and right after it says quite explicitly that "even this rule of thumb
does not apply universally."

In short, the statement in the C++ book has nothing authoritative to say
whatsoever about the sizes of "long"s in C - and it doesn't even seem to
have anything authoritative to say about the sizes of "long"s in C++!

henry@utzoo.uucp (Henry Spencer) (04/19/89)

In article <7.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes:
>Please note that this definition explicitly avoids any claims about the sizes
>of the types except for the "<=" business in the first message.  It most
>definitely does NOT say anything about 8, 16, or 32 bits!  In fact, the
>definition permits implementations of 8 bit longs and others of 32 bit chars!

If you look at the section on implementation limits, however, it effectively
constrains short to be at least 16 bits and long to be at least 32.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

pjh@mccc.UUCP (Pete Holsberg) (04/20/89)

In "Standard C", Plauger and Brodie say:
	char: at least an 8-bit (signed or unsigned) integer
	short: at least a 16-bit signed integer; at least as large as char
	int: at least a 16-bit signed integer; at least as large as short
	long: at least a 32-bit signed integer; at least as large as int
-- 
Pete Holsberg                   UUCP: {...!rutgers!}princeton!mccc!pjh
Mercer College				CompuServe: 70240,334
1200 Old Trenton Road           GEnie: PJHOLSBERG
Trenton, NJ 08690               Voice: 1-609-586-4800

guy@auspex.auspex.com (Guy Harris) (04/21/89)

>Nothing in C or C++ is _guaranteed_ about sizes except
>  1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

Possibly true of C++ and of pre-ANSI C; however, once the standard is
approved and compilers start claiming conformance, anybody who claims
that their compiler conforms but whose compiler doesn't have:

	a "char" type that holds values between -127 and 127;
	an "unsigned char" type that holds values between 0 and 255;
	a "short" or "int" type that holds values between -32767 and
	    32767;
	an "unsigned short" or "unsigned int" type that holds values
	    between 0 and 65535;
	a "long" type that holds values between -(2^31-1) and (2^31-1);
	an "unsigned long" type that holds values between 0 and (2^32-1);

is lying.

bill@twwells.uucp (T. William Wells) (04/21/89)

In article <1478@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
:       a "char" type that holds values between -127 and 127;

Or 0..255, char doesn't have to be signed.

---
Bill                            { uunet | novavax } !twwells!bill
(BTW, I'm may be looking for a new job sometime in the next few
months.  If you know of a good one where I can be based in South
Florida do send me e-mail.)

guy@auspex.auspex.com (Guy Harris) (04/22/89)

>Possibly true of C++ and of pre-ANSI C; however, once the standard is
>approved and compilers start claiming conformance, anybody who claims
>that their compiler conforms but whose compiler doesn't have:
>
>	a "char" type that holds values between -127 and 127;
	...

>is lying.

Make that

	a "signed char" type that holds values between -127 and 127;

Thanks to Tony Hansen for pointing this out.

henry@utzoo.uucp (Henry Spencer) (04/25/89)

In article <29193@apple.Apple.COM> austing@Apple.COM (Glenn L. Austin) writes:
>Considering that C++ is an *EXTENSION* of C (and was written with K&R as
>active participants), the fact that C++ talks about longs as at least 24 bits
>is true for C as well.

C++ is only approximately an extension of C; there are a few incompatibilities.
Also, C++ is approximately an extension of pre-ANSI C, and there are a number
of minor discrepancies that Bjarne and crew are trying to sort out, so current
C++ cannot be considered a reliable guide to ANSI C.

One should also note that "A is an extension of B" means, roughly speaking,
"everything true of B is also true of A", not vice-versa.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

meissner@tiktok.dg.com (Michael Meissner) (04/26/89)

In article <7.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes:
| gwyn@smoke.BRL.MIL (Doug Gwyn) in Message-ID: <10044@smoke.BRL.MIL> writes:
| > 
| > In article <12005@paris.ics.uci.edu> Doug Schmidt <schmidt@glacier.ics.uci.edu> writes:
| > >I realize the relation short <= int <= long holds, I'm just curious
| > >whether there is any minimum that these basic types must meet (e.g.,
| > >short >= 16 bits, etc.).
| > 
| > chars are at least 8 bits,
| > shorts are at least 16 bits,
| > longs are at least 32 bits.
| 
| I quote from the ANSI C DRAFT dated January 11, 1988, Section 3.1.2.5:
| 
|    An object declared as type  char  is large enough to store any member
|    of the basic execution character set.  ...
| 
|    There are four _signed integer types_, designated as  signed char,
|    short int,  int,  and  long int.  ...
| 
|    ...  A "plain"  int  object has the natural size SUGGESTED by the
|    architecture of the execution environment ( ... in the header <limits.h>).
|    In the list of signed integer types above, the range of values of each
|    type is a subrange of the values of the next type in the list.
| 
| Please note that this definition explicitly avoids any claims about the sizes
| of the types except for the "<=" business in the first message.  It most
| definitely does NOT say anything about 8, 16, or 32 bits!  In fact, the
| definition permits implementations of 8 bit longs and others of 32 bit chars!
| (Of course, no one in their right mind would try to sell such a product,
| but it is not forbidden by the language definition.)

And I quote from an eariler section (2.2.4.2) in the draft, dated
December 7, 1988 -- it seems like it's nailed down to me.  By the way,
this section has been in the draft for at least two years.

    The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives.  Moreoever, except
for CHAR_BIT and MB_LEN_MAX, the following shall by replaced by
expressions that have the same type as would an expression that is an
object of the corresponding type converted according to the integral
promotions.  Their implemtation-defined values shall be equal or
greater in magnitude (absolute value) to those shown, with the same
sign.

* number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT	8

* minimum value for an object of type signed char

SCHAR_MIN	-127

* maximum value for an object of type signed char

SCHAR_MAX	+127

* maximum value for an object of type unsigned char

UCHAR_MAX	255

* minimum value for an object of type char

CHAR_MIN	see below

* maximum value for an object of type char

CHAR_MAX	see below

* maximum number of bytes in a multibyte character, for any supported
locale.

MB_LEN_MAX	1

* minimum value for an object of type short int

SHRT_MIN	-32767

* maximum value for an object of type short int

SHRT_MAX	32767

* maximum value for an object of type unsigned short int

USHRT_MAX	65535

* minimum value for an object of type int

INT_MIN		-32767

* maximum value for an object of type int

INT_MAX		32767

* minimum value for an object of type long int

LONG_MIN	-2147483647

* maximum value for an object of type long int

LONG_MAX	2147483647

* maximum value for an object of type unsigned long int

ULONG_MAX	4294967295

    If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as SCHAR_MIN and the value of CHAR_MAX shall be the same as
SCHAR_MAX.  Otherwise, the value of CHAR_MIN shall be 0 and the value
of CHAR_MAX shall be the same as UCHAR_MAX. [see 3.1.2.5]

--
Michael Meissner, Data General.
Uucp:		...!mcnc!rti!xyzzy!meissner		If compiles were much
Internet:	meissner@dg-rtp.DG.COM			faster, when would we
Old Internet:	meissner%dg-rtp.DG.COM@relay.cs.net	have time for netnews?

sho@pur-phy (Sho Kuwamoto) (04/27/89)

In article <5387@xyzzy.UUCP> meissner@tiktok.UUCP (Michael Meissner) writes:
>* minimum value for an object of type int
>INT_MIN		-32767
>* maximum value for an object of type int
>INT_MAX		32767

The above was given only as an example of how an ANSI compliant C
could define these values, but why not make INT_MIN -32768?  This is
more than a knee jerk reaction against pascal.  I remember some
example program or something written for the mac in pascal.  Some
routine in the ROMs needed a 16 bit value, and the worst of it was
that the program in question needed to pass it 0xf000.  Because of
Pascal's strong typechecking, this value was not allowed, and they had
to put in some ugly hack.

Now I understand that you could always use an unsigned int for
something like this, but it seems un c-like to make 0xf000 somehow an
illegal value.  This is a contrived example, but suppose I was writing
something which scanned characters, and checked to see if the high bit
was set.  Would it be inelegant to say something like:

signed char c;
[...]
   if(c<0){
      [...]

In either case, I want to be able to access all possible values with
my bits, regardless of whether or not my variable is signed.

-Sho

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/27/89)

In article <2199@pur-phy> sho@newton.physics.purdue.edu.UUCP (Sho Kuwamoto) writes:
>why not make INT_MIN -32768?

How would you represent that on a 16-bit one's complement machine?
And no, it is not the place of the C standard to say that one's
complement machines are less desirable than two's complement;
in fact, I'd take issue with such a claim.  One of the nicest
machines I ever programmed was one's complement.

>it seems un c-like to make 0xf000 somehow an illegal value.

But it's not an illegal value.

guy@auspex.auspex.com (Guy Harris) (04/28/89)

>The above was given only as an example of how an ANSI compliant C
>could define these values, but why not make INT_MIN -32768?

To allow ANSI C to be implemented on machines with, say, 16-bit "int"s
and a one's complement or sign-magnitude representation of integral
types.

>In either case, I want to be able to access all possible values with
>my bits, regardless of whether or not my variable is signed.

Then don't buy one's complement or sign-magnitude machines.  The pANS
doesn't *require* that you be able to represent -32768 as an "int", but
it doesn't *forbid* it, either.  (What happens to "-0" on a one's
complement or sign-magnitude machine?)

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/29/89)

In article <1514@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>(What happens to "-0" on a one's complement or sign-magnitude machine?)

The integer value -0 has to be represented the same as 0, namely all 0 bits.
To write a "minus zero" the simplest thing is to write ~0.  Of course, the
fact that your code is dealing with an explicit minus zero already makes it
dependent on ones-complement architecture.  (I don't know how to do the
equivalent thing on a sign/magnitude architecture; there may be a way.)

daver@hcx2.SSD.HARRIS.COM (05/01/89)

>>Considering that C++ is an *EXTENSION* of C (and was written with K&R as
>>active participants), the fact that C++ talks about longs as at least 24 bits
>>is true for C as well.

>C++ is only approximately an extension of C; there are a few incompatibilities.
>Also, C++ is approximately an extension of pre-ANSI C, and there are a number
>of minor discrepancies that Bjarne and crew are trying to sort out, so current
>C++ cannot be considered a reliable guide to ANSI C.

>One should also note that "A is an extension of B" means, roughly speaking,
>"everything true of B is also true of A", not vice-versa.

From "Computerworld" (4/24/89, p. 31):
 "... A key selling point of C++ is its relationship with the established C
 language.  It includes ANSI-standard C as a subset, as does Objective C, ..."

Perhaps statements like the above cause the confusion.  (B is a subset of A
==> A is an extension of B ?)
  

seth@ctr.columbia.edu (Seth Robertson) (08/22/89)

Hi.

I have a problem in that when I pass an array, it loses track of its size.

ex.
------------------------------------------------------------------
unsigned long testcode[] =
{
  3422617600,
  3422752768,
  513998896,
  2684354560,
  0 };

main()
{
  int size = sizeof(testcode) / sizeof(testcode[0]);

  printit(testcode);
  printf("Loading %d %d %d\n",size,sizeof(testcode),sizeof(testcode[0]));
}


printit(code)
unsigned long code[];
{
  int size = sizeof(code) / sizeof(code[0]);

  printf("Loading %d %d %d\n",size,sizeof(code),sizeof(code[0]));
}
----------------------------------------------------------------------
Output:

Loading 1 4 4		<-- Incorrect
Loading 5 20 4		<-- Correct


I guess I understand what it is does (only passing the address, not any
additional information), but how do I get it to do what I want?


In case you are wondering, yes, there are good reasons for making
`printit' a subroutine, for passing code as a parameter instead of
explicitly naming testcode, and for making testcode a global
automatically initialized variable (variable only at compile time,
obviously) sized array.


-- 
                                        -Seth Robertson
                                         seth@ctr.columbia.edu

chad@csd4.csd.uwm.edu (D. Chadwick Gibbons) (08/22/89)

In article <1989Aug22.024808.17913@ctr.columbia.edu> Seth Robertson writes:
|I have a problem in that when I pass an array, it loses track of its size.

[example program deleted]

	As you conjectured, the parameter passing mechanism of C is indeed
only passing an address to your array.  You must remember that in C, arrays
are almost nonexistant.  Compilers convert array references (or at least, they
are _supposed_ to) to a pointer/offset expression in the form of: (*(a + i))
which would equal a[i] in the original program.  Arrays declarations in C
allow for little more than an easier way to initialize the size and contents
of a block of memory.  This also explains why a is a rvalue and expressions
such as a++ are illegal.  This can cause confusion to many new C programmers,
especially since there is a large following who declare the second argument to
main (argv) as char *argv[]; when really this isn't the case - it's done to
point out that argv is a "constant" size.

|In case you are wondering, yes, there are good reasons for making
|`printit' a subroutine, for passing code as a parameter instead of
|explicitly naming testcode, and for making testcode a global
|automatically initialized variable (variable only at compile time,
|obviously) sized array.

	I might argue at your reasons for doing so, but I can see what you are
trying to accomplish.  The problem of computing an array size can really only
be dealt with in two ways - there are probably more, but I can't recall them
at this time.  One is to abuse the preprocessor and declare a constant
expression to the sizeof a given object.  As an example:

#define ASIZE sizeof(testcode)

This has the disadvantage (or advantage, depending on how you look at it) of
being a compile time construction.  The alternate way of computing an array's
size is to flag the end of an array with an nonused data combination - in a
string array, a zero length string comes in handy.  This requires a loop to
determine the array size - yuck.  The other problem of this is that it is more
than likely implementation dependent - an old version of the M-word (I can't
bring myself to say it) compiler used to place padding in arrays in order to
satisfy word alignment.  Terrible.

	There are other ways, of course: one that just came to my mind is
negative subscript to store the size, but this is clumsy and often more
trouble than it's worth.  Reexam your code structure.
--
D. Chadwick Gibbons (chad@csd4.csd.uwm.edu)

scs@adam.pika.mit.edu (Steve Summit) (08/22/89)

In article <1989Aug22.024808.17913@ctr.columbia.edu> seth@ctr.columbia.edu (Seth Robertson) writes:
>I have a problem in that when I pass an array, it loses track of its size.
 [...passes array to subroutine, then tries to compute sizeof(array).]
>I guess I understand what it is does (only passing the address, not any
>additional information), but how do I get it to do what I want?

Just call sizeof before the call, passing the size to the subroutine:

	printit(testcode, sizeof(testcode) / sizeof(testcode[0]));
	...
	printit(code, nents)
	unsigned long code[];
	int nents;

This may seem slightly clumsy, having to compute sizeof() at
the point of each call (you had wanted to centralize the sizeof
processing in the printit subroutine, and centralization of
common functionality is usually a good idea).  However, the
resultant subroutine is somewhat more general.  For one thing,
I find it's often convenient to define large data structures in
separate source files, referenced at point of use with an
extern declaration.  In this case, I end up with something like

	extern unsigned long testcode[];

in the file making use of the testcode array, and I couldn't
compute sizeof(testcode) (even if I wanted to) for a different
but equally compelling reason.  Therefore, I usually define
another global

	int codesize = sizeof(testcode) / sizeof(testcode[0]);

in the same file in which testcode is defined.  The call is then

	extern unsigned long testcode[];
	extern int codesize;

	printit(testcode, codesize);

Passing in an explicit size to printit() also makes it
useful for printing selected subparts of the array, and for
handling the day when the array size becomes dynamic:

	unsigned long *testcode = NULL;
	int codesize = 0;
	...
	codesize = 23;
	testcode = (unsigned long *)malloc(codesize * sizeof(unsigned long));
	...
	printit(testcode, codesize);

                                            Steve Summit
                                            scs@adam.pika.mit.edu

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (11/15/89)

Thanks to everyone that replied to my question about sizeof(int),
both in news articles and by mail.

I wanted to know if there were any reasons,
other than making it easier to compile non-portable code,
for having sizeof(int)==sizeof(long) on the 16-bit Atari ST.

No one could come up with any other reason
(although maybe some think they did :-).

Below are excerpts from most of the responses.
If yours isn't there, it only means that I didn't disagree
with what you said.

If anyone disagrees with my comments below,
it is probably best to continue by mail or follow up to group
comp.lang.c since this topic is now about C programming and
not directly related to the ST.

====

> From: stephen@oahu.cs.ucla.edu (Steve Whitney)
> Organization: UCLA Computer Science Department
> 
> You can always use this simple hack:
> #define int short
> But you have to be careful with constants.  If you pass 3 as an argument
> to a routine expecting a 16 bit integer, it will pass 00 00 00 03 instead
> of 00 03.  To get around that, pass your constants as (short)3 instead.

No. For "func(int x)", or if there is no prototype in scope,
"func( (short)3 )" will convert 3 to a short, and then convert that
short back to an int before putting it on the stack.

The same thing will happen with non-constants too.
"func( (short)x )", if x is type char, short, or int, will convert the
value of x to a short, and then convert this possibly truncated value
back to an int before putting it on the stack.

> Of course if you haven't written your program yet, just write stuff to
> use shorts instead of ints.

I use shorts when I want shorts, longs when I want longs,
and ints when I don't care which I get.
That is what ints are supposed to be for.
So that if it doesn't matter whether it is long or short
the compiler will use the most efficient type.

====

> From: ron@argus.UUCP (Ron DeBlock)
> Organization: NJ Inst of Tech, Newark NJ
> 
> This  isn't a flame, just a reminder:
> 
> If you want 16 bit ints, you must declare short ints.
             ^
             at least
> If you want 32 bit ints, you must declare long ints.
             ^
             at least
> If you just declare int, the size is implementation dependent.
Correct.

The rule is chars are at least 8 bits, shorts are at least 16 bits,
longs are at least 32 bits, and
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

Note that I sometimes use a machine that has 36-bit shorts
since the code to access 2-byte objects is very inefficient
and not worth implementing since most programs ask for shorts
for the wrong reason (i.e. except for explicitly non-portable
code, shorts should only be used in very large arrays).

> You're right that a 16 bit int makes more sense in a 16 bit architecture,
> but DO NOT depend on all compilers to do it that way!

I don't.
I rely on ints being at least as big as shorts and hope that the
compiler will give me the appropriate size for the current architecture.
For many ST C compilers, it doesn't.  This makes the program a little
slower, but since I write portable code it still works fine.

====

> From: apratt@atari.UUCP (Allan Pratt)
> Organization: Atari Corp., Sunnyvale CA
> 
> > "so that badly written code will still work ok"
> 
> I *do* consider this a valid reason.

Sorry.  It is certainly a valid reason if one is faced with having
to compile non-portable code.  When I said "not valid", I meant not
valid with respect to my question.  I was simply asking if anyone knew
of any *other* possible reasons.  So far, no one has come up with one.

> Personally, I consider "int" to
> be an evil data type when the issue of portability comes up.

Funny.  Personally I consider "int" to be the best thing for portability.
If fact that is its sole reason for existing.
If you don't use "int" for portability, what do you use it for?

One should only use int in cases where it doesn't matter whether
the compiler generates code for long, short, or something in between.
What is wrong with int is the way so many programmers have misused it.

> But look
> at UNIX and all the libraries meant to look like UNIX: malloc, fwrite,
> etc. all take "int" arguments, because "int" is "big enough" on those
> machines.  A 16-bit library will probably work, but you can't malloc /
> write more than 32K at a time.  Thanks.  MWC suffers from this, as does
> Alcyon and some Mac-derived compilers (but for a different reason).

True, but the ANSI standard version of C fixes all or most of these problems.
That's why I'm using GNU C instead of one of the older non-standard compilers.

====

> From kirkenda@jove.cs.pdx.edu  Sat Nov  4 00:41:24 1989
> From: Steve Kirkendall <kirkenda%jove.cs.pdx.edu@RELAY.CS.NET>
> Organization: Dept. of Computer Science, Portland State University; Portland OR
> 
> GCC has been ported to Minix-ST, and that version has a 16-bit int option.
> Unfortunately, it has a few problems (eg. sizeof(foo) returns a long value,
> which breaks code such as "p = (foo *)malloc(sizeof(foo))" because malloc()
> doesn't expect a long argument).

That problem will go away in general as more compilers
conform to the standard.

If minix is using the GNU compiler and the GNU library,
there shouldn't be any problem.  Under ANSI C, malloc()'s argument
is supposed to be type (size_t), which is the same type as the result
returned by sizeof.  For an ST, this is probably typedefed to
(unsigned long).  I don't know why the minix version shouldn't work.

> As I said before, the speed difference is about 20% overall.

I didn't mean to claim that everything would be twice as fast.
I was simply curious as to what people thought they were gaining
by chosing an option that made things at best no slower and at
worst a lot slower.

====

> From iuvax!STONY-BROOK.SCRC.Symbolics.COM!jrd  Thu Nov  2 17:19:05 1989
> From: John R. Dunning <iuvax!STONY-BROOK.SCRC.Symbolics.COM!jrd>
> 
>     From: watmath!rbutterworth@iuvax.cs.indiana.edu  (Ray Butterworth)
>     This reminds me of something I've been wondering about for a while.
>     Why does GCC on the ST have 32 bit ints?
> 
> GCC is written with the assumption that modern machines have at least
> 32-bit fixnums.

Fine.  But it shouldn't assume that those 32-bit integers will
have type (int).
When I write code that needs more than 16 bits of integer,
I ask for type (long), not for type (int).

> (As near as I can tell, all GNU code is.  That's
> because it's mostly true) GCC will not compile itself if you break that
> assumption

Then I'd say the compiler is badly written
(though there are of course varying degrees of badness,
and of all the bad code I've seen, this stuff is pretty good).
You should be able to write code that doesn't rely on (int) being
the same as (long) or being big enough to hold a pointer,
without any loss of efficiency.  i.e. on architectures where (int)
and (long) are the same, it will generate exactly the same machine code.

>   > Surely 16 is the obvious size considering it has 16 bit memory access.
> 
> Nonsense.  It makes far more sense to make the default sized frob be a
> common size, so you don't need to worry about it.  For cases you care
> about speed, you optimize.

Change "Nonsense." to "Nonsense:", and I'll agree.

The whole point of type (int) is so the compiler will optimize for me
when I don't know what architecture I am writing for.
And if I am writing portable code there is no way I should know what
architecture I am writing for.
If I want an integer value that is more than 16 bits, I'll ask for (long).
If I want an integer value that doesn't need to be more than 16 bits,
I'll ask for (int).  The compiler might give me 16, it might give me 32,
or it might give me 72; I don't really care.  The important thing is that
the compiler should give me the most efficient type that is at least 16
bits.

>   > (Note that I don't consider
>   >  "so that badly written code will still work ok"
>   >  as a valid reason.)

> Fine, turn it around.  Why is it valid for things that know they want to
> hack 16-bit frobs to declare them ints?

It isn't.  They should be declared int only if I want them to be
at least 16 bits.

> To avoid being 'badly written code', they should declare them shorts.

No.  If I want something that is *exactly* 16 bits, 
I am in on of two different situations:
1) I am writing machine specific code.
2) I am writing a special numerical algorithm.

In the first case, my code is obviously non-portable
so it is fine to use short, or char, or whatever type other than (int)
it is that will guarantee me 16 bit integers on this architecture.
Such code should of course be isolated from the rest of the portable
code, and documented as being very architecture specific so as to
minimize the amount of work required to port the program to a
different architecture.

In the second case, I can still write portable code,
but I have to be very careful about what assumptions I make.
e.g. #define SIGNBIT(x) (0x8000 & (x))
makes a big assumption about int being 16 bits.
But  #define SIGNBIT(x) ( (~(( (unsigned int)(~0) )>>1)) & (x) )
will work fine regardless of the size of int, and will generate
the same machine code as the first macro when int is 16 bits.
Coding for portabability may require a little extra effort,
but it doesn't mean the result has to be any lesss efficient.

>   > Every time anything accesses an int in GCC, it requires two memory
>   > accesses.  Most programs are full of things like "++i" or "i+=7"
>   > or "if(i>j)", and such things take approximately 100% longer when
>   > ints are 32 bits.
> 
> If you don't optimize them or aren't careful about how you declare them,
> sure.

But I did optimize; I declared them (int) expecting the compiler to give
me the integral type that is at least 16 bits long and is the most
efficient for the current architecture.  On an ST, that is 16 bits.

I hope you're not suggesting that by "optimizing" I should have
declared it (short) knowing that I am working on an ST.
That is known as writing bad, non-portable code.

On some some word addressable machines for instance, there are
perfectly correct compilers for which the code for accessing a short
can be far less efficent that the code for accessing a long.
e.g. 3 or more instructions instead of 1.

>   > Has anyone made a 16-bit GCC and library and done a comparison?
> 
> Yes, it's for sure faster to use shorts than ints.

That's what I was objecting to.
By the original definition of (int),
using int should be exactly as fast as using the faster of (short) or (long).
i.e. regardless of architecture, the best choice should be (int).

> Rather than griping, why not just use the -mshort switch?  It's designed
> for just this kind of thing.  It's even documented.

Sorry, I wasn't intending to gripe (although I know I often sound like it).
I also wasn't wanting to know how to make it behave the way I wanted.
I was simply wondering why the default behaviour was the way it was.

I think the GNU compiler and libraries are the best available for the ST,
and I certainly can't complain about the price.

I really was curious to see if anyone had any reason why they would
want a 32-bit (int) on an ST, other than for the obvious reason that it
makes it easier to compile badly written code (i.e. code that makes
non-portable assumptions), and so far no one has.

====

> From: hyc@math.lsa.umich.edu (Howard Chu)
> Organization: University of Michigan Math Dept., Ann Arbor
> 
> For compatibility - using the 32 bit int mode, I can take just about
> any source file off a Sun, Usenet (comp.sources.unix), etc., type
> "cc" or "make" and have a running executable without ever having to
> edit a single source file.

i.e. so you can compile code that makes some non-portable assumptions,
namely that sizeof(int) == sizeof(long).
i.e. so you can easily compile badly written code.
I already know about that reason.  I spend half my time at work
trying to get all-the-world's-a-VAX Berkely programs to work on
other hardware.

> >Surely 16 is the obvious size considering it has 16 bit memory access.
> Yep. So obvious, in fact, that every version of ST GCC that's been
> posted has been sent out with *both* 16 *and* 32 bit libraries. What
> are you complaining about?

The first version I had didn't have the 16-bit library.
I have since obtained updates.

====

> From: 7103_300@uwovax.uwo.ca
> From Eric R. Smith
> 
> The most up-to-date version of the libraries (both 16 and 32 bit) are
> available on dsrgsun.ces.cwru.edu, in ~ftp/atari. If you don't have
> these libraries, get them; thanks to some nice work by Dale Schumacher
> (dLibs), Henry Spencer (strings), Jwahar Bammi (lots of stuff), the
> people at Berkeley (curses and doprnt) and yours truly, they're a big
> improvement on the original GCC library.

Yes.  I just picked these up last week.  Thank you all.

(Of course gulam still dropped a couple of bombs the first time I
 used "more", so I aliased it away and use vi.)

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (11/22/89)

In article <31505@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>e.g. #define SIGNBIT(x) (0x8000 & (x))
>makes a big assumption about int being 16 bits.
>But  #define SIGNBIT(x) ( (~(( (unsigned int)(~0) )>>1)) & (x) )
>will work fine regardless of the size of int, and will generate
>the same machine code as the first macro when int is 16 bits.

Thanks to Niels J|rgen Kruse (njk@diku.dk), who had to tell me twice
before I'd believe the obvious, and Karl Heuer (karl@haddock.isc.com),
who inform me that the above isn't quite as portable as I'd thought.

All of the operators are logical-bit-operators *except* for the cast.
Casting to (unsigned int) is an arithmetic operator and as such it
might change the bit pattern, in particular on 1's-complement and
sign-magnitude architectures.

The standard (3.1.2.5) requires that "(unsigned int)n" have the
same bit pattern as "n" if n is a non-negative int, so the cast
should of course be done to the 0, not to the ~0.  i.e.
#define SIGNBIT(x) ( (~(( ~(unsigned int)0 )>>1)) & (x) )

>Coding for portabability may require a little extra effort,
>but it doesn't mean the result has to be any lesss efficient.

Maybe that "little" is an understatement.

ccdn@levels.sait.edu.au (09/23/90)

ccdn@levels.sait.edu.au writes:
>       #define _iorc(x) ((sizeof(x)==1)?(unsigned char)(x):(x))
>
> Is sizeof's expression parameter ever executed?  In particular need
> I be concerned about side effects in code like:
>
>       while (_ctype[_iorc(*a++)]);


A more careful reading of K&R does answer this:  "This expression [sizeof]
is semantically an integer constant".  I guess that answers my question.

Sorry for asking foolish questions.


David Newall, who no longer works       Phone:  +61 8 344 2008
for SA Institute of Technology          E-mail: ccdn@lux.sait.edu.au
"Life is uncertain:  Eat dessert first"  *Check the return address!*

mayne@sun10.scri.fsu.edu (William (Bill) Mayne) (10/20/90)

Some time ago the issue of the value of sizeof(S) where S is
a structure requiring alignment came up in this news group.
It turns out that sizeof(S) will include not only all the
content of S. It also counts any padding necessary to ensure
the allignment of another S, even if S is not declared as
part of an array. This is good if your usage is:

   s_array = malloc(n*sizeof(S)); /* Where s_array is a pointer
                                     to type S */

But in other cases it is not so good. If sizeof() is being used
to get the number of bytes for memcpy() the padding bytes will
be copied unnecessarily. If space is being allocated for a single
structure extra memory may be allocated, but this probably won't
make any difference since the space for the padding would usually
be wasted anyway. 

"Big deal!" you say? Actually it could make an important differnce
if the structure is being written as a record to a file. Then  extra 
file space will be used. Unless I intend to read or write arrays of
records some of the time, changing the number of records handled, 
with each operation, this is a real waste.

The $64K question is: How do I portably get the actual size of
a structure without padding when I need that rather than
what sizeof() tells me? I have some methods that will work,
but I am not sure how portable they'd be and I hope someone
has a better way.

Bill Mayne     mayne@nu.cs.fsu.edu

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/20/90)

In article <1229@sun13.scri.fsu.edu> mayne@sun10.scri.fsu.edu (William (Bill) Mayne) writes:
> The $64K question is: How do I portably get the actual size of
> a structure without padding when I need that rather than
> what sizeof() tells me?

You don't.

If you want to add up the sizes of the structure elements, do that.
Write each one to disk individually, and take compactness over speed.

To put it differently, if you want to write a data structure to a file,
you should respect its rights as a *structure*, rather than treating it
as a byte array. As you observed, byte arrays are fine for some things,
like allocating memory, but not for others.

---Dan

scs@adam.mit.edu (Steve Summit) (10/20/90)

In article <1229@sun13.scri.fsu.edu> mayne@sun10.scri.fsu.edu (William (Bill) Mayne) writes:
[sizeof() reflects padding at ends of structures.]
>"Big deal!" you say? Actually it could make an important differnce
>if the structure is being written as a record to a file. Then  extra 
>file space will be used.
>The $64K question is: How do I portably get the actual size of
>a structure without padding when I need that rather than
>what sizeof() tells me?

My $64,000 question is: why are so many poor souls condemned to
try to read and write binary data files?  Yes, it is easy and
quick to use fwrite and sizeof to write whole structs to a file,
but those are the only two good things which can be said about it.
Binary files are otherwise quite difficult to work with, not to
mention unportable, sometimes even to the same machine on which
they were written.  More flexible (e.g. text) file formats offer
hosts of advantages, and can be implemented reasonably easily and
efficiently as well.  (This debate appears here endlessly, and I
should add it to the FAQ if I could think of a succinct way.  I
won't repeat all the pros and cons right now.)

If a binary format has not already been forced upon you by some
prior decision or external piece of software (which it sounds
like it has not, since you are trying to figure out how to change
the format by eliminating the padding you feel is spurious) you
should strongly consider dropping binary formats entirely, and
moving to a simple text format.  I'd describe good ways to do so,
but I recently discovered that everything I'd say about data
files has already been published, in Jon Bently's book
Programming Pearls (or perhaps the sequel, More Programming
Pearls).

If you must compute the size of a structure less any trailing
padding, one way to do so would be

	struct x x;
	offsetof(struct x, last_member) + sizeof(x.last_member)

offsetof is a (relatively) new macro, standardized by ANSI, and
#defined in <stddef.h>.

(By the way, it is generally agreed that the advantages of having
sizeof account for trailing padding far outweigh any disadvantages.
Yet this issue apparently belongs in the FAQ as well.)

                                            Steve Summit
                                            scs@adam.mit.edu

stelmack@screamer.csee.usf.edu (Gregory M. Stelmack) (10/20/90)

In article <1990Oct20.003222.25439@athena.mit.edu> scs@adam.mit.edu (Steve Summit) writes:
>
>My $64,000 question is: why are so many poor souls condemned to
>try to read and write binary data files?  Yes, it is easy and

You mention that this has been debated here, but I haven't seen it in the 6
months or so I've been reading the group, and I do have a response: Many of us
use binary files because our OS does not differentiate between text and binary
files. My Suns at school do not even have the ability to choose which one to
open -- look at my recent question about reading/writing binary files...
So, I'd just as soon stick with my binary files...might as well with binary
data.

-- Greg Stelmack
-- Email: stelmack@sol.csee.usf.edu
-- USmail: USF Box 1510, Tampa, FL 33620-1510
-- Amiga: the only way to compute!
-- IRAQNOPHOBIA: Nothing a little RAID wouldn't cure!

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/22/90)

In article <1229@sun13.scri.fsu.edu>, mayne@sun10.scri.fsu.edu (William (Bill) Mayne) writes:
> "Big deal!" you say? Actually it could make an important differnce
> if the structure is being written as a record to a file. Then  extra 
> file space will be used. Unless I intend to read or write arrays of
> records some of the time, changing the number of records handled, 
> with each operation, this is a real waste.

> The $64K question is: How do I portably get the actual size of
> a structure without padding when I need that rather than
> what sizeof() tells me?

There is no such thing as "the actual size of a structure without padding".
Much of the padding required for alignment is likely to be within the
record.  Consider
	struct foo { char a; double d; short s; void *p; };
			    ^                  ^
I've marked the two places that padding is likely to go on a machine that
requires 32-bit alignment for doubles and data pointers.  They are *inside*.

Suppose we could squeeze the padding out.  The size of the record with
padding is 20 bytes.  Without padding it'd be 15.  Big deal.  If you
order the fields in a struct thus:
	first long double
	then  double
	then  float
	then  pointer-to-function
	then  pointer
	then  long
	then  int
	then  short
	last  char
then all the padding required for alignment is likely to be at the end,
and on a 32-bit byte-addressed machine the total amount of padding is
likely to be at most 3 bytes.  This isn't going to be perfect on all
machines, but it's pretty good for most.  (It's a generalisation of the
ordering prudent Fortran programmers use within COMMON blocks.)

If you are only writing one or two records of a particular type to a
file, why do you care about the padding?  If you are writing thousands,
then you may well want to read and write arrays of them.

If it is absolutely essential that you write your records in the most
compact form, then don't use ``raw'' fwrite and fread.  Write your own

	int fwrite_foo(FILE *f, struct foo *p) { ... }
	int fread_foo(FILE *f, struct foo *p)  { ... }

functions that write/read the fields separately.

But above all, keep in mind that in C, there is no such thing as "the
actual size of a structure without padding".  "sizeof" is as actual 
as it ever gets.

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/22/90)

In article <98@screamer.csee.usf.edu>, stelmack@screamer.csee.usf.edu (Gregory M. Stelmack) writes:
> Many of us use binary files because our OS does not differentiate
> between text and binary files.

I'm sorry, but this is just so totally wierd.  I can understand the
reasoning that goes "this operating system forces me to distinguish
between binary and text, and this is one of the cases where it wants
binary, so I'm stuck with binary".  But I don't see why an operating
system that doesn't force you to use a binary representation is a
reason for doing so anyway.

> My Suns at school do not even have the ability to choose which one to
> open -- look at my recent question about reading/writing binary files...

Oh come on.  The Suns at your school *do* give you the choice of opening
a file for binary transput or opening it for text transput.  Either way,
you write exactly the same code.  This is like complaining that your
local milk company doesn't let you choose whether to drink from the bottle
or out of a glass because it gives you exactly the same bottle in both
cases.

The realistic way to put it is that UNIX-based systems like the Suns
let you have *any* mix of text and binary within the same file.  You
can, for example, put source code (text), object code (binary), and
archives (structured) into archives (.a files, managed by the 'ar'
command).  This is a facility that operating systems which distinguish
between text and data *deny* you.

Stop and think about it.  The ability to mix text and binary in one file
gives us the best of both worlds.  Suppose we structure a file like this:
	<directory>
	<text version>
	<binary version>
where the <directory> is
	<binary edition> <pointer to binary version>
and <binary edition> is some magic tag that lets you determine whether
the binary version is appropriate for this machine&os&compiler&release.
If it is, the <pointer to binary version> is a decimal integer or integers
encoding the address of the binary version, you just go there and slurp it
up.  If the binary version isn't appropriate, you read the text version,
rewrite the binary version, and update the header.  That way you get the
speed benefit of reading a binary version, and the portability benefit of
having a text version, and you've still got only one file to worry about.

What makes this possible?  *NOT* distinguishing between text and binary
in the os.

> So, I'd just as soon stick with my binary files...might as well with binary
> data.

You'd better stick with your machine too...

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

djones@megatest.UUCP (Dave Jones) (10/25/90)

From article <1229@sun13.scri.fsu.edu>, by mayne@sun10.scri.fsu.edu (William (Bill) Mayne):

> 
> The $64K question is: How do I portably get the actual size of
> a structure without padding when I need that rather than
> what sizeof() tells me?

You don't. When you write structures directly to disc, it is non-portable,
period -- whether or not you write the tail-padding. Different machines will
have different internal padding, even different data formats. For most
kinds of data, text (ASCII or EBCDIC) is the perfect solution. Use fprintf()
or get fancy with a library package. I think there are some public domain
ones. Somebody will also mention Sun's XDR, which may work well for you,
particularly if your machine uses the same data-representations that Sun
machines do.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/26/90)

In article <14309@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
> When you write structures directly to disc, it is non-portable,
> period -- whether or not you write the tail-padding. Different machines will
> have different internal padding, even different data formats.

Sometimes you have to manually swap memory to disk for efficiency.
You're never going to use the temporary file on another machine, so the
format doesn't have to be portable. All you need is a guarantee from the
OS that memory can be written out and read back in without corruption.

I agree that the original problem is best solved with a text format.

---Dan

bright@nazgul.UUCP (Walter Bright) (10/31/90)

In article <1229@sun13.scri.fsu.edu> mayne@sun10.scri.fsu.edu (William (Bill) Mayne) writes:
<The $64K question is: How do I portably get the actual size of
<a structure without padding when I need that rather than
<what sizeof() tells me? I have some methods that will work,
<but I am not sure how portable they'd be and I hope someone
<has a better way.

What you need is to turn off the alignment for a struct. There is no
portable way to do it. Compilers typically provide a command line switch
or a pragma to turn off alignment.

If you want to guarantee that alignment is off for a particular struct,
try this:

	struct ABC { int a; char c; };
	...
	assert(sizeof(struct ABC) == sizeof(int) + sizeof(char));

jacob@latcs1.oz.au (Jacob L. Cybulski) (11/05/90)

Being relatively new to C, I found a bit of a problem with a THINK C program
that goes more or less like this :-

typedef unsigned char Pattern[8];

void foo (Pattern x)
{
/* 0 */	printf("%d\n", sizeof(Pattern);			/* prints 8 */
/* 1 */	printf("%d\n", sizeof(x));			/* prints 4 */
/* 2 */	printf("%d\n", sizeof(*x));			/* prints 1 */
/* 3 */	printf("%d\n", sizeof((Pattern) x);		/* illegal */
/* 4 */	printf("%d\n", sizeof(*((Pattern *) x));	/* prints 8 */
}

The intuition says that sizeof(Pattern) = sizeof(x) regardless of the
Pattern definition. Well, not in C, if Pattern is an array then it is
implemented as a pointer so sizeof(x) is a pointer length (case 1), *x is an
address of the first array element so it's length is that of its elements
(case 2), the cast of x into its type is illegal possibly because x is
implicitly defined as a (Pattern *) (case 3), finally a convoluted casting
and redirection gives you the right answer (case 4). Sure, it would simple
and "portable" to say sizeof(Pattern), but if you did use sizeof(x) then
your function becomes context dependent and you cannot rely on the
macro-expanded code anymore.

Now is it the fault of my compiler (THINK C) to give me such a hard time,
is it my bad C programming style, or is it the ANSI standard which has some
gaping semantic holes?

Any answers?

Jacob L. Cybulski

Amdahl Australian Intelligent Tools Program
Department of Computer Science
La Trobe University
Bundoora, Vic 3083, Australia

Phone: +613 479 1270
Fax:   +613 470 4915
Telex: AA 33143
EMail: jacob@latcs1.oz.au

karl@ima.isc.com (Karl Heuer) (11/05/90)

In article <9156@latcs1.oz.au> jacob@latcs1.oz.au (Jacob L. Cybulski) writes:
>	typedef unsigned char Pattern[8];
>	void foo (Pattern x) { ... }

This is related to the pointer-vs-array stuff in the FAQ, which you should
read if you haven't already, but I'll post this answer which is in terms of
your specific example.

The typedef just disguises things; this is equivalent to
	void foo(unsigned char x[8]) { ... }
which, it can be argued, is logically incorrect since C does not allow arrays
to be passed by value.  However, tradition and the ANSI Standard say that this
should be accepted anyway, and *silently rewritten* as
	void foo(unsigned char *x) { ... }
since that's `probably' what the user really meant.  (A better alternative, in
my opinion, would have been to simply forbid array-parameter declarations in
function prototypes until such time as array copy is legalized.)

>the cast of x into its type is illegal possibly because x is implicitly
>defined as a (Pattern *) (case 3),

No, there are no `Pattern *' expressions in your code.  The type of x is
`unsigned char *' even if you thought you declared it as `Pattern'.  (This
rewrite is *only* true of formal parameters, and does not apply to any other
kind of declaration.)  You get an error because it's illegal to cast anything
to an array type.

To avoid this type of confusion, I recommend:

(a) *NEVER* declare formal arguments of array type; always use pointer syntax.
    (Some people like to use pointer vs. array syntax to distinguish how they
    intend to use the object, but this causes more confusion than it's worth,
    and besides it doesn't generalize to non-parameter declarations.)

(b) If you want to use opaque types via typedefs, pass them in by reference
    (`Pattern *x' would have produced more consistent results, and is one of
    the few reasonable uses for a pointer-to-entire-array).

(c) If you might have to port to a pre-ANSI implementation that doesn't
    believe in array pointers, then don't use arrays in typedefs either.
    Enclose it in a struct first:
	typedef struct { unsigned char c[8]; } Pattern;
	... x->c[i]; ... /* was x[i] */

>Now is it the fault of my compiler (THINK C) to give me such a hard time,
>is it my bad C programming style, or is it the ANSI standard which has some
>gaping semantic holes?

The compiler is innocent.  I can't really call your style `bad' without
insulting several expert contributors to this group who also happen to use it,
but I always disrecommend it since it does lead to this common pitfall.  I
would classify it as a mistake in Classic C, a botch which X3J11 unfortunately
chose to repeat rather than fix.  (For more details on a possible fix, see
alt.lang.cfutures.)

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

pfalstad@phoenix.Princeton.EDU (Paul John Falstad) (11/05/90)

In article <9156@latcs1.oz.au> jacob@latcs1.oz.au (Jacob L. Cybulski) writes:
>typedef unsigned char Pattern[8];
>
>void foo (Pattern x)
>{
>/* 0 */	printf("%d\n", sizeof(Pattern));			/* prints 8 */
>/* 1 */	printf("%d\n", sizeof(x));			/* prints 4 */
>/* 2 */	printf("%d\n", sizeof(*x));			/* prints 1 */
>/* 3 */	printf("%d\n", sizeof((Pattern) x);		/* illegal */
>/* 4 */	printf("%d\n", sizeof(*((Pattern *) x)));	/* prints 8 */

(I added some missing parens)

>The intuition says that sizeof(Pattern) = sizeof(x) regardless of the
>Pattern definition.

This is true, if x is of type Pattern.  It isn't in this case, although
it looks like it.

							Well, not in C, if Pattern is an array then it is
>implemented as a pointer so sizeof(x) is a pointer length (case 1), *x is an

Not quite.  Arrays used as function arguments are implemented as
pointers.  void foo (Pattern x) is equivalent to void foo (char *).

>address of the first array element so it's length is that of its elements
>(case 2), the cast of x into its type is illegal possibly because x is
>implicitly defined as a (Pattern *) (case 3), finally a convoluted casting

x is implicitly defined as a char *.  You can't cast something to an
array.

>Now is it the fault of my compiler (THINK C) to give me such a hard time,
>is it my bad C programming style, or is it the ANSI standard which has some
>gaping semantic holes?

The ANSI standard is giving you these results.  I don't consider them
gaping semantic holes.  As long as you remember that whenever you pass
an array as a function argument, a pointer is actually passed, the above
all makes sense.  To make it clear, you should probably declare foo as
void foo(char x[]) or void foo (char *x).  That style also emphasizes
the fact that the called function just gets a pointer to the first
element of the array, and therefore cannot possibly know the size of the
array, so sizeof will not work.

--
Paul Falstad, pfalstad@phoenix.princeton.edu PLink:HYPNOS GEnie:P.FALSTAD
"Your attention, please.  Would anyone who knows where the white courtesy
phone is located please pick up the white courtesy phone."

chris@mimsy.umd.edu (Chris Torek) (11/05/90)

In article <9156@latcs1.oz.au> jacob@latcs1.oz.au (Jacob L. Cybulski)
asks why, given the quoted code, sizeof(Pattern) != sizeof(x):
>typedef unsigned char Pattern[8];
>void foo(Pattern x) {
>/* 0 */	printf("%d\n", sizeof(Pattern);			/* prints 8 */
>/* 1 */	printf("%d\n", sizeof(x));			/* prints 4 */

>The intuition says that sizeof(Pattern) = sizeof(x) regardless of the
>Pattern definition.

Almost true---but *not* regardless of the `x' definition.  (Incidentally,
your example is clearly not out of a working program, as it has a missing
close parenthesis.)

>Well, not in C, if Pattern is an array then it is implemented as a
>pointer so sizeof(x) is a pointer length ....

Bad referent for `it' (core dumped) :-)

In C, there is no such thing as an array parameter.  In a language much
like C, but more strict, compilers would be required to reject any
parameter declaration whose type is `array N of T', for any N and T.
But the C language grants explicit rope with which you may hang yourself:
you may declare any parameter as an array, and the compiler will say
`Oh, that's not an array N of T, that's a pointer to T.  I'll just tidy
up here and pretend you wrote ``pointer to T''.'

But there *are* array objects, and any other declaration of type
`array N of T' really does mean `array N of T'.

Since `typedef' does not make a new type, but rather a synonym for an
old type, `typedef unsigned char Pattern[8];' simply makes `Pattern'
a synonym for `array 8 of unsigned char'.  The compiler sees your
function definition as

	Declare foo as a function returning void, in which
	there is one argument whose type is `array 8 of unsigned
	char'.  Define the function with the contents of the block.

The compiler does its bit with the type of the argument and changes
this to:

	Declare foo as a function returning void, in which
	there is one argument whose type is `pointer to unsigned
	char'.

Thus, you have not actually written `void foo(Pattern x)'.  You have
actually written `void foo(unsigned char *x)'.

>/* 2 */	printf("%d\n", sizeof(*x));			/* prints 1 */

>*x is an address of the first array element so it's length is that
>of its elements

No: `x' is `pointer to unsigned char', so `*x' is `unsigned char', and
sizeof(unsigned char) is 1.

>/* 3 */	printf("%d\n", sizeof((Pattern) x);		/* illegal */

>the cast of x into its type is illegal possibly because x is
>implicitly defined as a (Pattern *)

No, the cast is illegal because there is no such thing an an array value,
and the result of a cast is a value, not an object.  It is therefore
illegal to use an array type (such as `array 8 of unsigned char', aka
`Pattern') in a cast.

>/* 4 */	printf("%d\n", sizeof(*((Pattern *) x));	/* prints 8 */

>finally a convoluted casting and redirection gives you the right answer

Sort of.  `x' is a pointer to array 8 of unsigned char, and can be cast
to a different pointer type (with implementation-defined semantics for
the actual value produced, although here the value is not used and must
be generated by the compiler).  `Pattern *' is another name for
`pointer to array 8 of unsigned char'.  Indirecting (the first unary `*'
above) produces an object---the result of an indirection is always an
object, much as the result of a cast or address-of operator is never an
object---of type `array 8 of unsigned char'.  In a value context, this
object would be transformed into a value of type `pointer to unsigned
char', but unlike an actual function parameter, sizeof() does not call
for a value.  Thus, the object is in object context and remains an array,
and hence has size 8*sizeof(unsigned char) or 8*1 or 8.

>Any answers?

Never use array parameters. :-)

More seriously: for fixed opaque types that happen to be implementable
as an array, you are often better off making the type be a structure
whose only member is an array.  Instead of

	typedef unsigned char Pattern[8];

write

	typedef struct Pattern { unsigned char p_foo[8]; } Pattern;

You now have a compound object which acts more like simple objects,
rather than one that displays the peculiarity of changing type (and
hence also value) whenever it appears in a value context.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

dans@microsoft.UUCP (Dan SPALDING) (11/06/90)

In article <9156@latcs1.oz.au> jacob@latcs1.oz.au (Jacob L. Cybulski) writes:
-Being relatively new to C, I found a bit of a problem with a THINK C program
-that goes more or less like this :-
-
-typedef unsigned char Pattern[8];
-
-void foo (Pattern x)
-{
-/* 0 */	printf("%d\n", sizeof(Pattern);			/* prints 8 */
-/* 1 */	printf("%d\n", sizeof(x));			/* prints 4 */
-/* 2 */	printf("%d\n", sizeof(*x));			/* prints 1 */
-/* 3 */	printf("%d\n", sizeof((Pattern) x);		/* illegal */
-/* 4 */	printf("%d\n", sizeof(*((Pattern *) x));	/* prints 8 */
-}
-
In C, the base of an array is synonomous with a pointer to an element (the
base element) of that array.  Therefore, Pattern is 8 bytes, x is 4 (the size
of all pointers on the Mac) and *x will be 1 byte (an unsigned char).


Hope this helps.




-- 
------------------------------------------------------------------------
dan spalding -- microsoft systems/languages -- microsoft!dans
"there has not been a word invented that would convey my indifference to
that remark." - paraphrase from hawkeye pierce

chris@mimsy.umd.edu (Chris Torek) (11/07/90)

In article <58808@microsoft.UUCP> dans@microsoft.UUCP (Dan SPALDING) writes:
>In C, the base of an array is synonomous with a pointer to an element (the
>base element) of that array.

Not exactly.  A precise definition is:

   Given an object of type `array N of T' placed in a value context, a
   C system must produce a value of type `pointer to T' which points
   to the first (0th) element of that array.

`Array' and `pointer' are completely different, but one happens to be
converted to the other IN VALUE CONTEXTS.  (A `value context' is any
place where you need the value of an expression.  Thus, given

	a = b;

`b' is in a value context, but `a' is not, because this puts the value
of b into a, but says nothing about the value of a.)  C happens to have
about three contexts which are not value contexts: declarations, left
hand side of assignment operators (including `i' in `++i' and `i++'),
and following `sizeof'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.brl.mil (Doug Gwyn) (11/07/90)

In article <9156@latcs1.oz.au> jacob@latcs1.oz.au (Jacob L. Cybulski) writes:
>typedef unsigned char Pattern[8];
>void foo (Pattern x)

Pattern is an array of 8 chars.  x is a pointer to a char.  This is the
way C has been for years.

>The intuition says that sizeof(Pattern) = sizeof(x) regardless of the
>Pattern definition.

Why are you relying on intuition rather than the language definition?

>Now is it the fault of my compiler (THINK C) to give me such a hard time,
>is it my bad C programming style, or is it the ANSI standard which has some
>gaping semantic holes?

I think the problem is your intuition.  THINK C is producing correct results.

volpe@camelback.crd.ge.com (Christopher R Volpe) (11/08/90)

In article <27498@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes:
|>`b' is in a value context, but `a' is not, because this puts the value
|>of b into a, but says nothing about the value of a.)  C happens to have
|>about three contexts which are not value contexts: declarations, left
|>hand side of assignment operators (including `i' in `++i' and `i++'),
|>and following `sizeof'.

Also, following "&".

|>-- 
|>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
|>Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris
                        
==================
Chris Volpe
G.E. Corporate R&D
volpecr@crd.ge.com

chris@mimsy.umd.edu (Chris Torek) (11/08/90)

>In article <27498@mimsy.umd.edu> I wrote:
>>C happens to have about three contexts which are not value contexts:
>>declarations, left hand side of assignment operators (including `i'
>>in `++i' and `i++'), and following `sizeof'.

In article <13504@crdgw1.crd.ge.com> volpe@camelback.crd.ge.com
(Christopher R Volpe) writes:
>Also, following "&".

Oops.

(I had the feeling I was forgetting at least one, hence the `about three'
in the original text.)  Have we forgotten any more? :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (11/09/90)

While we're on the subject of sizeof surprises, here is one that did
genuinely throw me off:

     main() {
        char c;
        c = 'c';     /* char variable holds char value */

        printf("sizeof c       = %d\n", sizeof c);
        printf("sizeof 'c'     = %d\n", sizeof 'c');
        printf("sizeof (char)  = %d\n", sizeof (char));
     }

Non-C wizards should try to guess what the program will print before
they run the program.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

pfalstad@stroke.Princeton.EDU (Paul John Falstad) (11/09/90)

In article <2665@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>        char c;
>        c = 'c';     /* char variable holds char value */
>        printf("sizeof c       = %d\n", sizeof c);
>        printf("sizeof 'c'     = %d\n", sizeof 'c');
>        printf("sizeof (char)  = %d\n", sizeof (char));

It prints:

sizeof c = 1
sizeof 'c' = 4
sizeof (char) = 1

Which is no real surprise if you know that 'c' is not a char.
Character literals are really ints.

--
Paul Falstad, pfalstad@phoenix.princeton.edu PLink:HYPNOS GEnie:P.FALSTAD
"Your attention, please.  Would anyone who knows where the white courtesy
phone is located please pick up the white courtesy phone."

friedl@mtndew.Tustin.CA.US (Stephen Friedl) (11/09/90)

In article <2665@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
> While we're on the subject of sizeof surprises, here is one that did
> genuinely throw me off:
> 
>		/* other stuff deleted */
>
>         printf("sizeof (char)  = %d\n", sizeof (char));

Hey, if you're using LPI C on the UNIX PC, it prints zero!

     Stephen

-- 
Stephen J. Friedl, KA8CMY / I speak for me only / Tustin, CA / 3B2-kind-of-guy
+1 714 544 6561  / friedl@mtndew.Tustin.CA.US  / {uunet,attmail}!mtndew!friedl

LPI C is for people who don't actually intend to write software

darcy@druid.uucp (D'Arcy J.M. Cain) (11/10/90)

In article <2665@cirrusl.UUCP> Rahul Dhesi writes:
> [...]
>     main() {
>        char c;
>        c = 'c';     /* char variable holds char value */
>
>        printf("sizeof c       = %d\n", sizeof c);
>        printf("sizeof 'c'     = %d\n", sizeof 'c');
>        printf("sizeof (char)  = %d\n", sizeof (char));
>     }
>
>Non-C wizards should try to guess what the program will print before
>they run the program.

OK, I'll byte.  What's the suprise?  I get 1, 4 and 1 on my 386 system
which is exactly what I expected.  Were you suprised that "'c'" was 4?
You shouldn't be.  That expression evaluates to an int, not a char.

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   I support gun control.
West Hill, Ontario, Canada         |   Let's start with the government!
+ 416 281 6094                     |

avery@netcom.UUCP (Avery Colter) (11/11/90)

sizeof(confusion) == worldwide.

-- 
Avery Ray Colter    {apple|claris}!netcom!avery  {decwrl|mips|sgi}!btr!elfcat
(415) 839-4567   "Fat and steel: two mortal enemies locked in deadly combat."
                                     - "The Bending of the Bars", A. R. Colter

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (11/14/90)

>>        printf("sizeof c       = %d\n", sizeof c);
>>        printf("sizeof 'c'     = %d\n", sizeof 'c');

>Were you suprised that "'c'" was 4?
>You shouldn't be.  That expression evaluates to an int, not a char.

If characters are promoted to ints in expressions, then why isn't
|sizeof c| equivalent to |sizeof (int) c|?  The confusion arises
because the term "expression" is defined differently in the definition
of C and in colloquoal conversation.

From my point of view -- call it naive if you will -- anything
that has a value is an "expression".  Therefore, if |c| as used above
has a value, it's an expression.  Therefore |c| must be promoted to
int.  Therefore |sizeof c| is equivalent to |sizeof (int) c|.  Hence
the surprise.

I'm sure K&R, H&S, and the ANSI standard all define these things in
various places.  But they can only guarantee what things mean, not
whether the meanings they define will surprise programmers.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

asd@cbnewsj.att.com (Adam S. Denton) (11/14/90)

In article <2692@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
> If characters are promoted to ints in expressions, then why isn't
> |sizeof c| equivalent to |sizeof (int) c|?  The confusion arises
> because the term "expression" is defined differently in the definition
> of C and in colloquoal conversation.

You take sizeof() of an object, not an expression.

> From my point of view -- call it naive if you will -- anything
> that has a value is an "expression".  Therefore, if |c| as used above
> has a value, it's an expression.  Therefore |c| must be promoted to
> int.  Therefore |sizeof c| is equivalent to |sizeof (int) c|.  Hence
> the surprise.

Your view is nice, but is not C.

> I'm sure K&R, H&S, and the ANSI standard all define these things in
> various places.  But they can only guarantee what things mean, not
> whether the meanings they define will surprise programmers.

It is the responsibility of the programmer to know the language
and use it properly, whether the language is perfect or not.
A hammer works quite well driving nails.  It's not the hammer's
fault if I have difficulty drilling holes with it.  C works
quite well with sizeof() and promotions defined the way they are.
Perhaps they could be better, but widely-available C compilers
do them the current way -- for good or ill.  If I want to avoid
writing my own compiler and my own non-portable code, I had better
code using what's out there and accepted NOW.  So (IMHO) no one
should EVER be "surprised" at C.  If they are, then they haven't
read and/or understood the manual or the Standard or whatever applies.
To use a tool properly, you must know how to do so.  C is a tool.

Adam Denton
asd@mtqua.att.com

gwyn@smoke.brl.mil (Doug Gwyn) (11/14/90)

In article <2692@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>If characters are promoted to ints in expressions, ...

But they aren't always.  It depends on the context.
The same is true for conversion of array name to pointer.

>I'm sure K&R, H&S, and the ANSI standard all define these things in
>various places.  But they can only guarantee what things mean, not
>whether the meanings they define will surprise programmers.

A good way to avoid being surprised is to learn what the things mean
before attempting to use them.

frank@grep.co.uk (Frank Wales) (11/15/90)

In article <2692@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>If characters are promoted to ints in expressions, then why isn't
>|sizeof c| equivalent to |sizeof (int) c|?  The confusion arises
>because the term "expression" is defined differently in the definition
>of C and in colloquoal conversation.

Not really, since sizeof operates on objects, rather than expressions,
although the object may be defined by an expression.  For example,
  sizeof (int)c   returns the size of the (anonymous) object which is the
result of evaluating the expression  (int)c .  If   sizeof c  operated on the
result of the expression  c  rather than the object named by c, then
sizeof could not be used to find out the size of the object c itself.
Some quoting mechanism would then be needed, and things might end up
syntactically more complex than they currently are.

>From my point of view -- call it naive if you will -- anything
>that has a value is an "expression".

Okay; "it's naive". :-)  Although expressions have values, that doesn't
mean that all values are the result of an expression.  There are times
when it must be possible to ask questions about objects themselves, rather
than the values they normally evaluate to.  It just so happens that
sizeof makes this distinction through virtue of its definition.  This
is also why it must be a part of the language itself, rather than an
external function.

>Therefore, if |c| as used above
>has a value, it's an expression.  Therefore |c| must be promoted to
>int.  Therefore |sizeof c| is equivalent to |sizeof (int) c|.  Hence
>the surprise.

The logic is sound, but the premise is faulty.

>But they can only guarantee what things mean, not
>whether the meanings they define will surprise programmers.

Ah, now you're talking about the interaction of people and ideas, at
which point the psychology of programming becomes the subject, and
so we must leave comp.lang.c ...
--
Frank Wales, Grep Limited,             [frank@grep.co.uk<->uunet!grep!frank]
Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303

gwyn@smoke.brl.mil (Doug Gwyn) (11/16/90)

In article <1990Nov14.154213.27324@cbnewsj.att.com> asd@cbnewsj.att.com (Adam S. Denton) writes:
>You take sizeof() of an object, not an expression.

Not so; there are two distinct uses of sizeof:
	sizeof unary_expression
	sizeof ( type_name )

The former does operate on an expression; however the expression is not
evaluated (so no side effects occur), only the type of the expression is
relevant.

A particularly simple form of unary expression consists of just the
identifier for an object, but more complex expressions are permitted.

>Your [dhesi's] view is nice, but is not C.
>[...]
>It is the responsibility of the programmer to know the language
>and use it properly, whether the language is perfect or not.

Now that you got right.

scjones@thor.UUCP (Larry Jones) (11/16/90)

In article <1990Nov14.154213.27324@cbnewsj.att.com>, asd@cbnewsj.att.com (Adam S. Denton) writes:
> You take sizeof() of an object, not an expression.

Sorry, wrong answer.  sizeof is an operator just like +, %, or -> --
its operand most certainly IS an expression.  sizeof (a * b / c) is
every bit as valid as sizeof a (or sizeof(a) for those who think it's
some kind of magic function rather than an operator).  The key point
(which someone else has already mentioned) is that expressions do not
undergo any conversions simply because they are expressions; rather,
the operands of each operator are converted according to the rules for
the particular operator.  So, although most operators convert an operand
which is an identifier into the corresponding value (i.e. convert an
lvalue into an rvalue), sizeof does not.
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
I hope Mom and Dad didn't rent out my room. -- Calvin

scs@adam.mit.edu (Steve Summit) (11/16/90)

In article <2692@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>>>        printf("sizeof 'c'     = %d\n", sizeof 'c');
>
>>Were you suprised that "'c'" was 4?
>>You shouldn't be.  That expression evaluates to an int, not a char.
>
>If characters are promoted to ints in expressions, then why isn't
>|sizeof c| equivalent to |sizeof (int) c|?

As Doug has pointed out, the Standard contains exhaustive
specifications of exactly when the "usual arithmetic conversions"
(i.e. promoting char to int) are carried out.

Though it's an ill-defined term (appearing nowhere in the Standard)
I find it useful to think of the notion of "dereference time."
(Please, let's not restart Usenet Standard Discussion #12671 on
the word "dereference.")  "Dereference time" is when you might
need the value of something, i.e. when you're doing arithmetic on
it (or, for pointers, using the unary * operator), but _not_ when
it's on the left-hand-side of the assignment operator, or the
operand of unary &, or the operand of sizeof.  "Dereference time"
is when you have to "look inside" an identifier, to pull out its
value.

(It is instructive to consider how differently the identifiers a
and b are handled in a simple assignment expression like "a = b".
b's value is fetched, and in fact this could be done fairly early
in the course of evaluation, but "a" must remain as a location%% --
an "lvalue" -- so we'll know where to store the value of the
right-hand-side when we're through evaluating it.  Fetching a's
value early in evaluation, or in fact at any point, would be a
completely counterproductive thing to do.)

It is at "dereference time" that the usual arithmetic conversions
are applied.  "Dereference time" is also when arrays decay into
pointers.

                                            Steve Summit
                                            scs@adam.mit.edu

(Now I'll get seventeen responses poking holes in this
"dereference time" notion and noting situations in which it's
incorrect or misleading.  I _said_ it was an ill-defined term,
and I'm only suggesting using it conceptually.  I suppose I
should think of a few exceptions first and mention them, to
forestall followups.)

%% The term "lvalue" takes its name from "left hand side," not
"location," and referring to "a," or any lvalue, as a "location"
can be misleading.  The currently Official definition of lvalue
is "an expression that designates an object" (ANSI section
3.2.2.1).  Note that "dereferencing" turns an lvalue into an
rvalue.

danm@hpnmdla.HP.COM (Dan Merget) (11/16/90)

In comp.lang.c, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:

> >>        printf("sizeof c       = %d\n", sizeof c);
> >>        printf("sizeof 'c'     = %d\n", sizeof 'c');

> >Were you suprised that "'c'" was 4?
> >You shouldn't be.  That expression evaluates to an int, not a char.
                      ^^^^
> If characters are promoted to ints in expressions, then why isn't
> |sizeof c| equivalent to |sizeof (int) c|?  The confusion arises
> because the term "expression" is defined differently in the definition
> of C and in colloquoal conversation.
                                                                      ********
                                                    ****  ****  ****  *      *
no no no No No No NO NO NO NO NO NO *NO* *NO* *NO*  *NO*  *NO*  *NO*  * NO!! *
                           ^^ ^^ ^^  ^^   ^^   ^^   ****  ****  ****  *      *
                                                                      ********

The confusion has nothing to do with the definition of "expression", and
nothing is getting promoted.  The poster did not say "All chars are promoted
to ints in expressions"; he said "That expression ["'c'"] evaluates to an int".

Remember, a "character" in c is simply the integer which corresponds to the
ASCII (or EBCDIC, etc) representation of that character.  When storing an array
of characters, you should use the 8-bit "char" type.  However, when you are
dealing with an individual character, you will get better milage out of the
"int" type.  In recognition of this fact, A CHARACTER BETWEEN TWO SINGLE
QUOTES IS AN **INT**!  EVEN THOUGH THEY CALL IT A "CHARACTER CONSTANT", IT IS
**NOT** AN 8-BIT CHAR!  (Exception: some compiler optimizations)
       _
      | |   /\           ________________________________________________
    __| |__/ /_/\____/\__\   This note is not intended to represent the  \
   / _  |_  __/  \__/  \___  viewpoints of my god, my country, my company,\
  / / | |/ / / /\ \/ /\ \  \ or my family.  I'm not even certain that      \
  \ \_| |\ \/\ \ \  / / /  / it accurately represents my own.              /
   \____| \__/\/  \/  \/  /_______________________________________________/

    Dan Merget
    danm@hpnmdla.HP.COM

ch@dce.ie (Charles Bryant) (11/16/90)

In article <14462@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>...
> there are two distinct uses of sizeof:
>	sizeof unary_expression
>	sizeof ( type_name )

That must make parsing it more difficult than if the parentheses were always
required:
	sizeof (type_name) - 1
since in other contexts "(type_name) - 1" would be an expression containing
a cast.
-- 
Charles Bryant (ch@dce.ie)
--
/usr/ch/.signature: Block device required

henry@zoo.toronto.edu (Henry Spencer) (11/17/90)

In article <1990Nov16.141352.22426@dce.ie> ch@dce.ie (Charles Bryant) writes:
>> there are two distinct uses of sizeof:
>>	sizeof unary_expression
>>	sizeof ( type_name )
>
>That must make parsing it more difficult than if the parentheses were always
>required:
>	sizeof (type_name) - 1
>since in other contexts "(type_name) - 1" would be an expression containing
>a cast.

Requiring parentheses wouldn't make any difference, since it's still legal
to wrap a unary expression in parentheses, so you can't tell from the
opening parenthesis which form is coming.  What *would* be useful is if
casts began with something that *couldn't* begin a unary expression.
As it is, either you need to sneak a peek a bit further ahead, or you
get to write some contorted code that swallows the parenthesis and then
decides, based on the next token, whether it has cast-minus-leading-paren
or parenthesized-expression-minus-leading-paren.  Definitely a nuisance,
and it's not the only such case in C.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

gwyn@smoke.brl.mil (Doug Gwyn) (11/17/90)

In article <4040002@hpnmdla.HP.COM> danm@hpnmdla.HP.COM (Dan Merget) writes:
>The confusion has nothing to do with the definition of "expression" ...

Especially note that
	'c'
is NOT a "'...'" operator being applied to a "c" operand!  It's a single
unanalyzable entity in its own right, known as a "character constant".

gwyn@smoke.brl.mil (Doug Gwyn) (11/17/90)

In article <1990Nov16.141352.22426@dce.ie> ch@dce.ie (Charles Bryant) writes:
>> there are two distinct uses of sizeof:
>>	sizeof unary_expression
>>	sizeof ( type_name )
>That must make parsing it more difficult than if the parentheses were always
>required ...

Yes, but this is necessary since C has "always" (actually, just for a long
time) had those two distinct usages of "sizeof".

Note that most published precedence tables for C operators, including K&R's,
get this wrong.  Using an example similar to the one you gave:

	sizeof ( short ) - 1
K&R's chart (p. 49 or 53, depending on edition) would have one parse this as
	sizeof ((short) (- 1))
whereas the only valid parse according to the C standard (and all existing
compilers of which I am aware) is
	(sizeof (short)) - 1
which has a different value.