[comp.sys.atari.st] sizeof

dillon@CORY.BERKELEY.EDU.UUCP (04/12/87)

>>In doing this I have found two (more!) bugs in Megamax:-
>>
>>  sizeof("anystring") returns 4 (the size of the pointer!!!!), not 10.
>
>Excuse me, but this doesn't sound like a bug...  If you take a look at
>K&R page 94, "In fact, any reference to an array is converted by the
>compiler to a pointer to the beginning of the array."  This means you're
>taking sizeof a pointer, which is in fact 4...(Remember, in C a string
>is no different from a null terminated array of characters.)


	Wrong, it should be 10.  Remember that sizeof() is special when it
comes to arrays.  sizeof() always returns the effective storage.

	char array[32];
	char *ptr;

	sizeof(array)		-result had better be 32

	The same goes for strings...

	sizeof("hello")		-result had better be 6 (it includes the \0)

	But pointers, on the other hand...

	sizeof(ptr)		-result is 4


			-Matt

burris@ihuxz.UUCP (04/14/87)

In article <8704120143.AA11235@cory.Berkeley.EDU>, dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
> >>In doing this I have found two (more!) bugs in Megamax:-
> >>
> >>  sizeof("anystring") returns 4 (the size of the pointer!!!!), not 10.
> >
> >Excuse me, but this doesn't sound like a bug...  If you take a look at
> 
> 	Wrong, it should be 10.  Remember that sizeof() is special when it
> comes to arrays.  sizeof() always returns the effective storage.
> 
> 	char array[32];
> 	char *ptr;
> 
> 	sizeof(array)		-result had better be 32
> 
> 	The same goes for strings...
> 
> 	sizeof("hello")		-result had better be 6 (it includes the \0)
> 
> 	But pointers, on the other hand...
> 
> 	sizeof(ptr)		-result is 4
> 
> 
> 			-Matt


The first response was correct. That's why there's a strlen() function.
Now if it were as follows:

char	string[ 20 ];
char	*strpt = "hello";

main()
{
	printf( "%d\n", sizeof( string ) );
	printf( "%d\n", sizeof( strpt ) );
	printf( "%d\n", sizeof( "hello" ) );
}

the result would be:

20
4
4

Dave Burris
ihnp4!ihuxz!burris

bds@mtgzz.UUCP (04/14/87)

I strongly recommend using either of the constructs:

static char xyz[] = "string";
/* at some point later in your code */
... sizeof xyz ... /* no overhead in using the symbol: xyz */

OR

... sizeof (char *) ...

Depending on what result you are looking for.

Note: (sizeof xyz) == (strlen(xyz) + 1) in THIS case. After all, I could
set xyz[0] = '\0', and the equality will not hold.
Remember: sizeof does not generate any run time code (unlike the strlen);
it is evaluated at compile time.

sansom@trwrb.UUCP (04/14/87)

In article <2006@ihuxz.ATT.COM> burris@ihuxz.ATT.COM (Burris) writes:
>bla, bla, bla, ...Now if it were as follows:
>char	string[ 20 ];
>char	*strpt = "hello";
>
>main()
>{
>	printf( "%d\n", sizeof( string ) );
>	printf( "%d\n", sizeof( strpt ) );
>	printf( "%d\n", sizeof( "hello" ) );
>}
>the result would be:
>20
>4
>4

(This is getting out of hand) - I just ran your program on our 4.2 BSD
system and came up with the following results:

20
4
6

Our compiler is of the "portable" variety, so I would think that it is
generating "correct" code.  Still, there clearly seems to be some fuzziness
out there regarding this type of "problem".  Some compilers treat a string
literal (e.g., "hello") as an array, and some treat it as a pointer to an
array.  I guess that's one of the reasons an ANSI standard C is in the works.

-Rich
-- 
   /////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  /// Richard E. Sansom                   TRW Electronics & Defense Sector \\\
  \\\ {decvax,ucbvax,ihnp4}!trwrb!sansom  Redondo Beach, CA                ///
   \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

dillon@CORY.BERKELEY.EDU.UUCP (04/15/87)

:main()
:{
:	printf( "%d\n", sizeof( string ) );
:	printf( "%d\n", sizeof( strpt ) );
:	printf( "%d\n", sizeof( "hello" ) );
:}
:
:the result would be:
:
:20
:4
:4
:
:Dave Burris

	Sorry Dave, your wrong.  Not only should sizeof("hello") be 6
theoretically, but every compiler I've ever used (and I have NOT used
megamax) returns 6.  Just how long have you been programming in C?

					-Matt

pett@socrates.ucsf.edu.UUCP (04/15/87)

In article <2006@ihuxz.ATT.COM> burris@ihuxz.ATT.COM (Burris) writes:
|In article <8704120143.AA11235@cory.Berkeley.EDU>, dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
|> >>  sizeof("anystring") returns 4 (the size of the pointer!!!!), not 10.
|> >
|> >Excuse me, but this doesn't sound like a bug...  
|
|The first response was correct. That's why there's a strlen() function.
|Now if it were as follows:
|
|char	string[ 20 ];
|char	*strpt = "hello";
|
|main()
|{
|	printf( "%d\n", sizeof( string ) );
|	printf( "%d\n", sizeof( strpt ) );
|	printf( "%d\n", sizeof( "hello" ) );
|}
|
|the result would be:
|
|20
|4
|4
|
|Dave Burris
|ihnp4!ihuxz!burris

Unfortunately, if you'd bothered to actually run your program (as I did
on our VAX 8600) you would have noticed the actual result:

20 
4
6

				Eric Pettersen
				pett@cgl.ucsf.edu

jmg@cernvax.UUCP (04/15/87)

In article <2006@ihuxz.ATT.COM> burris@ihuxz.ATT.COM (Burris) writes:
>
>The first response was correct. That's why there's a strlen() function.
>Now if it were as follows:
>
>char	string[ 20 ];
>char	*strpt = "hello";
>
>main()
>{
>	printf( "%d\n", sizeof( string ) );
>	printf( "%d\n", sizeof( strpt ) );
>	printf( "%d\n", sizeof( "hello" ) );
>}
>
>the result would be:
>
>20
>4
>4

As the originator of all of this C(ontroversy) I thought that I ought
to see what happens on our BSD4.2 system. The output of the test program:

20
4
6

The moral is clear: either say what the test program DID produce (on
system xyz with compiler pqr) or risk dropping a clanger.

Moshe did suggest to me a trick which made sizeof work. I forget what
it was.

Two further precisions on Megamax:

1. It is %* which is not accepted in printf statements.

2. Megamax does not like #define statements stuck just anywhere.
   It appears that they must be at the start of the text (I am not
   sure of the exact restrictions, but a perfectly valid gnuplot #define
   of a symbol is rejected.

Also, no getenv for megamax!

burris@ihuxz.UUCP (04/15/87)

In article <1765@trwrb.UUCP>, sansom@trwrb.UUCP (Richard Sansom) writes:
> (This is getting out of hand) - I just ran your program on our 4.2 BSD
> system and came up with the following results:
> 
> 20
> 4
> 6
> 

char	string[ 20 ];
char	*strpt = "hello";

main()
{
	printf( "%d\n", sizeof( string ) );
	printf( "%d\n", sizeof( strpt ) );
	printf( "%d\n", sizeof( "hello" ) );
}

I stand corrected. This is the correct answer, providing that your machine
has a pointer size of 4 bytes. Evidently the compiler allocates the fixed
string as:

char	string[ 6 ];

Sorry for the confusion. If I had the time to actually try the program I posted
I would have seen that the answer was wrong.

Dave Burris
ihnp4!ihuxz!burris

CS112170@YUSOL.BITNET.UUCP (04/22/87)

Date:     Wed, 22 Apr 87 14:05 EDT
From:     <CS112170@YUSOL.BITNET>
Subject:  re: sizeof() dilemma
To:       info-atari16@su-score.stanford.edu
X-Original-To:  edu%"info-atari16@su-score.stanford.edu", CS112170


To whoever is interested....

A sizeof("abcdef") on a VAX 8600, running C V2.2
returns a value of * 7 *

Dave Pullara
CS112170@YUSOL (on bitnet)

braner@batcomputer.tn.cornell.edu (braner) (04/26/87)

[]

The 'trick' was (?): instead of sizeof("string constant") (fails in Megamax)
use sizeof(array name).  So in the example at hand:
	char prompt[] = "gnuplot>"
	#define PROMPT prompt
	...
	...sizeof(PROMPT)...	/* original code unchanged */

A problem with Megamax is that there is no way to _just_ use the preprocessor.
Sometimes you might want to check out what's happening in some tricky macro
expansion, sometimes you might want to use the preprocessor for a purpose
unrelated to C (other languages, even word processing!).  (To defend Megamax:
they did it this way to keep the compiler small and fast.)  Another problem
is that the preprocessor is not documented in the Megamax docs _nor_ in the
C 'bible' (K&R) (not fully, anyway).

So: is there a PD C preprocessor (text->text) out there, preferably in C
source code form?

- Moshe Braner

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (11/15/89)

Thanks to everyone that replied to my question about sizeof(int),
both in news articles and by mail.

I wanted to know if there were any reasons,
other than making it easier to compile non-portable code,
for having sizeof(int)==sizeof(long) on the 16-bit Atari ST.

No one could come up with any other reason
(although maybe some think they did :-).

Below are excerpts from most of the responses.
If yours isn't there, it only means that I didn't disagree
with what you said.

If anyone disagrees with my comments below,
it is probably best to continue by mail or follow up to group
comp.lang.c since this topic is now about C programming and
not directly related to the ST.

====

> From: stephen@oahu.cs.ucla.edu (Steve Whitney)
> Organization: UCLA Computer Science Department
> 
> You can always use this simple hack:
> #define int short
> But you have to be careful with constants.  If you pass 3 as an argument
> to a routine expecting a 16 bit integer, it will pass 00 00 00 03 instead
> of 00 03.  To get around that, pass your constants as (short)3 instead.

No. For "func(int x)", or if there is no prototype in scope,
"func( (short)3 )" will convert 3 to a short, and then convert that
short back to an int before putting it on the stack.

The same thing will happen with non-constants too.
"func( (short)x )", if x is type char, short, or int, will convert the
value of x to a short, and then convert this possibly truncated value
back to an int before putting it on the stack.

> Of course if you haven't written your program yet, just write stuff to
> use shorts instead of ints.

I use shorts when I want shorts, longs when I want longs,
and ints when I don't care which I get.
That is what ints are supposed to be for.
So that if it doesn't matter whether it is long or short
the compiler will use the most efficient type.

====

> From: ron@argus.UUCP (Ron DeBlock)
> Organization: NJ Inst of Tech, Newark NJ
> 
> This  isn't a flame, just a reminder:
> 
> If you want 16 bit ints, you must declare short ints.
             ^
             at least
> If you want 32 bit ints, you must declare long ints.
             ^
             at least
> If you just declare int, the size is implementation dependent.
Correct.

The rule is chars are at least 8 bits, shorts are at least 16 bits,
longs are at least 32 bits, and
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

Note that I sometimes use a machine that has 36-bit shorts
since the code to access 2-byte objects is very inefficient
and not worth implementing since most programs ask for shorts
for the wrong reason (i.e. except for explicitly non-portable
code, shorts should only be used in very large arrays).

> You're right that a 16 bit int makes more sense in a 16 bit architecture,
> but DO NOT depend on all compilers to do it that way!

I don't.
I rely on ints being at least as big as shorts and hope that the
compiler will give me the appropriate size for the current architecture.
For many ST C compilers, it doesn't.  This makes the program a little
slower, but since I write portable code it still works fine.

====

> From: apratt@atari.UUCP (Allan Pratt)
> Organization: Atari Corp., Sunnyvale CA
> 
> > "so that badly written code will still work ok"
> 
> I *do* consider this a valid reason.

Sorry.  It is certainly a valid reason if one is faced with having
to compile non-portable code.  When I said "not valid", I meant not
valid with respect to my question.  I was simply asking if anyone knew
of any *other* possible reasons.  So far, no one has come up with one.

> Personally, I consider "int" to
> be an evil data type when the issue of portability comes up.

Funny.  Personally I consider "int" to be the best thing for portability.
If fact that is its sole reason for existing.
If you don't use "int" for portability, what do you use it for?

One should only use int in cases where it doesn't matter whether
the compiler generates code for long, short, or something in between.
What is wrong with int is the way so many programmers have misused it.

> But look
> at UNIX and all the libraries meant to look like UNIX: malloc, fwrite,
> etc. all take "int" arguments, because "int" is "big enough" on those
> machines.  A 16-bit library will probably work, but you can't malloc /
> write more than 32K at a time.  Thanks.  MWC suffers from this, as does
> Alcyon and some Mac-derived compilers (but for a different reason).

True, but the ANSI standard version of C fixes all or most of these problems.
That's why I'm using GNU C instead of one of the older non-standard compilers.

====

> From kirkenda@jove.cs.pdx.edu  Sat Nov  4 00:41:24 1989
> From: Steve Kirkendall <kirkenda%jove.cs.pdx.edu@RELAY.CS.NET>
> Organization: Dept. of Computer Science, Portland State University; Portland OR
> 
> GCC has been ported to Minix-ST, and that version has a 16-bit int option.
> Unfortunately, it has a few problems (eg. sizeof(foo) returns a long value,
> which breaks code such as "p = (foo *)malloc(sizeof(foo))" because malloc()
> doesn't expect a long argument).

That problem will go away in general as more compilers
conform to the standard.

If minix is using the GNU compiler and the GNU library,
there shouldn't be any problem.  Under ANSI C, malloc()'s argument
is supposed to be type (size_t), which is the same type as the result
returned by sizeof.  For an ST, this is probably typedefed to
(unsigned long).  I don't know why the minix version shouldn't work.

> As I said before, the speed difference is about 20% overall.

I didn't mean to claim that everything would be twice as fast.
I was simply curious as to what people thought they were gaining
by chosing an option that made things at best no slower and at
worst a lot slower.

====

> From iuvax!STONY-BROOK.SCRC.Symbolics.COM!jrd  Thu Nov  2 17:19:05 1989
> From: John R. Dunning <iuvax!STONY-BROOK.SCRC.Symbolics.COM!jrd>
> 
>     From: watmath!rbutterworth@iuvax.cs.indiana.edu  (Ray Butterworth)
>     This reminds me of something I've been wondering about for a while.
>     Why does GCC on the ST have 32 bit ints?
> 
> GCC is written with the assumption that modern machines have at least
> 32-bit fixnums.

Fine.  But it shouldn't assume that those 32-bit integers will
have type (int).
When I write code that needs more than 16 bits of integer,
I ask for type (long), not for type (int).

> (As near as I can tell, all GNU code is.  That's
> because it's mostly true) GCC will not compile itself if you break that
> assumption

Then I'd say the compiler is badly written
(though there are of course varying degrees of badness,
and of all the bad code I've seen, this stuff is pretty good).
You should be able to write code that doesn't rely on (int) being
the same as (long) or being big enough to hold a pointer,
without any loss of efficiency.  i.e. on architectures where (int)
and (long) are the same, it will generate exactly the same machine code.

>   > Surely 16 is the obvious size considering it has 16 bit memory access.
> 
> Nonsense.  It makes far more sense to make the default sized frob be a
> common size, so you don't need to worry about it.  For cases you care
> about speed, you optimize.

Change "Nonsense." to "Nonsense:", and I'll agree.

The whole point of type (int) is so the compiler will optimize for me
when I don't know what architecture I am writing for.
And if I am writing portable code there is no way I should know what
architecture I am writing for.
If I want an integer value that is more than 16 bits, I'll ask for (long).
If I want an integer value that doesn't need to be more than 16 bits,
I'll ask for (int).  The compiler might give me 16, it might give me 32,
or it might give me 72; I don't really care.  The important thing is that
the compiler should give me the most efficient type that is at least 16
bits.

>   > (Note that I don't consider
>   >  "so that badly written code will still work ok"
>   >  as a valid reason.)

> Fine, turn it around.  Why is it valid for things that know they want to
> hack 16-bit frobs to declare them ints?

It isn't.  They should be declared int only if I want them to be
at least 16 bits.

> To avoid being 'badly written code', they should declare them shorts.

No.  If I want something that is *exactly* 16 bits, 
I am in on of two different situations:
1) I am writing machine specific code.
2) I am writing a special numerical algorithm.

In the first case, my code is obviously non-portable
so it is fine to use short, or char, or whatever type other than (int)
it is that will guarantee me 16 bit integers on this architecture.
Such code should of course be isolated from the rest of the portable
code, and documented as being very architecture specific so as to
minimize the amount of work required to port the program to a
different architecture.

In the second case, I can still write portable code,
but I have to be very careful about what assumptions I make.
e.g. #define SIGNBIT(x) (0x8000 & (x))
makes a big assumption about int being 16 bits.
But  #define SIGNBIT(x) ( (~(( (unsigned int)(~0) )>>1)) & (x) )
will work fine regardless of the size of int, and will generate
the same machine code as the first macro when int is 16 bits.
Coding for portabability may require a little extra effort,
but it doesn't mean the result has to be any lesss efficient.

>   > Every time anything accesses an int in GCC, it requires two memory
>   > accesses.  Most programs are full of things like "++i" or "i+=7"
>   > or "if(i>j)", and such things take approximately 100% longer when
>   > ints are 32 bits.
> 
> If you don't optimize them or aren't careful about how you declare them,
> sure.

But I did optimize; I declared them (int) expecting the compiler to give
me the integral type that is at least 16 bits long and is the most
efficient for the current architecture.  On an ST, that is 16 bits.

I hope you're not suggesting that by "optimizing" I should have
declared it (short) knowing that I am working on an ST.
That is known as writing bad, non-portable code.

On some some word addressable machines for instance, there are
perfectly correct compilers for which the code for accessing a short
can be far less efficent that the code for accessing a long.
e.g. 3 or more instructions instead of 1.

>   > Has anyone made a 16-bit GCC and library and done a comparison?
> 
> Yes, it's for sure faster to use shorts than ints.

That's what I was objecting to.
By the original definition of (int),
using int should be exactly as fast as using the faster of (short) or (long).
i.e. regardless of architecture, the best choice should be (int).

> Rather than griping, why not just use the -mshort switch?  It's designed
> for just this kind of thing.  It's even documented.

Sorry, I wasn't intending to gripe (although I know I often sound like it).
I also wasn't wanting to know how to make it behave the way I wanted.
I was simply wondering why the default behaviour was the way it was.

I think the GNU compiler and libraries are the best available for the ST,
and I certainly can't complain about the price.

I really was curious to see if anyone had any reason why they would
want a 32-bit (int) on an ST, other than for the obvious reason that it
makes it easier to compile badly written code (i.e. code that makes
non-portable assumptions), and so far no one has.

====

> From: hyc@math.lsa.umich.edu (Howard Chu)
> Organization: University of Michigan Math Dept., Ann Arbor
> 
> For compatibility - using the 32 bit int mode, I can take just about
> any source file off a Sun, Usenet (comp.sources.unix), etc., type
> "cc" or "make" and have a running executable without ever having to
> edit a single source file.

i.e. so you can compile code that makes some non-portable assumptions,
namely that sizeof(int) == sizeof(long).
i.e. so you can easily compile badly written code.
I already know about that reason.  I spend half my time at work
trying to get all-the-world's-a-VAX Berkely programs to work on
other hardware.

> >Surely 16 is the obvious size considering it has 16 bit memory access.
> Yep. So obvious, in fact, that every version of ST GCC that's been
> posted has been sent out with *both* 16 *and* 32 bit libraries. What
> are you complaining about?

The first version I had didn't have the 16-bit library.
I have since obtained updates.

====

> From: 7103_300@uwovax.uwo.ca
> From Eric R. Smith
> 
> The most up-to-date version of the libraries (both 16 and 32 bit) are
> available on dsrgsun.ces.cwru.edu, in ~ftp/atari. If you don't have
> these libraries, get them; thanks to some nice work by Dale Schumacher
> (dLibs), Henry Spencer (strings), Jwahar Bammi (lots of stuff), the
> people at Berkeley (curses and doprnt) and yours truly, they're a big
> improvement on the original GCC library.

Yes.  I just picked these up last week.  Thank you all.

(Of course gulam still dropped a couple of bombs the first time I
 used "more", so I aliased it away and use vi.)