[comp.lang.c] Uninitialized externals and statics

dror@infmx.UUCP (Dror Matalon) (08/17/89)

	K&R 2.4 say "External and static variables are initialized 
to zero by default, but it is good style to state the initialization
anyway."

	Is this really portable ? I always initialize globals but I want
to know if I need to change some old stuff that counts on uninitialized
variables being initialized to zero.
-- 
Dror Matalon                        Informix Software Inc.		
{pyramid,uunet}!infmx!dror          4100 Bohannon drive			
                                    Menlo Park, Ca. 94025
                                    415-926-6426

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/17/89)

In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes:
-	K&R 2.4 say "External and static variables are initialized 
-to zero by default, but it is good style to state the initialization
-anyway."
-	Is this really portable ?

It's supposed to have always been the rule.
There certainly is a lot of C code that depends on it.

henry@utzoo.uucp (Henry Spencer) (08/17/89)

In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes:
>	K&R 2.4 say "External and static variables are initialized 
>to zero by default, but it is good style to state the initialization
>anyway."
>
>	Is this really portable ? I always initialize globals but I want
>to know if I need to change some old stuff that counts on uninitialized
>variables being initialized to zero.

The initialization to zero for external and static variables is a property
of the C language; all definitions of the language agree on this.  Any
compiler that does not implement it is broken.

Note that automatic variables (i.e., essentially all variables defined
within a function) do *not* get initialized to anything in particular.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bengsig@oracle.nl (Bjorn Engsig) (08/18/89)

The default initialization of statics and externals without explicit inital
values also has the advantage (at least on some systems) that the load
module will be smaller.  If you explicitly initialize to zero, all those
zeroes will be stored in the file.
-- 
Bjorn Engsig, ORACLE Europe         \ /    "Hofstadter's Law:  It always takes
Path:   mcvax!orcenl!bengsig         X      longer than you expect, even if you
Domain: bengsig@oracle.nl           / \     take into account Hofstadter's Law"

bill@twwells.com (T. William Wells) (08/19/89)

In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
: The default initialization of statics and externals without explicit inital
: values also has the advantage (at least on some systems) that the load
: module will be smaller.  If you explicitly initialize to zero, all those
: zeroes will be stored in the file.

At one point, we got toasted by some of our customers because our
executables were excessively large. It seems that one of our
programmers did things like:

int     Array[1000] = {0};

This sort of thing made the difference between a product that could
be shipped on one floppy and one that required two.

Guk.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

barkus@amcom.UUCP (todd barkus) (08/19/89)

In article <10764@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes:
>>-	K&R 2.4 say "External and static variables are initialized 
>>-to zero by default, but it is good style to state the initialization
>>-anyway."
>>-	Is this really portable ?
>>
>It's supposed to have always been the rule.
>There certainly is a lot of C code that depends on it.

Rules are great, especially when every one follows them.
Unfortunately not every one does.  We have one if not two boxes
whose compilers evidently do not know how to read (some of us keep our
K&R right next to the terminal, so it's not like they wouldn't have
access to one).

"The person who assumes the answer often answers to their assumption".

I think that is a tebarkus original, (it just popped into my head),
which is not to say someone else with alot of unused space in their
head might not of had the idea first :-).

davidsen@sungod.crd.ge.com (ody) (08/21/89)

  Although the proposed ANSI standard (3.5.7 line 20) calls for
initialization to zero, cast to the appropriate type (my paraphrase) for
arithmetic and pointer types, virtually all implementations initialize
to zero (without cast) in the absense of explicit initialization.

  For reasons of "real" portability (what works vs. what any standard
says) I recommend initializing all float and pointer types explicitly if
you what to be sure code will work on machines in which float zero and
NULL are not "all bits off." This will not in any way may your code less
portable to environments which implement the proposed standard, but will
minimize your "learning experiences."
	bill davidsen		(davidsen@crdos1.crd.GE.COM)
  {uunet | philabs}!crdgw1!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

mike@hpfcdc.HP.COM (Mike McNelly) (08/21/89)

> The default initialization of statics and externals without explicit inital
> values also has the advantage (at least on some systems) that the load
> module will be smaller.  If you explicitly initialize to zero, all those
> zeroes will be stored in the file.

Several years ago our HP 9000/Series 300 customers (rightly) complained
that those external and static variables that were explicitly
initialized to zero were taking up too much data space.  This is no
longer the case.  The necessary changes to the compiler were quite small
and easily accomplished.  Now our compiler puts these data items into
BSS just as though they were not explicitly initialized.

Not only does this change result in smaller executable files but it can
speed up compilation considerably.  Some of our biggest gains have been
in system code and in our graphics packages.

Mike McNelly
mike%hpfcla@hplabs.hp.com

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/22/89)

In article <1989Aug19.053711.7462@twwells.com> bill@twwells.com (T. William Wells) writes:
>int     Array[1000] = {0};
>This sort of thing made the difference between a product that could
>be shipped on one floppy and one that required two.

The interesting thing is, the compiler is entitled to treat this exactly
the same as the non-explicit initializer case.  The difference is a side
effect of UNIX having adopted the COMMON model for extern data.  Somewhere
along the way, AT&T PCC releases started supporting DEF/REF (in effect),
without the extra cleverness that would have kept executables from turning
.BSS into .DATA.

timcc@csv.viccol.edu.au (Tim Cook) (08/22/89)

In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes:
> In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
> : The default initialization of statics and externals without explicit inital
> : values also has the advantage (at least on some systems) that the load
> : module will be smaller.  If you explicitly initialize to zero, all those
> : zeroes will be stored in the file.
> 
> At one point, we got toasted by some of our customers because our
> executables were excessively large. It seems that one of our
> programmers did things like:
> 
> int     Array[1000] = {0};
> 
> This sort of thing made the difference between a product that could
> be shipped on one floppy and one that required two.
> 
> Guk.

Let's not misappropriate blame here.  It seems to me that your compiler
should take the blame in this scenario.  Your programmer is simply making
sure of what will be in "Array" when the program starts (sounds like a
worthwhile programming practice).

It's not his fault if the compiler can't sense that he has initialized it
to the default.  Seems like a simple optimization to me.

(Of course, most C compilers produce assembler, so they could have a go at
passing the buck on this one).

bill@twwells.com (T. William Wells) (08/22/89)

In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes:
: In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes:
: > In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
: > : The default initialization of statics and externals without explicit inital
: > : values also has the advantage (at least on some systems) that the load
: > : module will be smaller.  If you explicitly initialize to zero, all those
: > : zeroes will be stored in the file.
: >
: > At one point, we got toasted by some of our customers because our
: > executables were excessively large. It seems that one of our
: > programmers did things like:
: >
: > int     Array[1000] = {0};
: >
: > This sort of thing made the difference between a product that could
: > be shipped on one floppy and one that required two.
: >
: > Guk.
:
: Let's not misappropriate blame here.  It seems to me that your compiler
: should take the blame in this scenario.  Your programmer is simply making
: sure of what will be in "Array" when the program starts (sounds like a
: worthwhile programming practice).
:
: It's not his fault if the compiler can't sense that he has initialized it
: to the default.  Seems like a simple optimization to me.

#1: Essentially *every* compiler does this particular bogosity. That
    means that a competent programmer had better be aware of it and
    deal with it. (Let me put it another way: I don't know of any
    that don't.)

#2: We shipped *source code* to our customers. They were complaining
    because *their* compilers made the executables too large. (So
    also did, and do, ours.)

#3: No, we could *not* tell them to use another compiler. Firstly,
    they wouldn't. Second, it almost always wouldn't make a
    difference (see #1). And third, in some cases, there *weren't*
    alternate compilers.

    (Which reminds me: someone asserted that there are more 80x86's
    running C programs than any other microprocessor. I doubt it. I
    suspect that it is something like an 8051, Z80, or other equally
    puerile processor. Do you know how many typewriters, toaster
    ovens, and computer toys are out there today? Programmed in C?
    For a guess, someone might want to look up Franklin Computer's
    sales of their hand-held spellers: these are sold in the
    millions. Most has something less than an 8086 in them. And they
    are all programmed mostly in C.

    Why am I reminded? Guess which processors have the greatest lack
    of even semi-functional C compilers? And which require the
    greatest competence in programmers to make things come out
    reasonably?)

Welcome to the real world.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

richard@aiai.ed.ac.uk (Richard Tobin) (08/23/89)

In article <1786@crdgw1.crd.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>  Although the proposed ANSI standard (3.5.7 line 20) calls for
>initialization to zero, cast to the appropriate type (my paraphrase) for
>arithmetic and pointer types, virtually all implementations initialize
>to zero (without cast) in the absense of explicit initialization.

Are there any well-known machines on which these aren't equivalent, and
on which the "wrong" initialization is done?

-- Richard


-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

karzes@mfci.UUCP (Tom Karzes) (08/23/89)

In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes:
>In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes:
>> At one point, we got toasted by some of our customers because our
>> executables were excessively large. It seems that one of our
>> programmers did things like:
>> 
>> int     Array[1000] = {0};
>
>Let's not misappropriate blame here.  It seems to me that your compiler
>should take the blame in this scenario.  Your programmer is simply making
>sure of what will be in "Array" when the program starts (sounds like a
>worthwhile programming practice).

Actually, given that the programmer is unwilling to rely on implicit
zero initialization of statics, he/she is is only making sure that that
the first element of the array is initialized to 0 in this example.

>It's not his fault if the compiler can't sense that he has initialized it
>to the default.  Seems like a simple optimization to me.

Yes, it is a simple optimization.  However, standard Unix C compilers have
always placed explicitly initialized objects in the data section, regardless
of whether or not they're initialized with zero.  One important benefit of
this is that it permits the value in the executable to be patched with adb.
If it's in the bss section, you can't patch it in the file, and are forced
to modify and recompile the defining file, then relink the executable.
When we added this optimization to our compiler, there were so many complaints
about not being able to patch C executables that we added 2 switches to
control this behavior.  One switch forces EVERY defined object to the
data section, even if it isn't initialized at all (this is fairly extreme
and almost never used; it certainly isn't the default).  The second limits
the maximum size of an object which will be placed in the data section
when explicitly initialized with zero (the first switch overrides this
switch).  Thus, by setting the first option to FALSE and the second to,
say, 16, the behavior is to place all uninitialized objects in the bss
section, and all objects which are explicitly initialized to zero in
the data section, unless their size is greater than 16 bytes, in which
case they're placed in the bss section.  It was felt that 16 was a
conservative figure (it forces all explicitly initialized scalars,
including double complex (if you're dealing with Fortran code), into
the data section, but gives you the space savings you want when large
arrays are involved).

>(Of course, most C compilers produce assembler, so they could have a go at
>passing the buck on this one).

This is unreasonable.  If you tell an assembler to place a 0 in the
data section, it has absolutely no business trying to second guess
your intent and placing it elsewhere.  Your code could be making
all kinds of assumptions about the location of that entity.  (Besides,
in most Unix assemblers all it has is one or more labels followed
by some data; how is it supposed to know that one of those labels
is ONLY used to refer to the following chunk of zero data, and that
it will be used for nothing else, and that it can all be safely moved
elsewhere?)

fredex@cg-atla.UUCP (Fred Smith) (08/24/89)

In article <783@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>In article <1786@crdgw1.crd.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>>  Although the proposed ANSI standard (3.5.7 line 20) calls for
>>initialization to zero, cast to the appropriate type (my paraphrase) for
>>arithmetic and pointer types, virtually all implementations initialize
>>to zero (without cast) in the absense of explicit initialization.
>
>Are there any well-known machines on which these aren't equivalent, and
>on which the "wrong" initialization is done?
>
>-- Richard
>
>
>-- 
>Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
>AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
>Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

On a Prime 50-series machine the representation of a NULL pointer
is not all zeroes!

As far as I know, however, this does not cause a problem in such
initializations. It is appropriate when testing a pointer against
for being the null pointer to do a case, thusly:

      char * foo;

      if (foo == (char *)NULL)

but then doing such a cast is ALWAYS appropriate, on any machine, since
right after you leave the company somebody will get the bright idea
of porting your code to some new whiz-bang-100 processor with weird
architecture. This is also apprpriate on 8086-class machines, since
the representation of a pointer, including the null pointer, will vary
with memory model. (I hate segmented architectures.)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (08/24/89)

In article <783@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:

> Are there any well-known machines on which these aren't equivalent, and
> on which the "wrong" initialization is done?

  Well known machines, yes. I don't have access to them anymore. The
Honeywell DPS series (36 bit) has 400000000000(8) for f.p. zero and
xxxxxx00004x(8) for byte pointer (x's are address bits). I believe that
some DG models have char ptrs which are non-zero when NULL, but I
haven't looked at one in close to ten years.

  Can someone help on this?

-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/25/89)

In article <7550@cg-atla.UUCP> fredex@cg-atla.UUCP (Fred Smith) writes:
-It is appropriate when testing a pointer against
-for being the null pointer to do a case, thusly:
-      if (foo == (char *)NULL)
-but then doing such a cast is ALWAYS appropriate, on any machine, since
-right after you leave the company somebody will get the bright idea
-of porting your code to some new whiz-bang-100 processor with weird
-architecture. This is also apprpriate on 8086-class machines, since
-the representation of a pointer, including the null pointer, will vary
-with memory model. (I hate segmented architectures.)

The cast would be necessary only if the implementor has screwed up the
definition of NULL.  #define NULL 0 would always be a correct implementation
definition for NULL, and the cast would never be necessary.  We explain this
in this newgroup every few months..

kenny@m.cs.uiuc.edu (08/25/89)

>On a Prime 50-series machine the representation of a NULL pointer
>is not all zeroes!

>As far as I know, however, this does not cause a problem in such
>initializations. It is appropriate when testing a pointer against
>for being the null pointer to do a case, thusly:

>      char * foo;

>      if (foo == (char *)NULL)

>but then doing such a cast is ALWAYS appropriate, on any machine, since
>right after you leave the company somebody will get the bright idea
>of porting your code to some new whiz-bang-100 processor with weird
>architecture. This is also apprpriate on 8086-class machines, since
>the representation of a pointer, including the null pointer, will vary
>with memory model. (I hate segmented architectures.)

This is BROKEN.  How many times do those of us that understand this
have to shout it?  When a pointer is compared with an integer, it is
implicitly promoted to an integer.  Saying
	if (foo == NULL)
means EXACTLY the same thing as saying
	if (foo == (char *) NULL)
and if the NULL pointer doesn't have an all-zero representation, the
compiler is responsible for promoting it.  Any compiler that doesn't
promote pointers in comparisons with integers has a serious bug.

The non-all-zero representation will break questionable code like
	struct zot {
		long zot_l;
		char *zot_p;
		} barf;

where the pointer barf.zot_p *will* be initialized to the all-zero bit
pattern.  You can't have everything.

A-T

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (08/25/89)

In article <4700042@m.cs.uiuc.edu>, kenny@m.cs.uiuc.edu writes:
> This is BROKEN.  How many times do those of us that understand this
> have to shout it?  When a pointer is compared with an integer, it is
> implicitly promoted to an integer.  Saying
	[ many things ]

  I think you may have missed the original posters point (he didn't shout
it). He was saying that on his machine a NULL pointer is not all bits zero.
Therefore if a C implementation set the pointer to "all bits zero" the
result would not be a NULL pointer, and would not compare equal to NULL.

  The ANSI standard calls for initialization to zero *cast to the
appropriate type* here, which would be another value. His compiler may be
non-conforming, but the point he was making has nothing to do with
promoting pointers to int (actually I think it's the other way round, since
an int may not be able to hold a pointer).

  The standard also allows NULL to be a pointer type ((void *) 0) which
would make it somewhat arcane to convert two pointer to integers to compare
them. 
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

maart@cs.vu.nl (Maarten Litmaath) (08/25/89)

kenny@m.cs.uiuc.edu writes:
\...  Any compiler that doesn't
\promote pointers in comparisons with integers has a serious bug.

No, YOU have a serious bug!

	foo	*p;

	if (p == 0)

means

	if (<bit pattern of p> == <bit pattern of nil pointer of type foo *>)

and not

	if (<integer promotion of p> == 0)
-- 
"rot H - dD/dt = J, div D = rho, div B = 0, |Maarten Litmaath @ VU Amsterdam:
rot E + dB/dt = 0" and there was 7-UP Light.|maart@cs.vu.nl, mcvax!botter!maart

henry@utzoo.uucp (Henry Spencer) (08/26/89)

In article <4700042@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes:
>... When a pointer is compared with an integer, it is
>implicitly promoted to an integer.  Saying
>	if (foo == NULL)
>means EXACTLY the same thing as saying
>	if (foo == (char *) NULL)
>and if the NULL pointer doesn't have an all-zero representation, the
>compiler is responsible for promoting it...

Right conclusion, seriously wrong reasons.  Comparing a pointer to an
integer is *illegal* in general.  There is one, repeat one, special
case:  an integer constant expression of value zero -- repeat, an
integer CONSTANT expression of value ZERO -- gets turned into a NULL
pointer of the appropriate type when compared to a pointer.  Note that
it is the integer, not the pointer, that is converted.  Note that no
such conversion is done on integer variables, integer constant expressions
with non-zero values, or general integer expressions.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

dbrooks@osf.osf.org (David Brooks) (08/26/89)

In article <1989Aug25.185428.3511@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
[...]
> There is one, repeat one, special
>case:  an integer constant expression of value zero -- repeat, an
>integer CONSTANT expression of value ZERO -- gets turned into a NULL
>pointer of the appropriate type when compared to a pointer.  Note that
>it is the integer, not the pointer, that is converted.  Note that no
>such conversion is done on integer variables, integer constant expressions
>with non-zero values, or general integer expressions.

I was about to make the same point myself.  This can be determined by
careful reading of K&R II: try section A6.6, page 198.  The constant 0
may be converted by a cast, by assignment, or by comparison, to a
pointer.  This legitimizes "if (p == 0)".  Requiring an actual
conversion step removes any implicatino that the pointer is zero-valued.

Anyway, I had a question: what is this assumption about "all bits
zero" for the common case of initializing ints?  I wonder if there's
any machine out there that represents int 0 with some other bit
pattern...

Those of us old enough to remember when ones-complement seemed like a
good idea can begin to break into a sweat at this point :-)
-- 
David Brooks			dbrooks@osf.org
Open Software Foundation	uunet!osf.org!dbrooks
11 Cambridge Center		Personal views, not necessarily those
Cambridge, MA 02142, USA	of OSF, its sponsors or members.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/26/89)

In article <4700042@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes:
>This is BROKEN.  How many times do those of us that understand this
>have to shout it?  When a pointer is compared with an integer, it is
>implicitly promoted to an integer.

That's not right either.  Pointers may validly be compared only with
pointers to the same type or with null pointer constants.  "NULL" of
course is required to act like a null pointer constant in such contexts,
and so is "0" (quote marks not included).

If you want to promote a pointer to an integral type, an explicit
cast is required.  Which integral type is appropriate is implementation
defined.

bill@twwells.com (T. William Wells) (08/27/89)

In article <984@m3.mfci.UUCP> karzes@mfci.UUCP (Tom Karzes) writes:
: In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes:
: >In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes:
: >> At one point, we got toasted by some of our customers because our
: >> executables were excessively large. It seems that one of our
: >> programmers did things like:
: >>
: >> int     Array[1000] = {0};
: >
: >Let's not misappropriate blame here.  It seems to me that your compiler
: >should take the blame in this scenario.  Your programmer is simply making
: >sure of what will be in "Array" when the program starts (sounds like a
: >worthwhile programming practice).
:
: Actually, given that the programmer is unwilling to rely on implicit
: zero initialization of statics, he/she is is only making sure that that
: the first element of the array is initialized to 0 in this example.

Actually, the programmer was just following someone's brain-damaged
advice to initialize all globals. He had no idea why one might want
to do so. No, he wasn't a very good programmer. Or even a good coder.

Which reminds me of an answer to the "What makes a C expert" question
that I was going to give but didn't get around to. The "competent C
programmer" knows *what* is valid C. The "C expert" knows *why* it is
valid C; moreover, he is capable of selecting the best C tool for the
job, as opposed to merely using the first thing that comes to mind.

Fundamentally, the difference is that of between knowledge and
judgement.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

henry@utzoo.uucp (Henry Spencer) (08/27/89)

In article <609@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes:
>Anyway, I had a question: what is this assumption about "all bits
>zero" for the common case of initializing ints?  I wonder if there's
>any machine out there that represents int 0 with some other bit
>pattern...

It would be difficult, probably impossible, to build an ANSI-conforming
C implementation for such a machine.  ANSI C leaves representation of most
data types largely up to the implementor, but integers are pinned down
fairly thoroughly and pretty well have to be binary using one of the
orthodox representations.  I believe the standard is flexible enough in
key places for one's-complement to work, but more radical departures
from current orthodoxy will have trouble.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gary@dgcad.SV.DG.COM (Gary Bridgewater) (08/27/89)

In article <131@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>In article <783@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:
>... I believe that some DG models have char ptrs which are non-zero when NULL,
>but I haven't looked at one in close to ten years.

DG's 32 addresses come in two flavors: a 2-byte word address with left most
bit indicating indirection, next three bits indicating ring address (MULTICs
like rings) and the final 28 bits of word address. User rings are 4-7 so
the address of the 'first' word in the lowest user ring is 0x40000000.
Byte pointers are word pointers shifted left (no indirect byte pointers).
So the byte address of the first byte in the user's rings is 0x8000000.
Our C compiler is quite happy to allow NULL=0 to make commmon Cisms work
fine but if you violate the rules and expect to dereference a NULL pointer
you will get a ring validity trap. Crock? Bug? Feature? So far only
people with belly buttons have had an opinion on this. :-)
-- 
Gary Bridgewater, Data General Corp., Sunnyvale Ca.
gary@sv4.ceo.sv.dg.com or 
{amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary
No good deed goes unpunished.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/28/89)

In article <609@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes:
>Anyway, I had a question: what is this assumption about "all bits
>zero" for the common case of initializing ints?  I wonder if there's
>any machine out there that represents int 0 with some other bit
>pattern...

I doubt that it would be standard-conforming.

The proposed C standard does impose some constraints on implementations
that were not technically necessary.  Among these are: integers must be
represented by a binary numeration system (allows ones and twos complement,
maybe even sign/magnitude, but not several other reasonable representations);
'0' through '9' must have ascending contiguous integral values.

The former constraint doesn't bother me, but the latter does.

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/28/89)

In article <10831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
 
}The proposed C standard does impose some constraints on implementations
}that were not technically necessary.  [...binary numeration...]
}'0' through '9' must have ascending contiguous integral values.
 
}The former constraint doesn't bother me, but the latter does.

      Why!!! 
      What kind of idiot would design a character code with '0'..'9'
      in any other fashion.  The same can be said for 'a'..'z' and
      'A'..'Z', but we know which idiots would do that.

      These are the sorts of things which should fall under the principle
      of least astonishment!  Just because something is technically possible
      does not mean it is a good idea.

      It seems like the committee spent a lot of time thinking up obscure
      technically possible behavior just to see how clever they could be.

      "In the Klat-Klala numbering system all the odd digits come before
      all the even ones, we should allow for this."

John Hascall
(ps.  Apply `:-)'s above as needed to smother flames)
(pps. I think trigraphs were a misguided effort as well)

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/29/89)

In article <1392@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>      What kind of idiot would design a character code with '0'..'9'
>      in any other fashion.  The same can be said for 'a'..'z' and
>      'A'..'Z', but we know which idiots would do that.

Well, you see, it is not the job of X3J11 to determine what is idiotic
and what is not.  It is X3J11's job to specify a maximally useful
programming language.  Gratuitously excluding certain classes of
architecture would violate the Committee's charter.

If you were to consider EBCDIC's 8-bit bytes as signed, then the codes
for '0' .. '9' would appear in descending order.  That's not excessively
unreasonable.

>      It seems like the committee spent a lot of time thinking up obscure
>      technically possible behavior just to see how clever they could be.

Not really.  We did spend a lot of time determining just how much
variation had to be accommodated.  There are many interesting computer
architectures for which a C implementation would be something to be
encouraged.  Not all of them look like the systems you've encountered.

>(pps. I think trigraphs were a misguided effort as well)

I think that most of X3J11 might even privately agree with that
assessment.  However, they serve a possibly useful function with
very little adverse impact (mainly on idiots who use "??!").
The real problem with trigraphs is that they've been misconstrued
as an attempt to solve the international character set issue for
practical programming purposes.  The current party line is that
they're of use primarily in code transport among varying systems,
not for everyday programmer use.

diamond@csl.sony.co.jp (Norman Diamond) (08/29/89)

In article <10859@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:

>If you were to consider EBCDIC's 8-bit bytes as signed, then the codes
>for '0' .. '9' would appear in descending order.  That's not excessively
>unreasonable.

Nope; they're still ascending.  That (along with big-endianness) is why
a Fortran-66 program could read into an integer using A4 format and get
correct results.

-- 
Norman Diamond, Sony Corporation (diamond@ws.sony.junet)
  The above opinions are inherited by your machine's init process (pid 1),
  after being disowned and orphaned.  However, if you see this at Waterloo or
  Anterior, then their administrators must have approved of these opinions.

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/29/89)

In article <10859> gwyn@brl.arpa (Doug Gwyn) writes:
}In some article I rant and rave:

}>      What kind of idiot would design a character code with '0'..'9'
}>      in any other fashion.  The same can be said for 'a'..'z' and
}>      'A'..'Z', but we know which idiots would do that.
 
}Well, you see, it is not the job of X3J11 to determine what is idiotic
}and what is not.  It is X3J11's job to specify a maximally useful
}programming language.  Gratuitously excluding certain classes of
}architecture would violate the Committee's charter.

   If a standard is so broad as to include everything is it still a
   standard?  Is it wise to try to include every possible aberrant
   behavior (you are only going to encourage them)?  Where do you stop?
   Should also require that C compilers recognize the keywords in
   other languages (say Spanish) so as not to "gratuitously exclude
   certain classes" of programmers?

   If you are not going to restrict the "local alphabet" *
   characters to a contiguous sequence of integer values it certainly
   makes the problem of writing a portable sorting routine difficult.
   (The only alternative I can come up with off the top of my head is
   to have something like the following in a standard header:)

       #define _COLL_SEQ "abcdefghijklmnopqrstuvwxyz"

   and even that has it problems.


}If you were to consider EBCDIC's 8-bit bytes as signed, then the codes
}for '0' .. '9' would appear in descending order.  

   And if you were to consider them as tiny little floating point numbers,
   then the codes for '0'..'9' would make no sense at all.  :-)

}>(pps. I think trigraphs were a misguided effort as well)
 
}I think that most of X3J11 might even privately agree with that ...
}The real problem with trigraphs is that they've been misconstrued
}as an attempt to solve the international character set issue for
}practical programming purposes.  The current party line is that
}they're of use primarily in code transport among varying systems,
}not for everyday programmer use.

   That's the point.  They should be an international data transport
   standard not a C programming language standard.

   What if some group decides on a "insert other langauge here"
   standard that wants to use quadgraphs of $*$c, for example
   for this file transfer purpose.


John "Someone has to ask these stupid questions" Hascall

*  I was privately chided for my ethnocentric use of 'a'..'z'

barmar@think.COM (Barry Margolin) (08/29/89)

In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>   If you are not going to restrict the "local alphabet" *
>   characters to a contiguous sequence of integer values it certainly
>   makes the problem of writing a portable sorting routine difficult.

>*  I was privately chided for my ethnocentric use of 'a'..'z'

Well, since you mention that you aren't talking about just US ASCII,
it's worth pointing out that the international standard 8-bit
character code DOESN'T have the alphabetic characters contiguous.  It
was designed to be a superset of 7-bit ASCII.  This prevents it from
keeping the letters contiguous, since the alphabetic characters are
surrounded by non-alphabetic characters.  All the added characters
have their high order bit on.

So, if ANSI C were to require alphabetic characters to be contiguous,
it would not be possible to implement one that also supported the
standard character encodings.  Fully general lexicographic sorting
programs can't just use the numeric character values; indeed,
different coutries that use the same alphabet may have different
ordering conventions, so you can't even use a fixed ordering.  You
need a locale-dependent character-ordering predicate to do it right.

In Common Lisp, we define a partial ordering of the alphanumeric
characters required by the standard.  We specify that the uppercase
and lowercase characters must each be ordered alphabetically, that the
digits be ordered numerically, and that the digits not be interleaved
with any of the alphabetics.  We don't, however, require that the
characters be sequential within any of the three groups of characters,
nor do we specify the relative ordering of the three groups.  These
rules were designed so that both ASCII and EBCDIC could be used.  We
also define CHAR< and related predicates to permit the program to
access the actual order of the characters in the implementation.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)

In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>   If you are not going to restrict the "local alphabet" *
>   characters to a contiguous sequence of integer values it certainly
>   makes the problem of writing a portable sorting routine difficult.

Sorting in dictionary order is obviously locale-dependent.  The C standard
specifies facilities to assist in this (see strcoll()).  Note that it did
not attempt to constrain the alphabet.

>   That's the point.  They should be an international data transport
>   standard not a C programming language standard.

There IS such a standard (ISO 646), but it doesn't include representations
for certain glyphs that are used in C source code!

>   What if some group decides on a "insert other langauge here"
>   standard that wants to use quadgraphs of $*$c, for example
>   for this file transfer purpose.

C appears to have the first programming language standard that made a
serious attempt to address internationalization concerns.  Others can
do what they will, but some of them may follow C's general approaches.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)

In article <10759@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
>Nope; they're still ascending.

Oops, I stand corrected.  The magnitudes are descending but the values
(being negative) are ascending.  Oh well, it was just an example.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)

In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>If a standard is so broad as to include everything is it still a standard?

This question is phrased misleadingly.  X3.159 is not a computer architecture
standard; it is a C programming language standard.  Certainly it should
accommodate the widest feasible range of those factors that it cannot
constrain, so long as the utility of the language is not significantly
reduced thereby.

henry@utzoo.uucp (Henry Spencer) (08/30/89)

In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>   If you are not going to restrict the "local alphabet" *
>   characters to a contiguous sequence of integer values it certainly
>   makes the problem of writing a portable sorting routine difficult.

Uh, if you think that's the worst problem with writing a portable sorting
routine, you have no *concept* of the horrors that European languages commit
in defining collating sequences.  (The less said about Asian languages
the better...)  This is the least of the problems.  Building a sorting
routine that will "do the right thing" portably is a staggering task.

Incidentally, wishing for a contiguous alphabet will not make IBM (and
its non-contiguous-alphabet character set, EBCDIC) go away.  That alone
kills the idea.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

tneff@bfmny0.UUCP (Tom Neff) (08/31/89)

In article <10831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>The proposed C standard does impose some constraints on implementations
>that were not technically necessary.  Among these are: integers must be
>represented by a binary numeration system (allows ones and twos complement,
>maybe even sign/magnitude, but not several other reasonable representations);
>'0' through '9' must have ascending contiguous integral values.
>
>The former constraint doesn't bother me, but the latter does.

These constraints are reasonable.

The penalty for violating them in some future character set or machine
design will only be the lack of a fully ANSI conformant C compiler, and
the risk that ported ANSI C programs which explicitly take advantage of
these constraints in the code (not merely using system headers or macros
whose implementations *typically* rely on them, since the weirdo vendor
could be expected to provide workarounds in his supplied headers) will
not execute correctly when compiled without modification.

I suspect the purveyor of such an oddball CPU will have many worse
problems to deal with. :-)
-- 
"We walked on the moon --	((	Tom Neff
	you be polite"		 )) 	tneff@bfmny0.UU.NET

mcdonald@uxe.cso.uiuc.edu (08/31/89)

>Incidentally, wishing for a contiguous alphabet will not make IBM (and
>its non-contiguous-alphabet character set, EBCDIC) go away.  That alone
>kills the idea.

No, it won't. But it is easy to avoid: when you specify a computer,
simply specify that a certain character set (i.e. standard 
ASCII characters from 32 to 127) be used for all external and
internal purposes. This will automatically exclude EBCDIC and 
other perversions like CDC's "display code" or radix-50 filenames
(certain PDP-11 OS's). IBM mainframes are a world apart - and destined
for the graveyard of history, albeit very slowly.

Incidentally, I still have not forgiven IBM for the evil thing they
did when going from Model 26 keypunches (BCD) to Model 29 ones
(EBCDIC).

Doug McDonald

cowan@marob.masa.com (John Cowan) (09/01/89)

In article <10870@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <10759@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
>>Nope; they're still ascending.
>
>Oops, I stand corrected.  The magnitudes are descending but the values
>(being negative) are ascending.  Oh well, it was just an example.


Does the pANS still guarantee that the chars used in C programming (letters,
numbers, !@#$%^&*()_+ etc.) are non-negative?  K&R-1 made such a guarantee,
and it seems to be true on all "real" machines.  Only signed-byte machines
using EBCDIC and machines that use neither ASCII nor EBCDIC would violate
this rule.
-- 
Internet/Smail: cowan@marob.masa.com	Dumb: uunet!hombre!marob!cowan
Fidonet:  JOHN COWAN of 1:107/711	Magpie: JOHN COWAN, (212) 420-0527
		Charles li reis, nostre emperesdre magnes
		Set anz toz pleins at estet in Espagne.

evans@ditsyda.oz (Bruce Evans) (09/01/89)

In article <10870@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>Oops, I stand corrected.  The magnitudes are descending but the values
>(being negative) are ascending.  Oh well, it was just an example.

If chars are unsigned, they will not be negative (butterfly order :-).
-- 
Bruce Evans		evans@ditsyda.oz.au

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/01/89)

In article <24FD69D9.12F@marob.masa.com> cowan@marob.masa.com (John Cowan) writes:
>Does the pANS still guarantee that the chars used in C programming (letters,
>numbers, !@#$%^&*()_+ etc.) are non-negative?

The actual source code characters don't have values.  The execution-
environment values for the corresponding run-time characters (thus,
character constants) are indeed required to be positive.  Other
characters (not corresponding to those in the official C source
character set) can have any values representable in a byte.  (A byte
is whatever a C char is, not necessarily 8 bits.)