dror@infmx.UUCP (Dror Matalon) (08/17/89)
K&R 2.4 say "External and static variables are initialized to zero by default, but it is good style to state the initialization anyway." Is this really portable ? I always initialize globals but I want to know if I need to change some old stuff that counts on uninitialized variables being initialized to zero. -- Dror Matalon Informix Software Inc. {pyramid,uunet}!infmx!dror 4100 Bohannon drive Menlo Park, Ca. 94025 415-926-6426
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/17/89)
In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes:
- K&R 2.4 say "External and static variables are initialized
-to zero by default, but it is good style to state the initialization
-anyway."
- Is this really portable ?
It's supposed to have always been the rule.
There certainly is a lot of C code that depends on it.
henry@utzoo.uucp (Henry Spencer) (08/17/89)
In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes: > K&R 2.4 say "External and static variables are initialized >to zero by default, but it is good style to state the initialization >anyway." > > Is this really portable ? I always initialize globals but I want >to know if I need to change some old stuff that counts on uninitialized >variables being initialized to zero. The initialization to zero for external and static variables is a property of the C language; all definitions of the language agree on this. Any compiler that does not implement it is broken. Note that automatic variables (i.e., essentially all variables defined within a function) do *not* get initialized to anything in particular. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bengsig@oracle.nl (Bjorn Engsig) (08/18/89)
The default initialization of statics and externals without explicit inital values also has the advantage (at least on some systems) that the load module will be smaller. If you explicitly initialize to zero, all those zeroes will be stored in the file. -- Bjorn Engsig, ORACLE Europe \ / "Hofstadter's Law: It always takes Path: mcvax!orcenl!bengsig X longer than you expect, even if you Domain: bengsig@oracle.nl / \ take into account Hofstadter's Law"
bill@twwells.com (T. William Wells) (08/19/89)
In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
: The default initialization of statics and externals without explicit inital
: values also has the advantage (at least on some systems) that the load
: module will be smaller. If you explicitly initialize to zero, all those
: zeroes will be stored in the file.
At one point, we got toasted by some of our customers because our
executables were excessively large. It seems that one of our
programmers did things like:
int Array[1000] = {0};
This sort of thing made the difference between a product that could
be shipped on one floppy and one that required two.
Guk.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
barkus@amcom.UUCP (todd barkus) (08/19/89)
In article <10764@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>In article <2128@infmx.UUCP> dror@infmx.UUCP (Dror Matalon) writes: >>- K&R 2.4 say "External and static variables are initialized >>-to zero by default, but it is good style to state the initialization >>-anyway." >>- Is this really portable ? >> >It's supposed to have always been the rule. >There certainly is a lot of C code that depends on it. Rules are great, especially when every one follows them. Unfortunately not every one does. We have one if not two boxes whose compilers evidently do not know how to read (some of us keep our K&R right next to the terminal, so it's not like they wouldn't have access to one). "The person who assumes the answer often answers to their assumption". I think that is a tebarkus original, (it just popped into my head), which is not to say someone else with alot of unused space in their head might not of had the idea first :-).
davidsen@sungod.crd.ge.com (ody) (08/21/89)
Although the proposed ANSI standard (3.5.7 line 20) calls for initialization to zero, cast to the appropriate type (my paraphrase) for arithmetic and pointer types, virtually all implementations initialize to zero (without cast) in the absense of explicit initialization. For reasons of "real" portability (what works vs. what any standard says) I recommend initializing all float and pointer types explicitly if you what to be sure code will work on machines in which float zero and NULL are not "all bits off." This will not in any way may your code less portable to environments which implement the proposed standard, but will minimize your "learning experiences." bill davidsen (davidsen@crdos1.crd.GE.COM) {uunet | philabs}!crdgw1!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
mike@hpfcdc.HP.COM (Mike McNelly) (08/21/89)
> The default initialization of statics and externals without explicit inital > values also has the advantage (at least on some systems) that the load > module will be smaller. If you explicitly initialize to zero, all those > zeroes will be stored in the file. Several years ago our HP 9000/Series 300 customers (rightly) complained that those external and static variables that were explicitly initialized to zero were taking up too much data space. This is no longer the case. The necessary changes to the compiler were quite small and easily accomplished. Now our compiler puts these data items into BSS just as though they were not explicitly initialized. Not only does this change result in smaller executable files but it can speed up compilation considerably. Some of our biggest gains have been in system code and in our graphics packages. Mike McNelly mike%hpfcla@hplabs.hp.com
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/22/89)
In article <1989Aug19.053711.7462@twwells.com> bill@twwells.com (T. William Wells) writes: >int Array[1000] = {0}; >This sort of thing made the difference between a product that could >be shipped on one floppy and one that required two. The interesting thing is, the compiler is entitled to treat this exactly the same as the non-explicit initializer case. The difference is a side effect of UNIX having adopted the COMMON model for extern data. Somewhere along the way, AT&T PCC releases started supporting DEF/REF (in effect), without the extra cleverness that would have kept executables from turning .BSS into .DATA.
timcc@csv.viccol.edu.au (Tim Cook) (08/22/89)
In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes: > In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes: > : The default initialization of statics and externals without explicit inital > : values also has the advantage (at least on some systems) that the load > : module will be smaller. If you explicitly initialize to zero, all those > : zeroes will be stored in the file. > > At one point, we got toasted by some of our customers because our > executables were excessively large. It seems that one of our > programmers did things like: > > int Array[1000] = {0}; > > This sort of thing made the difference between a product that could > be shipped on one floppy and one that required two. > > Guk. Let's not misappropriate blame here. It seems to me that your compiler should take the blame in this scenario. Your programmer is simply making sure of what will be in "Array" when the program starts (sounds like a worthwhile programming practice). It's not his fault if the compiler can't sense that he has initialized it to the default. Seems like a simple optimization to me. (Of course, most C compilers produce assembler, so they could have a go at passing the buck on this one).
bill@twwells.com (T. William Wells) (08/22/89)
In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes: : In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes: : > In article <478.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes: : > : The default initialization of statics and externals without explicit inital : > : values also has the advantage (at least on some systems) that the load : > : module will be smaller. If you explicitly initialize to zero, all those : > : zeroes will be stored in the file. : > : > At one point, we got toasted by some of our customers because our : > executables were excessively large. It seems that one of our : > programmers did things like: : > : > int Array[1000] = {0}; : > : > This sort of thing made the difference between a product that could : > be shipped on one floppy and one that required two. : > : > Guk. : : Let's not misappropriate blame here. It seems to me that your compiler : should take the blame in this scenario. Your programmer is simply making : sure of what will be in "Array" when the program starts (sounds like a : worthwhile programming practice). : : It's not his fault if the compiler can't sense that he has initialized it : to the default. Seems like a simple optimization to me. #1: Essentially *every* compiler does this particular bogosity. That means that a competent programmer had better be aware of it and deal with it. (Let me put it another way: I don't know of any that don't.) #2: We shipped *source code* to our customers. They were complaining because *their* compilers made the executables too large. (So also did, and do, ours.) #3: No, we could *not* tell them to use another compiler. Firstly, they wouldn't. Second, it almost always wouldn't make a difference (see #1). And third, in some cases, there *weren't* alternate compilers. (Which reminds me: someone asserted that there are more 80x86's running C programs than any other microprocessor. I doubt it. I suspect that it is something like an 8051, Z80, or other equally puerile processor. Do you know how many typewriters, toaster ovens, and computer toys are out there today? Programmed in C? For a guess, someone might want to look up Franklin Computer's sales of their hand-held spellers: these are sold in the millions. Most has something less than an 8086 in them. And they are all programmed mostly in C. Why am I reminded? Guess which processors have the greatest lack of even semi-functional C compilers? And which require the greatest competence in programmers to make things come out reasonably?) Welcome to the real world. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
richard@aiai.ed.ac.uk (Richard Tobin) (08/23/89)
In article <1786@crdgw1.crd.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: > Although the proposed ANSI standard (3.5.7 line 20) calls for >initialization to zero, cast to the appropriate type (my paraphrase) for >arithmetic and pointer types, virtually all implementations initialize >to zero (without cast) in the absense of explicit initialization. Are there any well-known machines on which these aren't equivalent, and on which the "wrong" initialization is done? -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
karzes@mfci.UUCP (Tom Karzes) (08/23/89)
In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes: >In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes: >> At one point, we got toasted by some of our customers because our >> executables were excessively large. It seems that one of our >> programmers did things like: >> >> int Array[1000] = {0}; > >Let's not misappropriate blame here. It seems to me that your compiler >should take the blame in this scenario. Your programmer is simply making >sure of what will be in "Array" when the program starts (sounds like a >worthwhile programming practice). Actually, given that the programmer is unwilling to rely on implicit zero initialization of statics, he/she is is only making sure that that the first element of the array is initialized to 0 in this example. >It's not his fault if the compiler can't sense that he has initialized it >to the default. Seems like a simple optimization to me. Yes, it is a simple optimization. However, standard Unix C compilers have always placed explicitly initialized objects in the data section, regardless of whether or not they're initialized with zero. One important benefit of this is that it permits the value in the executable to be patched with adb. If it's in the bss section, you can't patch it in the file, and are forced to modify and recompile the defining file, then relink the executable. When we added this optimization to our compiler, there were so many complaints about not being able to patch C executables that we added 2 switches to control this behavior. One switch forces EVERY defined object to the data section, even if it isn't initialized at all (this is fairly extreme and almost never used; it certainly isn't the default). The second limits the maximum size of an object which will be placed in the data section when explicitly initialized with zero (the first switch overrides this switch). Thus, by setting the first option to FALSE and the second to, say, 16, the behavior is to place all uninitialized objects in the bss section, and all objects which are explicitly initialized to zero in the data section, unless their size is greater than 16 bytes, in which case they're placed in the bss section. It was felt that 16 was a conservative figure (it forces all explicitly initialized scalars, including double complex (if you're dealing with Fortran code), into the data section, but gives you the space savings you want when large arrays are involved). >(Of course, most C compilers produce assembler, so they could have a go at >passing the buck on this one). This is unreasonable. If you tell an assembler to place a 0 in the data section, it has absolutely no business trying to second guess your intent and placing it elsewhere. Your code could be making all kinds of assumptions about the location of that entity. (Besides, in most Unix assemblers all it has is one or more labels followed by some data; how is it supposed to know that one of those labels is ONLY used to refer to the following chunk of zero data, and that it will be used for nothing else, and that it can all be safely moved elsewhere?)
fredex@cg-atla.UUCP (Fred Smith) (08/24/89)
In article <783@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <1786@crdgw1.crd.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >> Although the proposed ANSI standard (3.5.7 line 20) calls for >>initialization to zero, cast to the appropriate type (my paraphrase) for >>arithmetic and pointer types, virtually all implementations initialize >>to zero (without cast) in the absense of explicit initialization. > >Are there any well-known machines on which these aren't equivalent, and >on which the "wrong" initialization is done? > >-- Richard > > >-- >Richard Tobin, JANET: R.Tobin@uk.ac.ed >AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk >Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin On a Prime 50-series machine the representation of a NULL pointer is not all zeroes! As far as I know, however, this does not cause a problem in such initializations. It is appropriate when testing a pointer against for being the null pointer to do a case, thusly: char * foo; if (foo == (char *)NULL) but then doing such a cast is ALWAYS appropriate, on any machine, since right after you leave the company somebody will get the bright idea of porting your code to some new whiz-bang-100 processor with weird architecture. This is also apprpriate on 8086-class machines, since the representation of a pointer, including the null pointer, will vary with memory model. (I hate segmented architectures.)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (08/24/89)
In article <783@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: > Are there any well-known machines on which these aren't equivalent, and > on which the "wrong" initialization is done? Well known machines, yes. I don't have access to them anymore. The Honeywell DPS series (36 bit) has 400000000000(8) for f.p. zero and xxxxxx00004x(8) for byte pointer (x's are address bits). I believe that some DG models have char ptrs which are non-zero when NULL, but I haven't looked at one in close to ten years. Can someone help on this? -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/25/89)
In article <7550@cg-atla.UUCP> fredex@cg-atla.UUCP (Fred Smith) writes:
-It is appropriate when testing a pointer against
-for being the null pointer to do a case, thusly:
- if (foo == (char *)NULL)
-but then doing such a cast is ALWAYS appropriate, on any machine, since
-right after you leave the company somebody will get the bright idea
-of porting your code to some new whiz-bang-100 processor with weird
-architecture. This is also apprpriate on 8086-class machines, since
-the representation of a pointer, including the null pointer, will vary
-with memory model. (I hate segmented architectures.)
The cast would be necessary only if the implementor has screwed up the
definition of NULL. #define NULL 0 would always be a correct implementation
definition for NULL, and the cast would never be necessary. We explain this
in this newgroup every few months..
kenny@m.cs.uiuc.edu (08/25/89)
>On a Prime 50-series machine the representation of a NULL pointer >is not all zeroes! >As far as I know, however, this does not cause a problem in such >initializations. It is appropriate when testing a pointer against >for being the null pointer to do a case, thusly: > char * foo; > if (foo == (char *)NULL) >but then doing such a cast is ALWAYS appropriate, on any machine, since >right after you leave the company somebody will get the bright idea >of porting your code to some new whiz-bang-100 processor with weird >architecture. This is also apprpriate on 8086-class machines, since >the representation of a pointer, including the null pointer, will vary >with memory model. (I hate segmented architectures.) This is BROKEN. How many times do those of us that understand this have to shout it? When a pointer is compared with an integer, it is implicitly promoted to an integer. Saying if (foo == NULL) means EXACTLY the same thing as saying if (foo == (char *) NULL) and if the NULL pointer doesn't have an all-zero representation, the compiler is responsible for promoting it. Any compiler that doesn't promote pointers in comparisons with integers has a serious bug. The non-all-zero representation will break questionable code like struct zot { long zot_l; char *zot_p; } barf; where the pointer barf.zot_p *will* be initialized to the all-zero bit pattern. You can't have everything. A-T
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (08/25/89)
In article <4700042@m.cs.uiuc.edu>, kenny@m.cs.uiuc.edu writes: > This is BROKEN. How many times do those of us that understand this > have to shout it? When a pointer is compared with an integer, it is > implicitly promoted to an integer. Saying [ many things ] I think you may have missed the original posters point (he didn't shout it). He was saying that on his machine a NULL pointer is not all bits zero. Therefore if a C implementation set the pointer to "all bits zero" the result would not be a NULL pointer, and would not compare equal to NULL. The ANSI standard calls for initialization to zero *cast to the appropriate type* here, which would be another value. His compiler may be non-conforming, but the point he was making has nothing to do with promoting pointers to int (actually I think it's the other way round, since an int may not be able to hold a pointer). The standard also allows NULL to be a pointer type ((void *) 0) which would make it somewhat arcane to convert two pointer to integers to compare them. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
maart@cs.vu.nl (Maarten Litmaath) (08/25/89)
kenny@m.cs.uiuc.edu writes:
\... Any compiler that doesn't
\promote pointers in comparisons with integers has a serious bug.
No, YOU have a serious bug!
foo *p;
if (p == 0)
means
if (<bit pattern of p> == <bit pattern of nil pointer of type foo *>)
and not
if (<integer promotion of p> == 0)
--
"rot H - dD/dt = J, div D = rho, div B = 0, |Maarten Litmaath @ VU Amsterdam:
rot E + dB/dt = 0" and there was 7-UP Light.|maart@cs.vu.nl, mcvax!botter!maart
henry@utzoo.uucp (Henry Spencer) (08/26/89)
In article <4700042@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes: >... When a pointer is compared with an integer, it is >implicitly promoted to an integer. Saying > if (foo == NULL) >means EXACTLY the same thing as saying > if (foo == (char *) NULL) >and if the NULL pointer doesn't have an all-zero representation, the >compiler is responsible for promoting it... Right conclusion, seriously wrong reasons. Comparing a pointer to an integer is *illegal* in general. There is one, repeat one, special case: an integer constant expression of value zero -- repeat, an integer CONSTANT expression of value ZERO -- gets turned into a NULL pointer of the appropriate type when compared to a pointer. Note that it is the integer, not the pointer, that is converted. Note that no such conversion is done on integer variables, integer constant expressions with non-zero values, or general integer expressions. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
dbrooks@osf.osf.org (David Brooks) (08/26/89)
In article <1989Aug25.185428.3511@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: [...] > There is one, repeat one, special >case: an integer constant expression of value zero -- repeat, an >integer CONSTANT expression of value ZERO -- gets turned into a NULL >pointer of the appropriate type when compared to a pointer. Note that >it is the integer, not the pointer, that is converted. Note that no >such conversion is done on integer variables, integer constant expressions >with non-zero values, or general integer expressions. I was about to make the same point myself. This can be determined by careful reading of K&R II: try section A6.6, page 198. The constant 0 may be converted by a cast, by assignment, or by comparison, to a pointer. This legitimizes "if (p == 0)". Requiring an actual conversion step removes any implicatino that the pointer is zero-valued. Anyway, I had a question: what is this assumption about "all bits zero" for the common case of initializing ints? I wonder if there's any machine out there that represents int 0 with some other bit pattern... Those of us old enough to remember when ones-complement seemed like a good idea can begin to break into a sweat at this point :-) -- David Brooks dbrooks@osf.org Open Software Foundation uunet!osf.org!dbrooks 11 Cambridge Center Personal views, not necessarily those Cambridge, MA 02142, USA of OSF, its sponsors or members.
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/26/89)
In article <4700042@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes: >This is BROKEN. How many times do those of us that understand this >have to shout it? When a pointer is compared with an integer, it is >implicitly promoted to an integer. That's not right either. Pointers may validly be compared only with pointers to the same type or with null pointer constants. "NULL" of course is required to act like a null pointer constant in such contexts, and so is "0" (quote marks not included). If you want to promote a pointer to an integral type, an explicit cast is required. Which integral type is appropriate is implementation defined.
bill@twwells.com (T. William Wells) (08/27/89)
In article <984@m3.mfci.UUCP> karzes@mfci.UUCP (Tom Karzes) writes: : In article <1423@csv.viccol.edu.au> timcc@csv.viccol.edu.au (Tim Cook) writes: : >In article <1989Aug19.053711.7462@twwells.com>, bill@twwells.com (T. William Wells) writes: : >> At one point, we got toasted by some of our customers because our : >> executables were excessively large. It seems that one of our : >> programmers did things like: : >> : >> int Array[1000] = {0}; : > : >Let's not misappropriate blame here. It seems to me that your compiler : >should take the blame in this scenario. Your programmer is simply making : >sure of what will be in "Array" when the program starts (sounds like a : >worthwhile programming practice). : : Actually, given that the programmer is unwilling to rely on implicit : zero initialization of statics, he/she is is only making sure that that : the first element of the array is initialized to 0 in this example. Actually, the programmer was just following someone's brain-damaged advice to initialize all globals. He had no idea why one might want to do so. No, he wasn't a very good programmer. Or even a good coder. Which reminds me of an answer to the "What makes a C expert" question that I was going to give but didn't get around to. The "competent C programmer" knows *what* is valid C. The "C expert" knows *why* it is valid C; moreover, he is capable of selecting the best C tool for the job, as opposed to merely using the first thing that comes to mind. Fundamentally, the difference is that of between knowledge and judgement. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
henry@utzoo.uucp (Henry Spencer) (08/27/89)
In article <609@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes: >Anyway, I had a question: what is this assumption about "all bits >zero" for the common case of initializing ints? I wonder if there's >any machine out there that represents int 0 with some other bit >pattern... It would be difficult, probably impossible, to build an ANSI-conforming C implementation for such a machine. ANSI C leaves representation of most data types largely up to the implementor, but integers are pinned down fairly thoroughly and pretty well have to be binary using one of the orthodox representations. I believe the standard is flexible enough in key places for one's-complement to work, but more radical departures from current orthodoxy will have trouble. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gary@dgcad.SV.DG.COM (Gary Bridgewater) (08/27/89)
In article <131@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >In article <783@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes: >... I believe that some DG models have char ptrs which are non-zero when NULL, >but I haven't looked at one in close to ten years. DG's 32 addresses come in two flavors: a 2-byte word address with left most bit indicating indirection, next three bits indicating ring address (MULTICs like rings) and the final 28 bits of word address. User rings are 4-7 so the address of the 'first' word in the lowest user ring is 0x40000000. Byte pointers are word pointers shifted left (no indirect byte pointers). So the byte address of the first byte in the user's rings is 0x8000000. Our C compiler is quite happy to allow NULL=0 to make commmon Cisms work fine but if you violate the rules and expect to dereference a NULL pointer you will get a ring validity trap. Crock? Bug? Feature? So far only people with belly buttons have had an opinion on this. :-) -- Gary Bridgewater, Data General Corp., Sunnyvale Ca. gary@sv4.ceo.sv.dg.com or {amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary No good deed goes unpunished.
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/28/89)
In article <609@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes: >Anyway, I had a question: what is this assumption about "all bits >zero" for the common case of initializing ints? I wonder if there's >any machine out there that represents int 0 with some other bit >pattern... I doubt that it would be standard-conforming. The proposed C standard does impose some constraints on implementations that were not technically necessary. Among these are: integers must be represented by a binary numeration system (allows ones and twos complement, maybe even sign/magnitude, but not several other reasonable representations); '0' through '9' must have ascending contiguous integral values. The former constraint doesn't bother me, but the latter does.
hascall@atanasoff.cs.iastate.edu (John Hascall) (08/28/89)
In article <10831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: }The proposed C standard does impose some constraints on implementations }that were not technically necessary. [...binary numeration...] }'0' through '9' must have ascending contiguous integral values. }The former constraint doesn't bother me, but the latter does. Why!!! What kind of idiot would design a character code with '0'..'9' in any other fashion. The same can be said for 'a'..'z' and 'A'..'Z', but we know which idiots would do that. These are the sorts of things which should fall under the principle of least astonishment! Just because something is technically possible does not mean it is a good idea. It seems like the committee spent a lot of time thinking up obscure technically possible behavior just to see how clever they could be. "In the Klat-Klala numbering system all the odd digits come before all the even ones, we should allow for this." John Hascall (ps. Apply `:-)'s above as needed to smother flames) (pps. I think trigraphs were a misguided effort as well)
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/29/89)
In article <1392@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: > What kind of idiot would design a character code with '0'..'9' > in any other fashion. The same can be said for 'a'..'z' and > 'A'..'Z', but we know which idiots would do that. Well, you see, it is not the job of X3J11 to determine what is idiotic and what is not. It is X3J11's job to specify a maximally useful programming language. Gratuitously excluding certain classes of architecture would violate the Committee's charter. If you were to consider EBCDIC's 8-bit bytes as signed, then the codes for '0' .. '9' would appear in descending order. That's not excessively unreasonable. > It seems like the committee spent a lot of time thinking up obscure > technically possible behavior just to see how clever they could be. Not really. We did spend a lot of time determining just how much variation had to be accommodated. There are many interesting computer architectures for which a C implementation would be something to be encouraged. Not all of them look like the systems you've encountered. >(pps. I think trigraphs were a misguided effort as well) I think that most of X3J11 might even privately agree with that assessment. However, they serve a possibly useful function with very little adverse impact (mainly on idiots who use "??!"). The real problem with trigraphs is that they've been misconstrued as an attempt to solve the international character set issue for practical programming purposes. The current party line is that they're of use primarily in code transport among varying systems, not for everyday programmer use.
diamond@csl.sony.co.jp (Norman Diamond) (08/29/89)
In article <10859@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >If you were to consider EBCDIC's 8-bit bytes as signed, then the codes >for '0' .. '9' would appear in descending order. That's not excessively >unreasonable. Nope; they're still ascending. That (along with big-endianness) is why a Fortran-66 program could read into an integer using A4 format and get correct results. -- Norman Diamond, Sony Corporation (diamond@ws.sony.junet) The above opinions are inherited by your machine's init process (pid 1), after being disowned and orphaned. However, if you see this at Waterloo or Anterior, then their administrators must have approved of these opinions.
hascall@atanasoff.cs.iastate.edu (John Hascall) (08/29/89)
In article <10859> gwyn@brl.arpa (Doug Gwyn) writes: }In some article I rant and rave: }> What kind of idiot would design a character code with '0'..'9' }> in any other fashion. The same can be said for 'a'..'z' and }> 'A'..'Z', but we know which idiots would do that. }Well, you see, it is not the job of X3J11 to determine what is idiotic }and what is not. It is X3J11's job to specify a maximally useful }programming language. Gratuitously excluding certain classes of }architecture would violate the Committee's charter. If a standard is so broad as to include everything is it still a standard? Is it wise to try to include every possible aberrant behavior (you are only going to encourage them)? Where do you stop? Should also require that C compilers recognize the keywords in other languages (say Spanish) so as not to "gratuitously exclude certain classes" of programmers? If you are not going to restrict the "local alphabet" * characters to a contiguous sequence of integer values it certainly makes the problem of writing a portable sorting routine difficult. (The only alternative I can come up with off the top of my head is to have something like the following in a standard header:) #define _COLL_SEQ "abcdefghijklmnopqrstuvwxyz" and even that has it problems. }If you were to consider EBCDIC's 8-bit bytes as signed, then the codes }for '0' .. '9' would appear in descending order. And if you were to consider them as tiny little floating point numbers, then the codes for '0'..'9' would make no sense at all. :-) }>(pps. I think trigraphs were a misguided effort as well) }I think that most of X3J11 might even privately agree with that ... }The real problem with trigraphs is that they've been misconstrued }as an attempt to solve the international character set issue for }practical programming purposes. The current party line is that }they're of use primarily in code transport among varying systems, }not for everyday programmer use. That's the point. They should be an international data transport standard not a C programming language standard. What if some group decides on a "insert other langauge here" standard that wants to use quadgraphs of $*$c, for example for this file transfer purpose. John "Someone has to ask these stupid questions" Hascall * I was privately chided for my ethnocentric use of 'a'..'z'
barmar@think.COM (Barry Margolin) (08/29/89)
In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: > If you are not going to restrict the "local alphabet" * > characters to a contiguous sequence of integer values it certainly > makes the problem of writing a portable sorting routine difficult. >* I was privately chided for my ethnocentric use of 'a'..'z' Well, since you mention that you aren't talking about just US ASCII, it's worth pointing out that the international standard 8-bit character code DOESN'T have the alphabetic characters contiguous. It was designed to be a superset of 7-bit ASCII. This prevents it from keeping the letters contiguous, since the alphabetic characters are surrounded by non-alphabetic characters. All the added characters have their high order bit on. So, if ANSI C were to require alphabetic characters to be contiguous, it would not be possible to implement one that also supported the standard character encodings. Fully general lexicographic sorting programs can't just use the numeric character values; indeed, different coutries that use the same alphabet may have different ordering conventions, so you can't even use a fixed ordering. You need a locale-dependent character-ordering predicate to do it right. In Common Lisp, we define a partial ordering of the alphanumeric characters required by the standard. We specify that the uppercase and lowercase characters must each be ordered alphabetically, that the digits be ordered numerically, and that the digits not be interleaved with any of the alphabetics. We don't, however, require that the characters be sequential within any of the three groups of characters, nor do we specify the relative ordering of the three groups. These rules were designed so that both ASCII and EBCDIC could be used. We also define CHAR< and related predicates to permit the program to access the actual order of the characters in the implementation. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)
In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: > If you are not going to restrict the "local alphabet" * > characters to a contiguous sequence of integer values it certainly > makes the problem of writing a portable sorting routine difficult. Sorting in dictionary order is obviously locale-dependent. The C standard specifies facilities to assist in this (see strcoll()). Note that it did not attempt to constrain the alphabet. > That's the point. They should be an international data transport > standard not a C programming language standard. There IS such a standard (ISO 646), but it doesn't include representations for certain glyphs that are used in C source code! > What if some group decides on a "insert other langauge here" > standard that wants to use quadgraphs of $*$c, for example > for this file transfer purpose. C appears to have the first programming language standard that made a serious attempt to address internationalization concerns. Others can do what they will, but some of them may follow C's general approaches.
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)
In article <10759@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes: >Nope; they're still ascending. Oops, I stand corrected. The magnitudes are descending but the values (being negative) are ascending. Oh well, it was just an example.
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)
In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: >If a standard is so broad as to include everything is it still a standard? This question is phrased misleadingly. X3.159 is not a computer architecture standard; it is a C programming language standard. Certainly it should accommodate the widest feasible range of those factors that it cannot constrain, so long as the utility of the language is not significantly reduced thereby.
henry@utzoo.uucp (Henry Spencer) (08/30/89)
In article <1403@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: > If you are not going to restrict the "local alphabet" * > characters to a contiguous sequence of integer values it certainly > makes the problem of writing a portable sorting routine difficult. Uh, if you think that's the worst problem with writing a portable sorting routine, you have no *concept* of the horrors that European languages commit in defining collating sequences. (The less said about Asian languages the better...) This is the least of the problems. Building a sorting routine that will "do the right thing" portably is a staggering task. Incidentally, wishing for a contiguous alphabet will not make IBM (and its non-contiguous-alphabet character set, EBCDIC) go away. That alone kills the idea. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
tneff@bfmny0.UUCP (Tom Neff) (08/31/89)
In article <10831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >The proposed C standard does impose some constraints on implementations >that were not technically necessary. Among these are: integers must be >represented by a binary numeration system (allows ones and twos complement, >maybe even sign/magnitude, but not several other reasonable representations); >'0' through '9' must have ascending contiguous integral values. > >The former constraint doesn't bother me, but the latter does. These constraints are reasonable. The penalty for violating them in some future character set or machine design will only be the lack of a fully ANSI conformant C compiler, and the risk that ported ANSI C programs which explicitly take advantage of these constraints in the code (not merely using system headers or macros whose implementations *typically* rely on them, since the weirdo vendor could be expected to provide workarounds in his supplied headers) will not execute correctly when compiled without modification. I suspect the purveyor of such an oddball CPU will have many worse problems to deal with. :-) -- "We walked on the moon -- (( Tom Neff you be polite" )) tneff@bfmny0.UU.NET
mcdonald@uxe.cso.uiuc.edu (08/31/89)
>Incidentally, wishing for a contiguous alphabet will not make IBM (and >its non-contiguous-alphabet character set, EBCDIC) go away. That alone >kills the idea. No, it won't. But it is easy to avoid: when you specify a computer, simply specify that a certain character set (i.e. standard ASCII characters from 32 to 127) be used for all external and internal purposes. This will automatically exclude EBCDIC and other perversions like CDC's "display code" or radix-50 filenames (certain PDP-11 OS's). IBM mainframes are a world apart - and destined for the graveyard of history, albeit very slowly. Incidentally, I still have not forgiven IBM for the evil thing they did when going from Model 26 keypunches (BCD) to Model 29 ones (EBCDIC). Doug McDonald
cowan@marob.masa.com (John Cowan) (09/01/89)
In article <10870@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <10759@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes: >>Nope; they're still ascending. > >Oops, I stand corrected. The magnitudes are descending but the values >(being negative) are ascending. Oh well, it was just an example. Does the pANS still guarantee that the chars used in C programming (letters, numbers, !@#$%^&*()_+ etc.) are non-negative? K&R-1 made such a guarantee, and it seems to be true on all "real" machines. Only signed-byte machines using EBCDIC and machines that use neither ASCII nor EBCDIC would violate this rule. -- Internet/Smail: cowan@marob.masa.com Dumb: uunet!hombre!marob!cowan Fidonet: JOHN COWAN of 1:107/711 Magpie: JOHN COWAN, (212) 420-0527 Charles li reis, nostre emperesdre magnes Set anz toz pleins at estet in Espagne.
evans@ditsyda.oz (Bruce Evans) (09/01/89)
In article <10870@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >Oops, I stand corrected. The magnitudes are descending but the values >(being negative) are ascending. Oh well, it was just an example. If chars are unsigned, they will not be negative (butterfly order :-). -- Bruce Evans evans@ditsyda.oz.au
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/01/89)
In article <24FD69D9.12F@marob.masa.com> cowan@marob.masa.com (John Cowan) writes: >Does the pANS still guarantee that the chars used in C programming (letters, >numbers, !@#$%^&*()_+ etc.) are non-negative? The actual source code characters don't have values. The execution- environment values for the corresponding run-time characters (thus, character constants) are indeed required to be positive. Other characters (not corresponding to those in the official C source character set) can have any values representable in a byte. (A byte is whatever a C char is, not necessarily 8 bits.)