mendozag@pur-ee.UUCP (Grado) (10/21/88)
A guy around here is trying to port to several machines a program he hacked away in a PC using Lattice C. For some obscure reason in his original program he decided to use only low-level I/O. That forced him to "split" integers and then save them as 2 bytes and then later when the file is read back the integers are put together(!). However, much to his dismay, other compilers (LSC, MSC, and Unix) require him to declare as unsigned char the I/O buffer (which he also uses for arithmetic operations) else the chars are negative numbers when the their contents represents value > 127. (He does a lot of arithmetic with characters representing integers). He claims the compilers are at fault and that all the compilers should have 'unsigned char' as default for characters so you can do all sorts of arithmetic with them. Any comments and/or suggestions I can pass along? [He basically learned C while developing this program and now has the chance of porting it to other machines, with copyright and all!]. mendozag@ecn.purdue.edu
friedl@vsi.COM (Stephen J. Friedl) (10/22/88)
In article <9563@pur-ee.UUCP>, mendozag@pur-ee.UUCP (Grado) writes: > > He claims the compilers are at fault and that all the compilers > should have 'unsigned char' as default for characters so you > can do all sorts of arithmetic with them. > Any comments and/or suggestions I can pass along? There are very good reasons for this. The large (overwhelming?) use of char variables is for characters, where sign is not an issue. While most modern architectures can handle all data types in both signed and unsigned manners, older machines had a "natural" method for byte handling with a substantial penalty for doing it "the other way". Apparently it was felt that this penalty was too high for what was seen as limited utility. If a machine supports signed and unsigned byte operations, it is up to the compiler writer to select which ever one she likes the most. The dpANS will allow the /signed/ keyword to do the obvious thing to chars, but it is unwise to rely on anything other than unsigned. -- Steve Friedl V-Systems, Inc. +1 714 545 6442 3B2-kind-of-guy friedl@vsi.com {backbones}!vsi.com!friedl attmail!vsi!friedl ---------Nancy Reagan on the Three Stooges: "Just say Moe"---------
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)
In article <9563@pur-ee.UUCP> mendozag@ee.ecn.purdue.edu (Victor M Grado) writes:
- He claims the compilers are at fault and that all the compilers
- should have 'unsigned char' as default for characters so you
- can do all sorts of arithmetic with them.
- Any comments and/or suggestions I can pass along?
Perhaps you could tell him to learn C instead of guessing about it.
henry@utzoo.uucp (Henry Spencer) (10/23/88)
In article <9563@pur-ee.UUCP> mendozag@ee.ecn.purdue.edu (Victor M Grado) writes: > He claims the compilers are at fault and that all the compilers > should have 'unsigned char' as default for characters... [Possibly we ought to have a "frequently-asked questions" posting in this group. Here, slightly modified, is something I posted two years ago, when a debate raged on this issue.] Would he still feel this way if all manipulations of unsigned char took three times as long as those of signed char? It can happen. All potential participants in this debate please attend to the following. - There exist machines (e.g. pdp11) on which unsigned chars are a lot less efficient than signed chars. - There exist machines (e.g. ibm370) on which signed chars are a lot less efficient than unsigned chars. - Many applications do not care whether the chars are signed or unsigned, so long as they can be twiddled efficiently. - For this reason, char is intended to be the more efficient of the two. - Many old programs assume that char is signed; this does not make it so. Those programs are wrong, and have been all along. Alas, this is not a comfort if you have to run them. - The Father, the Son, and the Holy Ghost (K&R1, H&S, and X3J11 resp.) all agree that characters in the "source character set" (roughly, those one uses to write C) must look positive. Actually, the Father and the Son gave considerably broader guarantees, but the Holy Ghost had to water them down a bit. - The "unsigned char" type exists (in most newer compilers) because there are a number of situations where sign extension is very awkward. For example, getchar() wants to do a non-sign-extended conversion from char to int. - X3J11, in its semi-infinite wisdom, has decided that it would be nice to have a signed counterpart to "unsigned char", to wit "signed char". Therefore it is reasonable to expect that most new compilers, and old ones brought into conformance with the yet-to-be-issued standard, will give you the full choice: signed char if you need signs, unsigned char if you need everything positive, and char if you don't care but want it to run fast. - Given that many compilers have not yet been upgraded to match even the current X3J11 drafts, much less the final endproduct (which doesn't exist yet), any application which cares about signedness should use typedefs or macros for its char types, so that the definitions can be revised later. - The only things you can safely put into a char variable, and depend on having them come out unchanged, are characters from the native character set and small *positive* integers. - Dennis Ritchie is on record, as I recall, as saying that if he had to do it all over again, he would consider changing his mind about making chars signed on the pdp11 (which is how this mess got started). The pdp11 hardware strongly encouraged this, but it *has* caused a lot of trouble since. It is, however, much too late to make such a change to C. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
carroll@s.cs.uiuc.edu (10/23/88)
In article <9563@pur-ee.UUCP> mendozag@ee.ecn.purdue.edu (Victor M Grado) writes:
- (...) He claims the compilers are at fault and that all the compilers
- should have 'unsigned char' as default for characters (...)
Absolutely not! The reason is that 'unsigned' is a keyword in C, and
'signed' is not. I got screwed by this porting stuff to the 3b systems,
where unsigned in the default, but the code thought signed was the
default. There is no way to fix that. Where as, if you assumed unsigned,
you merely have to put the 'unsigned' keyword in front of your chars.
FLAME ON -
This bug shows up in 'units', where the exponents are stored in chars,
*signed* chars. On a 3b, this means that units can't deal with negative
powers of dimensions, which is somewhat of a fatal flaw. Although there is
a simple fix (change 'char' to 'short int'), AT&T, through several releases,
*still* hasn't gotten it to work. Who knows what other bugs are floating around
because of something like this?
FLAME OFF
Alan M. Carroll "How many danger signs did you ignore?
carroll@s.cs.uiuc.edu How many times had you heard it all before?" - AP&EW
CS Grad / U of Ill @ Urbana ...{ucbvax,pur-ee,convex}!s.cs.uiuc.edu!carroll
knudsen@ihlpl.ATT.COM (Knudsen) (10/25/88)
In article <9563@pur-ee.UUCP>, mendozag@pur-ee.UUCP (Grado) writes: > A guy around here is trying to port to several machines a program he > hacked away in a PC using Lattice C. For some obscure reason in his > original program he decided to use only low-level I/O. That forced > him to "split" integers and then save them as 2 bytes and then later > when the file is read back the integers are put together(!). At least on a Motorola micro (6809 or 680x0) you can say write(chan, int, 2) and put out the whole integer at once. > require him to declare as unsigned char the I/O buffer (which he > also uses for arithmetic operations) else the chars are negative > numbers when the their contents represents value > 127. (He does > a lot of arithmetic with characters representing integers). This is often a problem. If he doesn't want to declare the buffer unsigned (or his compiler, like mine, doesn't support unsigned char), he can replace c with (c & 255) whenever c is used as an int. > He claims the compilers are at fault and that all the compilers > should have 'unsigned char' as default for characters so you > can do all sorts of arithmetic with them. All compilers should have unsigned char, but why as default? Half the time you want short-range *signed* variables -128 to +127. And if no unsigned char type is supported, the (c & 255) fixes it relatively cheap; the reverse fix (unsigned to signed) is harder. Also the (c & 255) fix protects you against unknown compilers, by guaranteeing unsigned no matter what the default is. I DO wish compilers would tell you somehow what the default is; the 3B2 compilers seem to default to unsigned char, which breaks a lot of old EOF loops. Finally, your friend should minimize char->int conversions as much as possible; read the stuff in, transfer to an int variable, and work on that exclusively. Since he learned C on it, he should now learn some more C by thoroughly re-working the code for style and efficiency anyway. I can't stomach some of the stuff I wrote a few years back. -- Mike Knudsen Bell Labs(AT&T) att!ihlpl!knudsen "Lawyers are like handguns and nuclear bombs. Nobody likes them, but the other guy's got one, so I better get one too."
crossgl@ingr.UUCP (Gordon Cross) (10/25/88)
In article <9563@pur-ee.UUCP>, mendozag@pur-ee.UUCP (Grado) writes: > > However, much to his dismay, other compilers (LSC, MSC, and Unix) > require him to declare as unsigned char the I/O buffer (which he > also uses for arithmetic operations) else the chars are negative > numbers when the their contents represents value > 127. (He does > a lot of arithmetic with characters representing integers). > The proposed ANSI C standard states (I am quoting directly from the document): " An object declared as a character (char) is large enough to store any member of the required source charcater set [ .. ]. If such a character is stored in a char object, its value is guaranteed to be non-negative. If other quantities are stored in a char object, the behavior is implementation defined: the values are treated as either signed or non-negative integers." Basically, this allows each complier writer to explore his whims. Hope it helps! Gordon Cross
dzoey@umd5.umd.edu (Joe Herman) (10/25/88)
From article <7354@ihlpl.ATT.COM>, by knudsen@ihlpl.ATT.COM (Knudsen): > In article <9563@pur-ee.UUCP>, mendozag@pur-ee.UUCP (Grado) writes: >> A guy around here is trying to port to several machines a program he >> hacked away in a PC using Lattice C. For some obscure reason in his >> original program he decided to use only low-level I/O. That forced >> him to "split" integers and then save them as 2 bytes and then later >> when the file is read back the integers are put together(!). Try this: union intaschar { char hilo[sizeof (int)]; int val; } foo; foo.val = somenumber write (fh, foo.hilo, sizeof (int)); if for some reason he can't just write the integer out like below. > > At least on a Motorola micro (6809 or 680x0) you can say > write(chan, int, 2) and put out the whole integer at once. Ick, I assume you mean: write (chan, &int, sizeof (int)); /* excuse the overloading of 'int' */ otherwise you're writing out two bytes of the address of 'int'. Also, for PC's (at least with microsoft) make sure you open the file in binary mode if you're going to do binary I/O. > I DO wish compilers would tell you somehow what the default is; > the 3B2 compilers seem to default to unsigned char, which breaks > a lot of old EOF loops. Remember, functions like getchar, getc &c, return an int, not a char which gets you around the problem of '\0377' being confused with EOF. > -- > Mike Knudsen Bell Labs(AT&T) att!ihlpl!knudsen > "Lawyers are like handguns and nuclear bombs. Nobody likes them, > but the other guy's got one, so I better get one too." Nice quote. Joe Herman The University of Maryland dzoey@terminus.umd.edu -- "Everything is wonderful until you know something about it."
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/26/88)
In article <2723@ingr.UUCP> crossgl@ingr.UUCP (Gordon Cross) writes: >The proposed ANSI C standard states (I am quoting directly from the document): The definition of "character" has been changed. However, whether a plain char acts like signed char or unsigned char is up to the implementation, as it has been since the early days of C.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/26/88)
In article <207600005@s.cs.uiuc.edu> carroll@s.cs.uiuc.edu writes: >Although there is a simple fix (change 'char' to 'short int'), AT&T, >through several releases, *still* hasn't gotten it to work. >Who knows what other bugs are floating around because of something >like this? Apparently nobody is paid to go around cleaning up old (yet still important) code. My favorite was the "#if u3b5|u3b2"s scattered around in the PWB/Graphics sources where fixing the original bug ("char c=getchar()" etc.) right would have been much simpler. One would hope that by now all the bugs that Guy Harris, I, and others tracked down have been fixed in the AT&T master sources, but somehow I doubt it.