[comp.lang.c] characters

henry@zoo.toronto.edu (Henry Spencer) (09/18/90)

In article <2517@idunno.Princeton.EDU> pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:
>I, for one, loathe the concept of signed chars.  I've wasted countless
>hours of programming time searching for bugs caused because I forgot that
>chars are signed by default.  I think chars (in fact, all integer types)
>should be unsigned by default.  Comments?

I believe I've heard Dennis Ritchie say that he'd consider making char
unsigned if he had to do it all over again.

Extending this to the rest of the integer types is silly, unless you
stop calling them (e.g.) "int" and make it, say, "nat" instead.  People
have this strange impression that something called "int" should behave
somewhat like an integer, and integers are signed by definition.  (The
difference between natural numbers and integers is the inclusion of
negative numbers.)
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

pfalstad@phoenix.Princeton.EDU (Paul John Falstad) (09/20/90)

In article <1990Sep18.162407.15525@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <2517@idunno.Princeton.EDU> pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:
>>I, for one, loathe the concept of signed chars.  I've wasted countless
>>hours of programming time searching for bugs caused because I forgot that
>>chars are signed by default.  I think chars (in fact, all integer types)
>>should be unsigned by default.  Comments?
>Extending this to the rest of the integer types is silly, unless you
>stop calling them (e.g.) "int" and make it, say, "nat" instead.  People

Ok, THAT was silly.  I got a bit carried away.

In some programs, I just got so annoyed that I did a typedef unsigned
char uchar, and then used uchar instead.  The problem with that is all
the library functions use char *.  I think ANSI C is actually supposed
to give an ERROR -- not a warning -- if you mix the two.  This leaves me
with three options: (1) edit the include files (bad idea), or (2) do a cast
each time.  But having to do strlen((char *) str) each time is very
annoying.

The third option is to just remember that chars are signed and watch out
for problems caused by the sign extension.  Hmmm.  But complaining is so much
easier...

Incidentally, someone sent me mail saying that neither K&R nor ANSI
specify that chars must be signed.  True?  All the implementations I've come
across have chars signed, however, which is the only fact I'm interested
in.

Paul Falstad, pfalstad@phoenix.princeton.edu PLink:HYPNOS GEnie:P.FALSTAD
For viewers at home, the answer is coming up on your screen.  For those of
you who wish to play it the hard way, stand upside down with your head in a
bucket of piranha fish.

bengsig@oracle.nl (Bjorn Engsig) (09/20/90)

Article <2657@idunno.Princeton.EDU> by pfalstad@phoenix.Princeton.EDU (Paul John Falstad) says:
|
|Incidentally, someone sent me mail saying that neither K&R nor ANSI
|specify that chars must be signed.  True?
Yes.
|All the implementations I've come across have chars signed, however,
Several of my compilers have chars unsigned.  One even has a switch, so I can
decide.
-- 
Bjorn Engsig,	Domain:		bengsig@oracle.nl, bengsig@oracle.com
		Path:		uunet!mcsun!orcenl!bengsig
		From IBM:	auschs!ibmaus!cs.utexas.edu!uunet!oracle!bengsig

steve@taumet.com (Stephen Clamage) (09/20/90)

pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:

>The third option is to just remember that chars are signed and watch out
>for problems caused by the sign extension.  Hmmm.  But complaining is so much
>easier...

The best option is not to write code which depends on the signed-ness of
chars.  The primary error is writing code which uses chars as tiny
integers.  In most cases, if you examine the resulting code, you will
find that this makes for no savings in code space, and often the code
is bigger and slower than if you just used ints (or possibly shorts)
instead.  If you have a large array of these things, there will be a
data savings by using chars.  To be safe, never use values outside
the range 0-127 with type char.

>Incidentally, someone sent me mail saying that neither K&R nor ANSI
>specify that chars must be signed.  True?

True.
K&R 1, section 6.1 in the Reference Manual, says:
	"Whether or not sign-extension occurs for characters is
	machine dependent ... Of the machines treated by this
	manual, only the PDP-11 sign-extends."
ANSI Standard, section 3.1.2.5, says of objects of type char:
	"... the behavior is implementation-defined: the values
	are treated as either signed or nonnegative integers."
The reason for this is that hardware varies as to whether
sign-extension is more expensive than zero-padding.  The idea was to
allow the compiler to generate the most efficient code when "plain"
char was specified.

>All the implementations I've come across have chars signed,
>however, which is the only fact I'm interested in.

Until you use an implementation where chars are unsigned and your
code stops working.  Your interests will then broaden.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

henry@zoo.toronto.edu (Henry Spencer) (09/20/90)

In article <2657@idunno.Princeton.EDU> pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:
>Incidentally, someone sent me mail saying that neither K&R nor ANSI
>specify that chars must be signed.  True?

Correct.  On some machines, like the IBM 360/370 series -- *not* an uncommon
machine, and yes, there are C compilers (and even Unix systems) for it --
you take a major performance hit if you insist that char must be signed.
Char is signed or unsigned, whichever is faster.  Programs should avoid
depending on the specific choice.  (It's not hard when you try.)

>All the implementations I've come
>across have chars signed, however, which is the only fact I'm interested
>in.

I assume, then, that you are not interested in making your code portable.
Sooner or later you *will* encounter an unsigned-char machine; refusing
to prepare for this event guarantees that it will be traumatic.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

flaps@dgp.toronto.edu (Alan J Rosenthal) (09/23/90)

mikey@ontek.com (krill o mine) writes:
>Most high school algebra texts do not consider zero a natural number.
>I would venture that you meant whole numbers.

I have NEVER heard a real mathematician say "whole number".

It seems to me that nearly all mathematicians in mathematics departments start
the naturals with one, but many, possibly most, math-like folks in computer
science departments start them with zero.  It doesn't matter as much in
mathematics.

In any case, high school algebra textbooks are certainly not a definitive
reference for mathematics.

karl@haddock.ima.isc.com (Karl Heuer) (09/25/90)

In article <459@taumet.com> steve@taumet.com (Stephen Clamage) writes:
>The best option is not to write code which depends on the signed-ness of
>chars.  The primary error is writing code which uses chars as tiny
>integers [which can be fixed by using ints or something]

It turns out to be harder than one might suppose, even for programs that don't
have that problem.  Consider the case typified by `if (getchar() == *s++)':
a successful getchar() always returns a zero-extended value, but `*s++' may do
sign extension if `s' was declared `char *'.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint