[comp.lang.c] type of character constants

throopw@agarn.dg.com (Wayne A. Throop) (03/02/89)

> scm@datlog.co.uk ( Steve Mawer )
>> diamond@diamond. (Norman Diamond)
>> When you assign 'x' to a character, you are assigning an int to a
>> character.  The reader knows that the type mismatch was intentional.
> Not if he knows the C language.  A single character written within
> single quotes is a *character constant*.  This isn't an int.

Ha.  A lot Steve knows.  From K&R, 1st ed, pg 185

    Character constants have type int; floating constants are double.

From Harbison and Steele, page 19

    Character constants have type int.

From the latest dpANS C standard I have, 3.1.3.4, first sentence
under "semantics":

    An integer character constant has type int.

Strangely, I can't tell what type character constants have by reading
K&RII.  The references lead off to a note saying something like "...
and all constants have some type, see section mumble about types".
Section mumble talks about types alright, but nowhere does anything I
can find state just what constant forms have what types.  I wonder if
I'm missing something, or if this is a little oversight?

--
You will not see a monster {at Loch Ness},
just as millions before you have not.
                              --- Charles Kuralt
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

charlie@vicorp.UUCP (Charlie Goldensher) (03/11/89)

In article <3711@xyzzy.UUCP>, throopw@agarn.dg.com (Wayne A. Throop) writes:
> > scm@datlog.co.uk ( Steve Mawer )
> >> diamond@diamond. (Norman Diamond)
> >> When you assign 'x' to a character, you are assigning an int to a
> >> character.  The reader knows that the type mismatch was intentional.
> > Not if he knows the C language.  A single character written within
> > single quotes is a *character constant*.  This isn't an int.
> 
> Ha.  A lot Steve knows.  From K&R, 1st ed, pg 185
> 
>     Character constants have type int; floating constants are double.
> 

The paragraph from which this is excerpted is as follows:

	A constant is a primary expression.  Its type may be int,
	long, or double depending on its form.  Character contants
	have type int; floating constants are type double.

On the previous page (page 184, 6.6 Arithmetic conversions) K&R say:

	First, any operands of type char or short are converted to
	type int, and any of type float are converted to double.

So what does it matter if a character constant is of type char or of
type int?  If it is of type char, it will be *converted* to type int
in any expression in which it is used.  And if it is of type int it
will be implicitly cast to type char if it assigned to a variable of
type char.

I don't care how the compiler writer chooses to implement it internally
as long as they follow the appropriate conversion rules.

Is there some reason I *should* care?
-- 
charlie@vicorp.uu.NET	--	Charlie Goldensher

bill@twwells.uucp (T. William Wells) (03/11/89)

In article <1644@vicorp.UUCP> charlie@vicorp.UUCP (Charlie Goldensher) writes:
: So what does it matter if a character constant is of type char or of
: type int?  If it is of type char, it will be *converted* to type int
: in any expression in which it is used.  And if it is of type int it
: will be implicitly cast to type char if it assigned to a variable of
: type char.
:
: I don't care how the compiler writer chooses to implement it internally
: as long as they follow the appropriate conversion rules.
:
: Is there some reason I *should* care?

There is a subtle difference. If 'c' is an integer constant, '\377'
represents the value 255. If, on the other hand, it is a char
constant, and characters sign extend, it represents -1.

---
Bill
{ uunet | novavax } !twwells!bill
(BTW, I'm going to be looking for a new job sometime in the next
few months.  If you know of a good one, do send me e-mail.)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/12/89)

In article <1644@vicorp.UUCP> charlie@vicorp.UUCP (Charlie Goldensher) writes:
>Is there some reason I *should* care?

Not unless you want your code to work right.

flaps@dgp.toronto.edu (Alan J Rosenthal) (03/14/89)

charlie@vicorp.UUCP (Charlie Goldensher) writes:
>On the previous page (page 184, 6.6 Arithmetic conversions) K&R say:
>
>	First, any operands of type char or short are converted to
>	type int, and any of type float are converted to double.
>
>So what does it matter if a character constant is of type char or of
>type int?  If it is of type char, it will be *converted* to type int
>in any expression in which it is used...

It affects sizeof 'a', which is sizeof(int), not sizeof(char).


bill@twwells.uucp (T. William Wells) writes:
>There is a subtle difference. If 'c' is an integer constant, '\377'
>represents the value 255. If, on the other hand, it is a char
>constant, and characters sign extend, it represents -1.

Nope, it is an int constant, and if chars are 8 bits and the system is twos
complement and chars sign-extend, it is -1.
Reference: K&R II p193, second-to-last sentence of the last full paragraph.

ajr

--
"The goto statement has been the focus of much of this controversy."
	    -- Aho & Ullman, Principles of Compiler Design, A-W 1977, page 54.

bill@twwells.uucp (T. William Wells) (03/15/89)

In article <8903140309.AA02400@champlain.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes:
: bill@twwells.uucp (T. William Wells) writes:
: >There is a subtle difference. If 'c' is an integer constant, '\377'
: >represents the value 255. If, on the other hand, it is a char
: >constant, and characters sign extend, it represents -1.
:
: Nope, it is an int constant, and if chars are 8 bits and the system is twos
: complement and chars sign-extend, it is -1.
: Reference: K&R II p193, second-to-last sentence of the last full paragraph.

Oops! I don't believe I wrote that!

There is a distinction somewhere along those lines, but it's not
relevant here. (See H&S, 2.7.3, p.21).

Sorry.

---
Bill
{ uunet | novavax } !twwells!bill
(BTW, I'm going to be looking for a new job sometime in the next
few months.  If you know of a good one, do send me e-mail.)

scm@datlog.co.uk ( Steve Mawer ) (03/15/89)

First let me start by apologising for the content of my misguided
contribution to the character constant type/size debate.  I had always
interpreted the meaning of character constant as 'constant of type char(acter)'
and have used expressions of the form 'x' whenever I've need to define
characters.  In a lot of other code I've seen (not influenced by me!)
others appear to share my (incorrect) assumptions. 

Thanks to the responses I've received, I believe I now fully understand
the concept of character constants.  Where I've been bitten in the past
is when I've used them to define 8 bit characters (e.g. for PC code sets)
and have found that few compilers I've used believe e.g. (0200 == '\200').
I'd always thought this was due to the 8 bit value '\200' being widened
to an int for the comparison and having (optional) sign extension done.
However, many compilers didn't even believe (0200 == (unsigned)'\200').
I now realise this must be due to the \200 being the 8 bit value and the
inclusion of the surrounding quotes causing the widening, leaving the value
(optionally) sign extended at that time.

That's cleared that up (or so I thought).  But then ..
in article <766@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes:
>
>There is a subtle difference. If 'c' is an integer constant, '\377'
>represents the value 255. If, on the other hand, it is a char
>constant, and characters sign extend, it represents -1.
>

As noted above, most compilers I've used believe that '\377' *is* -1; in
fact the only one I can recall that makes it 255 is that supplied with AIX.

So what is the 'correct' value for '\377', 255 or -1?
-- 
Steve C. Mawer        <scm@datlog.co.uk> or < {backbone}!ukc!datlog!scm >
                       Voice:  +44 1 863 0383 (x2153)

henry@utzoo.uucp (Henry Spencer) (03/19/89)

In article <1813@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
>So what is the 'correct' value for '\377', 255 or -1?

Precisely! :-)  The correct value is "255 or -1".  If char is signed in
the particular implementation, the value is -1 (assuming 8-bit chars);
if char is unsigned, the value is 255.  The X3J11 rule is essentially
that 'x' == (int)(char)'x' for all x, so it depends on whether conversion
of char to int sign-extends.  Code that depends on one particular choice
is non-portable.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

guy@auspex.UUCP (Guy Harris) (03/21/89)

>>There is a subtle difference. If 'c' is an integer constant, '\377'
>>represents the value 255. If, on the other hand, it is a char
>>constant, and characters sign extend, it represents -1.

As pointed out in other articles, the statement you're quoting here from
the previous posting isn't quite correct....

>As noted above, most compilers I've used believe that '\377' *is* -1; in
>fact the only one I can recall that makes it 255 is that supplied with AIX.
>
>So what is the 'correct' value for '\377', 255 or -1?

It depends.  To quote from the December 7, 1988 dpANS:

	If an integer character constant contains a single character or
	escape sequence, its value is the one that results when an
	object of type "char" whose value is that of the single
	character or escape sequence is converted to type "int".

An example would be:

	char c = '\377';	/* "an object of type 'char' whose value
				   is that of the single character or
				   escape sequence" */

	printf("%d\n", c);	/* "is converted to type 'int'" in this
				   code, due to the usual argument
				   promotions */

If the implementation sign-extends "char"s, this should print -1 (we
assume two's complement arithmetic here; this may or may not make a
difference - I'm not about to dive that deeply into the dpANS right
now); if it does not sign-extend "char"s, this should print 255. 

The result it prints is the correct value for '\377' on that
implementation.  More than one sign-extending implementation exists;
more than one non-sign-extending exists.  UNIX implementations of both
flavors exist; many individual copies of both flavors exist.  (In other
words, AIX ain't an isolated instance, not even on e.g. the RT PC. 
Don't make your code depend on whether "char"s are signed or unsigned.)

usenet@xyzzy.UUCP (Usenet Administration) (03/27/89)

In article <1989Mar18.221754.27335@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
| In article <1813@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
| >So what is the 'correct' value for '\377', 255 or -1?
| 
| Precisely! :-)  The correct value is "255 or -1".  If char is signed in
| the particular implementation, the value is -1 (assuming 8-bit chars);
| if char is unsigned, the value is 255.  The X3J11 rule is essentially
| that 'x' == (int)(char)'x' for all x, so it depends on whether conversion
| of char to int sign-extends.  Code that depends on one particular choice
| is non-portable.
From: meissner@bert.dg.com (Michael Meissner)
Path: bert!meissner

Warning, nit picking ahead....

Henry is of course right for just about any machine that you are
likely to run into, but two additional values are also possible.  If
you have a machine that uses one's complement, signed 8-bit bytes, you
would get back -0 for '\377'.  On the other hand, if you have a
machine that uses signed magnitude, signed 8-bit bytes, you would get
back -127 for '\377'.

The one's complement machines that I'm aware of are the Univac
mainframes, and the old Control Data Cybers (6xxx and 170 series).
The Univacs used 6, 8, and 9 bit bytes, dependending on the compiler,
and user options.  In terms of C, I believe they use 9 bit bytes,
since 8 bit bytes don't fit evenly in a 36-bit word.  The CDC machines
used 6 bit bytes, with a 12 bit extended code.  I don't think CDC ever
produced a C compiler for these machines, and their new machines (the
180 series) uses two's complement and 8 bit bytes in native mode,
though they do support the 170 series in some sort of emulation mode.

The only signed magnitude machine that I'm aware of is the Burroughs
mainframe (the A series and it's predecessors).  Given that they use
EBCDIC, I would suspect that they don't sign extend characters.  Also,
I believe that they have not yet released a C compiler, though I could
be wrong......

Michael Meissner, Data General.
Uucp:		...!mcnc!rti!xyzzy!meissner		If compiles were much
Internet:	meissner@dg-rtp.DG.COM			faster, when would we
Old Internet:	meissner%dg-rtp.DG.COM@relay.cs.net	have time for netnews?