[net.unix-wizards] unsigned char -> unsigned int conversion

keesan@bbncca.ARPA (Morris Keesan) (06/12/84)

------------------------------

    haddock!lee says,

> Note that "char" is part of the sequence char-short-int-long, whereas
> "unsigned char" is part of the sequence uchar-ushort-uint-ulong, where the
> second sequence has the property "unsigned".  During coercions, unsigned
> chars are extended to unsigned whereas chars are extended to int (by zero
> extension or sign extension, whichever is appropriate).

    What are you using as a reference?  Clearly not Kernighan & Ritchie,
since unsigned chars weren't invented as of publication.  The only reference I
have to unsigned chars is "The C Programming Language" in the System V
"Programming Guide", and it doesn't allow unsigned short or unsigned long.
Is this a Berkeleyism?  What does the current draft ANSI standard say
about it?
    In upgrading our local C compiler to be System V compatible, I blithely
assumed that since our chars are unsigned, all I had to do was the syntactic
magic to make "unsigned char" equivalent to "char".  A careful reading of the
System V C manual leaves the question up in the air.  Their section on the
usual arithmetic conversions omits any mention of unsigned char (an obvious
oversight).  Would somebody with a Bell Labs System V C compiler try Lee's

    unsigned char foo=1; if( (-1|foo) > 0 ) printf("-1 is negative\n");

and report what it does?  Thanks.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12,ima}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

guy@rlgvax.UUCP (06/16/84)

The C language does permit "unsigned short int" and "unsigned long int".
The fact that one can say "unsigned short x;" or "unsigned long x;" on many
compilers can be considered either a side-effect of the compilers being used,
or a consequence of the rules permitting "register x;" or even "x[30];" (give
a look at the "cb" source sometime).  If either K or R are reading this,
could they give an authoritative answer to the question of whether
"unsigned short x;" should be considered legal or not?

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

gwyn@BRL-VLD.ARPA (06/17/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA>

Unsigned integer types are widened by adding zeros for the most significant
bits; i.e. they do not sign-extend.  So coercing an unsigned char to an int
should not make it negative.

I have no idea if any of this is in the manual.  Presumably the issue has
been dealt with in the ANSI committee.  They are making noises about an
actual standard as soon as the end of 1985.  (Hope, hope.)

chris@umcp-cs.UUCP (06/18/84)

While testing the assertion that ``unsigned char'' types propagate
the ``unsigned''ness into expressions, I discovered something very
interesting in the 4.1BSD (PCC) C compiler.  The compiler emits
``.long''s in the assembly output!  Here's a sample:

	unsigned char foo = 1;

	main() {
		if ((-1 | foo) > 0)
			printf("hi there\n");
	}

PCC output (slightly edited):

		.data
		.globl	_foo
	_foo:
		.long	0x1
		.data	1
	L19:
		.ascii	"hi there\12\0"
		.text

		.align	1
		.globl	_main
	_main:
		.word	L13
		movzbl	_foo,r0
		bisl2	$-1,r0
		jeql	L17
		pushl	$L19
		calls	$1,_printf
	L17:
		ret
		.set	L13,0x0
		.data

Further experimentation shows that all ``static'' variables, and all
initialized variables, are generated with a minimum size of 4 bytes.
This apparently includes arrays as well (but ``char c[8]'' doesn't
generate 32 bytes).

I guess PCC is trying to keep things longword-aligned.

As to the original point, we can see from the ``jeql'' (as opposed to
``jgtr'') that PCC *does* propagate the unsigned attribute.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

henry@utzoo.UUCP (Henry Spencer) (06/20/84)

The C Reference Manual that's part of the K&R book is a little vague
on the subject of what types "unsigned" can be applied to.  In the
detailed discussion it says that "unsigned int" is the only legitimate
form, but elsewhere it implies the existence of "unsigned long" in
some implementations.

Note that even the V7 compiler accepts "unsigned short".

I believe the draft ANSI C standard says that "unsigned" can be applied
to "char", "short", "int", and "long".
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

keesan@bbncca.ARPA (Morris Keesan) (06/20/84)

------------------------------
> The C language does permit "unsigned short int" and "unsigned long int".
> The fact that one can say "unsigned short x;" or "unsigned long x;" on many
> compilers can be considered either a side-effect of the compilers being used,
>   . . .
> 	  Guy Harris

    WHICH C language?  The one defined in Kernighan and Ritchie doesn't; the
one defined in the System V Release 1 "Programming Guide" doesn't.  The only
C language DEFINITION I know of which allows unsigned shorts and longs is the
System V Release 2 C Language document.  The following excerpt posted recently
by Jim Balter of INTERACTIVE Systems (Thank you, Jim) clearly allows
"unsigned short int x", "unsigned long int x", "unsigned short x",
and "unsigned long x".

    At most one of the words long or short may be specified in conjunction
    with int; the meaning is the same as if int were not mentioned.  The
    word long may be specified in conjunction with float; the meaning is
    the same as double.  The word unsigned may be specified alone, or in
    conjunction with int or any of its short or long varieties, or with
    char. 

I have no idea what the Berkeley compilers allow.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12,ima}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

keesan@bbncca.ARPA (Morris Keesan) (06/20/84)

-----------------------------
> The C Reference Manual that's part of the K&R book is a little vague
> on the subject of what types "unsigned" can be applied to.  In the
> detailed discussion it says that "unsigned int" is the only legitimate
> form, but elsewhere it implies the existence of "unsigned long" in
> some implementations.
> 
> Note that even the V7 compiler accepts "unsigned short".
>     . . .
> 				  Henry Spencer @ U of Toronto Zoology

    I'm well familiar with the "detailed discussion" (section 8.2), since I've
been citing it a lot recently, but where "elsewhere" does it "imply" unsigned
long?

    I checked BBN's C compiler for our C/70 machine, which is based on the
V7 PDP-11 C compiler, and found that it does indeed accept "unsigned short".
Curious, I checked to see why and discovered that it's because "short" is
defined as lexically equivalent to "int".  This results in the amusing
behaviour of the compiler accepting "unsigned short" (equivalent to "unsigned"
or "unsigned int"), but rejecting "short int" (because it looks like "int int").
I'd be interested in knowing whether this happens in real V7 compilers, or only
in BBN's.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12,ima}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

henry@utzoo.UUCP (Henry Spencer) (06/21/84)

Morris Keesan asks, in discussion of the C Reference Manual:

   > ..........elsewhere it implies the existence of "unsigned long" in
   > some implementations.
   
   ............. but where "elsewhere" does it "imply" unsigned
   long?

Section 4 of the K&R CRM notes:

	...
	(On the PDP-11, unsigned long quantities are not supported.)
	...

Also, the problem he reports (V7 cc bounces "short int" because it's just
translating "short" to "int" everywhere) is present in the vanilla V7 cc
as well as the BBN one.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

gwyn@brl-vgr.ARPA (Doug Gwyn ) (07/02/84)

Like most such bugs, the "short int" problem was fixed long ago.
I just tried it on our (UNIX System V) Ritchie PDP-11 C compiler.