[net.unix-wizards] Why chars unsigned on 3B?

kendall@wjh12.UUCP (Sam Kendall) (06/04/84)

Is it true what I have heard, there are instructions on the 3B to
implement signed chars easily?  If so, why are chars unsigned in C?

	Sam Kendall	{allegra,ihnp4,ima,amd70}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall

rcd@opus.UUCP (Dick Dunn) (06/06/84)

>Is it true what I have heard, there are instructions on the 3B to
>implement signed chars easily?  If so, why are chars unsigned in C?

Because someone is trying to drive home the fact that you shouldn't program
in a way that depends on whether chars are signed! :-)

Seriously, why not have chars unsigned by definition on this machine?
There's one particular problem - we've gotten a mess somewhere in the long
path from concept to implementation:  Char not only means a character; it
means an 8-bit quantity.  (I know - where does it say that?  I'm talking
practice here.)  So in a sense, when you mean "signed 8-bit integer" you
say "char" on most machines.  If you mean "unsigned 8-bit integer" you say
"unsigned char".  But how do you say "signed 8-bit integer" on the 3B?  You
get frustrated, kick the terminal, and end up in a cast.  I wish that
tradition hadn't gotten us into this corner...
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
	...Never offend with style when you can offend with substance.

henry@utzoo.UUCP (Henry Spencer) (06/09/84)

Sam Kendall asks:

   Is it true what I have heard, there are instructions on the 3B to
   implement signed chars easily?  If so, why are chars unsigned in C?

It also, almost certainly, has instructions to implement unsigned
chars easily.  It is fashionable nowadays to provide both.  Thus the
C implementor has to choose:  does he do the clean thing and make the
chars unsigned, or does he opt for maximum compatibility at the cost
of perpetuating a dreadful botch that was an accidental side effect
of the PDP11 design?  Clearly the compiler implementor for the 3B did
the right thing and made chars unsigned.  Please note that sections 4
and 6.1 of the C Reference Manual say, in so many words:

	Objects declared as characters (char) are large enough to
	store any member of the implementation's character set, and
	if a genuine character from that character set is stored in
	a character variable, its value is equivalent to the integer
	code for that character.  Other quantities may be stored into
	character variables, but the implementation is machine-dependent.

	...

	Whether or not sign-extension occurs for characters is machine-
	dependent, [although] it is guaranteed that a member of the
	standard character set is non-negative.

Dennis Ritchie has been heard to say [forgive me, Dennis, if I'm
remembering this wrong] that the signedness of char on the 11 was
definitely a mistake, although hard to avoid given the way the 11
does byte moves.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

kendall@wjh12.UUCP (Sam Kendall) (06/10/84)

> Why do you care whether chars default to unsigned or signed?

I have gotten this question from several intelligent people; I guess the
answer isn't obvious as I thought it was.  There are two main reasons.

(1) I want chars to be signed because making them so provides 8-bit
    signed integers, not otherwise available through C on the 3B.  Some
    object that this type should not exist because it cannot be used
    portably.  But this type can be used portably through parameterization
    of code: 

	#if CHAR_SIGNED
	typedef char SMALLINT;
	#else
	typedef short SMALLINT;
	#endif

    or the like.

(2) Most UNIX machines have signed characters.  Porting is easier to a
    machine with signed characters also.  Sure, I am careful to avoid
    writing programs which depend on signed characters, other than as
    illustrated above; but I have to port other people's programs
    sometimes.  Idealism cannot escape that fact.

	Sam Kendall	{allegra,ihnp4,ima,amd70}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall

trt@rti-sel.UUCP (06/11/84)

Some compilers make chars signed, others make them unsigned.
I wonder if any choose randomly (:-)).
I suppose there are some interesting language questions involved here
but the major effect is to slow down program porting.

A "solution" to the signed/unsigned char problem:
One of the niftier features of the Gould UTX port
is that the compiler can generate code to emulate either signed
or unsigned chars.  That is, "cc foo.c" generates
code for signed chars (VAX-like) whereas "cc -XU foo.c"
generates unsigned code (Amdahl-like).

The default code is VAX-like for the obvious reason.
Programs that were written on, say, a 68K should probably be compiled '-XU'.
Hack though it be, it sure does save time.

By the way, our Gould Concept needs two instructions to sign-extend a byte.
However, in usual cases (e.g. while(*p++ = *q++);) the sign-extension
is not required and is not generated.
	Tom Truscott