[net.lang.c] Significant deficiency in C

00R0DHESI%bsu.csnet@CSNET-RELAY.ARPA (Rahul Dhesi) (09/27/86)

That C promotes all values of type char to int before using them in an
expression may on the face of it seem to simplify expression evaluation and the
semantics of the language, but it also makes the implementation grossly
inefficient in some cases.  Experimenting with Lempel-Ziv compression that also
involved calculating a CRC, I found that code compiled by Microsoft C was as
much as 10 times slower than hand-crafted assembly code (this on an 8088 CPU)
even when macros were liberally used instead of functions. 

Inspecting critical loops in the generated code, I was shocked to find that the
compiler was doing what it could to make things fast, but it was hindered by
the requirement that char types be promoted to int before being used in an
expression.  The 8088 CPU has 16-bit registers that are made up of pairs of
8-bit registers and each 8-bit register can be separately used for 8-bit
operations, making character-oriented code extremely efficient.  But the
compiler was forced to treat such code as int-oriented code (ints being 16 bits
in this implementation).  Hand-crafted assembly code, where the programmer
knows whether or not 16 bits will be ultimately needed, isn't similarly
restricted. 

I think this is a significant deficiency in C.  Is ANSI doing anything to
eliminate this or decrease its effect?  Considering that the vast majority of
UNIX tools do text processing in 7-bit units, one would think that somebody
would have done something by now.  Is a potential doubling or tripling of 
text processing speed not significant enough to be worth redefining a 
programming language for?

Perhaps the problem is that UNIX and C have traditionally been implemented only
on machines (e.g. PDP-11, VAX-11/7xx) in which using an 8-bit register always
automatically tied up a 16-bit or larger register, and nobody ever imagined
that we would be running C programs on cheap microprocessors with 8-bit
registers. 

Rahul Dhesi
dhesi%bsu@csnet-relay.ARPA
...!seismo!csnet-relay.ARPA!bsu!dhesi

ark@alice.UucP (Andrew Koenig) (09/30/86)

> Inspecting critical loops in the generated code, I was shocked to find that the
> compiler was doing what it could to make things fast, but it was hindered by
> the requirement that char types be promoted to int before being used in an
> expression.  The 8088 CPU has 16-bit registers that are made up of pairs of
> 8-bit registers and each 8-bit register can be separately used for 8-bit
> operations, making character-oriented code extremely efficient.  But the
> compiler was forced to treat such code as int-oriented code (ints being 16 bits
> in this implementation).  Hand-crafted assembly code, where the programmer
> knows whether or not 16 bits will be ultimately needed, isn't similarly
> restricted. 

One would think that for many char operations it is possible for
the compiler to figure out that doing it in 8 bits and expanding to
16 at the end only if necessary will give the same result as expanding
to 16 bits immediately.

Can you give us some examples of code that you think would be
impossible to get right without changing the rules?

mark@dmcnh.UUCP (Mark Roddy) (09/30/86)

[ re char promotion very inefficient for target CPUs with 8 bit registers ]

Very true. Good C compilers for 8 bit processors come with non standard
options that allow you to turn off promotion. The increase in speed and decrease
in code size is dramatic.

mwm@eris.berkeley.edu (Mike Meyer) (10/02/86)

In article <6129@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes:
>One would think that for many char operations it is possible for
>the compiler to figure out that doing it in 8 bits and expanding to
>16 at the end only if necessary will give the same result as expanding
>to 16 bits immediately.
>
>Can you give us some examples of code that you think would be
>impossible to get right without changing the rules?

Yes. How about (from a bug report on Lattice C in net.micro.amiga):

	char	x = 0XFF, y = 2, z ;

	z = (x + y) / 2 ;

If you do this in character arithmetic, you get 0 (1/2). If you do it
as something longer, you get 128.

Of course, whether or not is is "right" is open to question.

	<mike

karl@haddock (10/02/86)

00R0DHESI%bsu.cs@CSNET-RELAY.ARPA (Rahul Dhesi) writes:
>I think [char to int promotion] is a significant deficiency in C.  Is ANSI
>doing anything to eliminate this or decrease its effect?

I doubt it.  Too many programs depend on it.

>Considering that [most] UNIX tools do text processing in 7-bit units, ...

But if they use getchar(), they have to declare the variable "int" anyway.

However, a compiler is free to use 8-bit arithmetic if it can determine that
the result will be the same (e.g. char2 = char0 + char1); I think more work
needs to be done in this area.  (Of course, there will always be cases where
it lacks the information to perform the optimization.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
P.S. I just tried c2=c0+c1 on the vax; it did cvtbl, cvtbl, addl2, cvtbl, and
cvtlb.  This sort of slop encourages programmers to declare everything "int".

henry@utzoo.UUCP (Henry Spencer) (10/04/86)

> That C promotes all values of type char to int before using them in an
> expression ... [is a performance disaster on an 8-bit machine] ...
> I think this is a significant deficiency in C.  Is ANSI doing anything to
> eliminate this or decrease its effect? ...  Is a potential doubling or
> tripling of text processing speed not significant enough to be worth
> redefining a  programming language for?

Only when it's really necessary to redefine the language to solve the
problem.  The X3J11 drafts state clearly that the semantics as given in
the draft define the required effects, and the compiler is free to get
those effects by whatever means it chooses.  In particular, if it can
determine that the result will only be 8 bits and that no overflow hassles
intervene, it is entirely free to use 8 bits, essentially optimizing out
the promotion.  This is in fact an example the drafts use.

> Perhaps the problem is that UNIX and C have traditionally been implemented
> only on machines ...in which using an 8-bit register always automatically
> tied up a 16-bit or larger register...

This is quite true, but as it happens it's not that hard for a compiler for
a small machine to optimize the code to use 8 bits most of the time.  The
only unoptimizable case is in function calls, and that is less of a
hassle.  It is, of course, quite possible to write a stupid compiler that
doesn't try to do this optimization.  In fact, there are quite a few of
them.  Particularly in the 8088 world, which is such a huge and uncritical
market that any get-rich-quick artist who puts an attractive package around
shoddy software can count on making a profit.  (There *is* some pretty good
8088 software, but Sturgeon's Law -- "95% of everything is crap" -- very
definitely applies.)

In short, you are being victimized by a stupid compiler, not a stupid
language.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

garys@bunker.UUCP (Gary M. Samuelson) (10/06/86)

In article <1372@jade.BERKELEY.EDU> mwm@eris.UUCP (Mike Meyer) writes:
>In article <6129@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes:
>>One would think that for many char operations it is possible for
>>the compiler to figure out that doing it in 8 bits and expanding to
>>16 at the end only if necessary will give the same result as expanding
>>to 16 bits immediately.
>>
>>Can you give us some examples of code that you think would be
>>impossible to get right without changing the rules?
>
>Yes. How about (from a bug report on Lattice C in net.micro.amiga):
>
>	char	x = 0XFF, y = 2, z ;
>
>	z = (x + y) / 2 ;
>
>If you do this in character arithmetic, you get 0 (1/2). If you do it
>as something longer, you get 128.

Unless, of course, char is considered signed, in which case the result
will still be 0.

(In any event, I am not sure that a 'bug report' is a valid source for
the kind of example Mr. Koenig is looking for.  Fix the bug, and then
it may be possible to get it right without changing the rules.)

Gary Samuelson

guy@sun.uucp (Guy Harris) (10/06/86)

> (In any event, I am not sure that a 'bug report' is a valid source for
> the kind of example Mr. Koenig is looking for.  Fix the bug, and then
> it may be possible to get it right without changing the rules.)

The fact that the string "bug report" appeared there is irrelevant; it
connotes nothing.  This is not a problem with a particular C implementation.
The bug report merely indicates that the problem exists if a C
implementation doesn't follow the rules in question.  Presumably, they were
trying to be "clever" and evaluate the expression in question in 8-bit
arithmetic.

Unfortunately, this *is* one of the cases where you can't do it using 8-bit
arithmetic without changing the rules; in this case, 8-bit arithmetic is
inequivalent to 16-bit arithmetic, and if "int"s are 16 bits (as they
presumably are in this case), the C language requires that the results be
those produced by 16-bit arithmetic.

You can fix the bug by changing the code to use 16-bit arithmetic, in which
case you "get it right" in the sense of giving the right answer but not in
the sense of using 8-bit arithmetic.  You can also "fix" the bug by changing
the rules.  You can't get it right without changing the rules just fixing the
bug; the problem with the rules is not caused by the presence of a bug!
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)