00R0DHESI%bsu.csnet@CSNET-RELAY.ARPA (Rahul Dhesi) (09/27/86)
That C promotes all values of type char to int before using them in an expression may on the face of it seem to simplify expression evaluation and the semantics of the language, but it also makes the implementation grossly inefficient in some cases. Experimenting with Lempel-Ziv compression that also involved calculating a CRC, I found that code compiled by Microsoft C was as much as 10 times slower than hand-crafted assembly code (this on an 8088 CPU) even when macros were liberally used instead of functions. Inspecting critical loops in the generated code, I was shocked to find that the compiler was doing what it could to make things fast, but it was hindered by the requirement that char types be promoted to int before being used in an expression. The 8088 CPU has 16-bit registers that are made up of pairs of 8-bit registers and each 8-bit register can be separately used for 8-bit operations, making character-oriented code extremely efficient. But the compiler was forced to treat such code as int-oriented code (ints being 16 bits in this implementation). Hand-crafted assembly code, where the programmer knows whether or not 16 bits will be ultimately needed, isn't similarly restricted. I think this is a significant deficiency in C. Is ANSI doing anything to eliminate this or decrease its effect? Considering that the vast majority of UNIX tools do text processing in 7-bit units, one would think that somebody would have done something by now. Is a potential doubling or tripling of text processing speed not significant enough to be worth redefining a programming language for? Perhaps the problem is that UNIX and C have traditionally been implemented only on machines (e.g. PDP-11, VAX-11/7xx) in which using an 8-bit register always automatically tied up a 16-bit or larger register, and nobody ever imagined that we would be running C programs on cheap microprocessors with 8-bit registers. Rahul Dhesi dhesi%bsu@csnet-relay.ARPA ...!seismo!csnet-relay.ARPA!bsu!dhesi
ark@alice.UucP (Andrew Koenig) (09/30/86)
> Inspecting critical loops in the generated code, I was shocked to find that the > compiler was doing what it could to make things fast, but it was hindered by > the requirement that char types be promoted to int before being used in an > expression. The 8088 CPU has 16-bit registers that are made up of pairs of > 8-bit registers and each 8-bit register can be separately used for 8-bit > operations, making character-oriented code extremely efficient. But the > compiler was forced to treat such code as int-oriented code (ints being 16 bits > in this implementation). Hand-crafted assembly code, where the programmer > knows whether or not 16 bits will be ultimately needed, isn't similarly > restricted. One would think that for many char operations it is possible for the compiler to figure out that doing it in 8 bits and expanding to 16 at the end only if necessary will give the same result as expanding to 16 bits immediately. Can you give us some examples of code that you think would be impossible to get right without changing the rules?
mark@dmcnh.UUCP (Mark Roddy) (09/30/86)
[ re char promotion very inefficient for target CPUs with 8 bit registers ] Very true. Good C compilers for 8 bit processors come with non standard options that allow you to turn off promotion. The increase in speed and decrease in code size is dramatic.
mwm@eris.berkeley.edu (Mike Meyer) (10/02/86)
In article <6129@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes: >One would think that for many char operations it is possible for >the compiler to figure out that doing it in 8 bits and expanding to >16 at the end only if necessary will give the same result as expanding >to 16 bits immediately. > >Can you give us some examples of code that you think would be >impossible to get right without changing the rules? Yes. How about (from a bug report on Lattice C in net.micro.amiga): char x = 0XFF, y = 2, z ; z = (x + y) / 2 ; If you do this in character arithmetic, you get 0 (1/2). If you do it as something longer, you get 128. Of course, whether or not is is "right" is open to question. <mike
karl@haddock (10/02/86)
00R0DHESI%bsu.cs@CSNET-RELAY.ARPA (Rahul Dhesi) writes: >I think [char to int promotion] is a significant deficiency in C. Is ANSI >doing anything to eliminate this or decrease its effect? I doubt it. Too many programs depend on it. >Considering that [most] UNIX tools do text processing in 7-bit units, ... But if they use getchar(), they have to declare the variable "int" anyway. However, a compiler is free to use 8-bit arithmetic if it can determine that the result will be the same (e.g. char2 = char0 + char1); I think more work needs to be done in this area. (Of course, there will always be cases where it lacks the information to perform the optimization.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint P.S. I just tried c2=c0+c1 on the vax; it did cvtbl, cvtbl, addl2, cvtbl, and cvtlb. This sort of slop encourages programmers to declare everything "int".
henry@utzoo.UUCP (Henry Spencer) (10/04/86)
> That C promotes all values of type char to int before using them in an > expression ... [is a performance disaster on an 8-bit machine] ... > I think this is a significant deficiency in C. Is ANSI doing anything to > eliminate this or decrease its effect? ... Is a potential doubling or > tripling of text processing speed not significant enough to be worth > redefining a programming language for? Only when it's really necessary to redefine the language to solve the problem. The X3J11 drafts state clearly that the semantics as given in the draft define the required effects, and the compiler is free to get those effects by whatever means it chooses. In particular, if it can determine that the result will only be 8 bits and that no overflow hassles intervene, it is entirely free to use 8 bits, essentially optimizing out the promotion. This is in fact an example the drafts use. > Perhaps the problem is that UNIX and C have traditionally been implemented > only on machines ...in which using an 8-bit register always automatically > tied up a 16-bit or larger register... This is quite true, but as it happens it's not that hard for a compiler for a small machine to optimize the code to use 8 bits most of the time. The only unoptimizable case is in function calls, and that is less of a hassle. It is, of course, quite possible to write a stupid compiler that doesn't try to do this optimization. In fact, there are quite a few of them. Particularly in the 8088 world, which is such a huge and uncritical market that any get-rich-quick artist who puts an attractive package around shoddy software can count on making a profit. (There *is* some pretty good 8088 software, but Sturgeon's Law -- "95% of everything is crap" -- very definitely applies.) In short, you are being victimized by a stupid compiler, not a stupid language. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry
garys@bunker.UUCP (Gary M. Samuelson) (10/06/86)
In article <1372@jade.BERKELEY.EDU> mwm@eris.UUCP (Mike Meyer) writes: >In article <6129@alice.uUCp> ark@alice.UucP (Andrew Koenig) writes: >>One would think that for many char operations it is possible for >>the compiler to figure out that doing it in 8 bits and expanding to >>16 at the end only if necessary will give the same result as expanding >>to 16 bits immediately. >> >>Can you give us some examples of code that you think would be >>impossible to get right without changing the rules? > >Yes. How about (from a bug report on Lattice C in net.micro.amiga): > > char x = 0XFF, y = 2, z ; > > z = (x + y) / 2 ; > >If you do this in character arithmetic, you get 0 (1/2). If you do it >as something longer, you get 128. Unless, of course, char is considered signed, in which case the result will still be 0. (In any event, I am not sure that a 'bug report' is a valid source for the kind of example Mr. Koenig is looking for. Fix the bug, and then it may be possible to get it right without changing the rules.) Gary Samuelson
guy@sun.uucp (Guy Harris) (10/06/86)
> (In any event, I am not sure that a 'bug report' is a valid source for > the kind of example Mr. Koenig is looking for. Fix the bug, and then > it may be possible to get it right without changing the rules.) The fact that the string "bug report" appeared there is irrelevant; it connotes nothing. This is not a problem with a particular C implementation. The bug report merely indicates that the problem exists if a C implementation doesn't follow the rules in question. Presumably, they were trying to be "clever" and evaluate the expression in question in 8-bit arithmetic. Unfortunately, this *is* one of the cases where you can't do it using 8-bit arithmetic without changing the rules; in this case, 8-bit arithmetic is inequivalent to 16-bit arithmetic, and if "int"s are 16 bits (as they presumably are in this case), the C language requires that the results be those produced by 16-bit arithmetic. You can fix the bug by changing the code to use 16-bit arithmetic, in which case you "get it right" in the sense of giving the right answer but not in the sense of using 8-bit arithmetic. You can also "fix" the bug by changing the rules. You can't get it right without changing the rules just fixing the bug; the problem with the rules is not caused by the presence of a bug! -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)