yaping@eleazar.dartmouth.edu (Yaping Xu) (10/21/88)
[This article was written by Scott Horne, not Yaping Xu. Direct all e-mail responses to me at jalphin@prism.clemson.edu.] Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0? They often skip every other keypress on me--and in one case, they skip two keypresses out of three! Maybe it's my code. This occurs mainly when I try c = toupper(getch()); Do those functions work in MSC 5.1? I'll dig out some examples. Thanks. --Scott
mru@unccvax.UUCP (Markus Ruppel) (10/21/88)
>Scott Horne: > > c = toupper(getch()); > Per default, toupper() is implemented as a macro which causes side effects. You have to '#undef toupper()' to force the compiler to use the function version. This also applies to 'tolower()'. Markus Ruppel Dept. of Chemistry UNCC USA UUCP: ...mcnc!mru ...mcnc!unccvax!mru BITNET: ACC00MR1@UNCCVM
scs@athena.mit.edu (Steve Summit) (10/21/88)
This is a snide, whiney "I told you so" to the efficiency addicts and macro panderers out there. In article <10508@dartvax.Dartmouth.EDU> Scott Horne writes: >Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0? >They often skip every other keypress on me--and in one case, they skip two >keypresses out of three! Maybe it's my code. This occurs mainly when I try > > c = toupper(getch()); (getch and getche are fairly pointless and superfluous low-level analogues to getchar, but this is irrelevant.) In the old days, the toupper macro worked correctly only on lowercase alphabetic characters, which meant that one often ended up writing if(islower(c)) c = toupper(c) The hackers at the Shady Hill home for arthritic-fingered programmers got tired of typing this, so a variant appeared: toupper could be made to work correctly (a laudable goal) with an implementation such as: #define _toupper(c) ((c) - ('a' - 'A')) #define toupper(c) (islower(c) ? _toupper(c) : (c)) Now, there are three conventions for writing macros: 1. Parenthesize fully, inside and out 2. Use capital letters in the name, to remind the reader it's a macro and may therefore act weird 3. Make every effort not to repeat "arguments," so that side effects aren't replicated A "side effect" is anything that an expression does other than "return" a value, and is therefore a problem if something like toupper(*p++) is (textually, before the code generator gets to it) expanded to islower(*p++) ? _toupper(*p++) : *p++ How many times is p incremented? Besides pre- and postincrenment and -decrement, the other classic example of a side effect is I/O. What a coincidence: look at what Scott Horne used as an argument to toupper, and note the curious concordance between the period of its failure mode (two out of three) and the number of times toupper's argument is repeated in its expansion. Rule 2 is occasionally broken by "standard library" facilities, but generally only when rule 3 is observed, so that the distinction between function and macro is transparent to the caller. The "improved" toupper macro, scrupulous as it is in its adherence to rule 1, violates both rules 2 and 3, and is therefore a perfect ticking time bomb long term booby trap of a recurring nightmare for unsuspecting programmers everywhere. If it is desirable for toupper to work correctly on characters that are nonalphabetic or already upper-case (I believe this property is called "idempotence," and as I said, it is a laudable goal), then the macro implementation has to be sacrificed, and toupper() made a proper function. By the way, the fancy toupper macro also violates a fourth rule, almost universally ignored today, which is that macros shouldn't expand to "too much" code, because in the old days we only had 64K or so to play with, and every byte counted. The most famous exception is the recent Berkeley line-buffered putc macro, which is something like seven backslash-continued lines long, although, believe it or not, it does manage to guarantee a single evaluation of its first argument, so putc(*p++, fd) will work, as indeed it must. One would try something ludicrous like FILE *fdarray[10]; ... putc(c, fd[i++]) at one's extreme peril, however. Now, with respect to Microsoft, their run-time library gets tugged in several directions as they try to maintain compatibility with existing code while migrating toward ANSI, and in version 4 I believe they had two separate versions of toupper, depending on which header file you #included. To make things even more confusing, I think one header file gave you the unsafe macro I'm disparaging, and the other got you a real function. (Of course, there was also a third implementation, called "_toupper", which is the non-checking version, safely implementable as a macro, such as appears in the example towards the beginning of this article.) (These difficulties may be resolved in Microsoft's Version 5. Although I happen to use Microsoft V5, I don't pay much attention to its or anyone's implementation of islower/toupper any more. Any code of mine that cares protects itself with #ifdef _toupper #undef toupper #define toupper _toupper #endif which recreates, with only the barest twinges of worry about undermining _reserved ANSI identifiers, a cozy V7 environment. I'll call islower() explicitly; thank you. Note that I do this not for efficiency's sake but for safety; an even more likely side-effect-containing argument for ctype macros than getch() is *p++.) The bottom line is, don't implement things with macros unless it's absolutely safe. The potential efficiency improvements simply aren't worth it when they lead to these "little surprises." In those rare cases where the efficiency gain is significant and important, capitalize the hell out of the macro name and plaster the code and documentation with big warnings, and budget some time for the confusion and stubborn bugs which will still inevitably arise. Speaking of documentation; some will haughtily tell the original complainant to RTFM; Microsoft's manual may well state that toupper is a macro and can't be used on arguments with side effects. That's unacceptable. Someone coined a nice phrase called the "principle of least surprise." Among other things, it holds that there is a class of mistakes which are so easy to make that no amount of documentation will rescue them; the only solution is to remove the problem, in this case the dangerous macro implementation. Let's not get started on tweaks to the preprocessor to make dangerous macros safer to write; we just spent a month or so exhaustively treating how not to square numbers. If you want to work on something, work on good inlining algorithms instead. And before you think that your proposed improvements to the preprocessor make whacko macros safe, or even that the three or four rules listed above are sufficient, consider putc(c, fd); which is what people like me write when we've indented ourselves into a brick wall at the right margin but are for some stupid reason reluctant to break out into another subroutine. Although ANSI says macro invocations are allowed to cross newline boundaries, there are a lot of existing preprocessors which can't handle them without explicit backslash continuations. (I can't say I blame them, macro invocations spanning newlines being rather extremely painful to implement correctly.) Steve Summit scs@adam.pika.mit.edu
cpp90221@dcscg1.UUCP (Duane L. Rezac) (10/21/88)
From article <10508@dartvax.Dartmouth.EDU>, by yaping@eleazar.dartmouth.edu (Yaping Xu): > Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0? > They often skip every other keypress on me--and in one case, they skip two > keypresses out of three! Maybe it's my code. This occurs mainly when I try > > c = toupper(getch()); > --Scott I'm not sure about Microsoft C, but with turbo C and C86 optimizing Compiler, the getch() and getche() read one character out of the buffer. I have run into the problem with these functions skipping inputs due to some information that was left in the buffer from a previous read. When the second calling of the function occurs, it reads the remaining data in the buffer, appearing to run right past the requested input. At times I have had to add an extra getche() or getch() in front of the one that is skipping the input in order to clear the buffer. (by the way, if anyone has a good method to insure that the keyboard buffer is empty, Please post it.) -- +-----------------------+---------------------------------------------------+ | Duane L. Rezac |These views are my own, and NOT representitive of | | dsacg1!dcscg1!cpp90221|my place of Employment. | +-----------------------+---------------------------------------------------+
yaping@eleazar.dartmouth.edu (Yaping Xu) (10/21/88)
[This article was written by Scott Horne, not Yaping Xu. Direct all e-mail responses to me at jalphin@prism.clemson.edu.] Several people have answered my question about getch() & getche() in MSC. I didn't know that `toupper' was a macro, which it is: # define toupper(c) ( (islower(c)) ? _toupper(c) : (c) ) which caused the problem: "toupper(getch())" would evaluate to ((islower(getch()) ? _toupper(getch()) : getch()) and islower() would be expanded, and so would _toupper() be. Thus getch() is called at least twice. Thanks for pointing out my stupid mistake--and please stop filling my mailbox with responses! :-) --Scott
chris@mimsy.UUCP (Chris Torek) (10/21/88)
In article <7594@bloom-beacon.MIT.EDU> scs@athena.mit.edu (Steve Summit) writes: > 1. Parenthesize fully, inside and out > 2. Use capital letters in the name, to remind the reader > it's a macro and may therefore act weird > 3. Make every effort not to repeat "arguments," so that > side effects aren't replicated Actually, these are all good arguments for an `inline' keyword, a la C++. It is worth noting that GCC has an inline keyword, and one can write, e.g., inline int toupper(int c) { return (islower(c) ? _toupper(c) : c); } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
burgett@galaxy.COM (Michael Burgett) (10/21/88)
In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes: > c = toupper(getch()); > --Scott I think that toupper() is a function and an macro, try adding a #undef toupper before you make the call and see if that helps.. Mike Burgett adobe!burgett@decwrl.dec.com
mustard@sdrc.UUCP (Sandy Mustard) (10/22/88)
toupper is implemented as a macro that evaluates its parm more than once.
ok@quintus.uucp (Richard A. O'Keefe) (10/22/88)
In article <7594@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes: >This is a snide, whiney "I told you so" to the efficiency addicts >and macro panderers out there. > > #define _toupper(c) ((c) - ('a' - 'A')) > #define toupper(c) (islower(c) ? _toupper(c) : (c)) > >If it is desirable for toupper to work correctly on characters >that are nonalphabetic or already upper-case (I believe this >property is called "idempotence," and as I said, it is a laudable >goal), then the macro implementation has to be sacrificed, and >toupper() made a proper function. This conclusion does not follow. *THAT* version of toupper() has to go, but you can still usefully use a macro. extern char _utab[]; #define toupper(c) _utab[(c) & 255] Merits: (1) single evaluation (2) usually faster than a function call (3) works nicely with EBCDIC or ISO 8859, not just ASCII This is a good way of turning any function-from-characters into a macro: compute all the function values when your program starts and store them in an array. (Look at the is<class>() macros in /usr/include/ctype.h .)
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)
In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes: > c = toupper(getch()); The problem is almost certainly due to toupper() being implemented as an "unsafe" macro, i.e. one that evaluates its argument more than once, so that if the argument has side-effects the result is different from what a function toupper() would have done. Obviously getch() has side-effects.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)
In article <7594@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes: >If it is desirable for toupper to work correctly on characters >that are nonalphabetic or already upper-case (I believe this >property is called "idempotence," and as I said, it is a laudable >goal), then the macro implementation has to be sacrificed, and >toupper() made a proper function. No, toupper() can be correctly implemented as a "safe" macro, at least in an environment where all locales use character sets that fit in 8-bit bytes. Think about how other <ctype.h> functions are typically implemented as safe macros and you should be able to see how toupper() could be so done.
gsmith@umd5.umd.edu (Gordon Smith) (10/24/88)
In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes: > > [This article was written by Scott Horne, not Yaping Xu. > >Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0? >They often skip every other keypress on me--and in one case, they skip two >keypresses out of three! Maybe it's my code. This occurs mainly when I try > > c = toupper(getch()); > The reason your program is not working correctly, is because of the toupper, not the getch. toupper() is implemented as a macro, not a function. Therefore the code MAY look similiar to this: c = isalpha(getch()) && isupper(getch()) ? getch() : getch()-'a'+'A'; Therefore executing getch() more than once. This may not be the exact macro description for toupper, but it does illustrate the point.
knudsen@ihlpl.ATT.COM (Knudsen) (10/25/88)
In article <10523@dartvax.Dartmouth.EDU>, yaping@eleazar.dartmouth.edu (Yaping Xu) writes: > I didn't know that `toupper' was a macro, which it is: > Thanks for pointing out my stupid mistake--and please stop filling my mailbox Stupid, hell. Don't you wish that every C system had some standard, easy way to check which "fcns" are macros? That lint warned you about such problems? Meanwhile, better print out your stdio.h and other header files and make a list.... -- Mike Knudsen Bell Labs(AT&T) att!ihlpl!knudsen "Lawyers are like handguns and nuclear bombs. Nobody likes them, but the other guy's got one, so I better get one too."
swarbric@tramp.Colorado.EDU (Frank Swarbrick) (10/25/88)
In article <397@dcscg1.UUCP> cpp90221@dcscg1.UUCP (Duane L. Rezac) writes: >I'm not sure about Microsoft C, but with turbo C and C86 optimizing Compiler, >the getch() and getche() read one character out of the buffer. I have run into >the problem with these functions skipping inputs due to some information that >was left in the buffer from a previous read. When the second calling of the >function occurs, it reads the remaining data in the buffer, appearing to run >right past the requested input. At times I have had to add an extra getche() or >getch() in front of the one that is skipping the input in order to clear the >buffer. > >(by the way, if anyone has a good method to insure that the keyboard buffer is > empty, Please post it.) To clear the keyboard buffer I just do #define clrkbdbuf() while (kbhit()) getch() (of what I mean is I write that define and then call it as clrkbdbuf();) There're probably better ways, but this works fine for me. Frank Swarbrick (and, yes, the net.cat) University Of Colorado, Boulder swarbric@tramp.Colorado.EDU ...!{ncar|nbires}!boulder!tramp!swarbric "...don't believe in Goldman, his type like a curse Instant Karma's gonna get him if I don't get him first" --U2
guy@auspex.UUCP (Guy Harris) (10/26/88)
>Stupid, hell. Don't you wish that every C system had some standard, >easy way to check which "fcns" are macros? Many of them do; it's called "the manual". For instance, from SunOS 4.0 (I suspect these items go back to the V7 documentation): DESCRIPTION getc() returns the next character (that is, byte) from the named input stream, as an integer. It also moves the file pointer, if defined, ahead one character in stream. getchar() is defined as getc(stdin). getc and getchar are macros. ... BUGS Because it is implemented as a macro, getc() treats a stream argument with side effects incorrectly. In particular, getc(*f++) does not work sensibly. fgetc() should be used instead. What do you mean by "check"? Do you mean #if "getchar is a macro" code that works if "getchar" is a macro... #else code that doesn't... #endif If so, try #ifdef getchar code that works if "getchar" is a macro... #else code that doesn't... #endif although it would be better to just write code that works if "getchar" is a macro... and be done with it. >That lint warned you about such problems? Well, yeah, it'd be nice if "lint" warned about attempts to dereference null pointers, too, and misspellings in character strings, and....
hermit@shockeye.UUCP (Mark Buda) (10/26/88)
In article <10523@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes: > > [This article was written by Scott Horne, not Yaping Xu. > Direct all e-mail responses to me at jalphin@prism.clemson.edu.] > >Several people have answered my question about getch() & getche() in MSC. >I didn't know that `toupper' was a macro, which it is: > ># define toupper(c) ( (islower(c)) ? _toupper(c) : (c) ) Okay, I'm confused. We've got a System V Release 1<n<2 system here, and conv(3c) says that toupper is a function and _toupper is the macro. We've got a Genix (4.1BSD) system that says toupper is a macro (and notes that it is the same as SysV _toupper.) Turbo C agrees with SysV. How many systems have it which way? (I know. It is unwise to depend on "toupper" in portable programs...) -- Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189 Dumb UUCP: ...rutgers!bpa!vu-vlsi!devon!shockeye!hermit Entropy will get you in the end. "A little suction does wonders." - Gary Collins
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/27/88)
In article <236@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes: >How many systems have it which way? (I know. It is unwise to depend on >"toupper" in portable programs...) All C implementations should provide toupper() via <ctype.h>. Whether it is implemented as a macro or a function is the only significant variable. Some macro implementations are "unsafe" (with respect to side-effects in their arguments), so for maximum portability you should not rely on toupper() being "safe" even though ANSI C will require that. Don't use _toupper(), which may not even exist in many implementations.
lvc@cbnews.ATT.COM (Lawrence V. Cipriani) (10/27/88)
In article <8764@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >All C implementations should provide toupper() via <ctype.h>. ... >Don't use _toupper(), which may not even exist in many implementations. A related suggestion... I have a program that makes '_' an alphabetic by changing _ctype[]. This was a bad idea since 1) the "array" has a different name on different systems, eg. BSD vs. AT&T, and 2) the #define symbols for character class definition, eg. _U vs. _UPPER (in uSoft) vary as well. -- Larry Cipriani, AT&T Network Systems, Columbus OH, cbnews!lvc lvc@cbnews.ATT.COM
guy@auspex.UUCP (Guy Harris) (10/28/88)
>Okay, I'm confused. We've got a System V Release 1<n<2 system here, and >conv(3c) says that toupper is a function and _toupper is the macro. >We've got a Genix (4.1BSD) system that says toupper is a macro (and notes >that it is the same as SysV _toupper.) Turbo C agrees with SysV. > >How many systems have it which way? (I know. It is unwise to depend on >"toupper" in portable programs...) UNIX V7 had "toupper" a macro, with no "_toupper". I think they renamed that macro "_toupper" and added the function "toupper" - which, unlike the macro, is supposed to leave characters that aren't lower-case letters alone, rather than performing unnatural acts on them - in S5 (it may have been S3). 4.xBSD didn't pick up the S5 stuff; it stuck with the V7 version. Systems based on V7 or 4.xBSD (and maybe S3) that haven't made themselves S5-compatible will probably have "toupper" as a macro; systems that have made themselves S5-compatible will have it as a function, at least in their S5-compatbile environment - if they also offer a BSD-compatible environment, it will probably be a macro in that environment. I suspect most of the microcomputer systems will work in S5 fashion. VAX C probably does it in BSD fashion. K&R Second Edition, based on some ANSI C draft, specifies that "toupper" must work in the S5 fashion (leaving characters that aren't lower-case letters alone), although (not having a draft handy) I don't know whether ANSI C allows this to be done with a macro or not. It doesn't say anything about "_toupper", so I assume it's not guaranteed to exist in an ANSI C implementation.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/28/88)
In article <1737@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence V. Cipriani) writes: >A related suggestion... I have a program that makes '_' an alphabetic >by changing _ctype[]. This was a bad idea ... Yeah, I've seen a couple of instances of that. Usually I fix it by simply finding where the macro is used and adding the additional test for '_' there. I've never seen a significant loss of speed thereby. The general principle is to avoid relying on any details of the specific implementation(s). If something is not guaranteed by the spec, it is subject to change even on the same system but certainly across systems.
karl@haddock.ima.isc.com (Karl Heuer) (10/28/88)
In article <1737@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence V. Cipriani) writes: >[Poking _ctype[] to make '_' appear alphabetic is] a bad idea since 1) the >"array" has a different name on different systems, eg. BSD vs. AT&T, and 2) >the #define symbols for character class definition, eg. _U vs. _UPPER (in >uSoft) vary as well. Also because (3) _ctype[] may not be writable, (4) other library routines may be depending on isalpha('_') being false, (5) Even if _ctype[] exists, isalpha() might not use it. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
guy@auspex.UUCP (Guy Harris) (10/28/88)
>A related suggestion... I have a program that makes '_' an alphabetic >by changing _ctype[]. This was a bad idea since 1) the "array" has a >different name on different systems, eg. BSD vs. AT&T, Try "AT&T vs. AT&T"; V7 used "_ctype_", and a different AT&T release (S3 or S5) removed the "_". Don't assume every place where BSD and S5 differ is the result of AT&T and Berkeley deciding to do things differently; sometimes it was just one or more parts of AT&T deciding to do things differently.... And yes, it was a bad idea (I think the S5 "m4" code does the same thing); don't assume you know the way some system-defined function works internally, because some day you may find a system on which it works differently.... (Furthermore, some library routine your program calls may have expected "isalpha('_')" to be false, in which case it was in for a rude surprise.)