[comp.lang.c] getch

yaping@eleazar.dartmouth.edu (Yaping Xu) (10/21/88)

	[This article was written by Scott Horne, not Yaping Xu.
	Direct all e-mail responses to me at jalphin@prism.clemson.edu.]

Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0?
They often skip every other keypress on me--and in one case, they skip two
keypresses out of three!  Maybe it's my code.  This occurs mainly when I try

	c = toupper(getch());

Do those functions work in MSC 5.1?

I'll dig out some examples.  Thanks.

				--Scott

mru@unccvax.UUCP (Markus Ruppel) (10/21/88)

>Scott Horne:
> 
> 	c = toupper(getch());
>
Per default, toupper() is implemented as a macro which causes side effects.
You have to '#undef toupper()' to force the compiler to use the function 
version. This also applies to 'tolower()'.

Markus Ruppel
Dept. of Chemistry
UNCC
USA

UUCP: ...mcnc!mru
      ...mcnc!unccvax!mru
BITNET: ACC00MR1@UNCCVM

scs@athena.mit.edu (Steve Summit) (10/21/88)

This is a snide, whiney "I told you so" to the efficiency addicts
and macro panderers out there.

In article <10508@dartvax.Dartmouth.EDU> Scott Horne writes:
>Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0?
>They often skip every other keypress on me--and in one case, they skip two
>keypresses out of three!  Maybe it's my code.  This occurs mainly when I try
>
>	c = toupper(getch());

(getch and getche are fairly pointless and superfluous low-level
analogues to getchar, but this is irrelevant.)

In the old days, the toupper macro worked correctly only on
lowercase alphabetic characters, which meant that one often
ended up writing

	if(islower(c))
		c = toupper(c)

The hackers at the Shady Hill home for arthritic-fingered
programmers got tired of typing this, so a variant appeared:
toupper could be made to work correctly (a laudable goal) with
an implementation such as:

	#define _toupper(c)  ((c) - ('a' - 'A'))
	#define toupper(c) (islower(c) ? _toupper(c) : (c))

Now, there are three conventions for writing macros:

     1.	Parenthesize fully, inside and out

     2. Use capital letters in the name, to remind the reader
	it's a macro and may therefore act weird

     3.	Make every effort not to repeat "arguments," so that
	side effects aren't replicated

A "side effect" is anything that an expression does other than
"return" a value, and is therefore a problem if something like

	toupper(*p++)

is (textually, before the code generator gets to it) expanded to

	islower(*p++) ? _toupper(*p++) : *p++

How many times is p incremented?

Besides pre- and postincrenment and -decrement, the other classic
example of a side effect is I/O.  What a coincidence: look at
what Scott Horne used as an argument to toupper, and note the
curious concordance between the period of its failure mode (two
out of three) and the number of times toupper's argument is
repeated in its expansion.

Rule 2 is occasionally broken by "standard library" facilities,
but generally only when rule 3 is observed, so that the
distinction between function and macro is transparent to the
caller.

The "improved" toupper macro, scrupulous as it is in its
adherence to rule 1, violates both rules 2 and 3, and is
therefore a perfect ticking time bomb long term booby trap of
a recurring nightmare for unsuspecting programmers everywhere.

If it is desirable for toupper to work correctly on characters
that are nonalphabetic or already upper-case (I believe this
property is called "idempotence," and as I said, it is a laudable
goal), then the macro implementation has to be sacrificed, and
toupper() made a proper function.

By the way, the fancy toupper macro also violates a fourth rule,
almost universally ignored today, which is that macros shouldn't
expand to "too much" code, because in the old days we only had
64K or so to play with, and every byte counted.  The most famous
exception is the recent Berkeley line-buffered putc macro, which
is something like seven backslash-continued lines long, although,
believe it or not, it does manage to guarantee a single
evaluation of its first argument, so putc(*p++, fd) will work, as
indeed it must.  One would try something ludicrous like
FILE *fdarray[10]; ... putc(c, fd[i++]) at one's extreme peril,
however.

Now, with respect to Microsoft, their run-time library gets
tugged in several directions as they try to maintain
compatibility with existing code while migrating toward ANSI, and
in version 4 I believe they had two separate versions of toupper,
depending on which header file you #included.  To make things
even more confusing, I think one header file gave you the unsafe
macro I'm disparaging, and the other got you a real function.
(Of course, there was also a third implementation, called
"_toupper", which is the non-checking version, safely
implementable as a macro, such as appears in the example towards
the beginning of this article.)

(These difficulties may be resolved in Microsoft's Version 5.
Although I happen to use Microsoft V5, I don't pay much attention
to its or anyone's implementation of islower/toupper any more.
Any code of mine that cares protects itself with

	#ifdef _toupper
	#undef toupper
	#define toupper _toupper
	#endif

which recreates, with only the barest twinges of worry about
undermining _reserved ANSI identifiers, a cozy V7 environment.
I'll call islower() explicitly; thank you.  Note that I do this
not for efficiency's sake but for safety; an even more likely
side-effect-containing argument for ctype macros than getch() is
*p++.)

The bottom line is, don't implement things with macros unless
it's absolutely safe.  The potential efficiency improvements
simply aren't worth it when they lead to these "little
surprises."  In those rare cases where the efficiency gain is
significant and important, capitalize the hell out of the macro
name and plaster the code and documentation with big warnings,
and budget some time for the confusion and stubborn bugs which
will still inevitably arise.

Speaking of documentation; some will haughtily tell the original
complainant to RTFM; Microsoft's manual may well state that
toupper is a macro and can't be used on arguments with side
effects.  That's unacceptable.  Someone coined a nice phrase
called the "principle of least surprise."  Among other things, it
holds that there is a class of mistakes which are so easy to make
that no amount of documentation will rescue them; the only
solution is to remove the problem, in this case the dangerous
macro implementation.

Let's not get started on tweaks to the preprocessor to make
dangerous macros safer to write; we just spent a month or so
exhaustively treating how not to square numbers.  If you want to
work on something, work on good inlining algorithms instead.  And
before you think that your proposed improvements to the
preprocessor make whacko macros safe, or even that the three or
four rules listed above are sufficient, consider
							putc(c,
							     fd);

which is what people like me write when we've indented ourselves
into a brick wall at the right margin but are for some stupid
reason reluctant to break out into another subroutine.  Although
ANSI says macro invocations are allowed to cross newline
boundaries, there are a lot of existing preprocessors which can't
handle them without explicit backslash continuations.  (I can't
say I blame them, macro invocations spanning newlines being
rather extremely painful to implement correctly.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

cpp90221@dcscg1.UUCP (Duane L. Rezac) (10/21/88)

From article <10508@dartvax.Dartmouth.EDU>, by yaping@eleazar.dartmouth.edu (Yaping Xu):
> Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0?
> They often skip every other keypress on me--and in one case, they skip two
> keypresses out of three!  Maybe it's my code.  This occurs mainly when I try
> 
> 	c = toupper(getch());
 
> 				--Scott
I'm not sure about Microsoft C, but with turbo C and C86 optimizing Compiler, 
the getch() and getche() read one character out of the buffer. I have run into
the problem with these functions skipping inputs due to some information that 
was left in the buffer from a previous read. When the second calling of the 
function occurs, it reads the remaining data in the buffer, appearing to run 
right past the requested input. At times I have had to add an extra getche() or
getch() in front of the one that is skipping the input in order to clear the 
buffer. 

(by the way, if anyone has a good method to insure that the keyboard buffer is 
 empty, Please post it.)

-- 
+-----------------------+---------------------------------------------------+
| Duane L. Rezac        |These views are my own, and NOT representitive of  |
| dsacg1!dcscg1!cpp90221|my place of Employment.                            |
+-----------------------+---------------------------------------------------+

yaping@eleazar.dartmouth.edu (Yaping Xu) (10/21/88)

	[This article was written by Scott Horne, not Yaping Xu.
	Direct all e-mail responses to me at jalphin@prism.clemson.edu.]

Several people have answered my question about getch() & getche() in MSC.
I didn't know that `toupper' was a macro, which it is:

#	define	 toupper(c)	( (islower(c)) ? _toupper(c) : (c) )

which caused the problem:  "toupper(getch())" would evaluate to

	((islower(getch()) ? _toupper(getch()) : getch())

and islower() would be expanded, and so would _toupper() be.  Thus getch()
is called at least twice.

Thanks for pointing out my stupid mistake--and please stop filling my mailbox
with responses!  :-)

				--Scott

chris@mimsy.UUCP (Chris Torek) (10/21/88)

In article <7594@bloom-beacon.MIT.EDU> scs@athena.mit.edu (Steve Summit)
writes:
>     1. Parenthesize fully, inside and out
>     2. Use capital letters in the name, to remind the reader
>	 it's a macro and may therefore act weird
>     3. Make every effort not to repeat "arguments," so that
>	 side effects aren't replicated

Actually, these are all good arguments for an `inline' keyword, a la
C++.  It is worth noting that GCC has an inline keyword, and one can
write, e.g.,

	inline int toupper(int c) {
		return (islower(c) ? _toupper(c) : c);
	}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

burgett@galaxy.COM (Michael Burgett) (10/21/88)

In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes:
>	c = toupper(getch());
>				--Scott
I think that toupper() is a function and an macro, try adding a 
#undef toupper before you make the call and see if that helps..

		Mike Burgett  adobe!burgett@decwrl.dec.com

mustard@sdrc.UUCP (Sandy Mustard) (10/22/88)

toupper is implemented as a macro that evaluates its parm more than once.

ok@quintus.uucp (Richard A. O'Keefe) (10/22/88)

In article <7594@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>This is a snide, whiney "I told you so" to the efficiency addicts
>and macro panderers out there.
>
>	#define _toupper(c)  ((c) - ('a' - 'A'))
>	#define toupper(c) (islower(c) ? _toupper(c) : (c))
>
>If it is desirable for toupper to work correctly on characters
>that are nonalphabetic or already upper-case (I believe this
>property is called "idempotence," and as I said, it is a laudable
>goal), then the macro implementation has to be sacrificed, and
>toupper() made a proper function.

This conclusion does not follow.  *THAT* version of toupper() has to
go, but you can still usefully use a macro.

	extern char _utab[];
	#define toupper(c) _utab[(c) & 255]

Merits:	(1) single evaluation
	(2) usually faster than a function call
	(3) works nicely with EBCDIC or ISO 8859, not just ASCII

This is a good way of turning any function-from-characters into a macro:
compute all the function values when your program starts and store them
in an array.  (Look at the is<class>() macros in /usr/include/ctype.h .)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)

In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes:
>	c = toupper(getch());

The problem is almost certainly due to toupper() being implemented as an
"unsafe" macro, i.e. one that evaluates its argument more than once, so
that if the argument has side-effects the result is different from what
a function toupper() would have done.  Obviously getch() has side-effects.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)

In article <7594@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>If it is desirable for toupper to work correctly on characters
>that are nonalphabetic or already upper-case (I believe this
>property is called "idempotence," and as I said, it is a laudable
>goal), then the macro implementation has to be sacrificed, and
>toupper() made a proper function.

No, toupper() can be correctly implemented as a "safe" macro,
at least in an environment where all locales use character sets
that fit in 8-bit bytes.  Think about how other <ctype.h>
functions are typically implemented as safe macros and you
should be able to see how toupper() could be so done.

gsmith@umd5.umd.edu (Gordon Smith) (10/24/88)

In article <10508@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes:
>
>	[This article was written by Scott Horne, not Yaping Xu.
>
>Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0?
>They often skip every other keypress on me--and in one case, they skip two
>keypresses out of three!  Maybe it's my code.  This occurs mainly when I try
>
>	c = toupper(getch());
>

    The reason your program is not working correctly, is because of
the toupper, not the getch.  toupper() is implemented as a macro, not a 
function.  Therefore the code MAY look similiar to this:
	c = isalpha(getch()) && isupper(getch()) ? getch() : getch()-'a'+'A';

Therefore executing getch() more than once.  This may not be the exact
macro description for toupper, but it does illustrate the point.

knudsen@ihlpl.ATT.COM (Knudsen) (10/25/88)

In article <10523@dartvax.Dartmouth.EDU>, yaping@eleazar.dartmouth.edu (Yaping Xu) writes:
> I didn't know that `toupper' was a macro, which it is:
> Thanks for pointing out my stupid mistake--and please stop filling my mailbox

Stupid, hell.  Don't you wish that every C system had some standard,
easy way to check which "fcns" are macros?  That lint warned you
about such problems?

Meanwhile, better print out your stdio.h and other header files
and make a list....
-- 
Mike Knudsen  Bell Labs(AT&T)   att!ihlpl!knudsen
"Lawyers are like handguns and nuclear bombs.  Nobody likes them,
but the other guy's got one, so I better get one too."

swarbric@tramp.Colorado.EDU (Frank Swarbrick) (10/25/88)

In article <397@dcscg1.UUCP> cpp90221@dcscg1.UUCP (Duane L. Rezac) writes:
>I'm not sure about Microsoft C, but with turbo C and C86 optimizing Compiler, 
>the getch() and getche() read one character out of the buffer. I have run into
>the problem with these functions skipping inputs due to some information that 
>was left in the buffer from a previous read. When the second calling of the 
>function occurs, it reads the remaining data in the buffer, appearing to run 
>right past the requested input. At times I have had to add an extra getche() or
>getch() in front of the one that is skipping the input in order to clear the 
>buffer. 
>
>(by the way, if anyone has a good method to insure that the keyboard buffer is 
> empty, Please post it.)

To clear the keyboard buffer I just do

#define clrkbdbuf() while (kbhit()) getch()

(of what I mean is I write that define and then call it as clrkbdbuf();)

There're probably better ways, but this works fine for me.

Frank Swarbrick (and, yes, the net.cat)       University Of Colorado, Boulder
swarbric@tramp.Colorado.EDU          ...!{ncar|nbires}!boulder!tramp!swarbric
"...don't believe in Goldman, his type like a curse
 Instant Karma's gonna get him if I don't get him first" --U2

guy@auspex.UUCP (Guy Harris) (10/26/88)

>Stupid, hell.  Don't you wish that every C system had some standard,
>easy way to check which "fcns" are macros?

Many of them do; it's called "the manual".   For instance, from SunOS 4.0
(I suspect these items go back to the V7 documentation):

DESCRIPTION
     getc() returns the next character (that is, byte)  from  the
     named  input  stream, as an integer.  It also moves the file
     pointer,  if  defined,  ahead  one  character   in   stream.
     getchar()  is  defined as getc(stdin).  getc and getchar are
     macros.

...

BUGS
     Because it is implemented as a macro, getc() treats a stream
     argument  with  side  effects  incorrectly.   In particular,
     getc(*f++) does not work sensibly.  fgetc() should  be  used
     instead.

What do you mean by "check"?  Do you mean

#if "getchar is a macro"
	code that works if "getchar" is a macro...
#else
	code that doesn't...
#endif

If so, try

#ifdef getchar
	code that works if "getchar" is a macro...
#else
	code that doesn't...
#endif

although it would be better to just write

	code that works if "getchar" is a macro...

and be done with it.

>That lint warned you about such problems?

Well, yeah, it'd be nice if "lint" warned about attempts to dereference
null pointers, too, and misspellings in character strings, and....

hermit@shockeye.UUCP (Mark Buda) (10/26/88)

In article <10523@dartvax.Dartmouth.EDU> jalphin@prism.clemson.edu writes:
>
>	[This article was written by Scott Horne, not Yaping Xu.
>	Direct all e-mail responses to me at jalphin@prism.clemson.edu.]
>
>Several people have answered my question about getch() & getche() in MSC.
>I didn't know that `toupper' was a macro, which it is:
>
>#	define	 toupper(c)	( (islower(c)) ? _toupper(c) : (c) )

Okay, I'm confused. We've got a System V Release 1<n<2 system here, and
conv(3c) says that toupper is a function and _toupper is the macro.
We've got a Genix (4.1BSD) system that says toupper is a macro (and notes
that it is the same as SysV _toupper.) Turbo C agrees with SysV.

How many systems have it which way? (I know. It is unwise to depend on
"toupper" in portable programs...)
-- 
Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189
Dumb UUCP: ...rutgers!bpa!vu-vlsi!devon!shockeye!hermit
Entropy will get you in the end.
"A little suction does wonders." - Gary Collins

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/27/88)

In article <236@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes:
>How many systems have it which way? (I know. It is unwise to depend on
>"toupper" in portable programs...)

All C implementations should provide toupper() via <ctype.h>.
Whether it is implemented as a macro or a function is the only
significant variable.  Some macro implementations are "unsafe"
(with respect to side-effects in their arguments), so for
maximum portability you should not rely on toupper() being
"safe" even though ANSI C will require that.

Don't use _toupper(), which may not even exist in many implementations.

lvc@cbnews.ATT.COM (Lawrence V. Cipriani) (10/27/88)

In article <8764@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>All C implementations should provide toupper() via <ctype.h>.
	...
>Don't use _toupper(), which may not even exist in many implementations.

A related suggestion...  I have a program that makes '_' an alphabetic
by changing _ctype[].  This was a bad idea since 1) the "array" has a
different name on different systems, eg. BSD vs. AT&T, and 2) the #define
symbols for character class definition, eg. _U vs. _UPPER (in uSoft) vary
as well.

-- 
Larry Cipriani, AT&T Network Systems, Columbus OH, cbnews!lvc lvc@cbnews.ATT.COM

guy@auspex.UUCP (Guy Harris) (10/28/88)

>Okay, I'm confused. We've got a System V Release 1<n<2 system here, and
>conv(3c) says that toupper is a function and _toupper is the macro.
>We've got a Genix (4.1BSD) system that says toupper is a macro (and notes
>that it is the same as SysV _toupper.) Turbo C agrees with SysV.
>
>How many systems have it which way? (I know. It is unwise to depend on
>"toupper" in portable programs...)

UNIX V7 had "toupper" a macro, with no "_toupper".  I think they renamed
that macro "_toupper" and added the function "toupper" - which, unlike
the macro, is supposed to leave characters that aren't lower-case
letters alone, rather than performing unnatural acts on them - in S5 (it
may have been S3).

4.xBSD didn't pick up the S5 stuff; it stuck with the V7 version. 
Systems based on V7 or 4.xBSD (and maybe S3) that haven't made
themselves S5-compatible will probably have "toupper" as a macro;
systems that have made themselves S5-compatible will have it as a
function, at least in their S5-compatbile environment - if they also
offer a BSD-compatible environment, it will probably be a macro in that
environment.

I suspect most of the microcomputer systems will work in S5 fashion. 
VAX C probably does it in BSD fashion.

K&R Second Edition, based on some ANSI C draft, specifies that "toupper"
must work in the S5 fashion (leaving characters that aren't lower-case
letters alone), although (not having a draft handy) I don't know whether
ANSI C allows this to be done with a macro or not.   It doesn't say
anything about "_toupper", so I assume it's not guaranteed to exist in
an ANSI C implementation.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/28/88)

In article <1737@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence V. Cipriani) writes:
>A related suggestion...  I have a program that makes '_' an alphabetic
>by changing _ctype[].  This was a bad idea ...

Yeah, I've seen a couple of instances of that.  Usually I fix it by
simply finding where the macro is used and adding the additional test
for '_' there.  I've never seen a significant loss of speed thereby.

The general principle is to avoid relying on any details of the
specific implementation(s).  If something is not guaranteed by the
spec, it is subject to change even on the same system but certainly
across systems.

karl@haddock.ima.isc.com (Karl Heuer) (10/28/88)

In article <1737@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence V. Cipriani) writes:
>[Poking _ctype[] to make '_' appear alphabetic is] a bad idea since 1) the
>"array" has a different name on different systems, eg. BSD vs. AT&T, and 2)
>the #define symbols for character class definition, eg. _U vs. _UPPER (in
>uSoft) vary as well.

Also because (3) _ctype[] may not be writable, (4) other library routines may
be depending on isalpha('_') being false, (5) Even if _ctype[] exists,
isalpha() might not use it.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

guy@auspex.UUCP (Guy Harris) (10/28/88)

>A related suggestion...  I have a program that makes '_' an alphabetic
>by changing _ctype[].  This was a bad idea since 1) the "array" has a
>different name on different systems, eg. BSD vs. AT&T,

Try "AT&T vs. AT&T"; V7 used "_ctype_", and a different AT&T release (S3
or S5) removed the "_".

Don't assume every place where BSD and S5 differ is the result of AT&T
and Berkeley deciding to do things differently; sometimes it was just
one or more parts of AT&T deciding to do things differently....

And yes, it was a bad idea (I think the S5 "m4" code does the same
thing); don't assume you know the way some system-defined function works
internally, because some day you may find a system on which it works
differently....  (Furthermore, some library routine your program calls
may have expected "isalpha('_')" to be false, in which case it was in
for a rude surprise.)