eggert@twinsun.com (Paul Eggert) (05/08/91)
The traditional implementation of isdigit() in <ctype.h> is typically something like this: #define isdigit(c) ((_ctype_+1)[c] & 4) which requires indexing through a global array followed by a masking operation. Why not use the following implementation instead? #define isdigit(c) ((unsigned)((c)-'0') < 10) This needs just a subtraction followed by a comparison. It's faster on all the systems I've tried it on, and is strictly conforming ANSI C.
steve@taumet.com (Stephen Clamage) (05/08/91)
eggert@twinsun.com (Paul Eggert) writes: >The traditional implementation of isdigit() in <ctype.h> is typically >something like this: > #define isdigit(c) ((_ctype_+1)[c] & 4) >which requires indexing through a global array followed by a masking >operation. Why not use the following implementation instead? > #define isdigit(c) ((unsigned)((c)-'0') < 10) >This needs just a subtraction followed by a comparison. It's faster on >all the systems I've tried it on, and is strictly conforming ANSI C. Maybe you didn't try it on enough systems :-) The macro you suggest requires a test and jump, and on some modern RISC machines the penalty is very high. For example, I tried a loop which only tested for isdigit and incremented a global based on the test. On a Sun-4 (SPARC), the program using your macro took 44% longer than the one using the usual ctype macro. On a DECStation (MIPS), your macro took 46% longer. This points up the principle that you shouldn't diddle your code looking for small speedups based on characteristics of the compiler or target machine. Your attempts may backfire badly when the code is re-used. (I'm not accusing Paul of diddling, I just like to make this point.) Apart from that, the other ctype tests (except isascii) are always faster using the lookup table, and it is simpler to use a uniform set of macros. -- Steve Clamage, TauMetric Corp, steve@taumet.com
wicklund@intellistor.com (Tom Wicklund) (05/08/91)
In <1991May8.030515.7004@twinsun.com> eggert@twinsun.com (Paul Eggert) writes: >The traditional implementation of isdigit() in <ctype.h> is typically >something like this: > #define isdigit(c) ((_ctype_+1)[c] & 4) >which requires indexing through a global array followed by a masking >operation. Why not use the following implementation instead? > #define isdigit(c) ((unsigned)((c)-'0') < 10) >This needs just a subtraction followed by a comparison. It's faster on >all the systems I've tried it on, and is strictly conforming ANSI C. Two portability problems -- 1) Numeric digits aren't adjacent character codes in all character sets (though I can't think of an example where this isn't true off hand). 2) The (unsigned) cast may not work (I think an implementation could return abs(x) as the result of an (unsigned) cast but am not sure), so (unsigned)(c-'0') would be TRUE for "c" in the range '0'-9 .. '0'+9. Note the first "traditional" implementation isn't completely portable either since it assumes that EOF is -1, not necessarily true. Two possible performance problems: 1) Your example is probably faster in an -if- statement, but slower in a statement such as: x = isdigit(y); 2) On machines with complex addressing modes (e.g. 80x86 and 68xxx) it should be possible to compile the table driven isdigit into a single instruction which doesn't use an extra register. Your example is a single instruction but requires a register to hold (c - '0'), which could mean other code won't optimize as well. I think the "traditional" implementation is done to be consistent with macros such as isalpha, which is much more efficient as a table (and letters aren't adjacent codes in EBCDIC or foreign language ISO/ASCII based character sets). So both approaches should work on most modern machines and compilers, so there's nothing wrong with implementing either.
ark@alice.att.com (Andrew Koenig) (05/08/91)
In article <1991May8.030515.7004@twinsun.com> eggert@twinsun.com (Paul Eggert) writes: > Why not use the following implementation instead? > #define isdigit(c) ((unsigned)((c)-'0') < 10) > This needs just a subtraction followed by a comparison. It's faster on > all the systems I've tried it on, and is strictly conforming ANSI C. This is surely an implementation issue. However, it does suggest an interesting puzzle: can you find a character set that meets the ANSI C constraints (in particular, the one that says that all digits are contiguous in the character set) for which this macro will not work? (Yes, I think there is one) -- --Andrew Koenig ark@europa.att.com
exspes@gdr.bath.ac.uk (P E Smee) (05/08/91)
In article <1991May8.163547.21677@gdr.bath.ac.uk> P.Smee@bristol.ac.uk (Paul Smee) writes: >In article <1991May8.030515.7004@twinsun.com> eggert@twinsun.com (Paul Eggert) writes: >>The traditional implementation of isdigit() in <ctype.h> is typically >>something like this: >> >> #define isdigit(c) ((_ctype_+1)[c] & 4) >> >>which requires indexing through a global array followed by a masking >>operation. Why not use the following implementation instead? >> >> #define isdigit(c) ((unsigned)((c)-'0') < 10) >> >>This needs just a subtraction followed by a comparison. It's faster on >>all the systems I've tried it on, and is strictly conforming ANSI C. > >And, it works just fine, if you believe that the chars /.-,+*()' are >digits. :-) Silly me. Ignore that, if it gets to your site before I manage to cancel it. My excuse is simply that I've been in too long today, sigh... -- Paul Smee, Computing Service, University of Bristol, Bristol BS8 1UD, UK P.Smee@bristol.ac.uk - ..!uunet!ukc!bsmail!p.smee - Tel +44 272 303132
henry@zoo.toronto.edu (Henry Spencer) (05/09/91)
In article <716@taumet.com> steve@taumet.com (Stephen Clamage) writes: >> #define isdigit(c) ((unsigned)((c)-'0') < 10) > >The macro you suggest requires a test and jump, and on some modern RISC >machines the penalty is very high... No, it requires a test and convert-test-value-to-data operation, which on some RISC machines does not require a jump. (I'll avoid putting in a plug for my favorite one... :-)) Besides, the main use for predicates like this is in driving conditional jumps, so in fact the conversion is rarely needed and there is no added jump. -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
worley@compass.com (Dale Worley) (05/09/91)
In article <1991May8.154453.4462@intellistor.com> wicklund@intellistor.com (Tom Wicklund) writes:
2) The (unsigned) cast may not work (I think an implementation could
return abs(x) as the result of an (unsigned) cast but am not sure),
so (unsigned)(c-'0') would be TRUE for "c" in the range '0'-9 .. '0'+9.
No, it can't. The conversion of a signed integer to an unsigned
integer has to be done by the traditional modular arithmetic. See
section 3.2.1.2 of the ANSI specification.
Dale
Dale Worley Compass, Inc. worley@compass.com
--
Why are you RUNNING? Cerebus just wants to KILL you a little...
-- "Cerebus"
eggert@twinsun.com (Paul Eggert) (05/09/91)
steve@taumet.com (Stephen Clamage) writes: >The macro you suggest requires a test and jump, and on some modern RISC >machines the penalty is very high. For example, I tried a loop which >only tested for isdigit and incremented a global based on the test. On >a Sun-4 (SPARC), the program using your macro took 44% longer than the one >using the usual ctype macro. On a DECStation (MIPS), your macro took >46% longer. That's odd. I tried several systems and got speedups for the proposed isdigit() every time, if anything even more for RISC than CISC. E.g.: prop trad speedup 2.5 4.5 1.8 Sparcstation 1 SunOS 4.1.1 cc -O 3.6 5.4 1.5 DECstation 3100 Ultrix V4.0 (Rev. 179) cc -O 9.7 13.3 1.4 Sun-3/260 SunOS 4.1 cc -O prop = time with proposed ``#define isdigit(c) ((unsigned)((c)-'0') < 10))'' trad = time with traditional isdigit() in <ctype.h> speedup = trad/prop = how much faster the proposed isdigit() is All times are the sum of the user+system CPU time in seconds, and are measured with the command `time ./a.out 10000000' under csh. Here's the program I used. #include <stdio.h> #if traditional # include <ctype.h> #else # define isdigit(c) ((unsigned)((c) - '0') < 10) #endif main(argc, argv) int argc; char **argv; { char *p = argv[1]; int i = atoi(p), c = *p; while (0 <= --i) if (isdigit(c)) c = ++*p; printf("%s\n", p); return 0; }
steve@taumet.com (Stephen Clamage) (05/09/91)
I wrote: |eggert@twinsun.com (Paul Eggert) writes: |>Why not use the following implementation instead? |> #define isdigit(c) ((unsigned)((c)-'0') < 10) |>This needs just a subtraction followed by a comparison. It's faster on |>all the systems I've tried it on, and is strictly conforming ANSI C. |Maybe you didn't try it on enough systems :-) |The macro you suggest requires a test and jump, and on some modern RISC |machines the penalty is very high. Bruce Evans pointed out to me that since isdigit() is used in tests, the test-and-jump is already paid for. I checked my test code to see why I got the results I did, and found they were the opposite of what I reported. I'm sorry for wasting everyone's time with this. Paul's macro is indeed faster. -- Steve Clamage, TauMetric Corp, steve@taumet.com
eggert@twinsun.com (Paul Eggert) (05/09/91)
ark@alice.att.com (Andrew Koenig) writes: >> #define isdigit(c) ((unsigned)((c)-'0') < 10) >... can you find a character set that meets the ANSI C constraints (in >particular, the one that says that all digits are contiguous in the >character set) for which this macro will not work? Yes: any host where char is the same as int, and where '0' != 0. The subtraction might overflow. The following slightly revised isdigit() works around this problem because unsigned arithmetic cannot overflow: #define isdigit(c) ((unsigned)(c) - '0' < 10) but frankly I worry that some old buggy compilers will generate the wrong code for this ``more portable'' version.
dik@cwi.nl (Dik T. Winter) (05/09/91)
In article <20325@alice.att.com> ark@alice.UUCP () writes: > In article <1991May8.030515.7004@twinsun.com> eggert@twinsun.com (Paul Eggert) writes: > > #define isdigit(c) ((unsigned)((c)-'0') < 10) > However, it does suggest an interesting puzzle: can you find a > character set that meets the ANSI C constraints (in particular, > the one that says that all digits are contiguous in the character > set) for which this macro will not work? (Yes, I think there is one) That is easy. If your character set uses '5'-s complement to represent the digits it will fail. But the one I know that does that uses only 6 bits for a char, so that violates another ANSI C constraint. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
andy@research.canon.oz.au (Andy Newman) (05/09/91)
eggert@twinsun.com (Paul Eggert) writes: > Why not use the following implementation instead? > > #define isdigit(c) ((unsigned)((c)-'0') < 10) Only works for character sets that have the decimal digits arranged contiguously and with digit '0' having a lower numeric value than any other digit character. This works for ASCII and EBCDIC but what about other character sets? -- Andy Newman (andy@research.canon.oz.au) Canon Info. Systems Research Australia "If education were illegal would more people would try it?"
steve@taumet.com (Stephen Clamage) (05/09/91)
wicklund@intellistor.com (Tom Wicklund) writes: >In <1991May8.030515.7004@twinsun.com> eggert@twinsun.com (Paul Eggert) writes: >>Why not use the following implementation instead? >> #define isdigit(c) ((unsigned)((c)-'0') < 10) >Two portability problems -- >1) Numeric digits aren't adjacent character codes in all character >sets ... ANSI C requires that '0' thru '9' be contiguous and in that order. >2) The (unsigned) cast may not work ANSI C requires that the unsigned cast work as well-defined modulo arithmetic, so that any negative value cast to unsigned looks like a large positive value. >Note the first "traditional" implementation isn't completely portable >either since it assumes that EOF is -1, not necessarily true. EOF and the <ctype.h> functions are supplied as part of the C library. They must all be consistent. Any library for which EOF is not -1 would not use _ctype[(c)+1] to index an array. We are talking about the library implementation, not arbitrary user code. -- Steve Clamage, TauMetric Corp, steve@taumet.com
sven@cs.widener.edu (Sven Heinicke) (05/09/91)
In <1991May9.025552.2292@research.canon.oz.au>, andy@research.canon.oz.au writes: >eggert@twinsun.com (Paul Eggert) writes: >> Why not use the following implementation instead? >> >> #define isdigit(c) ((unsigned)((c)-'0') < 10) > >Only works for character sets that have the decimal digits >arranged contiguously and with digit '0' having a lower >numeric value than any other digit character. This works >for ASCII and EBCDIC but what about other character sets? I am not saying that is is a bad argument because we don't know what is to come in the future but are there anyother kinds of systems that are used besides ASCII and EBCDIC that show promise in the future (I hope there are, because ASCII could use some work) -- sven@cs.widener.edu Widener CS system manager Sven Mike Heinicke and Student (pssmheinicke@cyber.widener.edu (if you must))
rcd@ico.isc.com (Dick Dunn) (05/10/91)
wicklund@intellistor.com (Tom Wicklund) writes: > 1) Numeric digits aren't adjacent character codes in all character > sets (though I can't think of an example where this isn't true off > hand). This seems like a weak statement! If there exists such a character set, it ought to be possible to discover it, shouldn't it? (Translation: reality check!) -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...If you plant ice, you're gonna harvest wind.
gwyn@smoke.brl.mil (Doug Gwyn) (05/11/91)
In article <1991May8.154453.4462@intellistor.com> wicklund@intellistor.com (Tom Wicklund) writes: >1) Numeric digits aren't adjacent character codes in all character >sets (though I can't think of an example where this isn't true off >hand). Oh, yes they are, at least in terms of the C source character set, because that is a requirement the standard imposes on a conforming implementation. (I argues and voted against that, by the way.) >2) The (unsigned) cast may not work (I think an implementation could >return abs(x) as the result of an (unsigned) cast but am not sure), Type conversions are carefully specified in the C standard; what you describe is not allowed in a conforming implementation.
sparkman@hydra3c.cs.utk.edu (Donald Sparkman) (05/15/91)
As far as a faster isdigit if you can insure only ascii then #define isdigit(c) ((c^0x30)<10) may be faster since its using bitwise Xor instead of cast and subtraction Don Sparkman