etxnisj@eos8c21.ericsson.se (Niklas Sjovall) (03/20/91)
Hi, I want to use a macro defined in ctype.h on a Sun4 (4.03), but i don't fully understand it. The macro is: #define _U 01 #define _L 02 extern char _ctype_[]; #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) It's the part (_ctype_+1)[c] i don't understand. Could there be any segmentation errors using this? Thanks
gwyn@smoke.brl.mil (Doug Gwyn) (03/21/91)
In article <1991Mar20.112543.5515@ericsson.se> etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes:
-I want to use a macro defined in ctype.h on a Sun4 (4.03), but i don't
-fully understand it.
-The macro is:
-#define _U 01
-#define _L 02
-extern char _ctype_[];
-#define isalpha(c) ((_ctype_+1)[c]&(_U|_L))
-It's the part (_ctype_+1)[c] i don't understand. Could there be any
-segmentation errors using this?
No, in fact that's a quite standard implementation of <ctype.h>.
The +1 is used to allow EOF (defined as -1) to also be used as the
argument to the is*() macros.
Note that some implementations will fail if fed arbitrary garbage
for arguments to the is*() macros, for example any integer value
more negative than -1.
wirzenius@cc.helsinki.fi (Lars Wirzenius) (03/21/91)
In article <1991Mar20.112543.5515@ericsson.se>, etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: > #define _U 01 > #define _L 02 > extern char _ctype_[]; > #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) > > It's the part (_ctype_+1)[c] i don't understand. Could there be any > segmentation errors using this? Since isalpha is a library function (and a common one at that), there shouldn't be any errors if you use it correctly, i.e. only give it valid arguments. In this case, the arguments have to be valid characters or the value of EOF (as defined in <stdio.h>). The way this (seems to be) implemented by Sun is: _ctype_ is an array, which is subscripted with the character argument (henceforth referred to as c), and each element of the array is a collection of flags that identify various characteristics of the character, such as whether it is a letter or not. As long as you only need to test real characters, you can simply use _ctype_[c]. However, isalpha should handle the value of EOF also. We could first test whether c == EOF, and use _ctype_ only if it isn't, but that requires using c twice, which isn't good, because of possible side effects (isalpha(getchar()) is quite reasonable sometimes). What we do instead is define EOF as -1 (we can do that, since we're writing the whole library), and arrange so that EOF's flags come at the beginning of the array (_ctype_[0]), then the real characters' flags, each at an index one greater than the numeric value of the character. This means that we can write _ctype_[c+1] to access the flags for character c; EOF is -1 so its flags come at _ctype_[-1+1], i.e. _ctype_[0]. Another way to write the expression is to use pointer arithmetic. This is what Sun has done. The value of the name of an array, _ctype_, becomes in value contexts a pointer to the first element of the array, &_ctype_[0]. If we add 1 to this pointer, we get a pointer to the next element, _ctype_[1]. This pointer is then subscripted with the character argument, since now the flags for character c are at offset c. The flags for EOF are at index -1, which in this case is a valid index, since it is still inside the real array, _ctype_. However, subscripting _ctype_ with -1 (i.e. _ctype[-1]) is quite illegal, and can very well result in a segmentation error; the same happens if you call isalpha(-2). Exactly what happens depends on the system, I believe 'undefined behaviour' is the phrase used in the ANSI standard for C (there have been many nice suggestions for this behaviour, ranging from mailing a complaint to Dennis Ritchie, to launching a nuclear attack; segmentation errors and system crashes are more normal ones (I hope :-)). -- Lars Wirzenius wirzenius@cc.helsinki.fi
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/21/91)
In article <1991Mar20.112543.5515@ericsson.se>, etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: > I want to use a macro defined in ctype.h on a Sun4 (4.03), but i don't > fully understand it. You should read the manual page. That tells you everything you need to know in order to USE the macro. In UNIX, it used to be the case that the <ctype.h> macros were defined for EOF (-1) and for the integers which satisfy isascii(). In ANSI C, the macros are defined for EOF and for any value representable as unsigned char. Think -1..255. > It's the part (_ctype_+1)[c] i don't understand. Could there be any > segmentation errors using this? (_ctype_+1)[c] is identical to *((_ctype_+1)+(c)) which is identical to _ctype_[(c)+1]. The +1 is there to map the lowest legal value EOF (-1) to 0 (the lowest element of the array). If you had full sources you'd probably find char _ctype_[257]; somewhere. Yes, of course there can be segmentation errors using this, if the value of c is outside the range -1 .. UCHAR_MAX, but you have to keep your subscripts in range for _any_ C array. -- Seen from an MVS perspective, UNIX and MS-DOS are hard to tell apart.
collinsa@p4.cs.man.ac.uk (Adrian Collins) (03/22/91)
In <1991Mar20.112543.5515@ericsson.se> etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: >Hi, >I want to use a macro defined in ctype.h on a Sun4 (4.03), but i don't >fully understand it. >The macro is: >#define _U 01 >#define _L 02 >extern char _ctype_[]; >#define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) >It's the part (_ctype_+1)[c] i don't understand. Could there be any >segmentation errors using this? From what I gather _ctype_[] is an array (probably 256 bytes) in length, each character has a corresponding entry into the table which contains information about the type characters suchas if it is printable, is whitespace, is uppercase, is lowercase. In the example about it checks to see if the bits corresponding to either uppercase or lowercase are set. If either is set then the character is an alphabetic character. For some reason the first entry in the array isn't used for holding character type information (beats me why), in which case the array is probably 257 in length presuming it isn't null terminated. Adrian --- Adrian Collins collinsa@uk.ac.man.cs.p4 Department of Computer Science a.m.collins@uk.ac.mcc University of Manchester Manchester, "Let me face the peril" UK. "No, it's too perilous!" - The Holy Grale
john@iastate.edu (Hascall John Paul) (03/25/91)
In article <collinsa.669647114@p4.cs.man.ac.uk> collinsa@p4.cs.man.ac.uk (Adrian Collins) writes: }In <1991Mar20.112543.5515@ericsson.se> etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: }>I want to use a macro defined in ctype.h ... i don't fully understand it. }>#define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) }>It's the part (_ctype_+1)[c] i don't understand. }For some reason the first entry in the array isn't used for holding }character type information (beats me why) ... The is????? macros are defined over the set (-1 ... 255) hence the need to offset by 1 to `align' with C's "start at 0" arrays (-1 is for EOF). This is so stuff like the following works correctly. do { c = getchar(); : if (isalpha(c)) fribbles(c); : } while (c != EOF); -- John Hascall An ill-chosen word is the fool's messenger. Project Vincent Iowa State University Computation Center john@iastate.edu Ames, IA 50011 (515) 294-9551
stan@Dixie.Com (Stan Brown) (03/26/91)
john@iastate.edu (Hascall John Paul) writes: =>In article <collinsa.669647114@p4.cs.man.ac.uk> collinsa@p4.cs.man.ac.uk (Adrian Collins) writes: =>}In <1991Mar20.112543.5515@ericsson.se> etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: =>}>I want to use a macro defined in ctype.h ... i don't fully understand it. =>}>#define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) =>}>It's the part (_ctype_+1)[c] i don't understand. =>}For some reason the first entry in the array isn't used for holding =>}character type information (beats me why) ... => The is????? macros are defined over the set (-1 ... 255) hence =>the need to offset by 1 to `align' with C's "start at 0" arrays (-1 =>is for EOF). This is so stuff like the following works correctly. => do { => c = getchar(); => : => if (isalpha(c)) fribbles(c); => : => } while (c != EOF); There was an execelent discussion of this subject about two months ago in _C_USERS_ magazine. It was a part of a serries that will eventually cover all the standard headers for an ANSI compliiant compiler. -- Stan Brown P. c. Design 404-363-2303 Ataant Ga. (emory|gatech|uunet) rsiatl!sdba!stan "vi forever" "Operating Systems, Like Editors Are Religions" -- Armando Stettner
dds@doc.ic.ac.uk (Diomidis Spinellis) (03/26/91)
In article <collinsa.669647114@p4.cs.man.ac.uk> collinsa@p4.cs.man.ac.uk (Adrian Collins) writes: >In <1991Mar20.112543.5515@ericsson.se> etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: > [...] >>#define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) > >>It's the part (_ctype_+1)[c] i don't understand. Could there be any >>segmentation errors using this? [...] > For some reason the first entry in the array isn't used for holding > character type information (beats me why), in which case the array is > probably 257 in length presuming it isn't null terminated. In this particular implementation _ctype_[0] holds the type value of the special constant, defined in stdio.h, EOF which happens to have the value of -1. Thus _ctype_[0] has the type information for EOF (-1), _ctype_[1] has the type information for character 0 etc. Diomidis -- Diomidis Spinellis Internet: dds@doc.ic.ac.uk Department of Computing UUCP: ...!ukc!icdoc!dds Imperial College, London SW7 #define O(b,f,u,s,c,a)b(){int o=f(); ...