[comp.lang.c] two chars at once...

sr16+@andrew.cmu.edu (Seth Benjamin Rothenberg) (09/13/89)

We have a language-sensitive editor which maintains the code in
a parsed form.  Now, it generates Fortran.  I need to make it
generate C.  One of the nasty things it does is treat 2 characters
as a single 2-byte integer.  I have to convert the following
types of expressions
      if  "ax" = mystr         ->  if mystr[1]=>C1D9  (or something like that)
      mystr = "AX"

Should I just change this call to use macros, like
    if cmp2("ax", mystr);
    cpy2(mystr, "AX");
or is a more direct (kludgy?) way possible?

(I suspect I should avoid something like this as a hardware-dependency)

Thanks
Seth
sr16@andrew.cmu.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/14/89)

In article <QZ3bmsW00WB4E5BEob@andrew.cmu.edu> sr16+@andrew.cmu.edu (Seth Benjamin Rothenberg) writes:
>One of the nasty things it does is treat 2 characters
>as a single 2-byte integer.
>      if  "ax" = mystr         ->  if mystr[1]=>C1D9  (or something like that)
>      mystr = "AX"
>Should I just change this call to use macros, like
>    if cmp2("ax", mystr);
>    cpy2(mystr, "AX");
>or is a more direct (kludgy?) way possible?

The direct equivalent in C would be to use multi-character character
constants such as 'AX', which are ints containing the multiple
character codes "somehow".  The details of how they are represented
are implementation-dependent; however, it is probable that 'AX' would
be equal to either 'A'<<CHAR_BIT|'X' or 'X'<<CHAR_BIT|'A'.

Using 2-character strings would be more portable, of course.

karl@haddock.ima.isc.com (Karl Heuer) (09/19/89)

In article <11057@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <QZ3bmsW00WB4E5BEob@andrew.cmu.edu> sr16+@andrew.cmu.edu (Seth Benjamin Rothenberg) writes:
>>One of the nasty things it does is treat 2 characters
>>as a single 2-byte integer.
>>	if  "ax" = mystr    ->  if mystr[1]=>C1D9  (or something like that)
>
>The direct equivalent in C would be to use multi-character character
>constants such as 'AX', which [contain both characters in an implementation-
>dependent manner].  Using 2-character strings would be more portable

(Terminology: since X3J11 has already claimed the words "multibyte character"%
and "wide character"&, and neither one of them refers to the construct above,
I have taken to calling them "siamese character constants".)

If the inefficiency of using two-character strings is a problem, and you don't
want to rely on the properties (or even the existence) of siamese character
constants, you can replace them with a macro:
	#define two(a,b) (((a)<<CHAR_BIT)|(b))
	if (two('a','x') == mystr) ...
(It doesn't matter whether two() is defined as above or the other way around,
as long as it's consistent.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint