[comp.std.c] Hexadecimal Escape Sequence

iiitsh@cybaswan.UUCP (Steve Hosgood) (01/15/90)

I recently discovered in K&R Edition 2 (ANSII C) that the hex escape
sequence will accept any number of valid hex characters after the "\x".
This means that the printf statement:-

	printf("\x1bfred");	/* i.e "<ESC>fred" */

..suddenly failed in a program when we updated from Microsoft 4.0 to
5.1 recently.

According to K&R2, (my only reference), MSC5.1 interprets this correctly
as "<hex BF>red", and 4.0 was wrong to limit itself to 2 hex chars following
the \x. It seems that an infinite number of hex characters may follow the
\x sequence, though what happens if the result fails to fit in a char is
undefined.

Is this what you'd call "expected behaviour"?

After all, the octal escape sequence limits itself to 3 characters...

If it IS correct, how do you write "<ESC>fred" using a hex escape? I
ended up having to use the octal escape in the end, which seems rather
an inelegant method.

Thanks in advance for any insights.
Steve 

gwyn@smoke.BRL.MIL (Doug Gwyn) (01/16/90)

In article <1335@cybaswan.UUCP> iiitsh@cybaswan.UUCP (Steve Hosgood) writes:
>	printf("\x1bfred");	/* i.e "<ESC>fred" */
>... It seems that an infinite number of hex characters may follow the
>\x sequence, though what happens if the result fails to fit in a char is
>undefined.
>Is this what you'd call "expected behaviour"?

It's what I would expect.  Two hex digits is not always enough.

>After all, the octal escape sequence limits itself to 3 characters...

That's a deficiency in the octal escape sequence design that we were
able to remedy for the newly invented hex sequences.

>If it IS correct, how do you write "<ESC>fred" using a hex escape?

The simplest method is to use string concatenation:
	printf("\x1b""fred");

walter@hpclwjm.HP.COM (Walter Murray) (01/17/90)

Steve Hosgood writes:

> I recently discovered in K&R Edition 2 (ANSII C) that the hex escape
> sequence will accept any number of valid hex characters after the "\x".
> This means that the printf statement:-
> 	printf("\x1bfred");	/* i.e "<ESC>fred" */
> ..suddenly failed in a program when we updated from Microsoft 4.0 to
> 5.1 recently.

> According to K&R2, (my only reference), MSC5.1 interprets this correctly
> as "<hex BF>red", and 4.0 was wrong to limit itself to 2 hex chars following
> the \x. It seems that an infinite number of hex characters may follow the
> \x sequence, though what happens if the result fails to fit in a char is
> undefined.

A Standard-conforming compiler is required to produce a diagnostic if
the value of a hexadecimal escape sequence doesn't fit in an unsigned
char.

> Is this what you'd call "expected behaviour"?

It's for the benefit of implementations where a char is more than
eight bits.

> After all, the octal escape sequence limits itself to 3 characters...

True.  Hexadecimal escape sequences are different.

> If it IS correct, how do you write "<ESC>fred" using a hex escape? I
> ended up having to use the octal escape in the end, which seems rather
> an inelegant method.

You use adjacent string literals.  You can safely write
   printf("\x1b" "fred");
or
   #define ESC "\x1b"
   printf(ESC "fred");
This works because escape sequences are converted to single characters
prior to the concatenation of adjacent string literals.

Walter
----------

henry@utzoo.uucp (Henry Spencer) (01/17/90)

In article <11960@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>... Two hex digits is not always enough.

Admittedly a problem...

>>After all, the octal escape sequence limits itself to 3 characters...
>That's a deficiency in the octal escape sequence design that we were
>able to remedy for the newly invented hex sequences.

Introducing, in return, a new deficiency:  the inability to terminate
the hex sequence simply and cleanly when desired.  Don't think this
wasn't pointed out during the public reviews, by me among others.
Among other things, it is a violation of the "prior art" rule, since
prior experience (C++ and some C compilers) has been entirely (I think)
with limited-length versions.

>>If it IS correct, how do you write "<ESC>fred" using a hex escape?
>The simplest method is to use string concatenation:
>	printf("\x1b""fred");

As the man said, "there's *got* to be a better way".  Unfortunately, none
made it into the standard.  Using string concatenation for this is an
ugly kludge, not a proper solution.  The right way would have been to
include some sort of bracketing as part of the escape, e.g. "\x(ab)".
This would even have allowed compatibility with C++ and other prior art:
length is unlimited only within brackets, with plain "\xabc" being limited
to three digits like the octal escapes.

Alas, it's too late now.  I expect the result will be widespread aversion
to the error-prone context-sensitive hex escapes, defeating the original
purpose of making life easier.
-- 
1972: Saturn V #15 flight-ready|     Henry Spencer at U of Toronto Zoology
1990: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (01/17/90)

In article <1990Jan16.194556.6727@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>Introducing, in return, a new deficiency:  the inability to terminate
>the hex sequence simply and cleanly when desired.  Don't think this
>wasn't pointed out during the public reviews, by me among others.

>>The simplest method is to use string concatenation:
>>	printf("\x1b""fred");

>As the man said, "there's *got* to be a better way".  Unfortunately, none
>made it into the standard.  Using string concatenation for this is an
>ugly kludge, not a proper solution.  The right way would have been to
>include some sort of bracketing as part of the escape, e.g. "\x(ab)".
>This would even have allowed compatibility with C++ and other prior art:
>length is unlimited only within brackets, with plain "\xabc" being limited
>to three digits like the octal escapes.

I bet you couldn't get agreement on whether that should be three or two.
And people with 16-bit chars would be pissed off at either choice.

The only alternative I recall X3J11 voting one was one that I proposed,
namely that the number of hex digits glommed onto after \x would be
implementation-defined.  I don't recall a bracketed notion being voted
on or even proposed.  (I would slightly prefer \xabx over \x(ab).)

The behavior actually adopted has the merits of being simple to specify
and being interpreted the same in all implementations (up to
representation overflow).

Besides, all you need do is think of ""\x and "" as being the actual
brackets, if you want a uniform method.  "foo""\xab""bar"

>Alas, it's too late now.  I expect the result will be widespread aversion
>to the error-prone context-sensitive hex escapes, defeating the original
>purpose of making life easier.

They don't seem particularly error-prone to me, because the idea of
embedding character code values in string literals is highly
nonportable, so I'm unlikely to do it anyway.  About the only way I'd
use these would be to have something like the following in a system
configuration file:
	#define	CSI	"\x1b["	/* control string introducer */
	#define	ACK	'\x06'	/* ACKnowledge successful receipt */
and there it would be pretty simple to remember to think about what
I'm doing with the \x notation.