[comp.lang.c] How can I de-escape my strings at run time?

cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) (05/30/90)

 
Is there an easy way to read a string into a buffer with automatic run-time
translation of the escape sequences?  I want to do something like this:
 
 { char fmt[100];
   gets(fmt);
      descape(fmt);   /*   ... This is the function I need   */
   printf(fmt,123);
   ...
 }
 
 The user should be able to enter his input data line like this
 
 \n\nThe value of \0x41 is %d\n
 
 and I'd like it to work as if the program contained the statement
 
  printf("\n\nThe value of \0x41 is %d\n",123);
 
(I can do it the hard way by parsing the string and substituting.  What
I want to know is whether there is a standard function or I/O routine
or a simple trick that can do the conversion at run time.)
 
Pete

--
EP Wentworth - Dept. of Computer Science - Rhodes University - Grahamstown.
Internet: cspw.quagga@f4.n494.z5.fidonet.org
Uninet: cspw@quagga
uucp: ..uunet!m2xenix!quagga!cspw



--  
uucp: uunet!m2xenix!puddle!5!494!4.0!cspw.quagga
Internet: cspw.quagga@p0.f4.n494.z5.fidonet.org

henry@utzoo.uucp (Henry Spencer) (06/01/90)

In article <6550.26639B0A@puddle.fidonet.org> cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) writes:
>Is there an easy way to read a string into a buffer with automatic run-time
>translation of the escape sequences? ...

Alas, no.  It would be very nice if *scanf and *printf provided variants of
%s that would do this.  At one point I considered formally proposing this
for ANSI C, but decided that I could not point to sufficient experience
with it to justify adding it to the standard.
-- 
As a user I'll take speed over|     Henry Spencer at U of Toronto Zoology
features any day. -A.Tanenbaum| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

rsalz@bbn.com (Rich Salz) (06/01/90)

In <6550.26639B0A@puddle.fidonet.org> cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) writes:
>Is there an easy way to read a string into a buffer with automatic run-time
>translation of the escape sequences?  I want to do something like this:
> 
> { char fmt[100];
>   gets(fmt);
>      descape(fmt);   /*   ... This is the function I need   */
>   printf(fmt,123);
>   ...
> }

/*
**  Convert C escape sequences in a string.  Returns a pointer to
**  malloc'd space, or NULL if malloc failed.
*/
#include <stdio.h>
#include <ctype.h>

#define OCTDIG(c)	('0' <= (c) && (c) <= '7')
#define HEXDIG(c)	isxdigit(c)

char *
UnEscapify(text)
    register char	*text;
{
    extern char		*malloc();
    register char	*p;
    char		*save;
    int			i;

    if ((save = malloc(strlen(text) + 1)) == NULL)
	return NULL;

    for (p = save; *text; text++, p++) {
	if (*text != '\\')
	    *p = *text;
	else {
	    switch (*++text) {
	    default:				/* Undefined; ignore it	*/
	    case '\'': case '\\': case '"': case '?':
		*p = *text;
		break;

	    case '\0':
		*p = '\0';
		return save;

	    case '0': case '1': case '2': case '3':
	    case '4': case '5': case '6': case '7':
		for (*p = 0, i = 0; OCTDIG(*text) && i < 3; text++, i++)
		    *p = (*p << 3) + *text - '0';
		text--;
		break;

	    case 'x':
		for (*p = 0; *++text && isxdigit(*text); )
		    if (isdigit(*text))
			*p = (*p << 4) + *text - '0';
		    else if (isupper(*text))
			*p = (*p << 4) + *text - 'A';
		    else
			*p = (*p << 4) + *text - 'a';
		text--;
		break;

	    case 'a':	*p = '\007';	break;	/* Alert		*/
	    case 'b':	*p = '\b';	break;	/* Backspace		*/
	    case 'f':	*p = '\f';	break;	/* Form feed		*/
	    case 'n':	*p = '\n';	break;	/* New line		*/
	    case 'r':	*p = '\r';	break;	/* Carriage return	*/
	    case 't':	*p = '\t';	break;	/* Horizontal tab	*/
	    case 'v':	*p = '\n';	break;	/* Vertical tab		*/

	    }
	}
    }
    *p = '\0';
    return save;
}

#ifdef	TEST

main()
{
    char	buff[256];
    char	*p;

    printf("Enter strings, EOF to quit:\n");
    while (gets(buff)) {
	if ((p = UnEscapify(buff)) == NULL) {
	    perror("Malloc failed");
	    abort();
	}
	printf("|%s|\n", p);
	free(p);
    }
    exit(0);

}
#endif	/*TEST */
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

rsalz@bbn.com (Rich Salz) (06/01/90)

Oops...

From: Kevin Braunsdorf <ksb@nostromo.cc.purdue.edu>
To: rsalz@BBN.COM

In article <2596@litchi.bbn.com> you write:
|	    case 'x':
|		for (*p = 0; *++text && isxdigit(*text); )
|		    if (isdigit(*text))
|			*p = (*p << 4) + *text - '0';
|		    else if (isupper(*text))
|			*p = (*p << 4) + *text - 'A';
|		    else
|			*p = (*p << 4) + *text - 'a';
|		text--;
|		break;
Nope.  You forgot to add 10 for the 'a' and 'A' case.
			*p = (*p << 4) + *text - 'A' + 10;
		    else
			*p = (*p << 4) + *text - 'a' + 10;

Sorry about that.
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

david@cs.uow.edu.au (David E A Wilson) (06/03/90)

In article <2596@litchi.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> 	    case 'v':	*p = '\n';	break;	/* Vertical tab		*/

Shouldn't this be '\v' or at least '\013' (for ASCII vertical tab)?

David Wilson

peter@ficc.ferranti.com (Peter da Silva) (06/03/90)

I'm mildly surprised X3.159 doesn't include \e for escape, since they
added \xNN \a and so on... was it considered?
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

meissner@osf.org (Michael Meissner) (06/04/90)

In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter
da Silva) writes:

| I'm mildly surprised X3.159 doesn't include \e for escape, since they
| added \xNN \a and so on... was it considered?

It came up a few times.  The problem is that ANSI C is not mandated to
require ASCII (or even ISO646).  EBCDIC is the classic counterpoint.
Some of the people in the committee also observed that is was kind of
silly to specify something, which is always used in a non-portable
fashion (ie, terminal/printer control strings), when there was always
\nnn around to do exactly the same thing, in the same non-portable
manner.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

henry@utzoo.uucp (Henry Spencer) (06/04/90)

In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>I'm mildly surprised X3.159 doesn't include \e for escape, since they
>added \xNN \a and so on... was it considered?

Yes.  "Escape" is a character-set-specific concept, however, and it was
thought inappropriate to demand that it exist in all C implementations.
(Personally I'd view \a much the same way, but this is the official
explanation...)
-- 
As a user I'll take speed over|     Henry Spencer at U of Toronto Zoology
features any day. -A.Tanenbaum| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/04/90)

In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter
da Silva) wrote:
: I'm mildly surprised X3.159 doesn't include \e for escape, since they
: added \xNN \a and so on... was it considered?
In article <MEISSNER.90Jun3154100@curley.osf.org>,
meissner@osf.org (Michael Meissner) wrote:
: It came up a few times.  The problem is that ANSI C is not mandated to
: require ASCII (or even ISO646).  EBCDIC is the classic counterpoint.

Er, EBCDIC _has_ an ESC character.
Are there any character sets C is known to be used with that haven't?
-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."

peter@ficc.ferranti.com (Peter da Silva) (06/04/90)

In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
> It came up a few times.  The problem is that ANSI C is not mandated to
> require ASCII (or even ISO646).  EBCDIC is the classic counterpoint.

Are the rest of the escapes, in fact, portable? For example, does ebcdic
have a separate \r and \n? I know some ASCII-based systems use the two
interchangeably (OS/9, for example).

Not to mention that C pretty much assumes you'll have non-portable
characters like # and {} available...

With another ANSI standard (X3.64, I think) specifying the interpretation of
escape sequences, it's not even that unportable...
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/05/90)

In article <+2X3GW9@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
: In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
: > It came up a few times.  The problem is that ANSI C is not mandated to
: > require ASCII (or even ISO646).  EBCDIC is the classic counterpoint.

: Are the rest of the escapes, in fact, portable? For example, does ebcdic
: have a separate \r and \n? I know some ASCII-based systems use the two
: interchangeably (OS/9, for example).

EBCDIC has three separate characters: NL (\n), CR (\r), and LF (\012).
Some C compilers for /370s identify \n with LF, some with NL (\x15).
Since IBM mainframes use length (fixed or variable) to specify record
boundaries, not embedded special characters, only the C library cares
what \n is.  In fact most of the ASCII "control characters" have
equivalents in EBCDIC, and many of them even have the same numeric value.
In particular, \e for ESC would have been _more_ portable to EBCDIC than
\n is, there is only one candidate for ESC and three for end of line.
(What would have been wrong with mapping \n to Record Separator?)

-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."

meissner@osf.org (Michael Meissner) (06/05/90)

In article <+2X3GW9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter
da Silva) writes:

| In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
| > It came up a few times.  The problem is that ANSI C is not mandated to
| > require ASCII (or even ISO646).  EBCDIC is the classic counterpoint.
| 
| Are the rest of the escapes, in fact, portable? For example, does ebcdic
| have a separate \r and \n? I know some ASCII-based systems use the two
| interchangeably (OS/9, for example).

The C standard mandates that \r and \n have separate numeric values.
ANSI C doesn't cover what the system really does with \r and \n, just
the programmer's intent.  I personally think \a, \r, and \v should not
be in the standard.  The mainframe crowd at ANSI did say that there
were EBCDIC equivalents for the other escape sequences.

| Not to mention that C pretty much assumes you'll have non-portable
| characters like # and {} available...

That's why there are trigraphs.

| With another ANSI standard (X3.64, I think) specifying the interpretation of
| escape sequences, it's not even that unportable...

Not every terminal speaks X3.64.  Try it on your local 3270 terminal
(or your DG terminal in DG mode....).  Also, not everything is a
terminal, escape whatever also does things to printers, and such.

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

peter@ficc.ferranti.com (Peter da Silva) (06/06/90)

In article <MEISSNER.90Jun5102326@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
> The C standard mandates that \r and \n have separate numeric values.

That'll be fun for Microware and people using OS/9.

> | Not to mention that C pretty much assumes you'll have non-portable
> | characters like # and {} available...

> That's why there are trigraphs.

Does anyone actually use them for work? It seems to me they're pretty much
unusable in practice except for transferring code between environments.

> (or your DG terminal in DG mode....).  Also, not everything is a
> terminal, escape whatever also does things to printers, and such.

And there is a standard for that.
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

prc@erbe.se (Robert Claeson) (06/07/90)

In article <BSY3JUC@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:

> > That's why there are trigraphs.
> 
> Does anyone actually use them for work? It seems to me they're pretty much
> unusable in practice except for transferring code between environments.

Glad you asked. Yes, trigraphs are used for work, especially when not in
an ASCII environment. EBCDIC, for example, doesn't have brackets and braces,
so C programmers in an EBCDIC environment are more or less forced to use
trigraphs.

Most national variants of the ISO 646 7-bit character set (except for the
U.S. and U.K. variants) doesn't have them either, but programmers have learned
to use whatever character that happens to have the same character code as
the special characters. For example, using the Swedish variant of ISO 646,
'[' is substituted with the alphabetical character A-diaeresis, '^' is
substituted with U-diaeresis, '~' is substituted with u-diaeresis and so on.

There is at least good to have a standard for the 'special characters'.
Pascal programmers in an EBCDIC environment has to use .( and .) instead
of [ and ], but there's no standard for that so it is not portable.

-- 
          Robert Claeson      E-mail: rclaeson@erbe.se
	  ERBE DATA AB

chip@tct.uucp (Chip Salzenberg) (06/07/90)

According to peter@ficc.ferranti.com (Peter da Silva):
>meissner@osf.org (Michael Meissner) writes:
>> The C standard mandates that \r and \n have separate numeric values.
>
>That'll be fun for Microware and people using OS/9.

I once did a cross-compiler for OS/9.
OS/9 text files have lines terminated with 0x0D.
So I defined '\n' as 0x0D.

I had to define '\r' as something different from '\n'.
You guessed it.
I defined '\r' as 0x0A.

Shoot me now.
-- 
Chip, the new t.b answer man      <chip@tct.uucp>, <uunet!ateng!tct!chip>

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/08/90)

In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes:
> Glad you asked. Yes, trigraphs are used for work, especially when not in
> an ASCII environment. EBCDIC, for example, doesn't have brackets and braces,

Er, this turns out not to be the case.  EBCDIC _has_ got curly braces.
Square brackets are not quite as good; there are actually _two_ different
sets of codes for the square brackets (historically connected with two
different "print chains") but the C compilers I've seen accept both.

> so C programmers in an EBCDIC environment are more or less forced to use
> trigraphs.

Whether EBCDIC has codes for these characters is one question (to which the
answer is, yes it has); whether you can easily use those characters in an
IBM environment (under VM/CMS for example) is another question, to which
the answer is again, _yes_.  I've sat by someone's side as he edited a C
program (the source code of TeX, as it happens, and TeX also relies
heavily on curly braces) using XEDIT, and it worked just fine.  There are
occasional glitches (BROWSE likes to display braces as blanks) but C and
TeX work just fine in an EBCDIC environment.
-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."

exspes@gdr.bath.ac.uk (P E Smee) (06/11/90)

In article <3190@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes:
>> Glad you asked. Yes, trigraphs are used for work, especially when not in
>> an ASCII environment. EBCDIC, for example, doesn't have brackets and braces,
>
>Er, this turns out not to be the case.  EBCDIC _has_ got curly braces.
>Square brackets are not quite as good; there are actually _two_ different
>sets of codes 
>
>> so C programmers in an EBCDIC environment are more or less forced to use
>> trigraphs.
>
>Whether EBCDIC has codes for these characters is one question (to which the
>answer is, yes it has); whether you can easily use those characters in an
>IBM environment (under VM/CMS for example) is another question, to which
>the answer is again, _yes_.  

Or, sometimes, _no_.  This depends heavily on precisely what
make/model, and even submodel of terminal you have (some but not all
3270's work, for example) and on the precise details of how they are
connected to the machine.  And, possibly, even on how your MAINT has
configured things.

I had to port a large C program to VM/CMS and had no problems while I
was working in the machine room.  When I got things stabilised enough
to work from my office, I found that a number of the C 'special chars'
didn't work (and worse, would be garbaged by the editor) and was forced
into using trigraphs.  The only reconfiguration I could find to avoid
this had the unfortunate side-effect of making one of our printers
(connected through the same controller) print garbage, and so wasn't
on.  Our IBM bod suggested that the solution was to buy yet another
controller, dedicated to the printer.

-- 
Paul Smee, Computing Service, University of Bristol, Bristol BS8 1UD, UK
 P.Smee@bristol.ac.uk - ..!uunet!ukc!bsmail!p.smee - Tel +44 272 303132

jeffe@sandino.austin.ibm.com (Peter Jeffe 512.823.4091/500000) (06/12/90)

In article <1990Jun11.092136.7800@gdr.bath.ac.uk> P.Smee@bristol.ac.uk (Paul Smee) writes:
>>Whether EBCDIC has codes for these characters is one question (to which the
>>answer is, yes it has); whether you can easily use those characters in an
>>IBM environment (under VM/CMS for example) is another question, to which
>>the answer is again, _yes_.  
>
>Or, sometimes, _no_.  This depends heavily on precisely what
>make/model, and even submodel of terminal you have (some but not all
>3270's work, for example) and on the precise details of how they are
>connected to the machine.  And, possibly, even on how your MAINT has
>configured things.

On VM, I think that the only trick is in doing the appropriate CP command to
translate the incoming/outgoing codes; unfortunately (for you; fortunately
for me) it has been too long for me to remember the exact command, but it
was something like "CP TERM [IN | OUT] CHAR1 CHAR2"; I had it in my
profile.exec and it did the trick.  I successfully worked with Whitesmith C
on a 9370 and CP did the terminal-bracket-to-Whitesmith-bracket conversion
just fine (there were occeasional glitches, but nothing compared to having
to use the trigraph abominations).  Sorry I can't provide more info, but I've
happily blocked out the whole miserable experience.
----------------------------------------------------------------------
disclaimer: all persons (including myself) and events mentioned herein
are fictitious and, given the subjective nature of reality, can bear no
resemblence to any other person's conception of real persons or events.
	Peter Jeffe 512.823.4091 jeffe@sandino.austin.ibm.com
	...cs.utexas.edu!ibmchs!auschs!sandino.austin.ibm.com!jeffe

math1i7@jetson.uh.edu (06/12/90)

In article <1990Jun11.092136.7800@gdr.bath.ac.uk>, exspes@gdr.bath.ac.uk (P E Smee) writes:
> In article <3190@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>>In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes:
>>> Glad you asked. Yes, trigraphs are used for work, especially when not in
>>> an ASCII environment. EBCDIC, for example, doesn't have brackets and braces,
>>

I have occasion to use C/370 on the IBM mainframes at work.  The compiler on 
that machine expects to find certain EBCDIC codes for left and right brackets.
These codes do not, unfortunately, display correctly on the terminals.  So 
someone in the support group wrote a pair of XEDIT macros that automatically
convert the codes for left and right brackets into different codes that
(usually) display correctly on the terminal, then reconvert them back when
you file and exit.  The problem with that is that different terminals (or
should it be different controllers) use different EBCDIC codes to display
the brackets ....  oh well (trigraphs do work)

Gordon Tillman