cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) (05/30/90)
Is there an easy way to read a string into a buffer with automatic run-time
translation of the escape sequences? I want to do something like this:
{ char fmt[100];
gets(fmt);
descape(fmt); /* ... This is the function I need */
printf(fmt,123);
...
}
The user should be able to enter his input data line like this
\n\nThe value of \0x41 is %d\n
and I'd like it to work as if the program contained the statement
printf("\n\nThe value of \0x41 is %d\n",123);
(I can do it the hard way by parsing the string and substituting. What
I want to know is whether there is a standard function or I/O routine
or a simple trick that can do the conversion at run time.)
Pete
--
EP Wentworth - Dept. of Computer Science - Rhodes University - Grahamstown.
Internet: cspw.quagga@f4.n494.z5.fidonet.org
Uninet: cspw@quagga
uucp: ..uunet!m2xenix!quagga!cspw
--
uucp: uunet!m2xenix!puddle!5!494!4.0!cspw.quagga
Internet: cspw.quagga@p0.f4.n494.z5.fidonet.org
henry@utzoo.uucp (Henry Spencer) (06/01/90)
In article <6550.26639B0A@puddle.fidonet.org> cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) writes: >Is there an easy way to read a string into a buffer with automatic run-time >translation of the escape sequences? ... Alas, no. It would be very nice if *scanf and *printf provided variants of %s that would do this. At one point I considered formally proposing this for ANSI C, but decided that I could not point to sufficient experience with it to justify adding it to the standard. -- As a user I'll take speed over| Henry Spencer at U of Toronto Zoology features any day. -A.Tanenbaum| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
rsalz@bbn.com (Rich Salz) (06/01/90)
In <6550.26639B0A@puddle.fidonet.org> cspw.quagga@p0.f4.n494.z5.fidonet.org (cspw quagga) writes: >Is there an easy way to read a string into a buffer with automatic run-time >translation of the escape sequences? I want to do something like this: > > { char fmt[100]; > gets(fmt); > descape(fmt); /* ... This is the function I need */ > printf(fmt,123); > ... > } /* ** Convert C escape sequences in a string. Returns a pointer to ** malloc'd space, or NULL if malloc failed. */ #include <stdio.h> #include <ctype.h> #define OCTDIG(c) ('0' <= (c) && (c) <= '7') #define HEXDIG(c) isxdigit(c) char * UnEscapify(text) register char *text; { extern char *malloc(); register char *p; char *save; int i; if ((save = malloc(strlen(text) + 1)) == NULL) return NULL; for (p = save; *text; text++, p++) { if (*text != '\\') *p = *text; else { switch (*++text) { default: /* Undefined; ignore it */ case '\'': case '\\': case '"': case '?': *p = *text; break; case '\0': *p = '\0'; return save; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': for (*p = 0, i = 0; OCTDIG(*text) && i < 3; text++, i++) *p = (*p << 3) + *text - '0'; text--; break; case 'x': for (*p = 0; *++text && isxdigit(*text); ) if (isdigit(*text)) *p = (*p << 4) + *text - '0'; else if (isupper(*text)) *p = (*p << 4) + *text - 'A'; else *p = (*p << 4) + *text - 'a'; text--; break; case 'a': *p = '\007'; break; /* Alert */ case 'b': *p = '\b'; break; /* Backspace */ case 'f': *p = '\f'; break; /* Form feed */ case 'n': *p = '\n'; break; /* New line */ case 'r': *p = '\r'; break; /* Carriage return */ case 't': *p = '\t'; break; /* Horizontal tab */ case 'v': *p = '\n'; break; /* Vertical tab */ } } } *p = '\0'; return save; } #ifdef TEST main() { char buff[256]; char *p; printf("Enter strings, EOF to quit:\n"); while (gets(buff)) { if ((p = UnEscapify(buff)) == NULL) { perror("Malloc failed"); abort(); } printf("|%s|\n", p); free(p); } exit(0); } #endif /*TEST */ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.
rsalz@bbn.com (Rich Salz) (06/01/90)
Oops... From: Kevin Braunsdorf <ksb@nostromo.cc.purdue.edu> To: rsalz@BBN.COM In article <2596@litchi.bbn.com> you write: | case 'x': | for (*p = 0; *++text && isxdigit(*text); ) | if (isdigit(*text)) | *p = (*p << 4) + *text - '0'; | else if (isupper(*text)) | *p = (*p << 4) + *text - 'A'; | else | *p = (*p << 4) + *text - 'a'; | text--; | break; Nope. You forgot to add 10 for the 'a' and 'A' case. *p = (*p << 4) + *text - 'A' + 10; else *p = (*p << 4) + *text - 'a' + 10; Sorry about that. -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.
david@cs.uow.edu.au (David E A Wilson) (06/03/90)
In article <2596@litchi.bbn.com>, rsalz@bbn.com (Rich Salz) writes: > case 'v': *p = '\n'; break; /* Vertical tab */ Shouldn't this be '\v' or at least '\013' (for ASCII vertical tab)? David Wilson
peter@ficc.ferranti.com (Peter da Silva) (06/03/90)
I'm mildly surprised X3.159 doesn't include \e for escape, since they added \xNN \a and so on... was it considered? -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.ferranti.com> 'U` Have you hugged your wolf today? <peter@sugar.hackercorp.com> @FIN Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.
meissner@osf.org (Michael Meissner) (06/04/90)
In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: | I'm mildly surprised X3.159 doesn't include \e for escape, since they | added \xNN \a and so on... was it considered? It came up a few times. The problem is that ANSI C is not mandated to require ASCII (or even ISO646). EBCDIC is the classic counterpoint. Some of the people in the committee also observed that is was kind of silly to specify something, which is always used in a non-portable fashion (ie, terminal/printer control strings), when there was always \nnn around to do exactly the same thing, in the same non-portable manner. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so
henry@utzoo.uucp (Henry Spencer) (06/04/90)
In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >I'm mildly surprised X3.159 doesn't include \e for escape, since they >added \xNN \a and so on... was it considered? Yes. "Escape" is a character-set-specific concept, however, and it was thought inappropriate to demand that it exist in all C implementations. (Personally I'd view \a much the same way, but this is the official explanation...) -- As a user I'll take speed over| Henry Spencer at U of Toronto Zoology features any day. -A.Tanenbaum| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/04/90)
In article <:9W3JZ3@ggpc2.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) wrote: : I'm mildly surprised X3.159 doesn't include \e for escape, since they : added \xNN \a and so on... was it considered? In article <MEISSNER.90Jun3154100@curley.osf.org>, meissner@osf.org (Michael Meissner) wrote: : It came up a few times. The problem is that ANSI C is not mandated to : require ASCII (or even ISO646). EBCDIC is the classic counterpoint. Er, EBCDIC _has_ an ESC character. Are there any character sets C is known to be used with that haven't? -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."
peter@ficc.ferranti.com (Peter da Silva) (06/04/90)
In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes: > It came up a few times. The problem is that ANSI C is not mandated to > require ASCII (or even ISO646). EBCDIC is the classic counterpoint. Are the rest of the escapes, in fact, portable? For example, does ebcdic have a separate \r and \n? I know some ASCII-based systems use the two interchangeably (OS/9, for example). Not to mention that C pretty much assumes you'll have non-portable characters like # and {} available... With another ANSI standard (X3.64, I think) specifying the interpretation of escape sequences, it's not even that unportable... -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.ferranti.com> 'U` Have you hugged your wolf today? <peter@sugar.hackercorp.com> @FIN Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/05/90)
In article <+2X3GW9@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: : In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes: : > It came up a few times. The problem is that ANSI C is not mandated to : > require ASCII (or even ISO646). EBCDIC is the classic counterpoint. : Are the rest of the escapes, in fact, portable? For example, does ebcdic : have a separate \r and \n? I know some ASCII-based systems use the two : interchangeably (OS/9, for example). EBCDIC has three separate characters: NL (\n), CR (\r), and LF (\012). Some C compilers for /370s identify \n with LF, some with NL (\x15). Since IBM mainframes use length (fixed or variable) to specify record boundaries, not embedded special characters, only the C library cares what \n is. In fact most of the ASCII "control characters" have equivalents in EBCDIC, and many of them even have the same numeric value. In particular, \e for ESC would have been _more_ portable to EBCDIC than \n is, there is only one candidate for ESC and three for end of line. (What would have been wrong with mapping \n to Record Separator?) -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."
meissner@osf.org (Michael Meissner) (06/05/90)
In article <+2X3GW9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: | In article <MEISSNER.90Jun3154100@curley.osf.org> meissner@osf.org (Michael Meissner) writes: | > It came up a few times. The problem is that ANSI C is not mandated to | > require ASCII (or even ISO646). EBCDIC is the classic counterpoint. | | Are the rest of the escapes, in fact, portable? For example, does ebcdic | have a separate \r and \n? I know some ASCII-based systems use the two | interchangeably (OS/9, for example). The C standard mandates that \r and \n have separate numeric values. ANSI C doesn't cover what the system really does with \r and \n, just the programmer's intent. I personally think \a, \r, and \v should not be in the standard. The mainframe crowd at ANSI did say that there were EBCDIC equivalents for the other escape sequences. | Not to mention that C pretty much assumes you'll have non-portable | characters like # and {} available... That's why there are trigraphs. | With another ANSI standard (X3.64, I think) specifying the interpretation of | escape sequences, it's not even that unportable... Not every terminal speaks X3.64. Try it on your local 3270 terminal (or your DG terminal in DG mode....). Also, not everything is a terminal, escape whatever also does things to printers, and such. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so
peter@ficc.ferranti.com (Peter da Silva) (06/06/90)
In article <MEISSNER.90Jun5102326@curley.osf.org> meissner@osf.org (Michael Meissner) writes: > The C standard mandates that \r and \n have separate numeric values. That'll be fun for Microware and people using OS/9. > | Not to mention that C pretty much assumes you'll have non-portable > | characters like # and {} available... > That's why there are trigraphs. Does anyone actually use them for work? It seems to me they're pretty much unusable in practice except for transferring code between environments. > (or your DG terminal in DG mode....). Also, not everything is a > terminal, escape whatever also does things to printers, and such. And there is a standard for that. -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.ferranti.com> 'U` Have you hugged your wolf today? <peter@sugar.hackercorp.com> @FIN Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.
prc@erbe.se (Robert Claeson) (06/07/90)
In article <BSY3JUC@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: > > That's why there are trigraphs. > > Does anyone actually use them for work? It seems to me they're pretty much > unusable in practice except for transferring code between environments. Glad you asked. Yes, trigraphs are used for work, especially when not in an ASCII environment. EBCDIC, for example, doesn't have brackets and braces, so C programmers in an EBCDIC environment are more or less forced to use trigraphs. Most national variants of the ISO 646 7-bit character set (except for the U.S. and U.K. variants) doesn't have them either, but programmers have learned to use whatever character that happens to have the same character code as the special characters. For example, using the Swedish variant of ISO 646, '[' is substituted with the alphabetical character A-diaeresis, '^' is substituted with U-diaeresis, '~' is substituted with u-diaeresis and so on. There is at least good to have a standard for the 'special characters'. Pascal programmers in an EBCDIC environment has to use .( and .) instead of [ and ], but there's no standard for that so it is not portable. -- Robert Claeson E-mail: rclaeson@erbe.se ERBE DATA AB
chip@tct.uucp (Chip Salzenberg) (06/07/90)
According to peter@ficc.ferranti.com (Peter da Silva): >meissner@osf.org (Michael Meissner) writes: >> The C standard mandates that \r and \n have separate numeric values. > >That'll be fun for Microware and people using OS/9. I once did a cross-compiler for OS/9. OS/9 text files have lines terminated with 0x0D. So I defined '\n' as 0x0D. I had to define '\r' as something different from '\n'. You guessed it. I defined '\r' as 0x0A. Shoot me now. -- Chip, the new t.b answer man <chip@tct.uucp>, <uunet!ateng!tct!chip>
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/08/90)
In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes: > Glad you asked. Yes, trigraphs are used for work, especially when not in > an ASCII environment. EBCDIC, for example, doesn't have brackets and braces, Er, this turns out not to be the case. EBCDIC _has_ got curly braces. Square brackets are not quite as good; there are actually _two_ different sets of codes for the square brackets (historically connected with two different "print chains") but the C compilers I've seen accept both. > so C programmers in an EBCDIC environment are more or less forced to use > trigraphs. Whether EBCDIC has codes for these characters is one question (to which the answer is, yes it has); whether you can easily use those characters in an IBM environment (under VM/CMS for example) is another question, to which the answer is again, _yes_. I've sat by someone's side as he edited a C program (the source code of TeX, as it happens, and TeX also relies heavily on curly braces) using XEDIT, and it worked just fine. There are occasional glitches (BROWSE likes to display braces as blanks) but C and TeX work just fine in an EBCDIC environment. -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."
exspes@gdr.bath.ac.uk (P E Smee) (06/11/90)
In article <3190@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes: >> Glad you asked. Yes, trigraphs are used for work, especially when not in >> an ASCII environment. EBCDIC, for example, doesn't have brackets and braces, > >Er, this turns out not to be the case. EBCDIC _has_ got curly braces. >Square brackets are not quite as good; there are actually _two_ different >sets of codes > >> so C programmers in an EBCDIC environment are more or less forced to use >> trigraphs. > >Whether EBCDIC has codes for these characters is one question (to which the >answer is, yes it has); whether you can easily use those characters in an >IBM environment (under VM/CMS for example) is another question, to which >the answer is again, _yes_. Or, sometimes, _no_. This depends heavily on precisely what make/model, and even submodel of terminal you have (some but not all 3270's work, for example) and on the precise details of how they are connected to the machine. And, possibly, even on how your MAINT has configured things. I had to port a large C program to VM/CMS and had no problems while I was working in the machine room. When I got things stabilised enough to work from my office, I found that a number of the C 'special chars' didn't work (and worse, would be garbaged by the editor) and was forced into using trigraphs. The only reconfiguration I could find to avoid this had the unfortunate side-effect of making one of our printers (connected through the same controller) print garbage, and so wasn't on. Our IBM bod suggested that the solution was to buy yet another controller, dedicated to the printer. -- Paul Smee, Computing Service, University of Bristol, Bristol BS8 1UD, UK P.Smee@bristol.ac.uk - ..!uunet!ukc!bsmail!p.smee - Tel +44 272 303132
jeffe@sandino.austin.ibm.com (Peter Jeffe 512.823.4091/500000) (06/12/90)
In article <1990Jun11.092136.7800@gdr.bath.ac.uk> P.Smee@bristol.ac.uk (Paul Smee) writes: >>Whether EBCDIC has codes for these characters is one question (to which the >>answer is, yes it has); whether you can easily use those characters in an >>IBM environment (under VM/CMS for example) is another question, to which >>the answer is again, _yes_. > >Or, sometimes, _no_. This depends heavily on precisely what >make/model, and even submodel of terminal you have (some but not all >3270's work, for example) and on the precise details of how they are >connected to the machine. And, possibly, even on how your MAINT has >configured things. On VM, I think that the only trick is in doing the appropriate CP command to translate the incoming/outgoing codes; unfortunately (for you; fortunately for me) it has been too long for me to remember the exact command, but it was something like "CP TERM [IN | OUT] CHAR1 CHAR2"; I had it in my profile.exec and it did the trick. I successfully worked with Whitesmith C on a 9370 and CP did the terminal-bracket-to-Whitesmith-bracket conversion just fine (there were occeasional glitches, but nothing compared to having to use the trigraph abominations). Sorry I can't provide more info, but I've happily blocked out the whole miserable experience. ---------------------------------------------------------------------- disclaimer: all persons (including myself) and events mentioned herein are fictitious and, given the subjective nature of reality, can bear no resemblence to any other person's conception of real persons or events. Peter Jeffe 512.823.4091 jeffe@sandino.austin.ibm.com ...cs.utexas.edu!ibmchs!auschs!sandino.austin.ibm.com!jeffe
math1i7@jetson.uh.edu (06/12/90)
In article <1990Jun11.092136.7800@gdr.bath.ac.uk>, exspes@gdr.bath.ac.uk (P E Smee) writes: > In article <3190@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >>In article <1600@hulda.erbe.se>, prc@erbe.se (Robert Claeson) writes: >>> Glad you asked. Yes, trigraphs are used for work, especially when not in >>> an ASCII environment. EBCDIC, for example, doesn't have brackets and braces, >> I have occasion to use C/370 on the IBM mainframes at work. The compiler on that machine expects to find certain EBCDIC codes for left and right brackets. These codes do not, unfortunately, display correctly on the terminals. So someone in the support group wrote a pair of XEDIT macros that automatically convert the codes for left and right brackets into different codes that (usually) display correctly on the terminal, then reconvert them back when you file and exit. The problem with that is that different terminals (or should it be different controllers) use different EBCDIC codes to display the brackets .... oh well (trigraphs do work) Gordon Tillman